From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com ([134.134.136.31]:6351 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756607AbcIFO2k (ORCPT ); Tue, 6 Sep 2016 10:28:40 -0400 Date: Tue, 6 Sep 2016 10:39:28 -0400 From: Keith Busch To: Christoph Hellwig Cc: axboe@fb.com, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org Subject: Re: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask Message-ID: <20160906143928.GA25201@localhost.localdomain> References: <1472468013-29936-1-git-send-email-hch@lst.de> <1472468013-29936-5-git-send-email-hch@lst.de> <20160831163852.GB5598@localhost.localdomain> <20160901084624.GC4115@lst.de> <20160901142410.GA10903@localhost.localdomain> <20160905194759.GA26008@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20160905194759.GA26008@lst.de> Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org On Mon, Sep 05, 2016 at 09:48:00PM +0200, Christoph Hellwig wrote: > On Thu, Sep 01, 2016 at 10:24:10AM -0400, Keith Busch wrote: > > On Thu, Sep 01, 2016 at 10:46:24AM +0200, Christoph Hellwig wrote: > > > On Wed, Aug 31, 2016 at 12:38:53PM -0400, Keith Busch wrote: > > > > This can't be right. We have a single affinity mask for the entire > > > > set, but what I think we want is an one affinity mask for each > > > > nr_io_queues. The irq_create_affinity_mask should then create an array > > > > of cpumasks based on nr_vecs.. > > > > > > Nah, this is Thomas' creating abuse of the cpumask type. Every bit set > > > in the affinity_mask means this is a cpu we allocate a vector / queue to. > > > > Yeah, I gathered that's what it was providing, but that's just barely > > not enough information to do something useful. The CPUs that aren't set > > have to use a previously assigned vector/queue, but which one? > > Always the previous one. Below is a patch to get us back to the > previous behavior: No, that's not right. Here's my topology info: # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 15745 MB node 0 free: 15319 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 16150 MB node 1 free: 15758 MB node distances: node 0 1 0: 10 21 1: 21 10 If I have 16 vectors, the affinity_mask generated by what you're doing looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each of those are the first unique CPU, getting a unique vector just like you wanted. If an unset bit just means share with the previous, then all of my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful! What we want for my CPU topology is the 16th CPU to pair with CPU 0, 17 pairs with 1, 18 with 2, and so on. You can't convey that information with this scheme. We need affinity_masks per vector. From mboxrd@z Thu Jan 1 00:00:00 1970 From: keith.busch@intel.com (Keith Busch) Date: Tue, 6 Sep 2016 10:39:28 -0400 Subject: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask In-Reply-To: <20160905194759.GA26008@lst.de> References: <1472468013-29936-1-git-send-email-hch@lst.de> <1472468013-29936-5-git-send-email-hch@lst.de> <20160831163852.GB5598@localhost.localdomain> <20160901084624.GC4115@lst.de> <20160901142410.GA10903@localhost.localdomain> <20160905194759.GA26008@lst.de> Message-ID: <20160906143928.GA25201@localhost.localdomain> On Mon, Sep 05, 2016@09:48:00PM +0200, Christoph Hellwig wrote: > On Thu, Sep 01, 2016@10:24:10AM -0400, Keith Busch wrote: > > On Thu, Sep 01, 2016@10:46:24AM +0200, Christoph Hellwig wrote: > > > On Wed, Aug 31, 2016@12:38:53PM -0400, Keith Busch wrote: > > > > This can't be right. We have a single affinity mask for the entire > > > > set, but what I think we want is an one affinity mask for each > > > > nr_io_queues. The irq_create_affinity_mask should then create an array > > > > of cpumasks based on nr_vecs.. > > > > > > Nah, this is Thomas' creating abuse of the cpumask type. Every bit set > > > in the affinity_mask means this is a cpu we allocate a vector / queue to. > > > > Yeah, I gathered that's what it was providing, but that's just barely > > not enough information to do something useful. The CPUs that aren't set > > have to use a previously assigned vector/queue, but which one? > > Always the previous one. Below is a patch to get us back to the > previous behavior: No, that's not right. Here's my topology info: # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 15745 MB node 0 free: 15319 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 16150 MB node 1 free: 15758 MB node distances: node 0 1 0: 10 21 1: 21 10 If I have 16 vectors, the affinity_mask generated by what you're doing looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each of those are the first unique CPU, getting a unique vector just like you wanted. If an unset bit just means share with the previous, then all of my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful! What we want for my CPU topology is the 16th CPU to pair with CPU 0, 17 pairs with 1, 18 with 2, and so on. You can't convey that information with this scheme. We need affinity_masks per vector.