From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from verein.lst.de ([213.95.11.211]:40578 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933172AbcIFQu6 (ORCPT ); Tue, 6 Sep 2016 12:50:58 -0400 Date: Tue, 6 Sep 2016 18:50:56 +0200 From: Christoph Hellwig To: Keith Busch Cc: axboe@fb.com, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Thomas Gleixner Subject: Re: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask Message-ID: <20160906165056.GB26214@lst.de> References: <1472468013-29936-1-git-send-email-hch@lst.de> <1472468013-29936-5-git-send-email-hch@lst.de> <20160831163852.GB5598@localhost.localdomain> <20160901084624.GC4115@lst.de> <20160901142410.GA10903@localhost.localdomain> <20160905194759.GA26008@lst.de> <20160906143928.GA25201@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20160906143928.GA25201@localhost.localdomain> Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org [adding Thomas as it's about the affinity_mask he (we) added to the IRQ core] On Tue, Sep 06, 2016 at 10:39:28AM -0400, Keith Busch wrote: > > Always the previous one. Below is a patch to get us back to the > > previous behavior: > > No, that's not right. > > Here's my topology info: > > # numactl --hardware > available: 2 nodes (0-1) > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 > node 0 size: 15745 MB > node 0 free: 15319 MB > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 > node 1 size: 16150 MB > node 1 free: 15758 MB > node distances: > node 0 1 > 0: 10 21 > 1: 21 10 How do you get that mapping? Does this CPU use Hyperthreading and thus expose siblings using topology_sibling_cpumask? As that's the only thing the old code used for any sort of special casing. I'll need to see if I can find a system with such a mapping to reproduce. > If I have 16 vectors, the affinity_mask generated by what you're doing > looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each > of those are the first unique CPU, getting a unique vector just like you > wanted. If an unset bit just means share with the previous, then all of > my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful! > > What we want for my CPU topology is the 16th CPU to pair with CPU 0, > 17 pairs with 1, 18 with 2, and so on. You can't convey that information > with this scheme. We need affinity_masks per vector. We actually have per-vector masks, but they are hidden inside the IRQ core and awkward to use. We could to the get_first_sibling magic in the block-mq queue mapping (and in fact with the current code I guess we need to). Or take a step back from trying to emulate the old code and instead look at NUMA nodes instead of siblings which some folks suggested a while ago. From mboxrd@z Thu Jan 1 00:00:00 1970 From: hch@lst.de (Christoph Hellwig) Date: Tue, 6 Sep 2016 18:50:56 +0200 Subject: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask In-Reply-To: <20160906143928.GA25201@localhost.localdomain> References: <1472468013-29936-1-git-send-email-hch@lst.de> <1472468013-29936-5-git-send-email-hch@lst.de> <20160831163852.GB5598@localhost.localdomain> <20160901084624.GC4115@lst.de> <20160901142410.GA10903@localhost.localdomain> <20160905194759.GA26008@lst.de> <20160906143928.GA25201@localhost.localdomain> Message-ID: <20160906165056.GB26214@lst.de> [adding Thomas as it's about the affinity_mask he (we) added to the IRQ core] On Tue, Sep 06, 2016@10:39:28AM -0400, Keith Busch wrote: > > Always the previous one. Below is a patch to get us back to the > > previous behavior: > > No, that's not right. > > Here's my topology info: > > # numactl --hardware > available: 2 nodes (0-1) > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 > node 0 size: 15745 MB > node 0 free: 15319 MB > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 > node 1 size: 16150 MB > node 1 free: 15758 MB > node distances: > node 0 1 > 0: 10 21 > 1: 21 10 How do you get that mapping? Does this CPU use Hyperthreading and thus expose siblings using topology_sibling_cpumask? As that's the only thing the old code used for any sort of special casing. I'll need to see if I can find a system with such a mapping to reproduce. > If I have 16 vectors, the affinity_mask generated by what you're doing > looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each > of those are the first unique CPU, getting a unique vector just like you > wanted. If an unset bit just means share with the previous, then all of > my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful! > > What we want for my CPU topology is the 16th CPU to pair with CPU 0, > 17 pairs with 1, 18 with 2, and so on. You can't convey that information > with this scheme. We need affinity_masks per vector. We actually have per-vector masks, but they are hidden inside the IRQ core and awkward to use. We could to the get_first_sibling magic in the block-mq queue mapping (and in fact with the current code I guess we need to). Or take a step back from trying to emulate the old code and instead look at NUMA nodes instead of siblings which some folks suggested a while ago.