All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: axboe@fb.com, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask
Date: Tue, 6 Sep 2016 10:39:28 -0400	[thread overview]
Message-ID: <20160906143928.GA25201@localhost.localdomain> (raw)
In-Reply-To: <20160905194759.GA26008@lst.de>

On Mon, Sep 05, 2016 at 09:48:00PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 01, 2016 at 10:24:10AM -0400, Keith Busch wrote:
> > On Thu, Sep 01, 2016 at 10:46:24AM +0200, Christoph Hellwig wrote:
> > > On Wed, Aug 31, 2016 at 12:38:53PM -0400, Keith Busch wrote:
> > > > This can't be right. We have a single affinity mask for the entire
> > > > set, but what I think we want is an one affinity mask for each
> > > > nr_io_queues. The irq_create_affinity_mask should then create an array
> > > > of cpumasks based on nr_vecs..
> > > 
> > > Nah, this is Thomas' creating abuse of the cpumask type.  Every bit set
> > > in the affinity_mask means this is a cpu we allocate a vector / queue to.
> > 
> > Yeah, I gathered that's what it was providing, but that's just barely
> > not enough information to do something useful. The CPUs that aren't set
> > have to use a previously assigned vector/queue, but which one?
> 
> Always the previous one.  Below is a patch to get us back to the
> previous behavior:

No, that's not right.

Here's my topology info:

  # numactl --hardware
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
  node 0 size: 15745 MB
  node 0 free: 15319 MB
  node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
  node 1 size: 16150 MB
  node 1 free: 15758 MB
  node distances:
  node   0   1
    0:  10  21
    1:  21  10

If I have 16 vectors, the affinity_mask generated by what you're doing
looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each
of those are the first unique CPU, getting a unique vector just like you
wanted. If an unset bit just means share with the previous, then all of
my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful!

What we want for my CPU topology is the 16th CPU to pair with CPU 0,
17 pairs with 1, 18 with 2, and so on. You can't convey that information
with this scheme. We need affinity_masks per vector.

WARNING: multiple messages have this Message-ID (diff)
From: keith.busch@intel.com (Keith Busch)
Subject: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask
Date: Tue, 6 Sep 2016 10:39:28 -0400	[thread overview]
Message-ID: <20160906143928.GA25201@localhost.localdomain> (raw)
In-Reply-To: <20160905194759.GA26008@lst.de>

On Mon, Sep 05, 2016@09:48:00PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 01, 2016@10:24:10AM -0400, Keith Busch wrote:
> > On Thu, Sep 01, 2016@10:46:24AM +0200, Christoph Hellwig wrote:
> > > On Wed, Aug 31, 2016@12:38:53PM -0400, Keith Busch wrote:
> > > > This can't be right. We have a single affinity mask for the entire
> > > > set, but what I think we want is an one affinity mask for each
> > > > nr_io_queues. The irq_create_affinity_mask should then create an array
> > > > of cpumasks based on nr_vecs..
> > > 
> > > Nah, this is Thomas' creating abuse of the cpumask type.  Every bit set
> > > in the affinity_mask means this is a cpu we allocate a vector / queue to.
> > 
> > Yeah, I gathered that's what it was providing, but that's just barely
> > not enough information to do something useful. The CPUs that aren't set
> > have to use a previously assigned vector/queue, but which one?
> 
> Always the previous one.  Below is a patch to get us back to the
> previous behavior:

No, that's not right.

Here's my topology info:

  # numactl --hardware
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
  node 0 size: 15745 MB
  node 0 free: 15319 MB
  node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
  node 1 size: 16150 MB
  node 1 free: 15758 MB
  node distances:
  node   0   1
    0:  10  21
    1:  21  10

If I have 16 vectors, the affinity_mask generated by what you're doing
looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each
of those are the first unique CPU, getting a unique vector just like you
wanted. If an unset bit just means share with the previous, then all of
my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful!

What we want for my CPU topology is the 16th CPU to pair with CPU 0,
17 pairs with 1, 18 with 2, and so on. You can't convey that information
with this scheme. We need affinity_masks per vector.

  reply	other threads:[~2016-09-06 14:28 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-29 10:53 blk-mq: allow passing in an external queue mapping V2 Christoph Hellwig
2016-08-29 10:53 ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 1/7] blk-mq: don't redistribute hardware queues on a CPU hotplug event Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 2/7] blk-mq: only allocate a single mq_map per tag_set Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 3/7] blk-mq: remove ->map_queue Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-31 16:38   ` Keith Busch
2016-08-31 16:38     ` Keith Busch
2016-09-01  8:46     ` Christoph Hellwig
2016-09-01  8:46       ` Christoph Hellwig
2016-09-01 14:24       ` Keith Busch
2016-09-01 14:24         ` Keith Busch
2016-09-01 23:30         ` Keith Busch
2016-09-01 23:30           ` Keith Busch
2016-09-05 19:48         ` Christoph Hellwig
2016-09-05 19:48           ` Christoph Hellwig
2016-09-06 14:39           ` Keith Busch [this message]
2016-09-06 14:39             ` Keith Busch
2016-09-06 16:50             ` Christoph Hellwig
2016-09-06 16:50               ` Christoph Hellwig
2016-09-06 17:30               ` Keith Busch
2016-09-06 17:30                 ` Keith Busch
2016-09-07 15:38               ` Thomas Gleixner
2016-09-07 15:38                 ` Thomas Gleixner
2016-08-29 10:53 ` [PATCH 5/7] nvme: switch to use pci_alloc_irq_vectors Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 6/7] nvme: remove the post_scan callout Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 7/7] blk-mq: get rid of the cpumask in struct blk_mq_tags Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-30 23:28 ` blk-mq: allow passing in an external queue mapping V2 Keith Busch
2016-08-30 23:28   ` Keith Busch
2016-09-01  8:45   ` Christoph Hellwig
2016-09-01  8:45     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160906143928.GA25201@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.