linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Mike Snitzer <snitzer@redhat.com>,
	linux-scsi@vger.kernel.org, Arun Easi <arun.easi@cavium.com>,
	Omar Sandoval <osandov@fb.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	Don Brace <don.brace@microsemi.com>,
	Kashyap Desai <kashyap.desai@broadcom.com>,
	Peter Rivera <peter.rivera@broadcom.com>,
	Laurence Oberman <loberman@redhat.com>,
	Meelis Roos <mroos@linux.ee>
Subject: Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
Date: Thu, 8 Mar 2018 17:19:28 +0800	[thread overview]
Message-ID: <20180308091926.GB24816@ming.t460p> (raw)
In-Reply-To: <ad59f1bc-5db6-1f27-9924-b043b76d5257@suse.de>

On Thu, Mar 08, 2018 at 09:41:16AM +0100, Hannes Reinecke wrote:
> On 03/08/2018 09:15 AM, Ming Lei wrote:
> > On Thu, Mar 08, 2018 at 08:50:35AM +0100, Christoph Hellwig wrote:
> >>> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> >>> +{
> >>> +	const struct cpumask *mask;
> >>> +	unsigned int queue, cpu;
> >>> +
> >>> +	for (queue = 0; queue < h->msix_vectors; queue++) {
> >>> +		mask = pci_irq_get_affinity(h->pdev, queue);
> >>> +		if (!mask)
> >>> +			goto fallback;
> >>> +
> >>> +		for_each_cpu(cpu, mask)
> >>> +			h->reply_map[cpu] = queue;
> >>> +	}
> >>> +	return;
> >>> +
> >>> +fallback:
> >>> +	for_each_possible_cpu(cpu)
> >>> +		h->reply_map[cpu] = 0;
> >>> +}
> >>
> >> It seems a little annoying that we have to duplicate this in the driver.
> >> Wouldn't this be solved by your force_blk_mq flag and relying on the
> >> hw_ctx id?
> > 
> > This issue can be solved by force_blk_mq, but may cause performance
> > regression for host-wide tagset drivers:
> > 
> > - If the whole tagset is partitioned into each hw queue, each hw queue's
> > depth may not be high enough, especially SCSI's IO path may be not
> > efficient enough. Even though we keep each queue's depth as 256, which
> > should be high enough to exploit parallelism from device internal view,
> > but still can't get good performance.
> > 
> > - If the whole tagset is still shared among all hw queues, the shared
> > tags can be accessed from all CPUs, and IOPS is degraded.
> > 
> > Kashyap has tested the above two approaches, both hurts IOPS on megaraid_sas.
> > 
> This is precisely the issue I have been worried about, too.
> 
> The problem is not so much the tagspace (which actually is quite small
> memory footprint-wise), but rather the _requests_ indexed by the tags.

But V1 is done in this way, one shared tags is used and requests are
allocated for each hw queue in NUMA locality, finally Kashyap confirmed
that IOPS can be recovered to normal if iostats is set as 0 after V1 is
applied:

	https://marc.info/?l=linux-scsi&m=151815231026789&w=2

That means the shared tags does have a big effect on performance.

> 
> We have this:
> 
> struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>                                         unsigned int hctx_idx,
>                                         unsigned int nr_tags,
>                                         unsigned int reserved_tags)
> {
>         struct blk_mq_tags *tags;
>         int node;
> 
>         node = blk_mq_hw_queue_to_node(set->mq_map, hctx_idx);
>         if (node == NUMA_NO_NODE)
>                 node = set->numa_node;
> 
>         tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
>                      BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>         if (!tags)
>                 return NULL;
> 
>         tags->rqs = kzalloc_node(nr_tags * sizeof(struct request *),
>                       GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, node);
> 
> 
> IE the _entire_ request set is allocated as _one_ array, making it quite
> hard to handle from the lower-level CPU caches.
> Also the 'node' indicator doesn't really help us here, as the requests
> have to be access by all CPUs in the shared tag case.
> 
> Would it be possible move tags->rqs to become a _double_ pointer?
> Then we would have only a shared lookup table, but the requests
> themselves can be allocated per node, depending on the CPU map.
> _And_ it should be easier on the CPU cache ...

That is basically same with the way in V1, even similar with V3, in
which per-node hw queue is introduced, from Kashyap's test, the
performance isn't bad. I believe finally IOPS can be improved if
scsi_host->host_busy operation is removed from IO path and
megaraid_sas driver is improved, as I mentioned earlier.

Thanks,
Ming

  reply	other threads:[~2018-03-08  9:19 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
2018-03-01 16:18   ` Don Brace
2018-03-01 19:01     ` Laurence Oberman
2018-03-01 21:19       ` Laurence Oberman
2018-03-02  2:16         ` Ming Lei
2018-03-02 14:09           ` Laurence Oberman
2018-03-02 15:03             ` Don Brace
2018-03-02 21:53               ` Laurence Oberman
2018-03-05  2:07                 ` Ming Lei
2018-03-06 17:55                   ` Martin K. Petersen
2018-03-06 19:24                   ` Martin K. Petersen
2018-03-07  0:00                     ` Ming Lei
2018-03-07  3:14                       ` Martin K. Petersen
2018-03-07 14:11                     ` Laurence Oberman
2018-03-08 13:42                       ` Ming Lei
2018-03-08 20:56                         ` Laurence Oberman
2018-03-05  7:23                 ` Kashyap Desai
2018-03-05 14:35                   ` Don Brace
2018-03-05 15:19                   ` Mike Snitzer
2018-03-02  0:47     ` Ming Lei
2018-03-08  7:50   ` Christoph Hellwig
2018-03-08  8:15     ` Ming Lei
2018-03-08  8:41       ` Hannes Reinecke
2018-03-08  9:19         ` Ming Lei [this message]
2018-03-08 15:31         ` Bart Van Assche
2018-02-27 10:07 ` [PATCH V3 2/8] scsi: megaraid_sas: " Ming Lei
2018-02-27 10:07 ` [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags' Ming Lei
2018-03-08  7:51   ` Christoph Hellwig
2018-02-27 10:07 ` [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS Ming Lei
2018-03-08  7:52   ` Christoph Hellwig
2018-03-08  9:35     ` Ming Lei
2018-02-27 10:07 ` [PATCH V3 5/8] scsi: Add template flag 'host_tagset' Ming Lei
2018-02-27 10:07 ` [PATCH V3 6/8] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
2018-02-27 10:07 ` [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset Ming Lei
2018-03-08  7:54   ` Christoph Hellwig
2018-03-08 10:59     ` Ming Lei
2018-02-27 10:07 ` [PATCH V3 8/8] scsi: megaraid: " Ming Lei
2018-02-28 14:58   ` Kashyap Desai
2018-02-28 15:21     ` Ming Lei
2018-02-28 16:22       ` Laurence Oberman
2018-03-01  5:24         ` Kashyap Desai
2018-03-01  7:58           ` Ming Lei
2018-03-07  5:27     ` Ming Lei
2018-03-07 15:01       ` Kashyap Desai
2018-03-07 16:05         ` Ming Lei
2018-03-07 17:28           ` Kashyap Desai
2018-03-08  1:15             ` Ming Lei
2018-03-08 10:04               ` Kashyap Desai
2018-03-08 11:06                 ` Ming Lei
2018-03-08 11:23                   ` Ming Lei
2018-03-09  6:56                     ` Kashyap Desai
2018-03-09  8:13                       ` Ming Lei
2018-03-01 21:46 ` [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Laurence Oberman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180308091926.GB24816@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=arun.easi@cavium.com \
    --cc=axboe@kernel.dk \
    --cc=don.brace@microsemi.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=loberman@redhat.com \
    --cc=martin.petersen@oracle.com \
    --cc=mroos@linux.ee \
    --cc=osandov@fb.com \
    --cc=peter.rivera@broadcom.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).