All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Kashyap Desai <kashyap.desai@broadcom.com>,
	Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Christoph Hellwig <hch@lst.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	linux-scsi@vger.kernel.org,
	Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>,
	PDL-MPT-FUSIONLINUX <mpt-fusionlinux.pdl@broadcom.com>
Subject: Re: [PATCH 00/10] mpt3sas: full mq support
Date: Wed, 15 Feb 2017 11:05:14 +0100	[thread overview]
Message-ID: <e34535df-ab24-bf52-579f-faeca8c9c9d4@suse.de> (raw)
In-Reply-To: <01e426acae471cf8e599a5100bc8d409@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4722 bytes --]

On 02/15/2017 10:18 AM, Kashyap Desai wrote:
>>
>>
>> Hannes,
>>
>> Result I have posted last time is with merge operation enabled in block
>> layer. If I disable merge operation then I don't see much improvement
>> with
>> multiple hw request queues. Here is the result,
>>
>> fio results when nr_hw_queues=1,
>> 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905,
>> runt=150003msec
>>
>> fio results when nr_hw_queues=24,
>> 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393,
>> runt=150001msec
> 
> Hannes -
> 
>  I worked with Sreekanth and also understand pros/cons of Patch #10.
> " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering"
> 
> In above patch, can_queue of HBA is divided based on logic CPU, it means we
> want to mimic as if mpt3sas HBA support multi queue distributing actual
> resources which is single Submission H/W Queue. This approach badly impact
> many performance areas.
> 
> nr_hw_queues = 1 is what I observe as best performance approach since it
> never throttle IO if sdev->queue_depth is set to HBA queue depth.
> In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we never
> allow more than "updated can_queue" in LLD.
> 
True.
And this was actually one of the things I wanted to demonstrate with
this patchset :-)
ATM blk-mq really works best when having a distinct tag space per
port/device. As soon as the hardware provides a _shared_ tag space you
end up with tag starvation issues as blk-mq only allows you to do a
static split of the available tagspace.
While this patchset demonstrates that the HBA itself _does_ benefit from
using block-mq (especially on highly parallel loads), it also
demonstrates that _block-mq_ has issues with singlethreaded loads on
this HBA (or, rather, type of HBA, as I doubt this issue is affecting
mpt3sas only).

> Below code bring actual HBA can_queue very low ( Ea on 96 logical core CPU
> new can_queue goes to 42, if HBA queue depth is 4K). It means we will see
> lots of IO throttling in scsi mid layer due to shost->can_queue reach the
> limit very soon if you have <fio> jobs with higher QD.
> 
> 	if (ioc->shost->nr_hw_queues > 1) {
> 		ioc->shost->nr_hw_queues = ioc->msix_vector_count;
> 		ioc->shost->can_queue /= ioc->msix_vector_count;
> 	}
> I observe negative performance if I have 8 SSD drives attached to Ventura
> (latest IT controller). 16 fio jobs at QD=128 gives ~1600K IOPs and the
> moment I switch to nr_hw_queues = "CPUs", it gave hardly ~850K IOPs. This is
> mainly because of host_busy stuck at very low ~169 on my setup.
> 
Which actually might be an issue with the way scsi is hooked into blk-mq.
The SCSI stack is using 'can_queue' as a check for 'host_busy', ie if
the host is capable of accepting more commands.
As we're limiting can_queue (to get the per-queue command depth
correctly) we should be using the _overall_ command depth for the
can_queue value itself to make the host_busy check work correctly.

I've attached a patch for that; can you test if it makes a difference?

> May be as Sreekanth mentioned, performance improvement you have observed is
> due to nomerges=2 is not set and OS will attempt soft back/front merge.
> 
> I debug live machine and understood we never see parallel instance of
> "scsi_dispatch_cmd" as we expect due to can_queue is less. If we really has
> *very* large HBA QD, this patch #10 to expose multiple SQ may be useful.
> 
As mentioned, the above patch might help here.
The patch actually _reduced_ throughput on my end, as the requests never
stayed long enough in the queue to be merged. Hence I've refrained from
posting it.
But as you're able to test with SSDs this patch really should make a
difference, and certainly should remove the arbitrary stalls due to
host_busy.

> For now, we are looking for updated version of patch which will only keep IT
> HBA in SQ mode (like we are doing in <megaraid_sas> driver) and add
> interface to use blk_tag in both scsi.mq and !scsi.mq mode.  Sreekanth has
> already started working on it, but we may need to check full performance
> test run to post the actual patch.
> May be we can cherry pick few patches from this series and get blk_tag
> support to improve performance of <mpt3sas> later which will not allow use
> to choose nr_hw_queue to be tunable.
> 
Sure, no problem with that.
I'll be preparing another submission round, and we can discuss how we go
from there.

Cheers,

Hannes
> Thanks, Kashyap
> 
> 
>>
>> Thanks,
>> Sreekanth


-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-mpt3sas-implement-shared_tags-SCSI-host-flag.patch --]
[-- Type: text/x-patch; name="0001-mpt3sas-implement-shared_tags-SCSI-host-flag.patch", Size: 3253 bytes --]

From df424c8618e0b06ded2d978818e6d3df4a54a61d Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Wed, 15 Feb 2017 10:58:01 +0100
Subject: [PATCH] mpt3sas: implement 'shared_tags' SCSI host flag

If the HBA implements a host-wide tagspace we should be signalling
this to the SCSI layer to avoid 'can_queue' being reduced, thereby
inducing I/O stalls.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 5 ++---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 ++
 drivers/scsi/scsi_lib.c              | 2 ++
 include/scsi/scsi_host.h             | 5 +++++
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 9e31cae..520aee4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3544,10 +3544,9 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 	 */
 	ioc->shost->reserved_cmds = INTERNAL_SCSIIO_CMDS_COUNT;
 	ioc->shost->can_queue = ioc->scsiio_depth - ioc->shost->reserved_cmds;
-	if (ioc->shost->nr_hw_queues > 1) {
+	if (ioc->shost->nr_hw_queues > 1)
 		ioc->shost->nr_hw_queues = ioc->msix_vector_count;
-		ioc->shost->can_queue /= ioc->msix_vector_count;
-	}
+
 	dinitprintk(ioc, pr_info(MPT3SAS_FMT
 		"scsi host: can_queue depth (%d), nr_hw_queues (%d)\n",
 		ioc->name, ioc->shost->can_queue, ioc->shost->nr_hw_queues));
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 14f7a9d..4088e1a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -8585,6 +8585,7 @@ static int scsih_map_queues(struct Scsi_Host *shost)
 	.shost_attrs			= mpt3sas_host_attrs,
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
+	.shared_tags			= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
 };
 
@@ -8624,6 +8625,7 @@ static int scsih_map_queues(struct Scsi_Host *shost)
 	.shost_attrs			= mpt3sas_host_attrs,
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
+	.shared_tags			= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
 };
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7100aaa..6bb06ed 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2143,6 +2143,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.ops = &scsi_mq_ops;
 	shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
 	shost->tag_set.queue_depth = shost->can_queue;
+	if (shost->hostt->shared_tags)
+		shost->tag_set.queue_depth /= shost->nr_hw_queues;
 	shost->tag_set.reserved_tags = shost->reserved_cmds;
 	shost->tag_set.cmd_size = cmd_size;
 	shost->tag_set.numa_node = NUMA_NO_NODE;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index cc83dd6..d344803 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -457,6 +457,11 @@ struct scsi_host_template {
 	unsigned no_async_abort:1;
 
 	/*
+	 * True if the host uses a shared tag space
+	 */
+	unsigned shared_tags:1;
+
+	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
 	unsigned int max_host_blocked;
-- 
1.8.5.6


  reply	other threads:[~2017-02-15 10:05 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-31  9:25 [PATCH 00/10] mpt3sas: full mq support Hannes Reinecke
2017-01-31  9:25 ` [PATCH 01/10] mpt3sas: switch to pci_alloc_irq_vectors Hannes Reinecke
2017-02-07 13:15   ` Christoph Hellwig
2017-02-16  9:32   ` Sreekanth Reddy
2017-02-16 10:01     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 02/10] mpt3sas: set default value for cb_idx Hannes Reinecke
2017-02-07 13:15   ` Christoph Hellwig
2017-01-31  9:25 ` [PATCH 03/10] mpt3sas: implement _dechain_st() Hannes Reinecke
2017-02-07 13:15   ` Christoph Hellwig
2017-02-07 13:18     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 04/10] mpt3sas: separate out _base_recovery_check() Hannes Reinecke
2017-02-07 13:16   ` Christoph Hellwig
2017-02-16  9:53   ` Sreekanth Reddy
2017-02-16 10:03     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 05/10] mpt3sas: open-code _scsih_scsi_lookup_get() Hannes Reinecke
2017-02-07 13:16   ` Christoph Hellwig
2017-02-16  9:59   ` Sreekanth Reddy
2017-02-16 10:04     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 06/10] mpt3sas: Introduce mpt3sas_get_st_from_smid() Hannes Reinecke
2017-02-07 13:17   ` Christoph Hellwig
2017-01-31  9:25 ` [PATCH 07/10] mpt3sas: use hi-priority queue for TMFs Hannes Reinecke
2017-02-07 13:19   ` Christoph Hellwig
2017-02-16 10:09   ` Sreekanth Reddy
2017-02-16 10:14     ` Hannes Reinecke
2017-02-16 10:23       ` Sreekanth Reddy
2017-02-16 10:26         ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 08/10] mpt3sas: lockless command submission for scsi-mq Hannes Reinecke
2017-01-31 13:22   ` Christoph Hellwig
2017-01-31 13:46     ` Hannes Reinecke
2017-01-31 14:24       ` Christoph Hellwig
2017-01-31  9:25 ` [PATCH 09/10] mpt3sas: Use 'msix_index' as argument for put_smid functions Hannes Reinecke
2017-01-31  9:26 ` [PATCH 10/10] mpt3sas: scsi-mq interrupt steering Hannes Reinecke
2017-01-31 10:05   ` Christoph Hellwig
2017-01-31 10:02 ` [PATCH 00/10] mpt3sas: full mq support Christoph Hellwig
2017-01-31 11:16   ` Hannes Reinecke
2017-01-31 17:54     ` Kashyap Desai
2017-02-01  6:51       ` Hannes Reinecke
2017-02-01  7:07         ` Kashyap Desai
2017-02-01  7:43           ` Hannes Reinecke
2017-02-09 13:03             ` Sreekanth Reddy
2017-02-09 13:12               ` Hannes Reinecke
2017-02-10  4:43                 ` Sreekanth Reddy
2017-02-10  6:59                   ` Hannes Reinecke
2017-02-13  6:15                     ` Sreekanth Reddy
2017-02-13 13:11                       ` Hannes Reinecke
2017-02-15  8:27                         ` Sreekanth Reddy
2017-02-15  9:18                           ` Kashyap Desai
2017-02-15 10:05                             ` Hannes Reinecke [this message]
2017-02-16  9:48                               ` Kashyap Desai
2017-02-16 10:18                                 ` Hannes Reinecke
2017-02-16 10:45                                   ` Kashyap Desai
2017-02-07 13:19 ` Christoph Hellwig
2017-02-07 14:38   ` Hannes Reinecke
2017-02-07 15:34     ` Christoph Hellwig
2017-02-07 15:39       ` Hannes Reinecke
2017-02-07 15:40         ` Christoph Hellwig
2017-02-07 15:49           ` Hannes Reinecke
2017-02-15  8:15   ` Christoph Hellwig
2017-02-15  8:19     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e34535df-ab24-bf52-579f-faeca8c9c9d4@suse.de \
    --to=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpt-fusionlinux.pdl@broadcom.com \
    --cc=sathya.prakash@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.