* [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ
@ 2019-08-16 2:36 James Smart
2019-08-16 16:36 ` Ewan D. Milne
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: James Smart @ 2019-08-16 2:36 UTC (permalink / raw)
To: linux-scsi; +Cc: James Smart, Dick Kennedy
When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of
MQ resources based on shost values set by the driver. In newer cases
of the driver, which attempts to set nr_hw_queues to the cpu count,
the multipliers become excessive, with a single shost having SCSI-MQ
pre-allocation reaching into the multiple GBytes range. NPIV, which
creates additional shosts, only multiply this overhead. On lower-memory
systems, this can exhaust system memory very quickly, resulting in a
system crash or failures in the driver or elsewhere due to low memory
conditions.
After testing several scenarios, the situation can be mitigated by
limiting the value set in shost->nr_hw_queues to 4. Although the shost
values were changed, the driver still had per-cpu hardware queues of
its own that allowed parallelization per-cpu. Testing revealed that
even with the smallish number for nr_hw_queues for SCSI-MQ, performance
levels remained near maximum with the within-driver affiinitization.
A module parameter was created to allow the value set for the
nr_hw_queues to be tunable.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
v3: add Ming's reviewed-by tag
---
drivers/scsi/lpfc/lpfc.h | 1 +
drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++
drivers/scsi/lpfc/lpfc_init.c | 10 ++++++----
drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++
4 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 2c3bb8a966e5..bade2e025ecf 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -824,6 +824,7 @@ struct lpfc_hba {
uint32_t cfg_cq_poll_threshold;
uint32_t cfg_cq_max_proc_limit;
uint32_t cfg_fcp_cpu_map;
+ uint32_t cfg_fcp_mq_threshold;
uint32_t cfg_hdw_queue;
uint32_t cfg_irq_chann;
uint32_t cfg_suppress_rsp;
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index ea62322ffe2b..8d8c495b5b60 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -5709,6 +5709,19 @@ LPFC_ATTR_RW(nvme_embed_cmd, 1, 0, 2,
"Embed NVME Command in WQE");
/*
+ * lpfc_fcp_mq_threshold: Set the maximum number of Hardware Queues
+ * the driver will advertise it supports to the SCSI layer.
+ *
+ * 0 = Set nr_hw_queues by the number of CPUs or HW queues.
+ * 1,128 = Manually specify the maximum nr_hw_queue value to be set,
+ *
+ * Value range is [0,128]. Default value is 8.
+ */
+LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF,
+ LPFC_FCP_MQ_THRESHOLD_MIN, LPFC_FCP_MQ_THRESHOLD_MAX,
+ "Set the number of SCSI Queues advertised");
+
+/*
* lpfc_hdw_queue: Set the number of Hardware Queues the driver
* will advertise it supports to the NVME and SCSI layers. This also
* will map to the number of CQ/WQ pairs the driver will create.
@@ -6030,6 +6043,7 @@ struct device_attribute *lpfc_hba_attrs[] = {
&dev_attr_lpfc_cq_poll_threshold,
&dev_attr_lpfc_cq_max_proc_limit,
&dev_attr_lpfc_fcp_cpu_map,
+ &dev_attr_lpfc_fcp_mq_threshold,
&dev_attr_lpfc_hdw_queue,
&dev_attr_lpfc_irq_chann,
&dev_attr_lpfc_suppress_rsp,
@@ -7112,6 +7126,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba)
/* Initialize first burst. Target vs Initiator are different. */
lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb);
lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size);
+ lpfc_fcp_mq_threshold_init(phba, lpfc_fcp_mq_threshold);
lpfc_hdw_queue_init(phba, lpfc_hdw_queue);
lpfc_irq_chann_init(phba, lpfc_irq_chann);
lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index faf43b1d3dbe..03998579d6ee 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -4309,10 +4309,12 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev)
shost->max_cmd_len = 16;
if (phba->sli_rev == LPFC_SLI_REV4) {
- if (phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ)
- shost->nr_hw_queues = phba->cfg_hdw_queue;
- else
- shost->nr_hw_queues = phba->sli4_hba.num_present_cpu;
+ if (!phba->cfg_fcp_mq_threshold ||
+ phba->cfg_fcp_mq_threshold > phba->cfg_hdw_queue)
+ phba->cfg_fcp_mq_threshold = phba->cfg_hdw_queue;
+
+ shost->nr_hw_queues = min_t(int, 2 * num_possible_nodes(),
+ phba->cfg_fcp_mq_threshold);
shost->dma_boundary =
phba->sli4_hba.pc_sli4_params.sge_supp_len-1;
diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h
index 3aeca387b22a..329f7aa7e169 100644
--- a/drivers/scsi/lpfc/lpfc_sli4.h
+++ b/drivers/scsi/lpfc/lpfc_sli4.h
@@ -44,6 +44,11 @@
#define LPFC_HBA_HDWQ_MAX 128
#define LPFC_HBA_HDWQ_DEF 0
+/* FCP MQ queue count limiting */
+#define LPFC_FCP_MQ_THRESHOLD_MIN 0
+#define LPFC_FCP_MQ_THRESHOLD_MAX 128
+#define LPFC_FCP_MQ_THRESHOLD_DEF 8
+
/* Common buffer size to accomidate SCSI and NVME IO buffers */
#define LPFC_COMMON_IO_BUF_SZ 768
--
2.13.7
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ
2019-08-16 2:36 [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ James Smart
@ 2019-08-16 16:36 ` Ewan D. Milne
2019-08-20 2:15 ` Martin K. Petersen
2019-08-26 7:18 ` Hannes Reinecke
2 siblings, 0 replies; 5+ messages in thread
From: Ewan D. Milne @ 2019-08-16 16:36 UTC (permalink / raw)
To: James Smart, linux-scsi; +Cc: Dick Kennedy
On Thu, 2019-08-15 at 19:36 -0700, James Smart wrote:
> When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of
> MQ resources based on shost values set by the driver. In newer cases
> of the driver, which attempts to set nr_hw_queues to the cpu count,
> the multipliers become excessive, with a single shost having SCSI-MQ
> pre-allocation reaching into the multiple GBytes range. NPIV, which
> creates additional shosts, only multiply this overhead. On lower-memory
> systems, this can exhaust system memory very quickly, resulting in a
> system crash or failures in the driver or elsewhere due to low memory
> conditions.
>
> After testing several scenarios, the situation can be mitigated by
> limiting the value set in shost->nr_hw_queues to 4. Although the shost
> values were changed, the driver still had per-cpu hardware queues of
> its own that allowed parallelization per-cpu. Testing revealed that
> even with the smallish number for nr_hw_queues for SCSI-MQ, performance
> levels remained near maximum with the within-driver affiinitization.
>
> A module parameter was created to allow the value set for the
> nr_hw_queues to be tunable.
>
> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
> Signed-off-by: James Smart <jsmart2021@gmail.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>
> ---
> v3: add Ming's reviewed-by tag
> ---
> drivers/scsi/lpfc/lpfc.h | 1 +
> drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++
> drivers/scsi/lpfc/lpfc_init.c | 10 ++++++----
> drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++
> 4 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
> index 2c3bb8a966e5..bade2e025ecf 100644
> --- a/drivers/scsi/lpfc/lpfc.h
> +++ b/drivers/scsi/lpfc/lpfc.h
> @@ -824,6 +824,7 @@ struct lpfc_hba {
> uint32_t cfg_cq_poll_threshold;
> uint32_t cfg_cq_max_proc_limit;
> uint32_t cfg_fcp_cpu_map;
> + uint32_t cfg_fcp_mq_threshold;
> uint32_t cfg_hdw_queue;
> uint32_t cfg_irq_chann;
> uint32_t cfg_suppress_rsp;
> diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> index ea62322ffe2b..8d8c495b5b60 100644
> --- a/drivers/scsi/lpfc/lpfc_attr.c
> +++ b/drivers/scsi/lpfc/lpfc_attr.c
> @@ -5709,6 +5709,19 @@ LPFC_ATTR_RW(nvme_embed_cmd, 1, 0, 2,
> "Embed NVME Command in WQE");
>
> /*
> + * lpfc_fcp_mq_threshold: Set the maximum number of Hardware Queues
> + * the driver will advertise it supports to the SCSI layer.
> + *
> + * 0 = Set nr_hw_queues by the number of CPUs or HW queues.
> + * 1,128 = Manually specify the maximum nr_hw_queue value to be set,
> + *
> + * Value range is [0,128]. Default value is 8.
> + */
> +LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF,
> + LPFC_FCP_MQ_THRESHOLD_MIN, LPFC_FCP_MQ_THRESHOLD_MAX,
> + "Set the number of SCSI Queues advertised");
> +
> +/*
> * lpfc_hdw_queue: Set the number of Hardware Queues the driver
> * will advertise it supports to the NVME and SCSI layers. This also
> * will map to the number of CQ/WQ pairs the driver will create.
> @@ -6030,6 +6043,7 @@ struct device_attribute *lpfc_hba_attrs[] = {
> &dev_attr_lpfc_cq_poll_threshold,
> &dev_attr_lpfc_cq_max_proc_limit,
> &dev_attr_lpfc_fcp_cpu_map,
> + &dev_attr_lpfc_fcp_mq_threshold,
> &dev_attr_lpfc_hdw_queue,
> &dev_attr_lpfc_irq_chann,
> &dev_attr_lpfc_suppress_rsp,
> @@ -7112,6 +7126,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba)
> /* Initialize first burst. Target vs Initiator are different. */
> lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb);
> lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size);
> + lpfc_fcp_mq_threshold_init(phba, lpfc_fcp_mq_threshold);
> lpfc_hdw_queue_init(phba, lpfc_hdw_queue);
> lpfc_irq_chann_init(phba, lpfc_irq_chann);
> lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr);
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index faf43b1d3dbe..03998579d6ee 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -4309,10 +4309,12 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev)
> shost->max_cmd_len = 16;
>
> if (phba->sli_rev == LPFC_SLI_REV4) {
> - if (phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ)
> - shost->nr_hw_queues = phba->cfg_hdw_queue;
> - else
> - shost->nr_hw_queues = phba->sli4_hba.num_present_cpu;
> + if (!phba->cfg_fcp_mq_threshold ||
> + phba->cfg_fcp_mq_threshold > phba->cfg_hdw_queue)
> + phba->cfg_fcp_mq_threshold = phba->cfg_hdw_queue;
> +
> + shost->nr_hw_queues = min_t(int, 2 * num_possible_nodes(),
> + phba->cfg_fcp_mq_threshold);
>
> shost->dma_boundary =
> phba->sli4_hba.pc_sli4_params.sge_supp_len-1;
> diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h
> index 3aeca387b22a..329f7aa7e169 100644
> --- a/drivers/scsi/lpfc/lpfc_sli4.h
> +++ b/drivers/scsi/lpfc/lpfc_sli4.h
> @@ -44,6 +44,11 @@
> #define LPFC_HBA_HDWQ_MAX 128
> #define LPFC_HBA_HDWQ_DEF 0
>
> +/* FCP MQ queue count limiting */
> +#define LPFC_FCP_MQ_THRESHOLD_MIN 0
> +#define LPFC_FCP_MQ_THRESHOLD_MAX 128
> +#define LPFC_FCP_MQ_THRESHOLD_DEF 8
> +
> /* Common buffer size to accomidate SCSI and NVME IO buffers */
> #define LPFC_COMMON_IO_BUF_SZ 768
>
Looks good.
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ
2019-08-16 2:36 [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ James Smart
2019-08-16 16:36 ` Ewan D. Milne
@ 2019-08-20 2:15 ` Martin K. Petersen
2019-08-26 7:18 ` Hannes Reinecke
2 siblings, 0 replies; 5+ messages in thread
From: Martin K. Petersen @ 2019-08-20 2:15 UTC (permalink / raw)
To: James Smart; +Cc: linux-scsi, Dick Kennedy
James,
> When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of
> MQ resources based on shost values set by the driver. In newer cases
> of the driver, which attempts to set nr_hw_queues to the cpu count,
> the multipliers become excessive, with a single shost having SCSI-MQ
> pre-allocation reaching into the multiple GBytes range. NPIV, which
> creates additional shosts, only multiply this overhead. On lower-memory
> systems, this can exhaust system memory very quickly, resulting in a
> system crash or failures in the driver or elsewhere due to low memory
> conditions.
Applied to 5.3/scsi-fixes. Thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ
2019-08-16 2:36 [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ James Smart
2019-08-16 16:36 ` Ewan D. Milne
2019-08-20 2:15 ` Martin K. Petersen
@ 2019-08-26 7:18 ` Hannes Reinecke
2019-08-26 16:23 ` James Smart
2 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2019-08-26 7:18 UTC (permalink / raw)
To: James Smart, linux-scsi; +Cc: Dick Kennedy
On 8/16/19 4:36 AM, James Smart wrote:
> When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of
> MQ resources based on shost values set by the driver. In newer cases
> of the driver, which attempts to set nr_hw_queues to the cpu count,
> the multipliers become excessive, with a single shost having SCSI-MQ
> pre-allocation reaching into the multiple GBytes range. NPIV, which
> creates additional shosts, only multiply this overhead. On lower-memory
> systems, this can exhaust system memory very quickly, resulting in a
> system crash or failures in the driver or elsewhere due to low memory
> conditions.
>
> After testing several scenarios, the situation can be mitigated by
> limiting the value set in shost->nr_hw_queues to 4. Although the shost
> values were changed, the driver still had per-cpu hardware queues of
> its own that allowed parallelization per-cpu. Testing revealed that
> even with the smallish number for nr_hw_queues for SCSI-MQ, performance
> levels remained near maximum with the within-driver affiinitization.
>
> A module parameter was created to allow the value set for the
> nr_hw_queues to be tunable.
>
> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
> Signed-off-by: James Smart <jsmart2021@gmail.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>
> ---
> v3: add Ming's reviewed-by tag
> ---
> drivers/scsi/lpfc/lpfc.h | 1 +
> drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++
> drivers/scsi/lpfc/lpfc_init.c | 10 ++++++----
> drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++
> 4 files changed, 27 insertions(+), 4 deletions(-)
>
Well, that doesn't actually match with my measurements (where I've seen
max I/O performance at about 16 queues); so I guess this is pretty much
setup-specific.
However, I'm somewhat loath to have a cap at 128; we actually have
several machines where we'll be having more CPUs than that.
Can't we increase the cap to 512 to give us a bit more leeway during
testing?
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ
2019-08-26 7:18 ` Hannes Reinecke
@ 2019-08-26 16:23 ` James Smart
0 siblings, 0 replies; 5+ messages in thread
From: James Smart @ 2019-08-26 16:23 UTC (permalink / raw)
To: Hannes Reinecke, linux-scsi; +Cc: Dick Kennedy, Sagi Grimberg, James Smart
On 8/26/2019 12:18 AM, Hannes Reinecke wrote:
> On 8/16/19 4:36 AM, James Smart wrote:
>> When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of
>> MQ resources based on shost values set by the driver. In newer cases
>> of the driver, which attempts to set nr_hw_queues to the cpu count,
>> the multipliers become excessive, with a single shost having SCSI-MQ
>> pre-allocation reaching into the multiple GBytes range. NPIV, which
>> creates additional shosts, only multiply this overhead. On lower-memory
>> systems, this can exhaust system memory very quickly, resulting in a
>> system crash or failures in the driver or elsewhere due to low memory
>> conditions.
>>
>> After testing several scenarios, the situation can be mitigated by
>> limiting the value set in shost->nr_hw_queues to 4. Although the shost
>> values were changed, the driver still had per-cpu hardware queues of
>> its own that allowed parallelization per-cpu. Testing revealed that
>> even with the smallish number for nr_hw_queues for SCSI-MQ, performance
>> levels remained near maximum with the within-driver affiinitization.
>>
>> A module parameter was created to allow the value set for the
>> nr_hw_queues to be tunable.
>>
>> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
>> Signed-off-by: James Smart <jsmart2021@gmail.com>
>> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>>
>> ---
>> v3: add Ming's reviewed-by tag
>> ---
>> drivers/scsi/lpfc/lpfc.h | 1 +
>> drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++
>> drivers/scsi/lpfc/lpfc_init.c | 10 ++++++----
>> drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++
>> 4 files changed, 27 insertions(+), 4 deletions(-)
>>
> Well, that doesn't actually match with my measurements (where I've seen
> max I/O performance at about 16 queues); so I guess this is pretty much
> setup-specific.
Keep in mind, when we ran our benchmarks, the driver was still using
per-cpu hdwq's selected by cpu #.
> However, I'm somewhat loath to have a cap at 128; we actually have
> several machines where we'll be having more CPUs than that.
> Can't we increase the cap to 512 to give us a bit more leeway during
> testing?
I'm fine if you want me to raise the max for the attribute. Keep in
mind, if 0, it can go > 128 to whatever the cpu number is, assuming it's
> 128.
-- james
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-08-26 16:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 2:36 [PATCH v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ James Smart
2019-08-16 16:36 ` Ewan D. Milne
2019-08-20 2:15 ` Martin K. Petersen
2019-08-26 7:18 ` Hannes Reinecke
2019-08-26 16:23 ` James Smart
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.