* Re: [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues
[not found] <CAAKRu_aWxovgFagWgY-UDDYb-24ca7yo=C461LXq_2iGt3XFqA@mail.gmail.com>
@ 2021-02-03 1:09 ` Andres Freund
2021-02-05 21:24 ` [PATCH v2 0/2] " Melanie Plageman (Microsoft)
0 siblings, 1 reply; 9+ messages in thread
From: Andres Freund @ 2021-02-03 1:09 UTC (permalink / raw)
To: Melanie Plageman; +Cc: linux-hyperv, kys, haiyangz, sthemmin, sashal
Hi,
On 2021-02-02 15:19:08 -0500, Melanie Plageman wrote:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 2e4fa77445fd..d72ab6daf9ae 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -378,10 +378,14 @@ static u32 max_outstanding_req_per_channel;
> static int storvsc_change_queue_depth(struct scsi_device *sdev, int queue_depth);
>
> static int storvsc_vcpus_per_sub_channel = 4;
> +static int storvsc_nr_hw_queues = -1;
>
> module_param(storvsc_ringbuffer_size, int, S_IRUGO);
> MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
>
> +module_param(storvsc_nr_hw_queues, int, S_IRUGO);
> +MODULE_PARM_DESC(storvsc_nr_hw_queues, "Number of hardware queues");
Would be nice if the parameter could be changed in a running
system. Obviously that's only going to affect devices that'll be
attached in the future (or detached and reattached), but that's still
nicer than not being able to change at all.
> @@ -2155,6 +2163,7 @@ static struct fc_function_template fc_transport_functions = {
> static int __init storvsc_drv_init(void)
> {
> int ret;
> + int ncpus = num_present_cpus();
>
> /*
> * Divide the ring buffer data size (which is 1 page less
> @@ -2169,6 +2178,14 @@ static int __init storvsc_drv_init(void)
> vmscsi_size_delta,
> sizeof(u64)));
>
> + if (storvsc_nr_hw_queues > ncpus || storvsc_nr_hw_queues == 0 ||
> + storvsc_nr_hw_queues < -1) {
> + printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of %d.
> + Valid values include -1 and 1-%d.\n",
> + storvsc_nr_hw_queues, ncpus);
> + return -EINVAL;
> + }
> +
I have a mild preference for just capping the effective value to
num_present_cpus(), rather than erroring out. Would allow you to
e.g. cap the number of queues to 4, with the same param specified on
smaller and bigger systems. Perhaps renaming it to max_hw_queues would
make that more obvious?
Greetings,
Andres Freund
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 0/2] scsi: storvsc: Parameterize nr_hw_queues
2021-02-03 1:09 ` [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues Andres Freund
@ 2021-02-05 21:24 ` Melanie Plageman (Microsoft)
2021-02-05 21:24 ` [PATCH 1/2] scsi: storvsc: Parameterize number hardware queues Melanie Plageman (Microsoft)
2021-02-05 21:24 ` [PATCH 2/2] scsi: storvsc: Make max hw queues updatable Melanie Plageman (Microsoft)
0 siblings, 2 replies; 9+ messages in thread
From: Melanie Plageman (Microsoft) @ 2021-02-05 21:24 UTC (permalink / raw)
To: andres
Cc: haiyangz, kys, linux-hyperv, melanieplageman, sashal, mikelley, sthemmin
Though the convention on this list seems to be to start a new thread for
a new version of the patch, because I made the mistake of sending the
first patch as an attachment, I've included both the original patch
inline as well as the updates in this patchset.
Melanie Plageman (Microsoft) (2):
scsi: storvsc: Parameterize number hardware queues
scsi: storvsc: Make max hw queues updatable
drivers/scsi/storvsc_drv.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] scsi: storvsc: Parameterize number hardware queues
2021-02-05 21:24 ` [PATCH v2 0/2] " Melanie Plageman (Microsoft)
@ 2021-02-05 21:24 ` Melanie Plageman (Microsoft)
2021-02-05 21:24 ` [PATCH 2/2] scsi: storvsc: Make max hw queues updatable Melanie Plageman (Microsoft)
1 sibling, 0 replies; 9+ messages in thread
From: Melanie Plageman (Microsoft) @ 2021-02-05 21:24 UTC (permalink / raw)
To: andres
Cc: haiyangz, kys, linux-hyperv, melanieplageman, sashal, mikelley, sthemmin
Add ability to set the number of hardware queues. The default value
remains the number of CPUs. This functionality is useful in some
environments (e.g. Microsoft Azure) when decreasing the number of
hardware queues has been shown to improve performance.
Signed-off-by: Melanie Plageman (Microsoft) <melanieplageman@gmail.com>
---
drivers/scsi/storvsc_drv.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..d72ab6daf9ae 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -378,10 +378,14 @@ static u32 max_outstanding_req_per_channel;
static int storvsc_change_queue_depth(struct scsi_device *sdev, int queue_depth);
static int storvsc_vcpus_per_sub_channel = 4;
+static int storvsc_nr_hw_queues = -1;
module_param(storvsc_ringbuffer_size, int, S_IRUGO);
MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
+module_param(storvsc_nr_hw_queues, int, S_IRUGO);
+MODULE_PARM_DESC(storvsc_nr_hw_queues, "Number of hardware queues");
+
module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO);
MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to subchannels");
@@ -2004,8 +2008,12 @@ static int storvsc_probe(struct hv_device *device,
* For non-IDE disks, the host supports multiple channels.
* Set the number of HW queues we are supporting.
*/
- if (!dev_is_ide)
- host->nr_hw_queues = num_present_cpus();
+ if (!dev_is_ide) {
+ if (storvsc_nr_hw_queues == -1)
+ host->nr_hw_queues = num_present_cpus();
+ else
+ host->nr_hw_queues = storvsc_nr_hw_queues;
+ }
/*
* Set the error handler work queue.
@@ -2155,6 +2163,7 @@ static struct fc_function_template fc_transport_functions = {
static int __init storvsc_drv_init(void)
{
int ret;
+ int ncpus = num_present_cpus();
/*
* Divide the ring buffer data size (which is 1 page less
@@ -2169,6 +2178,14 @@ static int __init storvsc_drv_init(void)
vmscsi_size_delta,
sizeof(u64)));
+ if (storvsc_nr_hw_queues > ncpus || storvsc_nr_hw_queues == 0 ||
+ storvsc_nr_hw_queues < -1) {
+ printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of %d.
+ Valid values include -1 and 1-%d.\n",
+ storvsc_nr_hw_queues, ncpus);
+ return -EINVAL;
+ }
+
#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
fc_transport_template = fc_attach_transport(&fc_transport_functions);
if (!fc_transport_template)
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] scsi: storvsc: Make max hw queues updatable
2021-02-05 21:24 ` [PATCH v2 0/2] " Melanie Plageman (Microsoft)
2021-02-05 21:24 ` [PATCH 1/2] scsi: storvsc: Parameterize number hardware queues Melanie Plageman (Microsoft)
@ 2021-02-05 21:24 ` Melanie Plageman (Microsoft)
2021-02-08 11:51 ` Wei Liu
1 sibling, 1 reply; 9+ messages in thread
From: Melanie Plageman (Microsoft) @ 2021-02-05 21:24 UTC (permalink / raw)
To: andres
Cc: haiyangz, kys, linux-hyperv, melanieplageman, sashal, mikelley, sthemmin
- Change name of parameter to storvsc_max_hw_queues
- Make the parameter updatable on a running system
- Change error to warning for an invalid value of the parameter at
driver init time
Signed-off-by: Melanie Plageman (Microsoft) <melanieplageman@gmail.com>
---
Andres wrote:
>> On 2021-02-02 15:19:08 -0500, Melanie Plageman wrote:
>> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
>> index 2e4fa77445fd..d72ab6daf9ae 100644
>> --- a/drivers/scsi/storvsc_drv.c
>> +++ b/drivers/scsi/storvsc_drv.c
>> @@ -378,10 +378,14 @@ static u32 max_outstanding_req_per_channel;
>> static int storvsc_change_queue_depth(struct scsi_device *sdev, int
>> queue_depth);
>>
>> static int storvsc_vcpus_per_sub_channel = 4;
>> +static int storvsc_nr_hw_queues = -1;
>>
>> module_param(storvsc_ringbuffer_size, int, S_IRUGO);
>> MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size
>> (bytes)");
>>
>> +module_param(storvsc_nr_hw_queues, int, S_IRUGO);
>> +MODULE_PARM_DESC(storvsc_nr_hw_queues, "Number of hardware queues");
>
> Would be nice if the parameter could be changed in a running
> system. Obviously that's only going to affect devices that'll be
> attached in the future (or detached and reattached), but that's still
> nicer than not being able to change at all.
My problem with making it updatable is that the user can set it to an
invalid value and then that is the value he/she/they will see in
/sys/module/hv_storvsc/parameters/storvsc_max_hw_queues at that point.
However, if they had set it as a boot-time kernel parameter, then the
value would have been corrected to a valid value.
On probe of a new device, if the value of storvsc_max_hw_queues is
invalid, the value of nr_hw_queues would be set to NCPUs, but, at that
point, do I also change the value in
/sys/module/hv_storvsc/parameters/storvsc_max_hw_queues?
If so, there is a weird intermediate period where it has a value that
isn't actually used or valid.
If not, you can update it to an invalid value while the system is
running but not if you set it at boot-time. Despite the fact that this
invalid value is not used at any point by the driver, it seems weird
that you can get it into a state where it doesn't reflect reality
depending on how you set it.
Maybe this isn't too big of a deal. I noticed the other module parameter
than can be set on a running system in storvsc is logging_level, and,
because of the do_logging() function, it doesn't really matter if the
user set it to a number higher or lower than one of the pre-defined log
levels. This behavior makes sense to me for logging, but I'm not sure
how I feel about going with a similar strategy for
storvsc_max_hw_queues.
I've gone with making it updatable and not updating the value of
storvsc_max_hw_queues during storvsc_probe(), even if the user had
updated storvsc_max_hw_queues to an invalid value. I just use NCPUs for
the value of nr_hw_queues.
Is there any user facing documentation on any of this? It would be nice
to add a comment there about what this parameter is and note that if you
change it, you need to reinitiate a probe of your device for it to take
effect and with details on the valid values for it and when they will be
reflected in /sys/module...
Andres wrote:
>> @@ -2155,6 +2163,7 @@ static struct fc_function_template
>> fc_transport_functions = {
>> static int __init storvsc_drv_init(void)
>> {
>> int ret;
>> + int ncpus = num_present_cpus();
>>
>> /*
>> * Divide the ring buffer data size (which is 1 page less
>> @@ -2169,6 +2178,14 @@ static int __init storvsc_drv_init(void)
>> vmscsi_size_delta,
>> sizeof(u64)));
>>
>> + if (storvsc_nr_hw_queues > ncpus || storvsc_nr_hw_queues == 0 ||
>> + storvsc_nr_hw_queues < -1) {
>> + printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of
>> %d.
>> + Valid values include -1 and 1-%d.\n",
>> + storvsc_nr_hw_queues, ncpus);
>> + return -EINVAL;
>> + }
>> +
>
> I have a mild preference for just capping the effective value to
> num_present_cpus(), rather than erroring out. Would allow you to
> e.g. cap the number of queues to 4, with the same param specified on
> smaller and bigger systems. Perhaps renaming it to max_hw_queues
> would
> make that more obvious?
I like max_hw_queues -- I've updated the name to that.
I changed it to update storvsc_max_hw_queues to NCPUs if an invalid
value was provided.
I agree that it was pretty nasty to have it error out. I wanted to avoid
unexpected behavior for the user.
I changed the error to a warning, so, provided the sufficient log level,
a user could find out why the value they set storvsc_max_hw_queues to
does not match the one they see in
/sys/module/hv_storvsc/parameters/storvsc_max_hw_queues
drivers/scsi/storvsc_drv.c | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index d72ab6daf9ae..e41f2461851e 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -378,13 +378,13 @@ static u32 max_outstanding_req_per_channel;
static int storvsc_change_queue_depth(struct scsi_device *sdev, int queue_depth);
static int storvsc_vcpus_per_sub_channel = 4;
-static int storvsc_nr_hw_queues = -1;
+static int storvsc_max_hw_queues = -1;
module_param(storvsc_ringbuffer_size, int, S_IRUGO);
MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
-module_param(storvsc_nr_hw_queues, int, S_IRUGO);
-MODULE_PARM_DESC(storvsc_nr_hw_queues, "Number of hardware queues");
+module_param(storvsc_max_hw_queues, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(storvsc_max_hw_queues, "Maximum number of hardware queues");
module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO);
MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to subchannels");
@@ -1901,6 +1901,7 @@ static int storvsc_probe(struct hv_device *device,
{
int ret;
int num_cpus = num_online_cpus();
+ int num_present_cpus = num_present_cpus();
struct Scsi_Host *host;
struct hv_host_device *host_dev;
bool dev_is_ide = ((dev_id->driver_data == IDE_GUID) ? true : false);
@@ -2009,10 +2010,18 @@ static int storvsc_probe(struct hv_device *device,
* Set the number of HW queues we are supporting.
*/
if (!dev_is_ide) {
- if (storvsc_nr_hw_queues == -1)
- host->nr_hw_queues = num_present_cpus();
+ if (storvsc_max_hw_queues == -1)
+ host->nr_hw_queues = num_present_cpus;
+ else if (storvsc_max_hw_queues > num_present_cpus ||
+ storvsc_max_hw_queues == 0 ||
+ storvsc_max_hw_queues < -1) {
+ storvsc_log(device, STORVSC_LOGGING_WARN,
+ "Setting nr_hw_queues to %d. storvsc_max_hw_queues value %d is invalid.\n",
+ num_present_cpus, storvsc_max_hw_queues);
+ host->nr_hw_queues = num_present_cpus;
+ }
else
- host->nr_hw_queues = storvsc_nr_hw_queues;
+ host->nr_hw_queues = storvsc_max_hw_queues;
}
/*
@@ -2178,12 +2187,11 @@ static int __init storvsc_drv_init(void)
vmscsi_size_delta,
sizeof(u64)));
- if (storvsc_nr_hw_queues > ncpus || storvsc_nr_hw_queues == 0 ||
- storvsc_nr_hw_queues < -1) {
- printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of %d.
- Valid values include -1 and 1-%d.\n",
- storvsc_nr_hw_queues, ncpus);
- return -EINVAL;
+ if (storvsc_max_hw_queues > ncpus || storvsc_max_hw_queues == 0 ||
+ storvsc_max_hw_queues < -1) {
+ printk(KERN_WARNING "storvsc: Setting storvsc_max_hw_queues to %d. %d is invalid.\n",
+ ncpus, storvsc_max_hw_queues);
+ storvsc_max_hw_queues = ncpus;
}
#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] scsi: storvsc: Make max hw queues updatable
2021-02-05 21:24 ` [PATCH 2/2] scsi: storvsc: Make max hw queues updatable Melanie Plageman (Microsoft)
@ 2021-02-08 11:51 ` Wei Liu
2021-02-09 17:10 ` Melanie Plageman
0 siblings, 1 reply; 9+ messages in thread
From: Wei Liu @ 2021-02-08 11:51 UTC (permalink / raw)
To: Melanie Plageman (Microsoft)
Cc: andres, haiyangz, kys, linux-hyperv, sashal, mikelley, sthemmin, Wei Liu
Hi Melanie
On Fri, Feb 05, 2021 at 09:24:47PM +0000, Melanie Plageman (Microsoft) wrote:
> - Change name of parameter to storvsc_max_hw_queues
> - Make the parameter updatable on a running system
> - Change error to warning for an invalid value of the parameter at
> driver init time
>
> Signed-off-by: Melanie Plageman (Microsoft) <melanieplageman@gmail.com>
This patch contains a section which seems to be a copy&paste error from
another email.
Also, please use get_maintainers.pl to CC the correct maintainers. You
can configure git-sendemail such that the correct people are CC'ed
automatically. The storvsc code normally goes via the storage
maintainers' tree.
If you have any question about how to use the get_maintainers.pl script
or the upstreaming process in general, let me know.
Wei.
> ---
>
> Andres wrote:
> >> On 2021-02-02 15:19:08 -0500, Melanie Plageman wrote:
> >> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> >> index 2e4fa77445fd..d72ab6daf9ae 100644
> >> --- a/drivers/scsi/storvsc_drv.c
> >> +++ b/drivers/scsi/storvsc_drv.c
> >> @@ -378,10 +378,14 @@ static u32 max_outstanding_req_per_channel;
> >> static int storvsc_change_queue_depth(struct scsi_device *sdev, int
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] scsi: storvsc: Make max hw queues updatable
2021-02-08 11:51 ` Wei Liu
@ 2021-02-09 17:10 ` Melanie Plageman
0 siblings, 0 replies; 9+ messages in thread
From: Melanie Plageman @ 2021-02-09 17:10 UTC (permalink / raw)
To: wei.liu
Cc: andres, haiyangz, kys, linux-hyperv, melanieplageman, mikelley,
sashal, sthemmin
From: Melanie Plageman (Microsoft) <melanieplageman@gmail.com>
> This patch contains a section which seems to be a copy&paste error
> from another email.
I don't think that there are any unintentional portions to the email.
Andres gave me feedback on my patch. I addressed his feedback in an
updated version of the patch but wanted to include additional context
and points which did not belong in the commit message. After some
research online, I was led to believe that doing so below the commit
message was an appropriate way to add relevant context to the patch
which may not belong in the commit message (using --annotate or
--compose with git-send-email). The patch still applied cleanly for me.
My intent was the following:
I received feedback on the code in the patch and wanted to address that
feedback with an updated version of the patch. I also wanted to respond
inline to the specific comments with text to continue the discussion.
Additionally, I wanted to make sure that the new patch showed a clear
diff with the old patch so that it was clear what I changed in response
to feedback. If I had started a new thread with this new version, I
feared a two-patch set would no longer make sense, as the two commits
should be fixed up together in a final version.
What would have been the appropriate way to achieve this that aligns
more closely with the convention of the list?
> Also, please use get_maintainers.pl to CC the correct maintainers. You
> can configure git-sendemail such that the correct people are CC'ed
> automatically. The storvsc code normally goes via the storage
> maintainers' tree.
>
> If you have any question about how to use the get_maintainers.pl script
> or the upstreaming process in general, let me know.
I created this CC list after running get_maintainers.pl on my patch (and
then used git-send-email to send it). It returned these people and
lists:
K. Y. Srinivasan [kys@microsoft.com]
Haiyang Zhang [haiyangz@microsoft.com]
Stephen Hemminger [sthemmin@microsoft.com]
Sasha Levin [sashal@kernel.org]
James E.J. Bottomley [jejb@linux.ibm.com]
Martin K. Petersen [martin.petersen@oracle.com]
[linux-hyperv@vger.kernel.org]
[linux-scsi@vger.kernel.org]
[linux-kernel@vger.kernel.org]
I chose to CC the four people listed as suporters for Hyper-V core and drivers
and exclude SCSI maintainers, as this patch doesn't seem to have much impact
outside of storvsc. Same rationale for excluding the SCSI list and main linux
kernel list.
Regards,
Melanie
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues
2021-02-03 0:50 ` Michael Kelley
@ 2021-02-06 0:25 ` Melanie Plageman (Microsoft)
0 siblings, 0 replies; 9+ messages in thread
From: Melanie Plageman (Microsoft) @ 2021-02-06 0:25 UTC (permalink / raw)
To: mikelley
Cc: andres, haiyangz, kys, linux-hyperv, melanieplageman, sashal, sthemmin
> Yes. If you look at other code in storvsc_drv.c, you'll see the use of
> "dev_err" for outputting warning messages. Follow the pattern of
> these other uses of "dev_err", and that should eliminate the
> checkpatch error.
I'm not sure I can use the dev_warn/dev_error macros (or the storvsc_log
macro which uses dev_warn) in storvsc_drv_init(), as it seems like it is
meant to be used with a device, and, at driver initialization, I'm not
sure how to access a specific device.
I originally used printk() after looking for an example of a driver
printing some logging message before erroring out in its init function
and had found this in drivers/scsi/iscsi_tcp.c
static int __init iscsi_sw_tcp_init(void)
{
if (iscsi_max_lun < 1) {
printk(KERN_ERR "iscsi_tcp: Invalid max_lun value of %u\n",
iscsi_max_lun);
return -EINVAL;
}
...
}
Unfortunately, the thread has split in two because of some issues with my
original mail. This message has the other updates I've made to the patch [1].
> I'm in agreement that the current handling of I/O queuing in the storvsc
> has problems. Your data definitely confirms that, and there are other
> data points that indicate that we need to more fundamentally rethink
> what I/Os get queued where. Storvsc is letting far too many I/Os get
> queued in the VMbus ring buffers and in the underlying Hyper-V.
>
> Adding a module parameter to specify the number of hardware queues
> may be part of the solution. But I really want to step back a bit and
> take into account all the data points we have before deciding what to
> change, what additional parameters to offer (if any), etc. There are
> other ways of limiting the number of I/Os being queued at the driver
> level, and I'm wondering how those tradeoff against adding a module
> parameter. I'm planning to jump in on this topic in just a few weeks,
> and would like to coordinate with you.
There are tradeoffs to reducing the number of hardware queues. It may
not be the right solution for many workloads. However, parameterizing
the number of hardware queues, given that the default would be the same
as current state, doesn't seem like it would be harmful.
virtio-scsi, for example, seems to provide num_queues as a configuration
option in virtscsi_probe() in drivers/scsi/virtio_scsi.c
num_queues = virtscsi_config_get(vdev, num_queues) ? : 1;
num_queues = min_t(unsigned int, nr_cpu_ids, num_queues);
And then use that value to set nr_hw_queues in the scsi host:
...
shost->nr_hw_queues = num_queues;
In fact, it may be worth modifying the patch to allow setting the number
of hardware queues to any number above zero.
I made it a module parameter to avoid having to change the interface at
any point above storvsc in the stack to allow for configuration of this
parameter.
[1] https://lore.kernel.org/linux-hyperv/20210205212447.3327-3-melanieplageman@gmail.com/
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues
2021-02-02 20:25 [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues Melanie Plageman
@ 2021-02-03 0:50 ` Michael Kelley
2021-02-06 0:25 ` Melanie Plageman (Microsoft)
0 siblings, 1 reply; 9+ messages in thread
From: Michael Kelley @ 2021-02-03 0:50 UTC (permalink / raw)
To: melanieplageman, linux-hyperv
From: Melanie Plageman <melanieplageman@gmail.com> Sent: Tuesday, February 2, 2021 12:26 PM
>
> Proposed patch attached.
>
For public mailing lists like linux-hyperv@vger.kernel.org that are
associated with the Linux kernel project, proposed patches should
generally go inline in the email, rather than as an attachment.
>
> While doing some performance tuning of various block device parameters
> on Azure to optimize database workloads, we found that reducing the
> number of hardware queues (nr_hw_queues used in the block-mq layer and
> by storvsc) improved performance of short queue depth writes, like
> database journaling.
>
> The Azure premium disks we were testing on often had around 2-3ms
> latency for a single small write. Given the IOPS and bandwidth
> provisioned on your typical Azure VM paired with an Azure premium disk,
> we found that decreasing the LUN queue_depth substantially from the
> default (default is 2048) was often required to get reasonable IOPS out
> of a small sequential synchronous write with a queue_depth of 1. In our
> investigation, the high default queue_depth resulted in such a large
> number of outstanding requests against the device that a journaling
> write incurred very high latency, slowing overall database performance.
>
> However, even with tuning these defaults (including
> /sys/block/sdX/queue/nr_requests), we still incurred high latency,
> especially on machines with high core counts, as the number of hardware
> queues, nr_hw_queues, is set to be the number of CPUs in storvsc.
> nr_hw_queues is, in turn, used to calculate the number of tags available
> on the block device, meaning that we still ended up with deeper queues
> than intended when requests were submitted to more than one hardware
> queue.
>
> We calculated the optimal block device settings, including our intended
> queue depth, to utilize as much of provisioned bandwidth as possible
> while still getting a reasonable number of IOPS from low queue depth
> sequential IO and random IO, but, without parameterizing the number of
> hardware queues, we weren't able to fully control this queue_depth.
>
> Attached is a patch which adds a module param to control nr_hw_queues.
> The default is the current value (number of CPUs), so it should have no
> impact on users who do not set the param.
>
> As a minimal example of the potential benefit, we found that, starting
> with a baseline of optimized block device tunings, we could
> substantially increase the IOPS of a sequential write job for both a
> large Azure premium SSD and a small Azure premium SSD.
>
> On an Azure Standard_F32s_v2 with a single 16 TiB disk, which has a
> provisioned max bandwidth of 750 MB/s and provisioned max IOPS of
> 18000, running Debian 10 with a Linux kernel built from master
> (v5.11-rc6 at time of patch testing) with the patch applied, with the
> following block device settings:
>
> /sys/block/sdX/device/queue_depth=55
>
> /sys/block/sdX/queue/max_sectors_kb=64,
> read_ahead_kb=2296,
> nr_requests=55,
> wbt_lat_usec=0,
> scheduler=mq-deadline
>
> And this fio job file:
> [global]
> time_based=1
> ioengine=libaio
> direct=1
> runtime=20
>
> [job1]
> name=seq_read
> bs=32k
> size=23G
> rw=read
> numjobs=2
> iodepth=110
> iodepth_batch_submit=55
> iodepth_batch_complete=55
>
> [job2]
> name=seq_write
> bs=8k
> size=10G
> rw=write
> numjobs=1
> iodepth=1
> overwrite=1
>
> With ncpu hardware queues configured, we measured an average of 764 MB/s
> read throughput and 153 write IOPS.
>
> With one hardware queue configured, we measured an average of 763 MB/s
> read throughput and 270 write IOPS.
>
> And on an Azure Standard_F32s_v2 with a single 16 GiB disk, the
> combination having a provisioned max bandwidth of 170 MB/s and a
> provisioned max IOPS of 3500, with the following block device settings:
>
> /sys/block/sdX/device/queue_depth=11
>
> /sys/block/sdX/queue/max_sectors_kb=65,
> read_ahead_kb=520,
> nr_requests=11,
> wbt_lat_usec=0,
> scheduler=mq-deadline
>
> And with this fio job file:
> [global]
> time_based=1
> ioengine=libaio
> direct=1
> runtime=60
>
> [job1]
> name=seq_read
> bs=32k
> size=5G
> rw=read
> numjobs=2
> iodepth=22
> iodepth_batch_submit=11
> iodepth_batch_complete=11
>
> [job2]
> name=seq_write
> bs=8k
> size=3G
> rw=write
> numjobs=1
> iodepth=1
> overwrite=1
>
> With ncpu hardware queues configured, we measured an average of 123 MB/s
> read throughput and 56 write IOPS.
>
> With one hardware queue configured, we measured an average of 165 MB/s
> read throughput and 346 write IOPS.
>
> Providing this option as a module param will help improve performance of
> certain workloads on certain devices.
I'm in agreement that the current handling of I/O queuing in the storvsc
has problems. Your data definitely confirms that, and there are other
data points that indicate that we need to more fundamentally rethink
what I/Os get queued where. Storvsc is letting far too many I/Os get
queued in the VMbus ring buffers and in the underlying Hyper-V.
Adding a module parameter to specify the number of hardware queues
may be part of the solution. But I really want to step back a bit and
take into account all the data points we have before deciding what to
change, what additional parameters to offer (if any), etc. There are
other ways of limiting the number of I/Os being queued at the driver
level, and I'm wondering how those tradeoff against adding a module
parameter. I'm planning to jump in on this topic in just a few weeks,
and would like to coordinate with you.
>
> In the attached patch, I check that the value provided for
> storvsc_nr_hw_queues is within a valid range at init time and error out
> if it is not. I noticed this warning from scripts/checkpatch.pl
>
> WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then
> dev_err(dev, ... then pr_err(... to printk(KERN_ERR ...
> #64: FILE: drivers/scsi/storvsc_drv.c:2183:
> printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of %d.
>
> Should I be using a different function for printing this message?
Yes. If you look at other code in storvsc_drv.c, you'll see the use of
"dev_err" for outputting warning messages. Follow the pattern of
these other uses of "dev_err", and that should eliminate the
checkpatch error.
Michael
>
> Regards,
> Melanie (Microsoft)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues
@ 2021-02-02 20:25 Melanie Plageman
2021-02-03 0:50 ` Michael Kelley
0 siblings, 1 reply; 9+ messages in thread
From: Melanie Plageman @ 2021-02-02 20:25 UTC (permalink / raw)
To: linux-hyperv
[-- Attachment #1: Type: text/plain, Size: 4926 bytes --]
Proposed patch attached.
While doing some performance tuning of various block device parameters
on Azure to optimize database workloads, we found that reducing the
number of hardware queues (nr_hw_queues used in the block-mq layer and
by storvsc) improved performance of short queue depth writes, like
database journaling.
The Azure premium disks we were testing on often had around 2-3ms
latency for a single small write. Given the IOPS and bandwidth
provisioned on your typical Azure VM paired with an Azure premium disk,
we found that decreasing the LUN queue_depth substantially from the
default (default is 2048) was often required to get reasonable IOPS out
of a small sequential synchronous write with a queue_depth of 1. In our
investigation, the high default queue_depth resulted in such a large
number of outstanding requests against the device that a journaling
write incurred very high latency, slowing overall database performance.
However, even with tuning these defaults (including
/sys/block/sdX/queue/nr_requests), we still incurred high latency,
especially on machines with high core counts, as the number of hardware
queues, nr_hw_queues, is set to be the number of CPUs in storvsc.
nr_hw_queues is, in turn, used to calculate the number of tags available
on the block device, meaning that we still ended up with deeper queues
than intended when requests were submitted to more than one hardware
queue.
We calculated the optimal block device settings, including our intended
queue depth, to utilize as much of provisioned bandwidth as possible
while still getting a reasonable number of IOPS from low queue depth
sequential IO and random IO, but, without parameterizing the number of
hardware queues, we weren't able to fully control this queue_depth.
Attached is a patch which adds a module param to control nr_hw_queues.
The default is the current value (number of CPUs), so it should have no
impact on users who do not set the param.
As a minimal example of the potential benefit, we found that, starting
with a baseline of optimized block device tunings, we could
substantially increase the IOPS of a sequential write job for both a
large Azure premium SSD and a small Azure premium SSD.
On an Azure Standard_F32s_v2 with a single 16 TiB disk, which has a
provisioned max bandwidth of 750 MB/s and provisioned max IOPS of
18000, running Debian 10 with a Linux kernel built from master
(v5.11-rc6 at time of patch testing) with the patch applied, with the
following block device settings:
/sys/block/sdX/device/queue_depth=55
/sys/block/sdX/queue/max_sectors_kb=64,
read_ahead_kb=2296,
nr_requests=55,
wbt_lat_usec=0,
scheduler=mq-deadline
And this fio job file:
[global]
time_based=1
ioengine=libaio
direct=1
runtime=20
[job1]
name=seq_read
bs=32k
size=23G
rw=read
numjobs=2
iodepth=110
iodepth_batch_submit=55
iodepth_batch_complete=55
[job2]
name=seq_write
bs=8k
size=10G
rw=write
numjobs=1
iodepth=1
overwrite=1
With ncpu hardware queues configured, we measured an average of 764 MB/s
read throughput and 153 write IOPS.
With one hardware queue configured, we measured an average of 763 MB/s
read throughput and 270 write IOPS.
And on an Azure Standard_F32s_v2 with a single 16 GiB disk, the
combination having a provisioned max bandwidth of 170 MB/s and a
provisioned max IOPS of 3500, with the following block device settings:
/sys/block/sdX/device/queue_depth=11
/sys/block/sdX/queue/max_sectors_kb=65,
read_ahead_kb=520,
nr_requests=11,
wbt_lat_usec=0,
scheduler=mq-deadline
And with this fio job file:
[global]
time_based=1
ioengine=libaio
direct=1
runtime=60
[job1]
name=seq_read
bs=32k
size=5G
rw=read
numjobs=2
iodepth=22
iodepth_batch_submit=11
iodepth_batch_complete=11
[job2]
name=seq_write
bs=8k
size=3G
rw=write
numjobs=1
iodepth=1
overwrite=1
With ncpu hardware queues configured, we measured an average of 123 MB/s
read throughput and 56 write IOPS.
With one hardware queue configured, we measured an average of 165 MB/s
read throughput and 346 write IOPS.
Providing this option as a module param will help improve performance of
certain workloads on certain devices.
In the attached patch, I check that the value provided for
storvsc_nr_hw_queues is within a valid range at init time and error out
if it is not. I noticed this warning from scripts/checkpatch.pl
WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then
dev_err(dev, ... then pr_err(... to printk(KERN_ERR ...
#64: FILE: drivers/scsi/storvsc_drv.c:2183:
printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of %d.
Should I be using a different function for printing this message?
Regards,
Melanie (Microsoft)
[-- Attachment #2: v1-0001-scsi-storvsc-Parameterize-number-hardware-queues.patch --]
[-- Type: application/octet-stream, Size: 2730 bytes --]
From 3232c69d0cb702c538d8654ed8607cec15adfe00 Mon Sep 17 00:00:00 2001
From: "Melanie Plageman (Microsoft)" <melanieplageman@gmail.com>
Date: Tue, 2 Feb 2021 19:19:36 +0000
Subject: [PATCH v1] scsi: storvsc: Parameterize number hardware queues
Add ability to set the number of hardware queues. The default value
remains the number of CPUs. This functionality is useful in some
environments (e.g. Microsoft Azure) when decreasing the number of
hardware queues has been shown to improve performance.
Signed-off-by: Melanie Plageman (Microsoft) <melanieplageman@gmail.com>
---
drivers/scsi/storvsc_drv.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..d72ab6daf9ae 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -378,10 +378,14 @@ static u32 max_outstanding_req_per_channel;
static int storvsc_change_queue_depth(struct scsi_device *sdev, int queue_depth);
static int storvsc_vcpus_per_sub_channel = 4;
+static int storvsc_nr_hw_queues = -1;
module_param(storvsc_ringbuffer_size, int, S_IRUGO);
MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
+module_param(storvsc_nr_hw_queues, int, S_IRUGO);
+MODULE_PARM_DESC(storvsc_nr_hw_queues, "Number of hardware queues");
+
module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO);
MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to subchannels");
@@ -2004,8 +2008,12 @@ static int storvsc_probe(struct hv_device *device,
* For non-IDE disks, the host supports multiple channels.
* Set the number of HW queues we are supporting.
*/
- if (!dev_is_ide)
- host->nr_hw_queues = num_present_cpus();
+ if (!dev_is_ide) {
+ if (storvsc_nr_hw_queues == -1)
+ host->nr_hw_queues = num_present_cpus();
+ else
+ host->nr_hw_queues = storvsc_nr_hw_queues;
+ }
/*
* Set the error handler work queue.
@@ -2155,6 +2163,7 @@ static struct fc_function_template fc_transport_functions = {
static int __init storvsc_drv_init(void)
{
int ret;
+ int ncpus = num_present_cpus();
/*
* Divide the ring buffer data size (which is 1 page less
@@ -2169,6 +2178,14 @@ static int __init storvsc_drv_init(void)
vmscsi_size_delta,
sizeof(u64)));
+ if (storvsc_nr_hw_queues > ncpus || storvsc_nr_hw_queues == 0 ||
+ storvsc_nr_hw_queues < -1) {
+ printk(KERN_ERR "storvsc: Invalid storvsc_nr_hw_queues value of %d.
+ Valid values include -1 and 1-%d.\n",
+ storvsc_nr_hw_queues, ncpus);
+ return -EINVAL;
+ }
+
#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
fc_transport_template = fc_attach_transport(&fc_transport_functions);
if (!fc_transport_template)
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-02-09 17:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAAKRu_aWxovgFagWgY-UDDYb-24ca7yo=C461LXq_2iGt3XFqA@mail.gmail.com>
2021-02-03 1:09 ` [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues Andres Freund
2021-02-05 21:24 ` [PATCH v2 0/2] " Melanie Plageman (Microsoft)
2021-02-05 21:24 ` [PATCH 1/2] scsi: storvsc: Parameterize number hardware queues Melanie Plageman (Microsoft)
2021-02-05 21:24 ` [PATCH 2/2] scsi: storvsc: Make max hw queues updatable Melanie Plageman (Microsoft)
2021-02-08 11:51 ` Wei Liu
2021-02-09 17:10 ` Melanie Plageman
2021-02-02 20:25 [PATCH v1] scsi: storvsc: Parameterize nr_hw_queues Melanie Plageman
2021-02-03 0:50 ` Michael Kelley
2021-02-06 0:25 ` Melanie Plageman (Microsoft)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).