* [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
@ 2021-10-08 4:35 Dexuan Cui
2021-10-08 12:00 ` John Garry
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Dexuan Cui @ 2021-10-08 4:35 UTC (permalink / raw)
To: kys, sthemmin, wei.liu, jejb, martin.petersen, haiyangz,
ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
longli, mikelley
Cc: linux-kernel, Dexuan Cui, stable
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
shost->cmd_per_lun to a negative "short" value.
Use min_t(int, ...) to fix the issue.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
v1 tried to fix the issue by changing the storvsc driver:
https://lwn.net/ml/linux-kernel/BYAPR21MB1270BBC14D5F1AE69FC31A16BFB09@BYAPR21MB1270.namprd21.prod.outlook.com/
v2 directly fixed the scsi core change instead as Michael Kelley suggested
(refer to the above link).
v3 simplified the commit log, as John Garry suggested.
Added Haiyang's and Ming's Reviewed-by.
drivers/scsi/hosts.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 3f6f14f0cafb..24b72ee4246f 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -220,7 +220,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
goto fail;
}
- shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,
+ /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
+ shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
shost->can_queue);
error = scsi_init_sense_cache(shost);
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-08 4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
@ 2021-10-08 12:00 ` John Garry
2021-10-12 16:41 ` Martin K. Petersen
2021-10-12 20:35 ` Martin K. Petersen
2 siblings, 0 replies; 7+ messages in thread
From: John Garry @ 2021-10-08 12:00 UTC (permalink / raw)
To: Dexuan Cui, kys, sthemmin, wei.liu, jejb, martin.petersen,
haiyangz, ming.lei, bvanassche, linux-scsi, linux-hyperv, longli,
mikelley
Cc: linux-kernel, stable
On 08/10/2021 05:35, Dexuan Cui wrote:
> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.
>
> Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
> Cc:stable@vger.kernel.org
> Signed-off-by: Dexuan Cui<decui@microsoft.com>
> Reviewed-by: Haiyang Zhang<haiyangz@microsoft.com>
> Reviewed-by: Ming Lei<ming.lei@redhat.com>
Reviewed-by: John Garry <john.garry@huawei.com>
thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-08 4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
2021-10-08 12:00 ` John Garry
@ 2021-10-12 16:41 ` Martin K. Petersen
2021-10-12 18:05 ` Dexuan Cui
2021-10-12 20:35 ` Martin K. Petersen
2 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2021-10-12 16:41 UTC (permalink / raw)
To: Dexuan Cui
Cc: kys, sthemmin, wei.liu, jejb, martin.petersen, haiyangz,
ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
longli, mikelley, linux-kernel, stable
Dexuan,
> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.
I queued this up as a short term workaround. However, I am hoping that
the rework of the scaling code in storvsc lands soon.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-12 16:41 ` Martin K. Petersen
@ 2021-10-12 18:05 ` Dexuan Cui
2021-10-12 21:27 ` Martin K. Petersen
0 siblings, 1 reply; 7+ messages in thread
From: Dexuan Cui @ 2021-10-12 18:05 UTC (permalink / raw)
To: Martin K. Petersen
Cc: KY Srinivasan, Stephen Hemminger, wei.liu, jejb, Haiyang Zhang,
ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
Long Li, Michael Kelley, linux-kernel, stable
> From: Martin K. Petersen <martin.petersen@oracle.com>
> Sent: Tuesday, October 12, 2021 9:42 AM
>
> Dexuan,
>
> > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> > shost->cmd_per_lun to a negative "short" value.
> >
> > Use min_t(int, ...) to fix the issue.
>
> I queued this up as a short term workaround. However, I am hoping that
> the rework of the scaling code in storvsc lands soon.
Thanks, Martin! I know Michael Kelley will improve the netvsc.
Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to
set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should
change scsi_driver.can_queue from "int" to "u16"? BTW, I guess the "cmd_per_lun"
should also be "u16" rather than "short"?
This was discussed in May, and it looks like the conclusion was not clear to me:
https://lwn.net/ml/linux-kernel/457d23a9-deb0-4ee1-fe7f-5a63605d9686@huawei.com/
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-08 4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
2021-10-08 12:00 ` John Garry
2021-10-12 16:41 ` Martin K. Petersen
@ 2021-10-12 20:35 ` Martin K. Petersen
2 siblings, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2021-10-12 20:35 UTC (permalink / raw)
To: bvanassche, linux-scsi, ming.lei, john.garry, haiyangz,
Dexuan Cui, mikelley, linux-hyperv, kys, longli, jejb, sthemmin,
wei.liu
Cc: Martin K . Petersen, linux-kernel, stable
On Thu, 7 Oct 2021 21:35:46 -0700, Dexuan Cui wrote:
> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.
>
> [...]
Applied to 5.15/scsi-fixes, thanks!
[1/1] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
https://git.kernel.org/mkp/scsi/c/50b6cb351636
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-12 18:05 ` Dexuan Cui
@ 2021-10-12 21:27 ` Martin K. Petersen
2021-10-12 21:43 ` Dexuan Cui
0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2021-10-12 21:27 UTC (permalink / raw)
To: Dexuan Cui
Cc: Martin K. Petersen, KY Srinivasan, Stephen Hemminger, wei.liu,
jejb, Haiyang Zhang, ming.lei, bvanassche, john.garry,
linux-scsi, linux-hyperv, Long Li, Michael Kelley, linux-kernel,
stable
Dexuan,
> Regarding this patch, I'm not sure if it's a "workaround": if it's
> incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value,
> probably we should change scsi_driver.can_queue from "int" to "u16"?
> BTW, I guess the "cmd_per_lun" should also be "u16" rather than
> "short"?
I agree that it would be nice to get all this cleaned up. Several,
somewhat peculiar, 25-year old design choices.
cmd_per_lun has traditionally been in the ballpark of low hundreds,
can_queue typically in the low thousands. And the block layer currently
caps at ~10K. Happy to take patches fixing this up, although I am a bit
worried about how much churn it will generate.
That said, I do think that cleaning this up is somewhat orthogonal to
the issue with storvsc. I suspect that allowing a huge amount of
concurrent outstanding commands is going to be detrimental to
performance for most workloads. And from that perspective I think that
the short->int fix, while valid given the type discrepancy, is just
treating the symptom.
Therefore I consider the short->int fix a workaround. And the proper fix
involves looking closely at things are scaled in the storvsc case. Which
I have noted that Michael is working on.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-12 21:27 ` Martin K. Petersen
@ 2021-10-12 21:43 ` Dexuan Cui
0 siblings, 0 replies; 7+ messages in thread
From: Dexuan Cui @ 2021-10-12 21:43 UTC (permalink / raw)
To: Martin K. Petersen
Cc: KY Srinivasan, Stephen Hemminger, wei.liu, jejb, Haiyang Zhang,
ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
Long Li, Michael Kelley, linux-kernel, stable
> From: Martin K. Petersen <martin.petersen@oracle.com>
> Sent: Tuesday, October 12, 2021 2:28 PM
> To: Dexuan Cui <decui@microsoft.com>
>
> > Regarding this patch, I'm not sure if it's a "workaround": if it's
> > incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value,
> > probably we should change scsi_driver.can_queue from "int" to "u16"?
>
> > BTW, I guess the "cmd_per_lun" should also be "u16" rather than
> > "short"?
>
> I agree that it would be nice to get all this cleaned up. Several,
> somewhat peculiar, 25-year old design choices.
>
> cmd_per_lun has traditionally been in the ballpark of low hundreds,
> can_queue typically in the low thousands. And the block layer currently
> caps at ~10K. Happy to take patches fixing this up, although I am a bit
> worried about how much churn it will generate.
Thanks for the explanation!
> That said, I do think that cleaning this up is somewhat orthogonal to
> the issue with storvsc. I suspect that allowing a huge amount of
> concurrent outstanding commands is going to be detrimental to
> performance for most workloads. And from that perspective I think that
> the short->int fix, while valid given the type discrepancy, is just
> treating the symptom.
Agreed.
> Therefore I consider the short->int fix a workaround. And the proper fix
> involves looking closely at things are scaled in the storvsc case. Which
> I have noted that Michael is working on.
Agreed. My v1 actually tried to work around the storvsc driver instread. :-)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-10-12 21:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08 4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
2021-10-08 12:00 ` John Garry
2021-10-12 16:41 ` Martin K. Petersen
2021-10-12 18:05 ` Dexuan Cui
2021-10-12 21:27 ` Martin K. Petersen
2021-10-12 21:43 ` Dexuan Cui
2021-10-12 20:35 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).