linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
@ 2021-10-08  4:35 Dexuan Cui
  2021-10-08 12:00 ` John Garry
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dexuan Cui @ 2021-10-08  4:35 UTC (permalink / raw)
  To: kys, sthemmin, wei.liu, jejb, martin.petersen, haiyangz,
	ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
	longli, mikelley
  Cc: linux-kernel, Dexuan Cui, stable

After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
shost->cmd_per_lun to a negative "short" value.

Use min_t(int, ...) to fix the issue.

Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---

v1 tried to fix the issue by changing the storvsc driver:
https://lwn.net/ml/linux-kernel/BYAPR21MB1270BBC14D5F1AE69FC31A16BFB09@BYAPR21MB1270.namprd21.prod.outlook.com/

v2 directly fixed the scsi core change instead as Michael Kelley suggested
(refer to the above link).

v3 simplified the commit log, as John Garry suggested.
   Added Haiyang's and Ming's Reviewed-by.

 drivers/scsi/hosts.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 3f6f14f0cafb..24b72ee4246f 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -220,7 +220,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 		goto fail;
 	}
 
-	shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,
+	/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
+	shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
 				   shost->can_queue);
 
 	error = scsi_init_sense_cache(shost);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  2021-10-08  4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
@ 2021-10-08 12:00 ` John Garry
  2021-10-12 16:41 ` Martin K. Petersen
  2021-10-12 20:35 ` Martin K. Petersen
  2 siblings, 0 replies; 7+ messages in thread
From: John Garry @ 2021-10-08 12:00 UTC (permalink / raw)
  To: Dexuan Cui, kys, sthemmin, wei.liu, jejb, martin.petersen,
	haiyangz, ming.lei, bvanassche, linux-scsi, linux-hyperv, longli,
	mikelley
  Cc: linux-kernel, stable

On 08/10/2021 05:35, Dexuan Cui wrote:
> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
> 
> Use min_t(int, ...) to fix the issue.
> 
> Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
> Cc:stable@vger.kernel.org
> Signed-off-by: Dexuan Cui<decui@microsoft.com>
> Reviewed-by: Haiyang Zhang<haiyangz@microsoft.com>
> Reviewed-by: Ming Lei<ming.lei@redhat.com>

Reviewed-by: John Garry <john.garry@huawei.com>

thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  2021-10-08  4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
  2021-10-08 12:00 ` John Garry
@ 2021-10-12 16:41 ` Martin K. Petersen
  2021-10-12 18:05   ` Dexuan Cui
  2021-10-12 20:35 ` Martin K. Petersen
  2 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2021-10-12 16:41 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: kys, sthemmin, wei.liu, jejb, martin.petersen, haiyangz,
	ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
	longli, mikelley, linux-kernel, stable


Dexuan,

> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.

I queued this up as a short term workaround. However, I am hoping that
the rework of the scaling code in storvsc lands soon.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  2021-10-12 16:41 ` Martin K. Petersen
@ 2021-10-12 18:05   ` Dexuan Cui
  2021-10-12 21:27     ` Martin K. Petersen
  0 siblings, 1 reply; 7+ messages in thread
From: Dexuan Cui @ 2021-10-12 18:05 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: KY Srinivasan, Stephen Hemminger, wei.liu, jejb, Haiyang Zhang,
	ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
	Long Li, Michael Kelley, linux-kernel, stable

> From: Martin K. Petersen <martin.petersen@oracle.com>
> Sent: Tuesday, October 12, 2021 9:42 AM
> 
> Dexuan,
> 
> > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> > shost->cmd_per_lun to a negative "short" value.
> >
> > Use min_t(int, ...) to fix the issue.
> 
> I queued this up as a short term workaround. However, I am hoping that
> the rework of the scaling code in storvsc lands soon.

Thanks, Martin! I know Michael Kelley will improve the netvsc.

Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to
set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should
change scsi_driver.can_queue from "int" to "u16"? BTW, I guess the "cmd_per_lun"
should also be "u16" rather than "short"?

This was discussed in May, and it looks like the conclusion was not clear to me:
https://lwn.net/ml/linux-kernel/457d23a9-deb0-4ee1-fe7f-5a63605d9686@huawei.com/

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  2021-10-08  4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
  2021-10-08 12:00 ` John Garry
  2021-10-12 16:41 ` Martin K. Petersen
@ 2021-10-12 20:35 ` Martin K. Petersen
  2 siblings, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2021-10-12 20:35 UTC (permalink / raw)
  To: bvanassche, linux-scsi, ming.lei, john.garry, haiyangz,
	Dexuan Cui, mikelley, linux-hyperv, kys, longli, jejb, sthemmin,
	wei.liu
  Cc: Martin K . Petersen, linux-kernel, stable

On Thu, 7 Oct 2021 21:35:46 -0700, Dexuan Cui wrote:

> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
> 
> Use min_t(int, ...) to fix the issue.
> 
> [...]

Applied to 5.15/scsi-fixes, thanks!

[1/1] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
      https://git.kernel.org/mkp/scsi/c/50b6cb351636

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  2021-10-12 18:05   ` Dexuan Cui
@ 2021-10-12 21:27     ` Martin K. Petersen
  2021-10-12 21:43       ` Dexuan Cui
  0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2021-10-12 21:27 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Martin K. Petersen, KY Srinivasan, Stephen Hemminger, wei.liu,
	jejb, Haiyang Zhang, ming.lei, bvanassche, john.garry,
	linux-scsi, linux-hyperv, Long Li, Michael Kelley, linux-kernel,
	stable


Dexuan,

> Regarding this patch, I'm not sure if it's a "workaround": if it's
> incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value,
> probably we should change scsi_driver.can_queue from "int" to "u16"?

> BTW, I guess the "cmd_per_lun" should also be "u16" rather than
> "short"?

I agree that it would be nice to get all this cleaned up. Several,
somewhat peculiar, 25-year old design choices.

cmd_per_lun has traditionally been in the ballpark of low hundreds,
can_queue typically in the low thousands. And the block layer currently
caps at ~10K. Happy to take patches fixing this up, although I am a bit
worried about how much churn it will generate.

That said, I do think that cleaning this up is somewhat orthogonal to
the issue with storvsc. I suspect that allowing a huge amount of
concurrent outstanding commands is going to be detrimental to
performance for most workloads. And from that perspective I think that
the short->int fix, while valid given the type discrepancy, is just
treating the symptom.

Therefore I consider the short->int fix a workaround. And the proper fix
involves looking closely at things are scaled in the storvsc case. Which
I have noted that Michael is working on.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  2021-10-12 21:27     ` Martin K. Petersen
@ 2021-10-12 21:43       ` Dexuan Cui
  0 siblings, 0 replies; 7+ messages in thread
From: Dexuan Cui @ 2021-10-12 21:43 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: KY Srinivasan, Stephen Hemminger, wei.liu, jejb, Haiyang Zhang,
	ming.lei, bvanassche, john.garry, linux-scsi, linux-hyperv,
	Long Li, Michael Kelley, linux-kernel, stable

> From: Martin K. Petersen <martin.petersen@oracle.com>
> Sent: Tuesday, October 12, 2021 2:28 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> > Regarding this patch, I'm not sure if it's a "workaround": if it's
> > incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value,
> > probably we should change scsi_driver.can_queue from "int" to "u16"?
> 
> > BTW, I guess the "cmd_per_lun" should also be "u16" rather than
> > "short"?
> 
> I agree that it would be nice to get all this cleaned up. Several,
> somewhat peculiar, 25-year old design choices.
> 
> cmd_per_lun has traditionally been in the ballpark of low hundreds,
> can_queue typically in the low thousands. And the block layer currently
> caps at ~10K. Happy to take patches fixing this up, although I am a bit
> worried about how much churn it will generate.

Thanks for the explanation!
 
> That said, I do think that cleaning this up is somewhat orthogonal to
> the issue with storvsc. I suspect that allowing a huge amount of
> concurrent outstanding commands is going to be detrimental to
> performance for most workloads. And from that perspective I think that
> the short->int fix, while valid given the type discrepancy, is just
> treating the symptom.

Agreed.
 
> Therefore I consider the short->int fix a workaround. And the proper fix
> involves looking closely at things are scaled in the storvsc case. Which
> I have noted that Michael is working on.

Agreed. My v1 actually tried to work around the storvsc driver instread. :-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-12 21:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08  4:35 [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Dexuan Cui
2021-10-08 12:00 ` John Garry
2021-10-12 16:41 ` Martin K. Petersen
2021-10-12 18:05   ` Dexuan Cui
2021-10-12 21:27     ` Martin K. Petersen
2021-10-12 21:43       ` Dexuan Cui
2021-10-12 20:35 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).