* REQ_HIPRI and SCSI mid-level @ 2021-05-21 21:56 Douglas Gilbert 2021-05-25 16:03 ` Douglas Gilbert 2021-05-26 0:34 ` Ming Lei 0 siblings, 2 replies; 7+ messages in thread From: Douglas Gilbert @ 2021-05-21 21:56 UTC (permalink / raw) To: linux-scsi Cc: Hannes Reinecke, James Bottomley, Martin K. Petersen, Kashyap Desai, Ming Lei The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) and assumes the user space (or some higher level) will be calling blk_poll() on requests marked with REQ_HIPRI and that will lead to their completion. In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen by searching for 'mq_poll'] with more to follow, I assume. I have tested blk_poll() on the scsi_debug driver using both fio and the new sg driver. It works well with one caveat: as long as there isn't an error. After fighting with that error processing from the ULD side (i.e. the new sg driver) and the LLD side I am concluding that the glue that holds them together, that is, the mid-level is not as REQ_HIPRI aware as it should be. Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c How can scsi_error.c re-issue requests _without_ taking into account that the original was issued with REQ_HIPRI ? Well I don't know but I'm pretty sure that is close to the area that I see causing problems (mainly lockups). As an example the scsi_debug driver has an in-use bitmap that when a new request arrives the code looks for an empty slot. Due to (incorrect) parameter setup that may fail. If the driver returns: device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; then I see lock-ups if the request in question has REQ_HIPRI set. If that is changed to: device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; then my user space test program sees that error and aborts showing the TASK SET FULL SCSI status. That is much better than a lockup ... Having played around with variants of the above for a few weeks, I'd like to throw this problem into the open :-) Suggestion: perhaps the eh could give up immediately on any request with REQ_HIPRI set (i.e. make it a higher level layer's problem). Doug Gilbert ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: REQ_HIPRI and SCSI mid-level 2021-05-21 21:56 REQ_HIPRI and SCSI mid-level Douglas Gilbert @ 2021-05-25 16:03 ` Douglas Gilbert 2021-05-27 15:43 ` Hannes Reinecke 2021-05-26 0:34 ` Ming Lei 1 sibling, 1 reply; 7+ messages in thread From: Douglas Gilbert @ 2021-05-25 16:03 UTC (permalink / raw) To: linux-scsi Cc: Hannes Reinecke, James Bottomley, Martin K. Petersen, Kashyap Desai, Ming Lei On 2021-05-21 5:56 p.m., Douglas Gilbert wrote: > The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) > and assumes the user space (or some higher level) will be calling > blk_poll() on requests marked with REQ_HIPRI and that will lead to their > completion. > > In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen > by searching for 'mq_poll'] with more to follow, I assume. I have tested > blk_poll() on the scsi_debug driver using both fio and the new sg driver. > It works well with one caveat: as long as there isn't an error. > After fighting with that error processing from the ULD side (i.e. the > new sg driver) and the LLD side I am concluding that the glue that > holds them together, that is, the mid-level is not as REQ_HIPRI aware > as it should be. > > Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c > How can scsi_error.c re-issue requests _without_ taking into account > that the original was issued with REQ_HIPRI ? Well I don't know but I'm > pretty sure that is close to the area that I see causing problems > (mainly lockups). > > As an example the scsi_debug driver has an in-use bitmap that when a new > request arrives the code looks for an empty slot. Due to (incorrect) > parameter setup that may fail. If the driver returns: > device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; > then I see lock-ups if the request in question has REQ_HIPRI set. > > If that is changed to: > device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; > then my user space test program sees that error and aborts showing the > TASK SET FULL SCSI status. That is much better than a lockup ... > > Having played around with variants of the above for a few weeks, I'd > like to throw this problem into the open :-) > > > Suggestion: perhaps the eh could give up immediately on any request > with REQ_HIPRI set (i.e. make it a higher level layer's problem). Hmmm, no response. Any LLD that decides to support blk_poll() and relies on the mid-level eh handling will get burnt by this. And those LLDs that do their own eh handling needs to take care not to replicate the mid-level's eh handling in this case. An even simpler approach for the mid-level eh handling to address (at least partially) this problem, is to do: rqq->cmd_flags &= ~REQ_HIPRI; on retried requests. Clearing the REQ_HIPRI flag, if present, should make the LLD behave the way the mid-level eh processing expects. Doug Gilbert ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: REQ_HIPRI and SCSI mid-level 2021-05-25 16:03 ` Douglas Gilbert @ 2021-05-27 15:43 ` Hannes Reinecke 2021-05-28 1:32 ` Ming Lei 0 siblings, 1 reply; 7+ messages in thread From: Hannes Reinecke @ 2021-05-27 15:43 UTC (permalink / raw) To: dgilbert, linux-scsi Cc: James Bottomley, Martin K. Petersen, Kashyap Desai, Ming Lei On 5/25/21 6:03 PM, Douglas Gilbert wrote: > On 2021-05-21 5:56 p.m., Douglas Gilbert wrote: >> The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) >> and assumes the user space (or some higher level) will be calling >> blk_poll() on requests marked with REQ_HIPRI and that will lead to their >> completion. >> >> In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen >> by searching for 'mq_poll'] with more to follow, I assume. I have tested >> blk_poll() on the scsi_debug driver using both fio and the new sg driver. >> It works well with one caveat: as long as there isn't an error. >> After fighting with that error processing from the ULD side (i.e. the >> new sg driver) and the LLD side I am concluding that the glue that >> holds them together, that is, the mid-level is not as REQ_HIPRI aware >> as it should be. >> >> Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c >> How can scsi_error.c re-issue requests _without_ taking into account >> that the original was issued with REQ_HIPRI ? Well I don't know but I'm >> pretty sure that is close to the area that I see causing problems >> (mainly lockups). >> >> As an example the scsi_debug driver has an in-use bitmap that when a new >> request arrives the code looks for an empty slot. Due to (incorrect) >> parameter setup that may fail. If the driver returns: >> device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; >> then I see lock-ups if the request in question has REQ_HIPRI set. >> >> If that is changed to: >> device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; >> then my user space test program sees that error and aborts showing the >> TASK SET FULL SCSI status. That is much better than a lockup ... >> That's because with the first result the command is requeued (due to DID_OK), whereas in the latter result the command is aborted (due to DID_ABORT). So the question really is whether we should retry the commands which have REQ_HIPRI set, or whether we shouldn't rather complete them with appropriate error code. A bit like enhanced BLOCK_PC requests, if you will. >> Having played around with variants of the above for a few weeks, I'd >> like to throw this problem into the open :-) >> >> >> Suggestion: perhaps the eh could give up immediately on any request >> with REQ_HIPRI set (i.e. make it a higher level layer's problem). Like I said above: it's not only scsi EH which would need to be modified, but possibly also the result evaluation in scsi_decide_disposition(); it's questionable whether a HIPRI command should be requeued at all. But this might even affect the NVMe folks; they do return commands with BLK_STS_RESOURCE, too. Maybe we should open a larger discussion on the block list. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: REQ_HIPRI and SCSI mid-level 2021-05-27 15:43 ` Hannes Reinecke @ 2021-05-28 1:32 ` Ming Lei 2021-06-01 12:19 ` Hannes Reinecke 0 siblings, 1 reply; 7+ messages in thread From: Ming Lei @ 2021-05-28 1:32 UTC (permalink / raw) To: Hannes Reinecke Cc: dgilbert, linux-scsi, James Bottomley, Martin K. Petersen, Kashyap Desai On Thu, May 27, 2021 at 05:43:07PM +0200, Hannes Reinecke wrote: > On 5/25/21 6:03 PM, Douglas Gilbert wrote: > > On 2021-05-21 5:56 p.m., Douglas Gilbert wrote: > > > The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) > > > and assumes the user space (or some higher level) will be calling > > > blk_poll() on requests marked with REQ_HIPRI and that will lead to their > > > completion. > > > > > > In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen > > > by searching for 'mq_poll'] with more to follow, I assume. I have tested > > > blk_poll() on the scsi_debug driver using both fio and the new sg driver. > > > It works well with one caveat: as long as there isn't an error. > > > After fighting with that error processing from the ULD side (i.e. the > > > new sg driver) and the LLD side I am concluding that the glue that > > > holds them together, that is, the mid-level is not as REQ_HIPRI aware > > > as it should be. > > > > > > Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c > > > How can scsi_error.c re-issue requests _without_ taking into account > > > that the original was issued with REQ_HIPRI ? Well I don't know but I'm > > > pretty sure that is close to the area that I see causing problems > > > (mainly lockups). > > > > > > As an example the scsi_debug driver has an in-use bitmap that when a new > > > request arrives the code looks for an empty slot. Due to (incorrect) > > > parameter setup that may fail. If the driver returns: > > > device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; > > > then I see lock-ups if the request in question has REQ_HIPRI set. > > > > > > If that is changed to: > > > device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; > > > then my user space test program sees that error and aborts showing the > > > TASK SET FULL SCSI status. That is much better than a lockup ... > > > > That's because with the first result the command is requeued (due to > DID_OK), whereas in the latter result the command is aborted (due to > DID_ABORT). > > So the question really is whether we should retry the commands which have > REQ_HIPRI set, or whether we shouldn't rather complete them with appropriate > error code. > A bit like enhanced BLOCK_PC requests, if you will. > > > > Having played around with variants of the above for a few weeks, I'd > > > like to throw this problem into the open :-) > > > > > > > > > Suggestion: perhaps the eh could give up immediately on any request > > > with REQ_HIPRI set (i.e. make it a higher level layer's problem). > > Like I said above: it's not only scsi EH which would need to be modified, > but possibly also the result evaluation in scsi_decide_disposition(); it's > questionable whether a HIPRI command should be requeued at all. Why can't HIPRI req be requeued? > > But this might even affect the NVMe folks; they do return commands with > BLK_STS_RESOURCE, too. Block layer will be responsible for re-queueing BLK_STS_RESOURCE requests, so still not understand why it is one issue for HIPRI req. Also rq->mq_hctx won't be changed since its allocation, blk_poll() will keep polling on the correct hw queue for reaping the IO. Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: REQ_HIPRI and SCSI mid-level 2021-05-28 1:32 ` Ming Lei @ 2021-06-01 12:19 ` Hannes Reinecke 2021-06-01 13:18 ` Ming Lei 0 siblings, 1 reply; 7+ messages in thread From: Hannes Reinecke @ 2021-06-01 12:19 UTC (permalink / raw) To: Ming Lei Cc: dgilbert, linux-scsi, James Bottomley, Martin K. Petersen, Kashyap Desai On 5/28/21 3:32 AM, Ming Lei wrote: > On Thu, May 27, 2021 at 05:43:07PM +0200, Hannes Reinecke wrote: >> On 5/25/21 6:03 PM, Douglas Gilbert wrote: >>> On 2021-05-21 5:56 p.m., Douglas Gilbert wrote: >>>> The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) >>>> and assumes the user space (or some higher level) will be calling >>>> blk_poll() on requests marked with REQ_HIPRI and that will lead to their >>>> completion. >>>> >>>> In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen >>>> by searching for 'mq_poll'] with more to follow, I assume. I have tested >>>> blk_poll() on the scsi_debug driver using both fio and the new sg driver. >>>> It works well with one caveat: as long as there isn't an error. >>>> After fighting with that error processing from the ULD side (i.e. the >>>> new sg driver) and the LLD side I am concluding that the glue that >>>> holds them together, that is, the mid-level is not as REQ_HIPRI aware >>>> as it should be. >>>> >>>> Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c >>>> How can scsi_error.c re-issue requests _without_ taking into account >>>> that the original was issued with REQ_HIPRI ? Well I don't know but I'm >>>> pretty sure that is close to the area that I see causing problems >>>> (mainly lockups). >>>> >>>> As an example the scsi_debug driver has an in-use bitmap that when a new >>>> request arrives the code looks for an empty slot. Due to (incorrect) >>>> parameter setup that may fail. If the driver returns: >>>> device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; >>>> then I see lock-ups if the request in question has REQ_HIPRI set. >>>> >>>> If that is changed to: >>>> device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; >>>> then my user space test program sees that error and aborts showing the >>>> TASK SET FULL SCSI status. That is much better than a lockup ... >>>> >> That's because with the first result the command is requeued (due to >> DID_OK), whereas in the latter result the command is aborted (due to >> DID_ABORT). >> >> So the question really is whether we should retry the commands which have >> REQ_HIPRI set, or whether we shouldn't rather complete them with appropriate >> error code. >> A bit like enhanced BLOCK_PC requests, if you will. >> >>>> Having played around with variants of the above for a few weeks, I'd >>>> like to throw this problem into the open :-) >>>> >>>> >>>> Suggestion: perhaps the eh could give up immediately on any request >>>> with REQ_HIPRI set (i.e. make it a higher level layer's problem). >> >> Like I said above: it's not only scsi EH which would need to be modified, >> but possibly also the result evaluation in scsi_decide_disposition(); it's >> questionable whether a HIPRI command should be requeued at all. > > Why can't HIPRI req be requeued? > Oh, it can. As I said: it's questionable; HIPRI / polled completions are just that, polling for completions. And a completion indicating a requeue is _still_ a completion. So one could argue that we should return here (as it's a completion, and we're polling for completion). >> >> But this might even affect the NVMe folks; they do return commands with >> BLK_STS_RESOURCE, too. > > Block layer will be responsible for re-queueing BLK_STS_RESOURCE requests, > so still not understand why it is one issue for HIPRI req. Also > rq->mq_hctx won't be changed since its allocation, blk_poll() > will keep polling on the correct hw queue for reaping the IO. > As mentioned above, I was talking about completions indicating a requeue. Requeues due to resource shortage on the initiator side would of course be requeued. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, 90409 Nürnberg GF: F. Imendörffer, HRB 36809 (AG Nürnberg) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: REQ_HIPRI and SCSI mid-level 2021-06-01 12:19 ` Hannes Reinecke @ 2021-06-01 13:18 ` Ming Lei 0 siblings, 0 replies; 7+ messages in thread From: Ming Lei @ 2021-06-01 13:18 UTC (permalink / raw) To: Hannes Reinecke Cc: dgilbert, linux-scsi, James Bottomley, Martin K. Petersen, Kashyap Desai On Tue, Jun 01, 2021 at 02:19:10PM +0200, Hannes Reinecke wrote: > On 5/28/21 3:32 AM, Ming Lei wrote: > > On Thu, May 27, 2021 at 05:43:07PM +0200, Hannes Reinecke wrote: > >> On 5/25/21 6:03 PM, Douglas Gilbert wrote: > >>> On 2021-05-21 5:56 p.m., Douglas Gilbert wrote: > >>>> The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) > >>>> and assumes the user space (or some higher level) will be calling > >>>> blk_poll() on requests marked with REQ_HIPRI and that will lead to their > >>>> completion. > >>>> > >>>> In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen > >>>> by searching for 'mq_poll'] with more to follow, I assume. I have tested > >>>> blk_poll() on the scsi_debug driver using both fio and the new sg driver. > >>>> It works well with one caveat: as long as there isn't an error. > >>>> After fighting with that error processing from the ULD side (i.e. the > >>>> new sg driver) and the LLD side I am concluding that the glue that > >>>> holds them together, that is, the mid-level is not as REQ_HIPRI aware > >>>> as it should be. > >>>> > >>>> Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c > >>>> How can scsi_error.c re-issue requests _without_ taking into account > >>>> that the original was issued with REQ_HIPRI ? Well I don't know but I'm > >>>> pretty sure that is close to the area that I see causing problems > >>>> (mainly lockups). > >>>> > >>>> As an example the scsi_debug driver has an in-use bitmap that when a new > >>>> request arrives the code looks for an empty slot. Due to (incorrect) > >>>> parameter setup that may fail. If the driver returns: > >>>> device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; > >>>> then I see lock-ups if the request in question has REQ_HIPRI set. > >>>> > >>>> If that is changed to: > >>>> device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; > >>>> then my user space test program sees that error and aborts showing the > >>>> TASK SET FULL SCSI status. That is much better than a lockup ... > >>>> > >> That's because with the first result the command is requeued (due to > >> DID_OK), whereas in the latter result the command is aborted (due to > >> DID_ABORT). > >> > >> So the question really is whether we should retry the commands which have > >> REQ_HIPRI set, or whether we shouldn't rather complete them with appropriate > >> error code. > >> A bit like enhanced BLOCK_PC requests, if you will. > >> > >>>> Having played around with variants of the above for a few weeks, I'd > >>>> like to throw this problem into the open :-) > >>>> > >>>> > >>>> Suggestion: perhaps the eh could give up immediately on any request > >>>> with REQ_HIPRI set (i.e. make it a higher level layer's problem). > >> > >> Like I said above: it's not only scsi EH which would need to be modified, > >> but possibly also the result evaluation in scsi_decide_disposition(); it's > >> questionable whether a HIPRI command should be requeued at all. > > > > Why can't HIPRI req be requeued? > > > Oh, it can. > As I said: it's questionable; HIPRI / polled completions are just that, > polling for completions. And a completion indicating a requeue is > _still_ a completion. > So one could argue that we should return here (as it's a completion, and > we're polling for completion). No, blk_poll() actually poll for bio's completion instead of request's completion. If the submitted HIPRI bio isn't completed, the polling won't be stopped. > > >> > >> But this might even affect the NVMe folks; they do return commands with > >> BLK_STS_RESOURCE, too. > > > > Block layer will be responsible for re-queueing BLK_STS_RESOURCE requests, > > so still not understand why it is one issue for HIPRI req. Also > > rq->mq_hctx won't be changed since its allocation, blk_poll() > > will keep polling on the correct hw queue for reaping the IO. > > > As mentioned above, I was talking about completions indicating a requeue. > Requeues due to resource shortage on the initiator side would of course > be requeued. blk_poll() doesn't care request requeue, what matters is that if the HIPRI bio is completed or not. So either timeout or EH codes needn't to handle HIPRI IO specially. Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: REQ_HIPRI and SCSI mid-level 2021-05-21 21:56 REQ_HIPRI and SCSI mid-level Douglas Gilbert 2021-05-25 16:03 ` Douglas Gilbert @ 2021-05-26 0:34 ` Ming Lei 1 sibling, 0 replies; 7+ messages in thread From: Ming Lei @ 2021-05-26 0:34 UTC (permalink / raw) To: Douglas Gilbert Cc: linux-scsi, Hannes Reinecke, James Bottomley, Martin K. Petersen, Kashyap Desai On Fri, May 21, 2021 at 05:56:19PM -0400, Douglas Gilbert wrote: > The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll) > and assumes the user space (or some higher level) will be calling > blk_poll() on requests marked with REQ_HIPRI and that will lead to their > completion. > > In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen > by searching for 'mq_poll'] with more to follow, I assume. I have tested > blk_poll() on the scsi_debug driver using both fio and the new sg driver. > It works well with one caveat: as long as there isn't an error. > After fighting with that error processing from the ULD side (i.e. the > new sg driver) and the LLD side I am concluding that the glue that > holds them together, that is, the mid-level is not as REQ_HIPRI aware > as it should be. > > Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c > How can scsi_error.c re-issue requests _without_ taking into account > that the original was issued with REQ_HIPRI ? Well I don't know but I'm > pretty sure that is close to the area that I see causing problems > (mainly lockups). > > As an example the scsi_debug driver has an in-use bitmap that when a new > request arrives the code looks for an empty slot. Due to (incorrect) > parameter setup that may fail. If the driver returns: > device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL; > then I see lock-ups if the request in question has REQ_HIPRI set. > > If that is changed to: > device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL; > then my user space test program sees that error and aborts showing the > TASK SET FULL SCSI status. That is much better than a lockup ... > > Having played around with variants of the above for a few weeks, I'd > like to throw this problem into the open :-) > > > Suggestion: perhaps the eh could give up immediately on any request > with REQ_HIPRI set (i.e. make it a higher level layer's problem). One invariant is that the polling will be kept as running until the associated iocb/bio is completed. So I understand it should be fine for timeout handler /EH to ignore REQ_HIPRI. That said the associated iocb/bio will be reaped by upper layer if EH/timeout handler makes progress. Or can you explain the scsi-debug REQ_HIPRI lockup in a bit detail? Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-06-01 13:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-21 21:56 REQ_HIPRI and SCSI mid-level Douglas Gilbert 2021-05-25 16:03 ` Douglas Gilbert 2021-05-27 15:43 ` Hannes Reinecke 2021-05-28 1:32 ` Ming Lei 2021-06-01 12:19 ` Hannes Reinecke 2021-06-01 13:18 ` Ming Lei 2021-05-26 0:34 ` Ming Lei
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.