All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: core: set result when the command cannot be dispatched
       [not found] <1554846371-33660-1-git-send-email-jalee@purestorage.com>
@ 2019-04-09 21:53 ` Jaesoo Lee
  2019-04-09 21:57   ` Jaesoo Lee
  2019-04-09 22:14   ` Bart Van Assche
  0 siblings, 2 replies; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-09 21:53 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert
  Cc: linux-scsi, linux-block, Roland Dreier

When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
Specifically, the bug is not setting result field of scsi_request correctly when
the dispatch of the command has been failed. Since the upper layer code
including the sg_io ioctl expects to receive any error status from result field
of scsi_request, the error is silently ignored and this could cause data
corruptions for some applications. This commit also fixes another bug that the
result field is not initialized when scsi_request is allocated.

Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
---
 block/scsi_ioctl.c      | 1 +
 drivers/scsi/scsi_lib.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 533f4ae..f2d7979 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
        req->cmd = req->__cmd;
        req->cmd_len = BLK_MAX_CDB;
        req->sense_len = 0;
+       req->result = 0;
 }
 EXPORT_SYMBOL(scsi_req_init);

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2018967..af1488d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
blk_mq_hw_ctx *hctx,
                        ret = BLK_STS_DEV_RESOURCE;
                break;
        default:
+               scsi_req(req)->result = DID_NO_CONNECT << 16;
                /*
                 * Make sure to release all allocated ressources when
                 * we hit an error, as we will never see this command
--
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
  2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
@ 2019-04-09 21:57   ` Jaesoo Lee
  2019-04-09 22:14   ` Bart Van Assche
  1 sibling, 0 replies; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-09 21:57 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert
  Cc: linux-scsi, linux-block, Roland Dreier

Hello,

This is the test results.

0. Kernel configs
Version: 5.1-rc1
Boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y

1. Normal state
: (As expected) The command succeeded

$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
$

2. Immediately after bringing down the iSCSI interface at the target
: (As expected) Failed with DID_TRANSPORT_DISRUPTED after a few seconds
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: transport: Host_status=0x0e [DID_TRANSPORT_DISRUPTED]
Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]

Write same(10) command failed

3. Immediately after the DID_TRANSPORT_DISRUPTED error
: (As expected) Failed with DID_NO_CONNECT after a few seconds
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: transport: Host_status=0x01 [DID_NO_CONNECT]
Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]

Write same(10) command failed

4. Issued IO again
: (As expected) The command failed
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: pass through os error: No such device or address
Write same(10) command failed

Thanks,

Jaesoo Lee.

On Tue, Apr 9, 2019 at 2:53 PM Jaesoo Lee <jalee@purestorage.com> wrote:
>
> When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> Specifically, the bug is not setting result field of scsi_request correctly when
> the dispatch of the command has been failed. Since the upper layer code
> including the sg_io ioctl expects to receive any error status from result field
> of scsi_request, the error is silently ignored and this could cause data
> corruptions for some applications. This commit also fixes another bug that the
> result field is not initialized when scsi_request is allocated.
>
> Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> ---
>  block/scsi_ioctl.c      | 1 +
>  drivers/scsi/scsi_lib.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 533f4ae..f2d7979 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
>         req->cmd = req->__cmd;
>         req->cmd_len = BLK_MAX_CDB;
>         req->sense_len = 0;
> +       req->result = 0;
>  }
>  EXPORT_SYMBOL(scsi_req_init);
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..af1488d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
>                         ret = BLK_STS_DEV_RESOURCE;
>                 break;
>         default:
> +               scsi_req(req)->result = DID_NO_CONNECT << 16;
>                 /*
>                  * Make sure to release all allocated ressources when
>                  * we hit an error, as we will never see this command
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
  2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
  2019-04-09 21:57   ` Jaesoo Lee
@ 2019-04-09 22:14   ` Bart Van Assche
  2019-04-09 23:29     ` Jaesoo Lee
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-04-09 22:14 UTC (permalink / raw)
  To: Jaesoo Lee, James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
	Douglas Gilbert
  Cc: linux-scsi, linux-block, Roland Dreier

On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> Specifically, the bug is not setting result field of scsi_request correctly when
> the dispatch of the command has been failed. Since the upper layer code
> including the sg_io ioctl expects to receive any error status from result field
> of scsi_request, the error is silently ignored and this could cause data
> corruptions for some applications. This commit also fixes another bug that the
> result field is not initialized when scsi_request is allocated.
> 
> Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> ---
>  block/scsi_ioctl.c      | 1 +
>  drivers/scsi/scsi_lib.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 533f4ae..f2d7979 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
>         req->cmd = req->__cmd;
>         req->cmd_len = BLK_MAX_CDB;
>         req->sense_len = 0;
> +       req->result = 0;
>  }
>  EXPORT_SYMBOL(scsi_req_init);

What makes you think that this assignment is necessary?

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..af1488d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
>                         ret = BLK_STS_DEV_RESOURCE;
>                 break;
>         default:
> +               scsi_req(req)->result = DID_NO_CONNECT << 16;
>                 /*
>                  * Make sure to release all allocated ressources when
>                  * we hit an error, as we will never see this command

What leads you to the conclusion that (ret != BLK_STS_OK &&
ret != BLK_STS_RESOUCE) means that there is a connectivity issue?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
  2019-04-09 22:14   ` Bart Van Assche
@ 2019-04-09 23:29     ` Jaesoo Lee
  2019-04-09 23:44       ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-09 23:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
	Douglas Gilbert, linux-scsi, linux-block, Roland Dreier

Let me comment in line.

On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> > Specifically, the bug is not setting result field of scsi_request correctly when
> > the dispatch of the command has been failed. Since the upper layer code
> > including the sg_io ioctl expects to receive any error status from result field
> > of scsi_request, the error is silently ignored and this could cause data
> > corruptions for some applications. This commit also fixes another bug that the
> > result field is not initialized when scsi_request is allocated.
> >
> > Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> > ---
> >  block/scsi_ioctl.c      | 1 +
> >  drivers/scsi/scsi_lib.c | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > index 533f4ae..f2d7979 100644
> > --- a/block/scsi_ioctl.c
> > +++ b/block/scsi_ioctl.c
> > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> >         req->cmd = req->__cmd;
> >         req->cmd_len = BLK_MAX_CDB;
> >         req->sense_len = 0;
> > +       req->result = 0;
> >  }
> >  EXPORT_SYMBOL(scsi_req_init);
>
> What makes you think that this assignment is necessary?
>

Actually, I discovered this before fixing this bug and we might not
see this problem anymore once this bug is fixed.

Previously, since we are not setting scsi_req(req)->result in
scsi_queue_rq, I found that the application could receive another
DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
is allocated for the IO.

Please let me know if I need to remove this change.

> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 2018967..af1488d 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > blk_mq_hw_ctx *hctx,
> >                         ret = BLK_STS_DEV_RESOURCE;
> >                 break;
> >         default:
> > +               scsi_req(req)->result = DID_NO_CONNECT << 16;
> >                 /*
> >                  * Make sure to release all allocated ressources when
> >                  * we hit an error, as we will never see this command
>
> What leads you to the conclusion that (ret != BLK_STS_OK &&
> ret != BLK_STS_RESOUCE) means that there is a connectivity issue?

I found this is what we are doing for legacy queue case; I referred to
scsi_prep_return() and scsi_kill_request() code where we always
returning DID_NO_CONNECT.

However, I think proper return code handling should be something like:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2018967..21e516e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
blk_mq_hw_ctx *hctx,
                        ret = BLK_STS_DEV_RESOURCE;
                break;
        default:
+               if (unlikely(!scsi_device_online(sdev)))
+                       scsi_req(req)->result = DID_NO_CONNECT << 16;
+               else
+                       scsi_req(req)->result = DID_ERROR << 16;
                /*
                 * Make sure to release all allocated ressources when
                 * we hit an error, as we will never see this command

>
> Thanks,
>
> Bart.

Thanks,

Jaesoo.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
  2019-04-09 23:29     ` Jaesoo Lee
@ 2019-04-09 23:44       ` Bart Van Assche
  2019-04-10  0:02         ` Jaesoo Lee
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-04-09 23:44 UTC (permalink / raw)
  To: Jaesoo Lee
  Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
	Douglas Gilbert, linux-scsi, linux-block, Roland Dreier

On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote:
> Let me comment in line.
> 
> On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
> > 
> > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> > > Specifically, the bug is not setting result field of scsi_request correctly when
> > > the dispatch of the command has been failed. Since the upper layer code
> > > including the sg_io ioctl expects to receive any error status from result field
> > > of scsi_request, the error is silently ignored and this could cause data
> > > corruptions for some applications. This commit also fixes another bug that the
> > > result field is not initialized when scsi_request is allocated.
> > > 
> > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> > > ---
> > >  block/scsi_ioctl.c      | 1 +
> > >  drivers/scsi/scsi_lib.c | 1 +
> > >  2 files changed, 2 insertions(+)
> > > 
> > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > > index 533f4ae..f2d7979 100644
> > > --- a/block/scsi_ioctl.c
> > > +++ b/block/scsi_ioctl.c
> > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> > >         req->cmd = req->__cmd;
> > >         req->cmd_len = BLK_MAX_CDB;
> > >         req->sense_len = 0;
> > > +       req->result = 0;
> > >  }
> > >  EXPORT_SYMBOL(scsi_req_init);
> > 
> > What makes you think that this assignment is necessary?
> > 
> 
> Actually, I discovered this before fixing this bug and we might not
> see this problem anymore once this bug is fixed.
> 
> Previously, since we are not setting scsi_req(req)->result in
> scsi_queue_rq, I found that the application could receive another
> DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
> is allocated for the IO.
> 
> Please let me know if I need to remove this change.

Since SCSI LLDs have to set that result variable anyway if a request
completes successfully I'd prefer not to add that assignment.

> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 2018967..af1488d 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > > blk_mq_hw_ctx *hctx,
> > >                         ret = BLK_STS_DEV_RESOURCE;
> > >                 break;
> > >         default:
> > > +               scsi_req(req)->result = DID_NO_CONNECT << 16;
> > >                 /*
> > >                  * Make sure to release all allocated ressources when
> > >                  * we hit an error, as we will never see this command
> > 
> > What leads you to the conclusion that (ret != BLK_STS_OK &&
> > ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
> 
> I found this is what we are doing for legacy queue case; I referred to
> scsi_prep_return() and scsi_kill_request() code where we always
> returning DID_NO_CONNECT.
> 
> However, I think proper return code handling should be something like:
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..21e516e 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
>                         ret = BLK_STS_DEV_RESOURCE;
>                 break;
>         default:
> +               if (unlikely(!scsi_device_online(sdev)))
> +                       scsi_req(req)->result = DID_NO_CONNECT << 16;
> +               else
> +                       scsi_req(req)->result = DID_ERROR << 16;
>                 /*
>                  * Make sure to release all allocated ressources when
>                  * we hit an error, as we will never see this command

The above looks better to me than the original patch.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
  2019-04-09 23:44       ` Bart Van Assche
@ 2019-04-10  0:02         ` Jaesoo Lee
  0 siblings, 0 replies; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-10  0:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
	Douglas Gilbert, linux-scsi, linux-block, Roland Dreier

Let me send v2 addressing your comments.

Thanks,

Jaesoo Lee.

On Tue, Apr 9, 2019 at 4:45 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote:
> > Let me comment in line.
> >
> > On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
> > >
> > > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> > > > Specifically, the bug is not setting result field of scsi_request correctly when
> > > > the dispatch of the command has been failed. Since the upper layer code
> > > > including the sg_io ioctl expects to receive any error status from result field
> > > > of scsi_request, the error is silently ignored and this could cause data
> > > > corruptions for some applications. This commit also fixes another bug that the
> > > > result field is not initialized when scsi_request is allocated.
> > > >
> > > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> > > > ---
> > > >  block/scsi_ioctl.c      | 1 +
> > > >  drivers/scsi/scsi_lib.c | 1 +
> > > >  2 files changed, 2 insertions(+)
> > > >
> > > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > > > index 533f4ae..f2d7979 100644
> > > > --- a/block/scsi_ioctl.c
> > > > +++ b/block/scsi_ioctl.c
> > > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> > > >         req->cmd = req->__cmd;
> > > >         req->cmd_len = BLK_MAX_CDB;
> > > >         req->sense_len = 0;
> > > > +       req->result = 0;
> > > >  }
> > > >  EXPORT_SYMBOL(scsi_req_init);
> > >
> > > What makes you think that this assignment is necessary?
> > >
> >
> > Actually, I discovered this before fixing this bug and we might not
> > see this problem anymore once this bug is fixed.
> >
> > Previously, since we are not setting scsi_req(req)->result in
> > scsi_queue_rq, I found that the application could receive another
> > DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
> > is allocated for the IO.
> >
> > Please let me know if I need to remove this change.
>
> Since SCSI LLDs have to set that result variable anyway if a request
> completes successfully I'd prefer not to add that assignment.
>
> > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > index 2018967..af1488d 100644
> > > > --- a/drivers/scsi/scsi_lib.c
> > > > +++ b/drivers/scsi/scsi_lib.c
> > > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > >                         ret = BLK_STS_DEV_RESOURCE;
> > > >                 break;
> > > >         default:
> > > > +               scsi_req(req)->result = DID_NO_CONNECT << 16;
> > > >                 /*
> > > >                  * Make sure to release all allocated ressources when
> > > >                  * we hit an error, as we will never see this command
> > >
> > > What leads you to the conclusion that (ret != BLK_STS_OK &&
> > > ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
> >
> > I found this is what we are doing for legacy queue case; I referred to
> > scsi_prep_return() and scsi_kill_request() code where we always
> > returning DID_NO_CONNECT.
> >
> > However, I think proper return code handling should be something like:
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 2018967..21e516e 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
> > blk_mq_hw_ctx *hctx,
> >                         ret = BLK_STS_DEV_RESOURCE;
> >                 break;
> >         default:
> > +               if (unlikely(!scsi_device_online(sdev)))
> > +                       scsi_req(req)->result = DID_NO_CONNECT << 16;
> > +               else
> > +                       scsi_req(req)->result = DID_ERROR << 16;
> >                 /*
> >                  * Make sure to release all allocated ressources when
> >                  * we hit an error, as we will never see this command
>
> The above looks better to me than the original patch.
>
> Thanks,
>
> Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-10  0:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1554846371-33660-1-git-send-email-jalee@purestorage.com>
2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
2019-04-09 21:57   ` Jaesoo Lee
2019-04-09 22:14   ` Bart Van Assche
2019-04-09 23:29     ` Jaesoo Lee
2019-04-09 23:44       ` Bart Van Assche
2019-04-10  0:02         ` Jaesoo Lee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.