* [PATCH] scsi: core: set result when the command cannot be dispatched
[not found] <1554846371-33660-1-git-send-email-jalee@purestorage.com>
@ 2019-04-09 21:53 ` Jaesoo Lee
2019-04-09 21:57 ` Jaesoo Lee
2019-04-09 22:14 ` Bart Van Assche
0 siblings, 2 replies; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-09 21:53 UTC (permalink / raw)
To: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert
Cc: linux-scsi, linux-block, Roland Dreier
When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
Specifically, the bug is not setting result field of scsi_request correctly when
the dispatch of the command has been failed. Since the upper layer code
including the sg_io ioctl expects to receive any error status from result field
of scsi_request, the error is silently ignored and this could cause data
corruptions for some applications. This commit also fixes another bug that the
result field is not initialized when scsi_request is allocated.
Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
---
block/scsi_ioctl.c | 1 +
drivers/scsi/scsi_lib.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 533f4ae..f2d7979 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
req->cmd = req->__cmd;
req->cmd_len = BLK_MAX_CDB;
req->sense_len = 0;
+ req->result = 0;
}
EXPORT_SYMBOL(scsi_req_init);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2018967..af1488d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
blk_mq_hw_ctx *hctx,
ret = BLK_STS_DEV_RESOURCE;
break;
default:
+ scsi_req(req)->result = DID_NO_CONNECT << 16;
/*
* Make sure to release all allocated ressources when
* we hit an error, as we will never see this command
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
@ 2019-04-09 21:57 ` Jaesoo Lee
2019-04-09 22:14 ` Bart Van Assche
1 sibling, 0 replies; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-09 21:57 UTC (permalink / raw)
To: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert
Cc: linux-scsi, linux-block, Roland Dreier
Hello,
This is the test results.
0. Kernel configs
Version: 5.1-rc1
Boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y
1. Normal state
: (As expected) The command succeeded
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
$
2. Immediately after bringing down the iSCSI interface at the target
: (As expected) Failed with DID_TRANSPORT_DISRUPTED after a few seconds
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: transport: Host_status=0x0e [DID_TRANSPORT_DISRUPTED]
Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]
Write same(10) command failed
3. Immediately after the DID_TRANSPORT_DISRUPTED error
: (As expected) Failed with DID_NO_CONNECT after a few seconds
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: transport: Host_status=0x01 [DID_NO_CONNECT]
Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]
Write same(10) command failed
4. Issued IO again
: (As expected) The command failed
$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: pass through os error: No such device or address
Write same(10) command failed
Thanks,
Jaesoo Lee.
On Tue, Apr 9, 2019 at 2:53 PM Jaesoo Lee <jalee@purestorage.com> wrote:
>
> When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> Specifically, the bug is not setting result field of scsi_request correctly when
> the dispatch of the command has been failed. Since the upper layer code
> including the sg_io ioctl expects to receive any error status from result field
> of scsi_request, the error is silently ignored and this could cause data
> corruptions for some applications. This commit also fixes another bug that the
> result field is not initialized when scsi_request is allocated.
>
> Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> ---
> block/scsi_ioctl.c | 1 +
> drivers/scsi/scsi_lib.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 533f4ae..f2d7979 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> req->cmd = req->__cmd;
> req->cmd_len = BLK_MAX_CDB;
> req->sense_len = 0;
> + req->result = 0;
> }
> EXPORT_SYMBOL(scsi_req_init);
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..af1488d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
> ret = BLK_STS_DEV_RESOURCE;
> break;
> default:
> + scsi_req(req)->result = DID_NO_CONNECT << 16;
> /*
> * Make sure to release all allocated ressources when
> * we hit an error, as we will never see this command
> --
> 2.7.4
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
2019-04-09 21:57 ` Jaesoo Lee
@ 2019-04-09 22:14 ` Bart Van Assche
2019-04-09 23:29 ` Jaesoo Lee
1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-04-09 22:14 UTC (permalink / raw)
To: Jaesoo Lee, James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
Douglas Gilbert
Cc: linux-scsi, linux-block, Roland Dreier
On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> Specifically, the bug is not setting result field of scsi_request correctly when
> the dispatch of the command has been failed. Since the upper layer code
> including the sg_io ioctl expects to receive any error status from result field
> of scsi_request, the error is silently ignored and this could cause data
> corruptions for some applications. This commit also fixes another bug that the
> result field is not initialized when scsi_request is allocated.
>
> Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> ---
> block/scsi_ioctl.c | 1 +
> drivers/scsi/scsi_lib.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 533f4ae..f2d7979 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> req->cmd = req->__cmd;
> req->cmd_len = BLK_MAX_CDB;
> req->sense_len = 0;
> + req->result = 0;
> }
> EXPORT_SYMBOL(scsi_req_init);
What makes you think that this assignment is necessary?
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..af1488d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
> ret = BLK_STS_DEV_RESOURCE;
> break;
> default:
> + scsi_req(req)->result = DID_NO_CONNECT << 16;
> /*
> * Make sure to release all allocated ressources when
> * we hit an error, as we will never see this command
What leads you to the conclusion that (ret != BLK_STS_OK &&
ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
2019-04-09 22:14 ` Bart Van Assche
@ 2019-04-09 23:29 ` Jaesoo Lee
2019-04-09 23:44 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-09 23:29 UTC (permalink / raw)
To: Bart Van Assche
Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
Douglas Gilbert, linux-scsi, linux-block, Roland Dreier
Let me comment in line.
On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> > Specifically, the bug is not setting result field of scsi_request correctly when
> > the dispatch of the command has been failed. Since the upper layer code
> > including the sg_io ioctl expects to receive any error status from result field
> > of scsi_request, the error is silently ignored and this could cause data
> > corruptions for some applications. This commit also fixes another bug that the
> > result field is not initialized when scsi_request is allocated.
> >
> > Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> > ---
> > block/scsi_ioctl.c | 1 +
> > drivers/scsi/scsi_lib.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > index 533f4ae..f2d7979 100644
> > --- a/block/scsi_ioctl.c
> > +++ b/block/scsi_ioctl.c
> > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> > req->cmd = req->__cmd;
> > req->cmd_len = BLK_MAX_CDB;
> > req->sense_len = 0;
> > + req->result = 0;
> > }
> > EXPORT_SYMBOL(scsi_req_init);
>
> What makes you think that this assignment is necessary?
>
Actually, I discovered this before fixing this bug and we might not
see this problem anymore once this bug is fixed.
Previously, since we are not setting scsi_req(req)->result in
scsi_queue_rq, I found that the application could receive another
DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
is allocated for the IO.
Please let me know if I need to remove this change.
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 2018967..af1488d 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > blk_mq_hw_ctx *hctx,
> > ret = BLK_STS_DEV_RESOURCE;
> > break;
> > default:
> > + scsi_req(req)->result = DID_NO_CONNECT << 16;
> > /*
> > * Make sure to release all allocated ressources when
> > * we hit an error, as we will never see this command
>
> What leads you to the conclusion that (ret != BLK_STS_OK &&
> ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
I found this is what we are doing for legacy queue case; I referred to
scsi_prep_return() and scsi_kill_request() code where we always
returning DID_NO_CONNECT.
However, I think proper return code handling should be something like:
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2018967..21e516e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
blk_mq_hw_ctx *hctx,
ret = BLK_STS_DEV_RESOURCE;
break;
default:
+ if (unlikely(!scsi_device_online(sdev)))
+ scsi_req(req)->result = DID_NO_CONNECT << 16;
+ else
+ scsi_req(req)->result = DID_ERROR << 16;
/*
* Make sure to release all allocated ressources when
* we hit an error, as we will never see this command
>
> Thanks,
>
> Bart.
Thanks,
Jaesoo.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
2019-04-09 23:29 ` Jaesoo Lee
@ 2019-04-09 23:44 ` Bart Van Assche
2019-04-10 0:02 ` Jaesoo Lee
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-04-09 23:44 UTC (permalink / raw)
To: Jaesoo Lee
Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
Douglas Gilbert, linux-scsi, linux-block, Roland Dreier
On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote:
> Let me comment in line.
>
> On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> > > Specifically, the bug is not setting result field of scsi_request correctly when
> > > the dispatch of the command has been failed. Since the upper layer code
> > > including the sg_io ioctl expects to receive any error status from result field
> > > of scsi_request, the error is silently ignored and this could cause data
> > > corruptions for some applications. This commit also fixes another bug that the
> > > result field is not initialized when scsi_request is allocated.
> > >
> > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> > > ---
> > > block/scsi_ioctl.c | 1 +
> > > drivers/scsi/scsi_lib.c | 1 +
> > > 2 files changed, 2 insertions(+)
> > >
> > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > > index 533f4ae..f2d7979 100644
> > > --- a/block/scsi_ioctl.c
> > > +++ b/block/scsi_ioctl.c
> > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> > > req->cmd = req->__cmd;
> > > req->cmd_len = BLK_MAX_CDB;
> > > req->sense_len = 0;
> > > + req->result = 0;
> > > }
> > > EXPORT_SYMBOL(scsi_req_init);
> >
> > What makes you think that this assignment is necessary?
> >
>
> Actually, I discovered this before fixing this bug and we might not
> see this problem anymore once this bug is fixed.
>
> Previously, since we are not setting scsi_req(req)->result in
> scsi_queue_rq, I found that the application could receive another
> DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
> is allocated for the IO.
>
> Please let me know if I need to remove this change.
Since SCSI LLDs have to set that result variable anyway if a request
completes successfully I'd prefer not to add that assignment.
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 2018967..af1488d 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > > blk_mq_hw_ctx *hctx,
> > > ret = BLK_STS_DEV_RESOURCE;
> > > break;
> > > default:
> > > + scsi_req(req)->result = DID_NO_CONNECT << 16;
> > > /*
> > > * Make sure to release all allocated ressources when
> > > * we hit an error, as we will never see this command
> >
> > What leads you to the conclusion that (ret != BLK_STS_OK &&
> > ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
>
> I found this is what we are doing for legacy queue case; I referred to
> scsi_prep_return() and scsi_kill_request() code where we always
> returning DID_NO_CONNECT.
>
> However, I think proper return code handling should be something like:
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..21e516e 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
> ret = BLK_STS_DEV_RESOURCE;
> break;
> default:
> + if (unlikely(!scsi_device_online(sdev)))
> + scsi_req(req)->result = DID_NO_CONNECT << 16;
> + else
> + scsi_req(req)->result = DID_ERROR << 16;
> /*
> * Make sure to release all allocated ressources when
> * we hit an error, as we will never see this command
The above looks better to me than the original patch.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched
2019-04-09 23:44 ` Bart Van Assche
@ 2019-04-10 0:02 ` Jaesoo Lee
0 siblings, 0 replies; 6+ messages in thread
From: Jaesoo Lee @ 2019-04-10 0:02 UTC (permalink / raw)
To: Bart Van Assche
Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
Douglas Gilbert, linux-scsi, linux-block, Roland Dreier
Let me send v2 addressing your comments.
Thanks,
Jaesoo Lee.
On Tue, Apr 9, 2019 at 4:45 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote:
> > Let me comment in line.
> >
> > On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
> > >
> > > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq.
> > > > Specifically, the bug is not setting result field of scsi_request correctly when
> > > > the dispatch of the command has been failed. Since the upper layer code
> > > > including the sg_io ioctl expects to receive any error status from result field
> > > > of scsi_request, the error is silently ignored and this could cause data
> > > > corruptions for some applications. This commit also fixes another bug that the
> > > > result field is not initialized when scsi_request is allocated.
> > > >
> > > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
> > > > ---
> > > > block/scsi_ioctl.c | 1 +
> > > > drivers/scsi/scsi_lib.c | 1 +
> > > > 2 files changed, 2 insertions(+)
> > > >
> > > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > > > index 533f4ae..f2d7979 100644
> > > > --- a/block/scsi_ioctl.c
> > > > +++ b/block/scsi_ioctl.c
> > > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> > > > req->cmd = req->__cmd;
> > > > req->cmd_len = BLK_MAX_CDB;
> > > > req->sense_len = 0;
> > > > + req->result = 0;
> > > > }
> > > > EXPORT_SYMBOL(scsi_req_init);
> > >
> > > What makes you think that this assignment is necessary?
> > >
> >
> > Actually, I discovered this before fixing this bug and we might not
> > see this problem anymore once this bug is fixed.
> >
> > Previously, since we are not setting scsi_req(req)->result in
> > scsi_queue_rq, I found that the application could receive another
> > DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
> > is allocated for the IO.
> >
> > Please let me know if I need to remove this change.
>
> Since SCSI LLDs have to set that result variable anyway if a request
> completes successfully I'd prefer not to add that assignment.
>
> > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > index 2018967..af1488d 100644
> > > > --- a/drivers/scsi/scsi_lib.c
> > > > +++ b/drivers/scsi/scsi_lib.c
> > > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > ret = BLK_STS_DEV_RESOURCE;
> > > > break;
> > > > default:
> > > > + scsi_req(req)->result = DID_NO_CONNECT << 16;
> > > > /*
> > > > * Make sure to release all allocated ressources when
> > > > * we hit an error, as we will never see this command
> > >
> > > What leads you to the conclusion that (ret != BLK_STS_OK &&
> > > ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
> >
> > I found this is what we are doing for legacy queue case; I referred to
> > scsi_prep_return() and scsi_kill_request() code where we always
> > returning DID_NO_CONNECT.
> >
> > However, I think proper return code handling should be something like:
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 2018967..21e516e 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
> > blk_mq_hw_ctx *hctx,
> > ret = BLK_STS_DEV_RESOURCE;
> > break;
> > default:
> > + if (unlikely(!scsi_device_online(sdev)))
> > + scsi_req(req)->result = DID_NO_CONNECT << 16;
> > + else
> > + scsi_req(req)->result = DID_ERROR << 16;
> > /*
> > * Make sure to release all allocated ressources when
> > * we hit an error, as we will never see this command
>
> The above looks better to me than the original patch.
>
> Thanks,
>
> Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-04-10 0:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1554846371-33660-1-git-send-email-jalee@purestorage.com>
2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
2019-04-09 21:57 ` Jaesoo Lee
2019-04-09 22:14 ` Bart Van Assche
2019-04-09 23:29 ` Jaesoo Lee
2019-04-09 23:44 ` Bart Van Assche
2019-04-10 0:02 ` Jaesoo Lee
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.