* [PATCH] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR @ 2021-04-12 23:45 Yuanyuan Zhong 2021-04-13 3:19 ` Chaitanya Kulkarni 0 siblings, 1 reply; 7+ messages in thread From: Yuanyuan Zhong @ 2021-04-12 23:45 UTC (permalink / raw) To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, cachen, Yuanyuan Zhong Block layer blk_end_sync_rq() silently drops blk_status_t error. When ->queue_rq returns error, nvme driver should set nvme status explicitly. Otherwise the passthrough command may take the stale status as result. A typical value zero will be interpreted as NVME_SC_SUCCESS, despite the dispatching error from ->queue_rq. Instead of trying to fix it for every error return, this change initialize the status to NVME_SC_HOST_PATH_ERROR during nvme_alloc_request(). Fixes: 27fa9bc54541 ("nvme: split nvme status from block req->errors") Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Casey Chen <cachen@purestorage.com> --- drivers/nvme/host/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 0896e21642be..f22323d02755 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -580,6 +580,7 @@ static inline void nvme_clear_nvme_request(struct request *req) nvme_req(req)->flags = 0; req->rq_flags |= RQF_DONTPREP; } + nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR; } static inline unsigned int nvme_req_op(struct nvme_command *cmd) -- 2.31.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR 2021-04-12 23:45 [PATCH] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR Yuanyuan Zhong @ 2021-04-13 3:19 ` Chaitanya Kulkarni 2021-04-14 17:08 ` [PATCH v2] " Yuanyuan Zhong 0 siblings, 1 reply; 7+ messages in thread From: Chaitanya Kulkarni @ 2021-04-13 3:19 UTC (permalink / raw) To: Yuanyuan Zhong; +Cc: kbusch, axboe, hch, sagi, linux-nvme, cachen On 4/12/21 16:55, Yuanyuan Zhong wrote: > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 0896e21642be..f22323d02755 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -580,6 +580,7 @@ static inline void nvme_clear_nvme_request(struct request *req) > nvme_req(req)->flags = 0; > req->rq_flags |= RQF_DONTPREP; > } > + nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR; > } > Since setting the status to the error value in not common when clearing the request this needs a good comment why we are doing it in nvme_clear_nvme_request(). _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR 2021-04-13 3:19 ` Chaitanya Kulkarni @ 2021-04-14 17:08 ` Yuanyuan Zhong 2021-04-15 19:30 ` Keith Busch 0 siblings, 1 reply; 7+ messages in thread From: Yuanyuan Zhong @ 2021-04-14 17:08 UTC (permalink / raw) To: Chaitanya.Kulkarni Cc: kbusch, axboe, hch, sagi, linux-nvme, cachen, Yuanyuan Zhong Block layer blk_end_sync_rq() silently drops blk_status_t error. When ->queue_rq returns error, nvme driver should set nvme status explicitly. Otherwise the passthrough command may take the stale status as result. A typical value zero will be interpreted as NVME_SC_SUCCESS, despite the dispatching error from ->queue_rq. Instead of trying to fix it for every error return, this change initialize the status to NVME_SC_HOST_PATH_ERROR during nvme_clear_nvme_request(). Fixes: 27fa9bc54541 ("nvme: split nvme status from block req->errors") Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Casey Chen <cachen@purestorage.com> --- drivers/nvme/host/core.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 0896e21642be..df7d9a515297 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -580,6 +580,14 @@ static inline void nvme_clear_nvme_request(struct request *req) nvme_req(req)->flags = 0; req->rq_flags |= RQF_DONTPREP; } + + /* + * We need to set nvme_req(req)->status whenever ->queue_rq() + * fails the command submission. Ideally an appropriate status + * should be set for each ->queue_rq() error case. + * This is a blanket failsafe to avoid omission. + */ + nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR; } static inline unsigned int nvme_req_op(struct nvme_command *cmd) -- 2.31.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR 2021-04-14 17:08 ` [PATCH v2] " Yuanyuan Zhong @ 2021-04-15 19:30 ` Keith Busch 2021-04-15 20:17 ` Keith Busch 0 siblings, 1 reply; 7+ messages in thread From: Keith Busch @ 2021-04-15 19:30 UTC (permalink / raw) To: Yuanyuan Zhong; +Cc: Chaitanya.Kulkarni, axboe, hch, sagi, linux-nvme, cachen On Wed, Apr 14, 2021 at 11:08:33AM -0600, Yuanyuan Zhong wrote: > Block layer blk_end_sync_rq() silently drops blk_status_t error. > When ->queue_rq returns error, nvme driver should set nvme status > explicitly. Otherwise the passthrough command may take the stale > status as result. A typical value zero will be interpreted as > NVME_SC_SUCCESS, despite the dispatching error from ->queue_rq. > > Instead of trying to fix it for every error return, this change > initialize the status to NVME_SC_HOST_PATH_ERROR during > nvme_clear_nvme_request(). It doesn't look like these types of errors are unique to nvme. Could we not just have blk_execute_rq() return an error instead? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR 2021-04-15 19:30 ` Keith Busch @ 2021-04-15 20:17 ` Keith Busch 2021-04-15 22:11 ` Yuanyuan Zhong 0 siblings, 1 reply; 7+ messages in thread From: Keith Busch @ 2021-04-15 20:17 UTC (permalink / raw) To: Yuanyuan Zhong; +Cc: Chaitanya.Kulkarni, axboe, hch, sagi, linux-nvme, cachen On Thu, Apr 15, 2021 at 12:30:30PM -0700, Keith Busch wrote: > On Wed, Apr 14, 2021 at 11:08:33AM -0600, Yuanyuan Zhong wrote: > > Block layer blk_end_sync_rq() silently drops blk_status_t error. > > When ->queue_rq returns error, nvme driver should set nvme status > > explicitly. Otherwise the passthrough command may take the stale > > status as result. A typical value zero will be interpreted as > > NVME_SC_SUCCESS, despite the dispatching error from ->queue_rq. > > > > Instead of trying to fix it for every error return, this change > > initialize the status to NVME_SC_HOST_PATH_ERROR during > > nvme_clear_nvme_request(). > > It doesn't look like these types of errors are unique to nvme. Could we > not just have blk_execute_rq() return an error instead? The following enables all callers to know if queue_rq() failed: --- diff --git a/block/blk-exec.c b/block/blk-exec.c index beae70a0e5e5..6d12524a91f9 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -21,7 +21,7 @@ static void blk_end_sync_rq(struct request *rq, blk_status_t error) { struct completion *waiting = rq->end_io_data; - rq->end_io_data = NULL; + rq->end_io_data = ERR_PTR(blk_status_to_errno(error)); /* * complete last, if this is a stack request the process (and thus @@ -73,7 +73,7 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait); * Insert a fully prepared request at the back of the I/O scheduler queue * for execution and wait for completion. */ -void blk_execute_rq(struct gendisk *bd_disk, struct request *rq, int at_head) +int blk_execute_rq(struct gendisk *bd_disk, struct request *rq, int at_head) { DECLARE_COMPLETION_ONSTACK(wait); unsigned long hang_check; @@ -87,5 +87,6 @@ void blk_execute_rq(struct gendisk *bd_disk, struct request *rq, int at_head) while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2))); else wait_for_completion_io(&wait); + return PTR_ERR_OR_ZERO(rq->end_io_data); } EXPORT_SYMBOL(blk_execute_rq); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1a6f89d1110c..9ed540d168af 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -933,7 +933,7 @@ extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, uns extern int blk_rq_map_user_iov(struct request_queue *, struct request *, struct rq_map_data *, const struct iov_iter *, gfp_t); -extern void blk_execute_rq(struct gendisk *, struct request *, int); +extern int blk_execute_rq(struct gendisk *, struct request *, int); extern void blk_execute_rq_nowait(struct gendisk *, struct request *, int, rq_end_io_fn *); -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR 2021-04-15 20:17 ` Keith Busch @ 2021-04-15 22:11 ` Yuanyuan Zhong 2021-04-15 22:58 ` Keith Busch 0 siblings, 1 reply; 7+ messages in thread From: Yuanyuan Zhong @ 2021-04-15 22:11 UTC (permalink / raw) To: Keith Busch; +Cc: Chaitanya.Kulkarni, axboe, hch, sagi, linux-nvme, Casey Chen > > It doesn't look like these types of errors are unique to nvme. Could we > > not just have blk_execute_rq() return an error instead? Looking at history of commit b7819b925918 ("block: remove the blk_execute_rq return value") and commit be549d491154 ("scsi: core: set result when the command cannot be dispatched"), it seems scsi code no longer have issue. > -extern void blk_execute_rq(struct gendisk *, struct request *, int); > +extern int blk_execute_rq(struct gendisk *, struct request *, int); Changing blk_execute_rq() prototype needs tree-wide callers update. While it could be a fix, I'd wait for maintainers to chime in. I don't quite like initializing the status for every nvme command. However for such a long standing bug across multiple stable releases, I think it will be an easy backport for stable-tree. -- Regards, Yuanyuan Zhong _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR 2021-04-15 22:11 ` Yuanyuan Zhong @ 2021-04-15 22:58 ` Keith Busch 0 siblings, 0 replies; 7+ messages in thread From: Keith Busch @ 2021-04-15 22:58 UTC (permalink / raw) To: Yuanyuan Zhong Cc: Chaitanya.Kulkarni, axboe, hch, sagi, linux-nvme, Casey Chen On Thu, Apr 15, 2021 at 03:11:13PM -0700, Yuanyuan Zhong wrote: > > > It doesn't look like these types of errors are unique to nvme. Could we > > > not just have blk_execute_rq() return an error instead? > Looking at history of commit b7819b925918 ("block: remove the blk_execute_rq > return value") and commit be549d491154 ("scsi: core: set result when the > command cannot be dispatched"), it seems scsi code no longer have issue. Those happened when 'struct request' carried an 'errors' field that a caller could check. The struct doesn't have that anymore. > > -extern void blk_execute_rq(struct gendisk *, struct request *, int); > > +extern int blk_execute_rq(struct gendisk *, struct request *, int); > Changing blk_execute_rq() prototype needs tree-wide callers update. Callers aren't relying on an error now, so I don't think this suggestion requires a tree-wide update. It might want an audit to see if existing callers are susceptible to the same bug you found with NVMe, but changing the return as suggested shouldn't introduce new issues. > While it could be a fix, I'd wait for maintainers to chime in. Yes, this does require concensus, but Linux is usually adaptable to changing kernal APIs when it makes sense. I *think* my suggestion makes sense, but I appreciate any feedback. > I don't quite like initializing the status for every nvme command. However for > such a long standing bug across multiple stable releases, I think it will be an > easy backport for stable-tree. Your proposal is easier for stable, however, the mainline fix doesn't necessarily need to consider that. We can always deal with a back-port if the mainline approach does not readily apply. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-04-15 22:58 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-12 23:45 [PATCH] nvme-core: initialize status to NVME_SC_HOST_PATH_ERROR Yuanyuan Zhong 2021-04-13 3:19 ` Chaitanya Kulkarni 2021-04-14 17:08 ` [PATCH v2] " Yuanyuan Zhong 2021-04-15 19:30 ` Keith Busch 2021-04-15 20:17 ` Keith Busch 2021-04-15 22:11 ` Yuanyuan Zhong 2021-04-15 22:58 ` Keith Busch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).