* [PATCH V3 1/7] ublk_drv: check 'current' instead of 'ubq_daemon'
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-13 4:17 ` [PATCH V3 2/7] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
` (6 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
This check is not atomic. So with recovery feature, ubq_daemon may be
modified simultaneously by recovery task. Instead, check 'current' is
safe here because 'current' never changes.
Also add comment explaining this check, which is really important for
understanding recovery feature.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6a4a94b4cdf4..c39b67d7133d 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -645,14 +645,22 @@ static inline void __ublk_rq_task_work(struct request *req)
struct ublk_device *ub = ubq->dev;
int tag = req->tag;
struct ublk_io *io = &ubq->ios[tag];
- bool task_exiting = current != ubq->ubq_daemon || ubq_daemon_is_dying(ubq);
unsigned int mapped_bytes;
pr_devel("%s: complete: op %d, qid %d tag %d io_flags %x addr %llx\n",
__func__, io->cmd->cmd_op, ubq->q_id, req->tag, io->flags,
ublk_get_iod(ubq, req->tag)->addr);
- if (unlikely(task_exiting)) {
+ /*
+ * Task is exiting if either:
+ *
+ * (1) current != ubq_daemon.
+ * io_uring_cmd_complete_in_task() tries to run task_work
+ * in a workqueue if ubq_daemon(cmd's task) is PF_EXITING.
+ *
+ * (2) current->flags & PF_EXITING.
+ */
+ if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
blk_mq_end_request(req, BLK_STS_IOERR);
mod_delayed_work(system_wq, &ub->monitor_work, 0);
return;
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH V3 2/7] ublk_drv: refactor ublk_cancel_queue()
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
2022-09-13 4:17 ` [PATCH V3 1/7] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-13 4:17 ` [PATCH V3 3/7] ublk_drv: define macros for recovery feature and check them ZiyangZhang
` (5 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
Assume only a few FETCH_REQ ioucmds are sent to ublk_drv, then the
ubq_daemon exits, We have to call io_uring_cmd_done() for all ioucmds
received so that io_uring ctx will not leak.
ublk_cancel_queue() may be called before START_DEV or after STOP_DEV,
we decrease ubq->nr_io_ready and clear UBLK_IO_FLAG_ACTIVE so that we
won't call io_uring_cmd_done() twice for one ioucmd to avoid UAF. Also
clearing UBLK_IO_FLAG_ACTIVE makes the code more reasonable.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index c39b67d7133d..0c6db0978ed0 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -963,22 +963,39 @@ static inline bool ublk_queue_ready(struct ublk_queue *ubq)
return ubq->nr_io_ready == ubq->q_depth;
}
+/* If ublk_cancel_queue() is called before sending START_DEV(), ->mutex
+ * provides protection on above update.
+ *
+ * If ublk_cancel_queue() is called after sending START_DEV(), disk is
+ * deleted first, UBLK_IO_RES_ABORT is returned so that any new io
+ * command can't be issued to driver, so updating on io flags and
+ * nr_io_ready is safe here.
+ *
+ * Also ->nr_io_ready is guaranteed to become zero after ublk_cance_queue()
+ * returns since request queue is either frozen or not present in both two
+ * cases.
+ */
static void ublk_cancel_queue(struct ublk_queue *ubq)
{
int i;
- if (!ublk_queue_ready(ubq))
+ if (!ubq->nr_io_ready)
return;
for (i = 0; i < ubq->q_depth; i++) {
struct ublk_io *io = &ubq->ios[i];
- if (io->flags & UBLK_IO_FLAG_ACTIVE)
+ if (io->flags & UBLK_IO_FLAG_ACTIVE) {
+ pr_devel("%s: done old cmd: qid %d tag %d\n",
+ __func__, ubq->q_id, i);
io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0);
+ io->flags &= ~UBLK_IO_FLAG_ACTIVE;
+ ubq->nr_io_ready--;
+ }
}
/* all io commands are canceled */
- ubq->nr_io_ready = 0;
+ WARN_ON_ONCE(ubq->nr_io_ready);
}
/* Cancel all pending commands, must be called after del_gendisk() returns */
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH V3 3/7] ublk_drv: define macros for recovery feature and check them
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
2022-09-13 4:17 ` [PATCH V3 1/7] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
2022-09-13 4:17 ` [PATCH V3 2/7] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-20 5:04 ` Ming Lei
2022-09-13 4:17 ` [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled ZiyangZhang
` (4 subsequent siblings)
7 siblings, 1 reply; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
Define some macros for recovery feature. Especially define a new state:
UBLK_S_DEV_QUIESCED which implies that ublk_device is quiesced
and is ready for recovery. This state can be observed by userspace.
UBLK_F_USER_RECOVERY implies that:
(1) ublk_drv enables recovery feature. It won't let monitor_work to
automatically abort rqs and release the device.
(2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to
userspace(ublksrv) before crash.
(3) With a dying ubq_daemon, in task work and ublk_queue_rq(),
ublk_drv requeues rqs.
UBLK_F_USER_RECOVERY_REISSUE implies that:
(1) everything UBLK_F_USER_RECOVERY implies except
(2) With a dying ubq_daemon, ublk_drv requeues rqs issued to
userspace(ublksrv) before crash.
UBLK_F_USER_RECOVERY_REISSUE is designed for backends which:
(1) tolerates double-writes because ublk_drv may issue the same rq
twice.
(2) does not let frontend users get I/O error. such as read-only FS
and VM backend.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
---
drivers/block/ublk_drv.c | 45 ++++++++++++++++++++++++++++++++++-
include/uapi/linux/ublk_cmd.h | 7 ++++++
2 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 0c6db0978ed0..23337bd7c105 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -49,7 +49,9 @@
/* All UBLK_F_* have to be included into UBLK_F_ALL */
#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
| UBLK_F_URING_CMD_COMP_IN_TASK \
- | UBLK_F_NEED_GET_DATA)
+ | UBLK_F_NEED_GET_DATA \
+ | UBLK_F_USER_RECOVERY \
+ | UBLK_F_USER_RECOVERY_REISSUE)
/* All UBLK_PARAM_TYPE_* should be included here */
#define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD)
@@ -323,6 +325,47 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
PAGE_SIZE);
}
+static inline bool ublk_queue_can_use_recovery(
+ struct ublk_queue *ubq)
+{
+ if (ubq->flags & UBLK_F_USER_RECOVERY)
+ return true;
+ return false;
+}
+
+static inline void ublk_disable_recovery(struct ublk_device *ub)
+{
+ int i;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ ubq->flags &= ~UBLK_F_USER_RECOVERY;
+ }
+}
+
+static inline bool ublk_can_use_recovery(struct ublk_device *ub)
+{
+ int i;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ if (!ublk_queue_can_use_recovery(ubq))
+ return false;
+ }
+ return true;
+}
+
+static inline bool ublk_queue_can_use_recovery_reissue(
+ struct ublk_queue *ubq)
+{
+ if (ublk_queue_can_use_recovery(ubq) &&
+ (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE))
+ return true;
+ return false;
+}
+
static void ublk_free_disk(struct gendisk *disk)
{
struct ublk_device *ub = disk->private_data;
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 677edaab2b66..87204c39f1ee 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -17,6 +17,8 @@
#define UBLK_CMD_STOP_DEV 0x07
#define UBLK_CMD_SET_PARAMS 0x08
#define UBLK_CMD_GET_PARAMS 0x09
+#define UBLK_CMD_START_USER_RECOVERY 0x10
+#define UBLK_CMD_END_USER_RECOVERY 0x11
/*
* IO commands, issued by ublk server, and handled by ublk driver.
@@ -74,9 +76,14 @@
*/
#define UBLK_F_NEED_GET_DATA (1UL << 2)
+#define UBLK_F_USER_RECOVERY (1UL << 3)
+
+#define UBLK_F_USER_RECOVERY_REISSUE (1UL << 4)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
+#define UBLK_S_DEV_QUIESCED 2
/* shipped via sqe->cmd of io_uring command */
struct ublksrv_ctrl_cmd {
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH V3 3/7] ublk_drv: define macros for recovery feature and check them
2022-09-13 4:17 ` [PATCH V3 3/7] ublk_drv: define macros for recovery feature and check them ZiyangZhang
@ 2022-09-20 5:04 ` Ming Lei
0 siblings, 0 replies; 33+ messages in thread
From: Ming Lei @ 2022-09-20 5:04 UTC (permalink / raw)
To: ZiyangZhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 13, 2022 at 12:17:03PM +0800, ZiyangZhang wrote:
> Define some macros for recovery feature. Especially define a new state:
> UBLK_S_DEV_QUIESCED which implies that ublk_device is quiesced
> and is ready for recovery. This state can be observed by userspace.
>
> UBLK_F_USER_RECOVERY implies that:
> (1) ublk_drv enables recovery feature. It won't let monitor_work to
> automatically abort rqs and release the device.
> (2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to
> userspace(ublksrv) before crash.
> (3) With a dying ubq_daemon, in task work and ublk_queue_rq(),
> ublk_drv requeues rqs.
>
> UBLK_F_USER_RECOVERY_REISSUE implies that:
> (1) everything UBLK_F_USER_RECOVERY implies except
> (2) With a dying ubq_daemon, ublk_drv requeues rqs issued to
> userspace(ublksrv) before crash.
>
> UBLK_F_USER_RECOVERY_REISSUE is designed for backends which:
> (1) tolerates double-writes because ublk_drv may issue the same rq
> twice.
> (2) does not let frontend users get I/O error. such as read-only FS
> and VM backend.
>
> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> ---
> drivers/block/ublk_drv.c | 45 ++++++++++++++++++++++++++++++++++-
> include/uapi/linux/ublk_cmd.h | 7 ++++++
> 2 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 0c6db0978ed0..23337bd7c105 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -49,7 +49,9 @@
> /* All UBLK_F_* have to be included into UBLK_F_ALL */
> #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
> | UBLK_F_URING_CMD_COMP_IN_TASK \
> - | UBLK_F_NEED_GET_DATA)
> + | UBLK_F_NEED_GET_DATA \
> + | UBLK_F_USER_RECOVERY \
> + | UBLK_F_USER_RECOVERY_REISSUE)
>
> /* All UBLK_PARAM_TYPE_* should be included here */
> #define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD)
> @@ -323,6 +325,47 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
> PAGE_SIZE);
> }
>
> +static inline bool ublk_queue_can_use_recovery(
> + struct ublk_queue *ubq)
> +{
> + if (ubq->flags & UBLK_F_USER_RECOVERY)
> + return true;
> + return false;
> +}
> +
> +static inline void ublk_disable_recovery(struct ublk_device *ub)
> +{
> + int i;
> +
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + ubq->flags &= ~UBLK_F_USER_RECOVERY;
> + }
> +}
Flags is supposed to not changed, especially ublk_disable_recovery
isn't necessary with my suggestion in the following link:
https://lore.kernel.org/linux-block/YylEjEply6y+bs0B@T590/T/#u
> +
> +static inline bool ublk_can_use_recovery(struct ublk_device *ub)
> +{
> + int i;
> +
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + if (!ublk_queue_can_use_recovery(ubq))
> + return false;
> + }
> + return true;
> +}
The above is too tricky, why can't check ub->dev_info &
UBLK_F_USER_RECOVERY directly?
> +
> +static inline bool ublk_queue_can_use_recovery_reissue(
> + struct ublk_queue *ubq)
> +{
> + if (ublk_queue_can_use_recovery(ubq) &&
> + (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE))
> + return true;
> + return false;
> +}
> +
> static void ublk_free_disk(struct gendisk *disk)
> {
> struct ublk_device *ub = disk->private_data;
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index 677edaab2b66..87204c39f1ee 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -17,6 +17,8 @@
> #define UBLK_CMD_STOP_DEV 0x07
> #define UBLK_CMD_SET_PARAMS 0x08
> #define UBLK_CMD_GET_PARAMS 0x09
> +#define UBLK_CMD_START_USER_RECOVERY 0x10
> +#define UBLK_CMD_END_USER_RECOVERY 0x11
>
> /*
> * IO commands, issued by ublk server, and handled by ublk driver.
> @@ -74,9 +76,14 @@
> */
> #define UBLK_F_NEED_GET_DATA (1UL << 2)
>
> +#define UBLK_F_USER_RECOVERY (1UL << 3)
> +
> +#define UBLK_F_USER_RECOVERY_REISSUE (1UL << 4)
The above are two features. I'd suggest to add UBLK_F_USER_RECOVERY
and its implementation first, then add one delta patch for supporting
the new feature of UBLK_F_USER_RECOVERY_REISSUE.
Not only it is more helpful for reviewing, but also easier to understand
the two's difference.
thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
` (2 preceding siblings ...)
2022-09-13 4:17 ` [PATCH V3 3/7] ublk_drv: define macros for recovery feature and check them ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-19 3:55 ` Ming Lei
2022-09-13 4:17 ` [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism ZiyangZhang
` (3 subsequent siblings)
7 siblings, 1 reply; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
With recovery feature enabled, in ublk_queue_rq or task work
(in exit_task_work or fallback wq), we requeue rqs instead of
ending(aborting) them. Besides, No matter recovery feature is enabled
or disabled, we schedule monitor_work immediately.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
---
drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 23337bd7c105..b067f33a1913 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
#define UBLK_REQUEUE_DELAY_MS 3
+static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
+ struct request *rq)
+{
+ pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
+ (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
+ ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
+ /* We cannot process this rq so just requeue it. */
+ if (ublk_queue_can_use_recovery(ubq)) {
+ blk_mq_requeue_request(rq, false);
+ blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
+ } else {
+ blk_mq_end_request(rq, BLK_STS_IOERR);
+ }
+}
+
static inline void __ublk_rq_task_work(struct request *req)
{
struct ublk_queue *ubq = req->mq_hctx->driver_data;
@@ -704,7 +719,7 @@ static inline void __ublk_rq_task_work(struct request *req)
* (2) current->flags & PF_EXITING.
*/
if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
- blk_mq_end_request(req, BLK_STS_IOERR);
+ __ublk_abort_rq_in_task_work(ubq, req);
mod_delayed_work(system_wq, &ub->monitor_work, 0);
return;
}
@@ -779,6 +794,21 @@ static void ublk_rq_task_work_fn(struct callback_head *work)
__ublk_rq_task_work(req);
}
+static inline blk_status_t __ublk_abort_rq(struct ublk_queue *ubq,
+ struct request *rq)
+{
+ pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
+ (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
+ ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
+ /* We cannot process this rq so just requeue it. */
+ if (ublk_queue_can_use_recovery(ubq)) {
+ blk_mq_requeue_request(rq, false);
+ blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
+ return BLK_STS_OK;
+ }
+ return BLK_STS_IOERR;
+}
+
static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
@@ -796,7 +826,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
if (unlikely(ubq_daemon_is_dying(ubq))) {
fail:
mod_delayed_work(system_wq, &ubq->dev->monitor_work, 0);
- return BLK_STS_IOERR;
+ return __ublk_abort_rq(ubq, rq);
}
if (ublk_can_use_task_work(ubq)) {
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-13 4:17 ` [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled ZiyangZhang
@ 2022-09-19 3:55 ` Ming Lei
2022-09-19 9:12 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-19 3:55 UTC (permalink / raw)
To: ZiyangZhang
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ming.lei
On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
> With recovery feature enabled, in ublk_queue_rq or task work
> (in exit_task_work or fallback wq), we requeue rqs instead of
> ending(aborting) them. Besides, No matter recovery feature is enabled
> or disabled, we schedule monitor_work immediately.
>
> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> ---
> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 23337bd7c105..b067f33a1913 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
>
> #define UBLK_REQUEUE_DELAY_MS 3
>
> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> + struct request *rq)
> +{
> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> + /* We cannot process this rq so just requeue it. */
> + if (ublk_queue_can_use_recovery(ubq)) {
> + blk_mq_requeue_request(rq, false);
> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
Here you needn't to kick requeue list since we know it can't make
progress. And you can do that once before deleting gendisk
or the queue is recovered.
> + } else {
> + blk_mq_end_request(rq, BLK_STS_IOERR);
> + }
> +}
> +
> static inline void __ublk_rq_task_work(struct request *req)
> {
> struct ublk_queue *ubq = req->mq_hctx->driver_data;
> @@ -704,7 +719,7 @@ static inline void __ublk_rq_task_work(struct request *req)
> * (2) current->flags & PF_EXITING.
> */
> if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
> - blk_mq_end_request(req, BLK_STS_IOERR);
> + __ublk_abort_rq_in_task_work(ubq, req);
> mod_delayed_work(system_wq, &ub->monitor_work, 0);
> return;
> }
> @@ -779,6 +794,21 @@ static void ublk_rq_task_work_fn(struct callback_head *work)
> __ublk_rq_task_work(req);
> }
>
> +static inline blk_status_t __ublk_abort_rq(struct ublk_queue *ubq,
> + struct request *rq)
> +{
> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> + /* We cannot process this rq so just requeue it. */
> + if (ublk_queue_can_use_recovery(ubq)) {
> + blk_mq_requeue_request(rq, false);
> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
Same with above.
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-19 3:55 ` Ming Lei
@ 2022-09-19 9:12 ` Ziyang Zhang
2022-09-19 12:39 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-19 9:12 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/19 11:55, Ming Lei wrote:
> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
>> With recovery feature enabled, in ublk_queue_rq or task work
>> (in exit_task_work or fallback wq), we requeue rqs instead of
>> ending(aborting) them. Besides, No matter recovery feature is enabled
>> or disabled, we schedule monitor_work immediately.
>>
>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
>> ---
>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>> index 23337bd7c105..b067f33a1913 100644
>> --- a/drivers/block/ublk_drv.c
>> +++ b/drivers/block/ublk_drv.c
>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
>>
>> #define UBLK_REQUEUE_DELAY_MS 3
>>
>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
>> + struct request *rq)
>> +{
>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
>> + /* We cannot process this rq so just requeue it. */
>> + if (ublk_queue_can_use_recovery(ubq)) {
>> + blk_mq_requeue_request(rq, false);
>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
>
> Here you needn't to kick requeue list since we know it can't make
> progress. And you can do that once before deleting gendisk
> or the queue is recovered.
No, kicking rq here is necessary.
Consider USER_RECOVERY is enabled and everything goes well.
User sends STOP_DEV, and we have kicked requeue list in
ublk_stop_dev() and are going to call del_gendisk().
However, a crash happens now. Then rqs may be still requeued
by ublk_queue_rq() because ublk_queue_rq() sees a dying
ubq_daemon. So del_gendisk() will hang because there are
rqs leaving in requeue list and no one kicks them.
BTW, kicking requeue list after requeue rqs is really harmless
since we schedule quiesce_work immediately after finding a
dying ubq_daemon. So few rqs have chance to be re-dispatched.
>
>> + } else {
>> + blk_mq_end_request(rq, BLK_STS_IOERR);
>> + }
>> +}
>> +
>> static inline void __ublk_rq_task_work(struct request *req)
>> {
>> struct ublk_queue *ubq = req->mq_hctx->driver_data;
>> @@ -704,7 +719,7 @@ static inline void __ublk_rq_task_work(struct request *req)
>> * (2) current->flags & PF_EXITING.
>> */
>> if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
>> - blk_mq_end_request(req, BLK_STS_IOERR);
>> + __ublk_abort_rq_in_task_work(ubq, req);
>> mod_delayed_work(system_wq, &ub->monitor_work, 0);
>> return;
>> }
>> @@ -779,6 +794,21 @@ static void ublk_rq_task_work_fn(struct callback_head *work)
>> __ublk_rq_task_work(req);
>> }
>>
>> +static inline blk_status_t __ublk_abort_rq(struct ublk_queue *ubq,
>> + struct request *rq)
>> +{
>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
>> + /* We cannot process this rq so just requeue it. */
>> + if (ublk_queue_can_use_recovery(ubq)) {
>> + blk_mq_requeue_request(rq, false);
>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
>
> Same with above.
>
>
> Thanks,
> Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-19 9:12 ` Ziyang Zhang
@ 2022-09-19 12:39 ` Ming Lei
2022-09-20 1:31 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-19 12:39 UTC (permalink / raw)
To: Ziyang Zhang
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ming.lei
On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
> On 2022/9/19 11:55, Ming Lei wrote:
> > On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
> >> With recovery feature enabled, in ublk_queue_rq or task work
> >> (in exit_task_work or fallback wq), we requeue rqs instead of
> >> ending(aborting) them. Besides, No matter recovery feature is enabled
> >> or disabled, we schedule monitor_work immediately.
> >>
> >> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> >> ---
> >> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
> >> 1 file changed, 32 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >> index 23337bd7c105..b067f33a1913 100644
> >> --- a/drivers/block/ublk_drv.c
> >> +++ b/drivers/block/ublk_drv.c
> >> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
> >>
> >> #define UBLK_REQUEUE_DELAY_MS 3
> >>
> >> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> >> + struct request *rq)
> >> +{
> >> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> >> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> >> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> >> + /* We cannot process this rq so just requeue it. */
> >> + if (ublk_queue_can_use_recovery(ubq)) {
> >> + blk_mq_requeue_request(rq, false);
> >> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
> >
> > Here you needn't to kick requeue list since we know it can't make
> > progress. And you can do that once before deleting gendisk
> > or the queue is recovered.
>
> No, kicking rq here is necessary.
>
> Consider USER_RECOVERY is enabled and everything goes well.
> User sends STOP_DEV, and we have kicked requeue list in
> ublk_stop_dev() and are going to call del_gendisk().
> However, a crash happens now. Then rqs may be still requeued
> by ublk_queue_rq() because ublk_queue_rq() sees a dying
> ubq_daemon. So del_gendisk() will hang because there are
> rqs leaving in requeue list and no one kicks them.
Why can't you kick requeue list before calling del_gendisk().
>
> BTW, kicking requeue list after requeue rqs is really harmless
> since we schedule quiesce_work immediately after finding a
> dying ubq_daemon. So few rqs have chance to be re-dispatched.
Do you think it makes sense to kick requeue list when the queue
can't handle any request?
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-19 12:39 ` Ming Lei
@ 2022-09-20 1:31 ` Ziyang Zhang
2022-09-20 2:39 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 1:31 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/19 20:39, Ming Lei wrote:
> On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
>> On 2022/9/19 11:55, Ming Lei wrote:
>>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
>>>> With recovery feature enabled, in ublk_queue_rq or task work
>>>> (in exit_task_work or fallback wq), we requeue rqs instead of
>>>> ending(aborting) them. Besides, No matter recovery feature is enabled
>>>> or disabled, we schedule monitor_work immediately.
>>>>
>>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
>>>> ---
>>>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>> index 23337bd7c105..b067f33a1913 100644
>>>> --- a/drivers/block/ublk_drv.c
>>>> +++ b/drivers/block/ublk_drv.c
>>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
>>>>
>>>> #define UBLK_REQUEUE_DELAY_MS 3
>>>>
>>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
>>>> + struct request *rq)
>>>> +{
>>>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
>>>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
>>>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
>>>> + /* We cannot process this rq so just requeue it. */
>>>> + if (ublk_queue_can_use_recovery(ubq)) {
>>>> + blk_mq_requeue_request(rq, false);
>>>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
>>>
>>> Here you needn't to kick requeue list since we know it can't make
>>> progress. And you can do that once before deleting gendisk
>>> or the queue is recovered.
>>
>> No, kicking rq here is necessary.
>>
>> Consider USER_RECOVERY is enabled and everything goes well.
>> User sends STOP_DEV, and we have kicked requeue list in
>> ublk_stop_dev() and are going to call del_gendisk().
>> However, a crash happens now. Then rqs may be still requeued
>> by ublk_queue_rq() because ublk_queue_rq() sees a dying
>> ubq_daemon. So del_gendisk() will hang because there are
>> rqs leaving in requeue list and no one kicks them.
>
> Why can't you kick requeue list before calling del_gendisk().
Yes, we can kick requeue list once before calling del_gendisk().
But a crash may happen just after kicking but before del_gendisk().
So some rqs may be requeued at this moment. But we have already
kicked the requeue list! Then del_gendisk() will hang, right?
>
>>
>> BTW, kicking requeue list after requeue rqs is really harmless
>> since we schedule quiesce_work immediately after finding a
>> dying ubq_daemon. So few rqs have chance to be re-dispatched.
>
> Do you think it makes sense to kick requeue list when the queue
> can't handle any request?
I know it does not make sense while ubq_daemon is dying, but it is good
for handling the situation I discribed before.
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-20 1:31 ` Ziyang Zhang
@ 2022-09-20 2:39 ` Ming Lei
2022-09-20 3:04 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-20 2:39 UTC (permalink / raw)
To: Ziyang Zhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 20, 2022 at 09:31:54AM +0800, Ziyang Zhang wrote:
> On 2022/9/19 20:39, Ming Lei wrote:
> > On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
> >> On 2022/9/19 11:55, Ming Lei wrote:
> >>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
> >>>> With recovery feature enabled, in ublk_queue_rq or task work
> >>>> (in exit_task_work or fallback wq), we requeue rqs instead of
> >>>> ending(aborting) them. Besides, No matter recovery feature is enabled
> >>>> or disabled, we schedule monitor_work immediately.
> >>>>
> >>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> >>>> ---
> >>>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
> >>>> 1 file changed, 32 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >>>> index 23337bd7c105..b067f33a1913 100644
> >>>> --- a/drivers/block/ublk_drv.c
> >>>> +++ b/drivers/block/ublk_drv.c
> >>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
> >>>>
> >>>> #define UBLK_REQUEUE_DELAY_MS 3
> >>>>
> >>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> >>>> + struct request *rq)
> >>>> +{
> >>>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> >>>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> >>>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> >>>> + /* We cannot process this rq so just requeue it. */
> >>>> + if (ublk_queue_can_use_recovery(ubq)) {
> >>>> + blk_mq_requeue_request(rq, false);
> >>>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
> >>>
> >>> Here you needn't to kick requeue list since we know it can't make
> >>> progress. And you can do that once before deleting gendisk
> >>> or the queue is recovered.
> >>
> >> No, kicking rq here is necessary.
> >>
> >> Consider USER_RECOVERY is enabled and everything goes well.
> >> User sends STOP_DEV, and we have kicked requeue list in
> >> ublk_stop_dev() and are going to call del_gendisk().
> >> However, a crash happens now. Then rqs may be still requeued
> >> by ublk_queue_rq() because ublk_queue_rq() sees a dying
> >> ubq_daemon. So del_gendisk() will hang because there are
> >> rqs leaving in requeue list and no one kicks them.
> >
> > Why can't you kick requeue list before calling del_gendisk().
>
> Yes, we can kick requeue list once before calling del_gendisk().
> But a crash may happen just after kicking but before del_gendisk().
> So some rqs may be requeued at this moment. But we have already
> kicked the requeue list! Then del_gendisk() will hang, right?
->force_abort is set before kicking in ublk_unquiesce_dev(), so
all new requests are failed immediately instead of being requeued,
right?
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-20 2:39 ` Ming Lei
@ 2022-09-20 3:04 ` Ziyang Zhang
2022-09-20 3:18 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 3:04 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 10:39, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 09:31:54AM +0800, Ziyang Zhang wrote:
>> On 2022/9/19 20:39, Ming Lei wrote:
>>> On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
>>>> On 2022/9/19 11:55, Ming Lei wrote:
>>>>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
>>>>>> With recovery feature enabled, in ublk_queue_rq or task work
>>>>>> (in exit_task_work or fallback wq), we requeue rqs instead of
>>>>>> ending(aborting) them. Besides, No matter recovery feature is enabled
>>>>>> or disabled, we schedule monitor_work immediately.
>>>>>>
>>>>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
>>>>>> ---
>>>>>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>>>> index 23337bd7c105..b067f33a1913 100644
>>>>>> --- a/drivers/block/ublk_drv.c
>>>>>> +++ b/drivers/block/ublk_drv.c
>>>>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
>>>>>>
>>>>>> #define UBLK_REQUEUE_DELAY_MS 3
>>>>>>
>>>>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
>>>>>> + struct request *rq)
>>>>>> +{
>>>>>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
>>>>>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
>>>>>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
>>>>>> + /* We cannot process this rq so just requeue it. */
>>>>>> + if (ublk_queue_can_use_recovery(ubq)) {
>>>>>> + blk_mq_requeue_request(rq, false);
>>>>>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
>>>>>
>>>>> Here you needn't to kick requeue list since we know it can't make
>>>>> progress. And you can do that once before deleting gendisk
>>>>> or the queue is recovered.
>>>>
>>>> No, kicking rq here is necessary.
>>>>
>>>> Consider USER_RECOVERY is enabled and everything goes well.
>>>> User sends STOP_DEV, and we have kicked requeue list in
>>>> ublk_stop_dev() and are going to call del_gendisk().
>>>> However, a crash happens now. Then rqs may be still requeued
>>>> by ublk_queue_rq() because ublk_queue_rq() sees a dying
>>>> ubq_daemon. So del_gendisk() will hang because there are
>>>> rqs leaving in requeue list and no one kicks them.
>>>
>>> Why can't you kick requeue list before calling del_gendisk().
>>
>> Yes, we can kick requeue list once before calling del_gendisk().
>> But a crash may happen just after kicking but before del_gendisk().
>> So some rqs may be requeued at this moment. But we have already
>> kicked the requeue list! Then del_gendisk() will hang, right?
>
> ->force_abort is set before kicking in ublk_unquiesce_dev(), so
> all new requests are failed immediately instead of being requeued,
> right?
>
->force_abort is not heplful here because there may be fallback wq running
which can requeue rqs after kicking requeue list.
In ublk_unquiesce_dev(), I simply disable recovery feature
if ub's state is UBLK_S_DEV_LIVE while stopping ublk_dev.
Note: We can make sure fallback wq does not run if we wait until all rqs with
ACTIVE flag set are requeued. This is done in quiesce_work(). But it cannot
run while ublk_stop_dev() is running...
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-20 3:04 ` Ziyang Zhang
@ 2022-09-20 3:18 ` Ming Lei
2022-09-20 3:34 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-20 3:18 UTC (permalink / raw)
To: Ziyang Zhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 20, 2022 at 11:04:30AM +0800, Ziyang Zhang wrote:
> On 2022/9/20 10:39, Ming Lei wrote:
> > On Tue, Sep 20, 2022 at 09:31:54AM +0800, Ziyang Zhang wrote:
> >> On 2022/9/19 20:39, Ming Lei wrote:
> >>> On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
> >>>> On 2022/9/19 11:55, Ming Lei wrote:
> >>>>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
> >>>>>> With recovery feature enabled, in ublk_queue_rq or task work
> >>>>>> (in exit_task_work or fallback wq), we requeue rqs instead of
> >>>>>> ending(aborting) them. Besides, No matter recovery feature is enabled
> >>>>>> or disabled, we schedule monitor_work immediately.
> >>>>>>
> >>>>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> >>>>>> ---
> >>>>>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
> >>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >>>>>> index 23337bd7c105..b067f33a1913 100644
> >>>>>> --- a/drivers/block/ublk_drv.c
> >>>>>> +++ b/drivers/block/ublk_drv.c
> >>>>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
> >>>>>>
> >>>>>> #define UBLK_REQUEUE_DELAY_MS 3
> >>>>>>
> >>>>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> >>>>>> + struct request *rq)
> >>>>>> +{
> >>>>>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> >>>>>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> >>>>>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> >>>>>> + /* We cannot process this rq so just requeue it. */
> >>>>>> + if (ublk_queue_can_use_recovery(ubq)) {
> >>>>>> + blk_mq_requeue_request(rq, false);
> >>>>>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
> >>>>>
> >>>>> Here you needn't to kick requeue list since we know it can't make
> >>>>> progress. And you can do that once before deleting gendisk
> >>>>> or the queue is recovered.
> >>>>
> >>>> No, kicking rq here is necessary.
> >>>>
> >>>> Consider USER_RECOVERY is enabled and everything goes well.
> >>>> User sends STOP_DEV, and we have kicked requeue list in
> >>>> ublk_stop_dev() and are going to call del_gendisk().
> >>>> However, a crash happens now. Then rqs may be still requeued
> >>>> by ublk_queue_rq() because ublk_queue_rq() sees a dying
> >>>> ubq_daemon. So del_gendisk() will hang because there are
> >>>> rqs leaving in requeue list and no one kicks them.
> >>>
> >>> Why can't you kick requeue list before calling del_gendisk().
> >>
> >> Yes, we can kick requeue list once before calling del_gendisk().
> >> But a crash may happen just after kicking but before del_gendisk().
> >> So some rqs may be requeued at this moment. But we have already
> >> kicked the requeue list! Then del_gendisk() will hang, right?
> >
> > ->force_abort is set before kicking in ublk_unquiesce_dev(), so
> > all new requests are failed immediately instead of being requeued,
> > right?
> >
>
> ->force_abort is not heplful here because there may be fallback wq running
> which can requeue rqs after kicking requeue list.
After ublk_wait_tagset_rqs_idle() returns, there can't be any
pending requests in fallback wq or task work, can there?
Please look at the 2nd point of the comment log, and I did ask you
to add the words for fallback wq and task work.
>
> In ublk_unquiesce_dev(), I simply disable recovery feature
> if ub's state is UBLK_S_DEV_LIVE while stopping ublk_dev.
monitor work will provide forward progress in case of not applying
user recovery.
>
> Note: We can make sure fallback wq does not run if we wait until all rqs with
> ACTIVE flag set are requeued. This is done in quiesce_work(). But it cannot
> run while ublk_stop_dev() is running...
Please take a look at the delta patch I just sent, which is supposed to be
simpler for addressing this corner case.
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-20 3:18 ` Ming Lei
@ 2022-09-20 3:34 ` Ziyang Zhang
2022-09-20 4:41 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 3:34 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 11:18, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 11:04:30AM +0800, Ziyang Zhang wrote:
>> On 2022/9/20 10:39, Ming Lei wrote:
>>> On Tue, Sep 20, 2022 at 09:31:54AM +0800, Ziyang Zhang wrote:
>>>> On 2022/9/19 20:39, Ming Lei wrote:
>>>>> On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
>>>>>> On 2022/9/19 11:55, Ming Lei wrote:
>>>>>>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
>>>>>>>> With recovery feature enabled, in ublk_queue_rq or task work
>>>>>>>> (in exit_task_work or fallback wq), we requeue rqs instead of
>>>>>>>> ending(aborting) them. Besides, No matter recovery feature is enabled
>>>>>>>> or disabled, we schedule monitor_work immediately.
>>>>>>>>
>>>>>>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
>>>>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>>>>>> index 23337bd7c105..b067f33a1913 100644
>>>>>>>> --- a/drivers/block/ublk_drv.c
>>>>>>>> +++ b/drivers/block/ublk_drv.c
>>>>>>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
>>>>>>>>
>>>>>>>> #define UBLK_REQUEUE_DELAY_MS 3
>>>>>>>>
>>>>>>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
>>>>>>>> + struct request *rq)
>>>>>>>> +{
>>>>>>>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
>>>>>>>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
>>>>>>>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
>>>>>>>> + /* We cannot process this rq so just requeue it. */
>>>>>>>> + if (ublk_queue_can_use_recovery(ubq)) {
>>>>>>>> + blk_mq_requeue_request(rq, false);
>>>>>>>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
>>>>>>>
>>>>>>> Here you needn't to kick requeue list since we know it can't make
>>>>>>> progress. And you can do that once before deleting gendisk
>>>>>>> or the queue is recovered.
>>>>>>
>>>>>> No, kicking rq here is necessary.
>>>>>>
>>>>>> Consider USER_RECOVERY is enabled and everything goes well.
>>>>>> User sends STOP_DEV, and we have kicked requeue list in
>>>>>> ublk_stop_dev() and are going to call del_gendisk().
>>>>>> However, a crash happens now. Then rqs may be still requeued
>>>>>> by ublk_queue_rq() because ublk_queue_rq() sees a dying
>>>>>> ubq_daemon. So del_gendisk() will hang because there are
>>>>>> rqs leaving in requeue list and no one kicks them.
>>>>>
>>>>> Why can't you kick requeue list before calling del_gendisk().
>>>>
>>>> Yes, we can kick requeue list once before calling del_gendisk().
>>>> But a crash may happen just after kicking but before del_gendisk().
>>>> So some rqs may be requeued at this moment. But we have already
>>>> kicked the requeue list! Then del_gendisk() will hang, right?
>>>
>>> ->force_abort is set before kicking in ublk_unquiesce_dev(), so
>>> all new requests are failed immediately instead of being requeued,
>>> right?
>>>
>>
>> ->force_abort is not heplful here because there may be fallback wq running
>> which can requeue rqs after kicking requeue list.
>
> After ublk_wait_tagset_rqs_idle() returns, there can't be any
> pending requests in fallback wq or task work, can there
Please consider this case: a crash happens while ublk_stop_dev() is
calling. In such case I cannot schedule quiesce_work or call
ublk_wait_tagset_rqs_idle(). This is because quiesce_work has to
accquire ub_mutex to quiesce request queue.
>
> Please look at the 2nd point of the comment log, and I did ask you
> to add the words for fallback wq and task work.
>
>>
>> In ublk_unquiesce_dev(), I simply disable recovery feature
>> if ub's state is UBLK_S_DEV_LIVE while stopping ublk_dev.
>
> monitor work will provide forward progress in case of not applying
> user recovery.
Yes, that's why I disable recovery feature in ublk_stop_dev if quiesce_work
has not run. In this case nonitor_work can abort rqs.
>
>>
>> Note: We can make sure fallback wq does not run if we wait until all rqs with
>> ACTIVE flag set are requeued. This is done in quiesce_work(). But it cannot
>> run while ublk_stop_dev() is running...
>
> Please take a look at the delta patch I just sent, which is supposed to be
> simpler for addressing this corner case.
I saw your patch, it is for rqs with ACTIVE unset, but I am concerning on rqs
with ACTIVE set.
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
2022-09-20 3:34 ` Ziyang Zhang
@ 2022-09-20 4:41 ` Ming Lei
0 siblings, 0 replies; 33+ messages in thread
From: Ming Lei @ 2022-09-20 4:41 UTC (permalink / raw)
To: Ziyang Zhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 20, 2022 at 11:34:32AM +0800, Ziyang Zhang wrote:
> On 2022/9/20 11:18, Ming Lei wrote:
> > On Tue, Sep 20, 2022 at 11:04:30AM +0800, Ziyang Zhang wrote:
> >> On 2022/9/20 10:39, Ming Lei wrote:
> >>> On Tue, Sep 20, 2022 at 09:31:54AM +0800, Ziyang Zhang wrote:
> >>>> On 2022/9/19 20:39, Ming Lei wrote:
> >>>>> On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
> >>>>>> On 2022/9/19 11:55, Ming Lei wrote:
> >>>>>>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
> >>>>>>>> With recovery feature enabled, in ublk_queue_rq or task work
> >>>>>>>> (in exit_task_work or fallback wq), we requeue rqs instead of
> >>>>>>>> ending(aborting) them. Besides, No matter recovery feature is enabled
> >>>>>>>> or disabled, we schedule monitor_work immediately.
> >>>>>>>>
> >>>>>>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> >>>>>>>> ---
> >>>>>>>> drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
> >>>>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >>>>>>>> index 23337bd7c105..b067f33a1913 100644
> >>>>>>>> --- a/drivers/block/ublk_drv.c
> >>>>>>>> +++ b/drivers/block/ublk_drv.c
> >>>>>>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
> >>>>>>>>
> >>>>>>>> #define UBLK_REQUEUE_DELAY_MS 3
> >>>>>>>>
> >>>>>>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> >>>>>>>> + struct request *rq)
> >>>>>>>> +{
> >>>>>>>> + pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> >>>>>>>> + (ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> >>>>>>>> + ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> >>>>>>>> + /* We cannot process this rq so just requeue it. */
> >>>>>>>> + if (ublk_queue_can_use_recovery(ubq)) {
> >>>>>>>> + blk_mq_requeue_request(rq, false);
> >>>>>>>> + blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
> >>>>>>>
> >>>>>>> Here you needn't to kick requeue list since we know it can't make
> >>>>>>> progress. And you can do that once before deleting gendisk
> >>>>>>> or the queue is recovered.
> >>>>>>
> >>>>>> No, kicking rq here is necessary.
> >>>>>>
> >>>>>> Consider USER_RECOVERY is enabled and everything goes well.
> >>>>>> User sends STOP_DEV, and we have kicked requeue list in
> >>>>>> ublk_stop_dev() and are going to call del_gendisk().
> >>>>>> However, a crash happens now. Then rqs may be still requeued
> >>>>>> by ublk_queue_rq() because ublk_queue_rq() sees a dying
> >>>>>> ubq_daemon. So del_gendisk() will hang because there are
> >>>>>> rqs leaving in requeue list and no one kicks them.
> >>>>>
> >>>>> Why can't you kick requeue list before calling del_gendisk().
> >>>>
> >>>> Yes, we can kick requeue list once before calling del_gendisk().
> >>>> But a crash may happen just after kicking but before del_gendisk().
> >>>> So some rqs may be requeued at this moment. But we have already
> >>>> kicked the requeue list! Then del_gendisk() will hang, right?
> >>>
> >>> ->force_abort is set before kicking in ublk_unquiesce_dev(), so
> >>> all new requests are failed immediately instead of being requeued,
> >>> right?
> >>>
> >>
> >> ->force_abort is not heplful here because there may be fallback wq running
> >> which can requeue rqs after kicking requeue list.
> >
> > After ublk_wait_tagset_rqs_idle() returns, there can't be any
> > pending requests in fallback wq or task work, can there
> Please consider this case: a crash happens while ublk_stop_dev() is
> calling. In such case I cannot schedule quiesce_work or call
> ublk_wait_tagset_rqs_idle(). This is because quiesce_work has to
> accquire ub_mutex to quiesce request queue.
The issue can be addressed in the following way more reliably &
cleanly & consistently, then you needn't to switch between the
two modes.
ublk_stop_dev()
if (ublk_can_use_recovery(ub)) {
if (ub->dev_info.state == UBLK_S_DEV_LIVE)
__ublk_quiesce_dev(ub); //lockless version
ublk_unquiesce_dev();
}
Meantime not necessary to disable recovery feature in ublk_unquiesce_dev
any more.
thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
` (3 preceding siblings ...)
2022-09-13 4:17 ` [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-19 9:32 ` Ming Lei
2022-09-13 4:17 ` [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
` (2 subsequent siblings)
7 siblings, 1 reply; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
With USER_RECOVERY feature enabled, the monitor_work schedules
quiesce_work after finding a dying ubq_daemon. The quiesce_work's job
is to:
(1) quiesce request queue.
(2) check if there is any INFLIGHT rq with UBLK_IO_FLAG_ACTIVE unset.
If so, we retry until all these rqs are requeued by ublk_queue_rq()
and task_work and become IDLE.
(3) requeue/abort inflight rqs issued to the crash ubq_daemon before. If
UBLK_F_USER_RECOVERY_REISSUE is set, rq is requeued; or it is
aborted.
(4) complete all ioucmds by calling io_uring_cmd_done(). We are safe to
do so because no ioucmd can be referenced now.
(5) set ub's state to UBLK_S_DEV_QUIESCED, which means we are ready for
recovery. This state is exposed to userspace by GET_DEV_INFO.
The driver can always handle STOP_DEV and cleanup everything no matter
ub's state is LIVE or QUIESCED. After ub's state is UBLK_S_DEV_QUIESCED,
user can recover with new process by sending START_USER_RECOVERY.
Note: we do not change the default behavior with reocvery feature
disabled. monitor_work still schedules stop_work and abort inflight
rqs. Finally ublk_device is released.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
---
drivers/block/ublk_drv.c | 168 +++++++++++++++++++++++++++++++++++++--
1 file changed, 161 insertions(+), 7 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index b067f33a1913..4409a130d0b6 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -121,7 +121,7 @@ struct ublk_queue {
unsigned long io_addr; /* mapped vm address */
unsigned int max_io_sz;
- bool abort_work_pending;
+ bool force_abort;
unsigned short nr_io_ready; /* how many ios setup */
struct ublk_device *dev;
struct ublk_io ios[0];
@@ -163,6 +163,7 @@ struct ublk_device {
* monitor each queue's daemon periodically
*/
struct delayed_work monitor_work;
+ struct work_struct quiesce_work;
struct work_struct stop_work;
};
@@ -660,6 +661,11 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
+ pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
+ __func__,
+ ((struct ublk_queue *)req->mq_hctx->driver_data)->q_id,
+ req->tag,
+ io->flags);
io->flags |= UBLK_IO_FLAG_ABORTED;
blk_mq_end_request(req, BLK_STS_IOERR);
}
@@ -820,6 +826,21 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
res = ublk_setup_iod(ubq, rq);
if (unlikely(res != BLK_STS_OK))
return BLK_STS_IOERR;
+ /* With recovery feature enabled, force_abort is set in
+ * ublk_stop_dev() before calling del_gendisk() if ub's state
+ * is QUIESCED. We have to abort all requeued and new rqs here
+ * to let del_gendisk() move on. Besides, we do not call
+ * io_uring_cmd_complete_in_task() to avoid UAF on io_uring ctx.
+ *
+ * Note: force_abort is guaranteed to be seen because it is set
+ * before request queue is unqiuesced.
+ */
+ if (unlikely(ubq->force_abort)) {
+ pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
+ __func__, ubq->q_id, rq->tag,
+ ubq->ios[rq->tag].flags);
+ return BLK_STS_IOERR;
+ }
blk_mq_start_request(bd->rq);
@@ -1003,6 +1024,101 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
ublk_put_device(ub);
}
+static bool ublk_check_inflight_rq(struct request *rq, void *data)
+{
+ struct ublk_queue *ubq = rq->mq_hctx->driver_data;
+ struct ublk_io *io = &ubq->ios[rq->tag];
+ bool *busy = data;
+
+ if (io->flags & UBLK_IO_FLAG_ACTIVE) {
+ *busy = true;
+ return false;
+ }
+ return true;
+}
+
+static void ublk_wait_tagset_rqs_idle(struct ublk_device *ub)
+{
+ bool busy = false;
+
+ WARN_ON_ONCE(!blk_queue_quiesced(ub->ub_disk->queue));
+ while (true) {
+ blk_mq_tagset_busy_iter(&ub->tag_set,
+ ublk_check_inflight_rq, &busy);
+ if (busy)
+ msleep(UBLK_REQUEUE_DELAY_MS);
+ else
+ break;
+ }
+}
+
+static void ublk_quiesce_queue(struct ublk_device *ub,
+ struct ublk_queue *ubq)
+{
+ int i;
+
+ for (i = 0; i < ubq->q_depth; i++) {
+ struct ublk_io *io = &ubq->ios[i];
+
+ if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
+ struct request *rq = blk_mq_tag_to_rq(
+ ub->tag_set.tags[ubq->q_id], i);
+
+ WARN_ON_ONCE(!rq);
+ pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
+ ublk_queue_can_use_recovery_reissue(ubq) ?
+ "requeue" : "abort",
+ ubq->q_id, i, io->flags);
+ if (ublk_queue_can_use_recovery_reissue(ubq))
+ blk_mq_requeue_request(rq, false);
+ else
+ __ublk_fail_req(io, rq);
+ } else {
+ pr_devel("%s: done old cmd: qid %d tag %d\n",
+ __func__, ubq->q_id, i);
+ io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0);
+ io->flags &= ~UBLK_IO_FLAG_ACTIVE;
+ }
+ ubq->nr_io_ready--;
+ }
+ WARN_ON_ONCE(ubq->nr_io_ready);
+}
+
+static void ublk_quiesce_dev(struct ublk_device *ub)
+{
+ int i;
+
+ mutex_lock(&ub->mutex);
+ if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+ goto unlock;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ if (!ubq_daemon_is_dying(ubq))
+ goto unlock;
+ }
+ blk_mq_quiesce_queue(ub->ub_disk->queue);
+ ublk_wait_tagset_rqs_idle(ub);
+ pr_devel("%s: quiesce ub: dev_id %d\n",
+ __func__, ub->dev_info.dev_id);
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_quiesce_queue(ub, ublk_get_queue(ub, i));
+
+ ub->dev_info.state = UBLK_S_DEV_QUIESCED;
+ unlock:
+ mutex_unlock(&ub->mutex);
+}
+
+static void ublk_quiesce_work_fn(struct work_struct *work)
+{
+ struct ublk_device *ub =
+ container_of(work, struct ublk_device, quiesce_work);
+
+ ublk_quiesce_dev(ub);
+}
+
static void ublk_daemon_monitor_work(struct work_struct *work)
{
struct ublk_device *ub =
@@ -1013,10 +1129,14 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
struct ublk_queue *ubq = ublk_get_queue(ub, i);
if (ubq_daemon_is_dying(ubq)) {
- schedule_work(&ub->stop_work);
-
- /* abort queue is for making forward progress */
- ublk_abort_queue(ub, ubq);
+ if (ublk_queue_can_use_recovery(ubq)) {
+ schedule_work(&ub->quiesce_work);
+ } else {
+ schedule_work(&ub->stop_work);
+
+ /* abort queue is for making forward progress */
+ ublk_abort_queue(ub, ubq);
+ }
}
}
@@ -1080,12 +1200,43 @@ static void ublk_cancel_dev(struct ublk_device *ub)
ublk_cancel_queue(ublk_get_queue(ub, i));
}
+static void ublk_unquiesce_dev(struct ublk_device *ub)
+{
+ int i;
+
+ pr_devel("%s: ub state %s\n", __func__,
+ ub->dev_info.state == UBLK_S_DEV_LIVE ?
+ "LIVE" : "QUIESCED");
+ if (ub->dev_info.state == UBLK_S_DEV_LIVE) {
+ /*
+ * quiesce_work cannot be running. We let monitor_work,
+ * ublk_queue_rq() and task_work abort rqs instead of
+ * requeuing them with a dying ubq_daemon. Then
+ * del_gendisk() can move on.
+ */
+ ublk_disable_recovery(ub);
+ } else {
+ /* quiesce_work has run. We let requeued rqs be aborted
+ * before running fallback_wq. "force_abort" must be seen
+ * after request queue is unqiuesced. Then del_gendisk()
+ * can move on.
+ */
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_get_queue(ub, i)->force_abort = true;
+
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ /* We may have requeued some rqs in ublk_quiesce_queue() */
+ blk_mq_kick_requeue_list(ub->ub_disk->queue);
+ }
+}
+
static void ublk_stop_dev(struct ublk_device *ub)
{
mutex_lock(&ub->mutex);
- if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+ if (ub->dev_info.state == UBLK_S_DEV_DEAD)
goto unlock;
-
+ if (ublk_can_use_recovery(ub))
+ ublk_unquiesce_dev(ub);
del_gendisk(ub->ub_disk);
ub->dev_info.state = UBLK_S_DEV_DEAD;
ub->dev_info.ublksrv_pid = -1;
@@ -1409,6 +1560,7 @@ static void ublk_remove(struct ublk_device *ub)
{
ublk_stop_dev(ub);
cancel_work_sync(&ub->stop_work);
+ cancel_work_sync(&ub->quiesce_work);
cdev_device_del(&ub->cdev, &ub->cdev_dev);
put_device(&ub->cdev_dev);
}
@@ -1585,6 +1737,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
goto out_unlock;
mutex_init(&ub->mutex);
spin_lock_init(&ub->mm_lock);
+ INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn);
INIT_WORK(&ub->stop_work, ublk_stop_work_fn);
INIT_DELAYED_WORK(&ub->monitor_work, ublk_daemon_monitor_work);
@@ -1705,6 +1858,7 @@ static int ublk_ctrl_stop_dev(struct io_uring_cmd *cmd)
ublk_stop_dev(ub);
cancel_work_sync(&ub->stop_work);
+ cancel_work_sync(&ub->quiesce_work);
ublk_put_device(ub);
return 0;
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-13 4:17 ` [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism ZiyangZhang
@ 2022-09-19 9:32 ` Ming Lei
2022-09-19 9:55 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-19 9:32 UTC (permalink / raw)
To: ZiyangZhang
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ming.lei
On Tue, Sep 13, 2022 at 12:17:05PM +0800, ZiyangZhang wrote:
> With USER_RECOVERY feature enabled, the monitor_work schedules
> quiesce_work after finding a dying ubq_daemon. The quiesce_work's job
> is to:
> (1) quiesce request queue.
> (2) check if there is any INFLIGHT rq with UBLK_IO_FLAG_ACTIVE unset.
> If so, we retry until all these rqs are requeued by ublk_queue_rq()
> and task_work and become IDLE.
These requests should be being handled by task work or the io_uring
fallback wq, and suggest to add the words here.
> (3) requeue/abort inflight rqs issued to the crash ubq_daemon before. If
> UBLK_F_USER_RECOVERY_REISSUE is set, rq is requeued; or it is
> aborted.
> (4) complete all ioucmds by calling io_uring_cmd_done(). We are safe to
> do so because no ioucmd can be referenced now.
> (5) set ub's state to UBLK_S_DEV_QUIESCED, which means we are ready for
> recovery. This state is exposed to userspace by GET_DEV_INFO.
>
> The driver can always handle STOP_DEV and cleanup everything no matter
> ub's state is LIVE or QUIESCED. After ub's state is UBLK_S_DEV_QUIESCED,
> user can recover with new process by sending START_USER_RECOVERY.
>
> Note: we do not change the default behavior with reocvery feature
> disabled. monitor_work still schedules stop_work and abort inflight
> rqs. Finally ublk_device is released.
This version looks much better than before.
>
> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> ---
> drivers/block/ublk_drv.c | 168 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 161 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index b067f33a1913..4409a130d0b6 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -121,7 +121,7 @@ struct ublk_queue {
>
> unsigned long io_addr; /* mapped vm address */
> unsigned int max_io_sz;
> - bool abort_work_pending;
> + bool force_abort;
> unsigned short nr_io_ready; /* how many ios setup */
> struct ublk_device *dev;
> struct ublk_io ios[0];
> @@ -163,6 +163,7 @@ struct ublk_device {
> * monitor each queue's daemon periodically
> */
> struct delayed_work monitor_work;
> + struct work_struct quiesce_work;
> struct work_struct stop_work;
> };
>
> @@ -660,6 +661,11 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
>
> if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
> + pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
> + __func__,
> + ((struct ublk_queue *)req->mq_hctx->driver_data)->q_id,
req->mq_hctx->queue_num is cleaner.
> + req->tag,
> + io->flags);
> io->flags |= UBLK_IO_FLAG_ABORTED;
> blk_mq_end_request(req, BLK_STS_IOERR);
> }
> @@ -820,6 +826,21 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> res = ublk_setup_iod(ubq, rq);
> if (unlikely(res != BLK_STS_OK))
> return BLK_STS_IOERR;
> + /* With recovery feature enabled, force_abort is set in
> + * ublk_stop_dev() before calling del_gendisk() if ub's state
> + * is QUIESCED. We have to abort all requeued and new rqs here
> + * to let del_gendisk() move on. Besides, we do not call
> + * io_uring_cmd_complete_in_task() to avoid UAF on io_uring ctx.
> + *
> + * Note: force_abort is guaranteed to be seen because it is set
> + * before request queue is unqiuesced.
> + */
> + if (unlikely(ubq->force_abort)) {
> + pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
> + __func__, ubq->q_id, rq->tag,
> + ubq->ios[rq->tag].flags);
> + return BLK_STS_IOERR;
> + }
>
> blk_mq_start_request(bd->rq);
>
> @@ -1003,6 +1024,101 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
> ublk_put_device(ub);
> }
>
> +static bool ublk_check_inflight_rq(struct request *rq, void *data)
> +{
> + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> + struct ublk_io *io = &ubq->ios[rq->tag];
> + bool *busy = data;
> +
> + if (io->flags & UBLK_IO_FLAG_ACTIVE) {
> + *busy = true;
> + return false;
> + }
> + return true;
> +}
> +
> +static void ublk_wait_tagset_rqs_idle(struct ublk_device *ub)
> +{
> + bool busy = false;
> +
> + WARN_ON_ONCE(!blk_queue_quiesced(ub->ub_disk->queue));
> + while (true) {
> + blk_mq_tagset_busy_iter(&ub->tag_set,
> + ublk_check_inflight_rq, &busy);
> + if (busy)
> + msleep(UBLK_REQUEUE_DELAY_MS);
> + else
> + break;
> + }
> +}
> +
> +static void ublk_quiesce_queue(struct ublk_device *ub,
> + struct ublk_queue *ubq)
> +{
> + int i;
> +
> + for (i = 0; i < ubq->q_depth; i++) {
> + struct ublk_io *io = &ubq->ios[i];
> +
> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
> + struct request *rq = blk_mq_tag_to_rq(
> + ub->tag_set.tags[ubq->q_id], i);
> +
> + WARN_ON_ONCE(!rq);
> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
> + ublk_queue_can_use_recovery_reissue(ubq) ?
> + "requeue" : "abort",
> + ubq->q_id, i, io->flags);
> + if (ublk_queue_can_use_recovery_reissue(ubq))
> + blk_mq_requeue_request(rq, false);
This way is too violent.
There may be just one queue dying, but you requeue all requests
from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
such as, just requeuing requests in dying queue.
That said you still can re-use the logic in ublk_abort_queue()/ublk_daemon_monitor_work()
for making progress, just changing aborting request with requeue in
ublk_abort_queue().
> + else
> + __ublk_fail_req(io, rq);
> + } else {
> + pr_devel("%s: done old cmd: qid %d tag %d\n",
> + __func__, ubq->q_id, i);
> + io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0);
> + io->flags &= ~UBLK_IO_FLAG_ACTIVE;
> + }
> + ubq->nr_io_ready--;
> + }
> + WARN_ON_ONCE(ubq->nr_io_ready);
> +}
> +
> +static void ublk_quiesce_dev(struct ublk_device *ub)
> +{
> + int i;
> +
> + mutex_lock(&ub->mutex);
> + if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> + goto unlock;
> +
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + if (!ubq_daemon_is_dying(ubq))
> + goto unlock;
> + }
> + blk_mq_quiesce_queue(ub->ub_disk->queue);
> + ublk_wait_tagset_rqs_idle(ub);
> + pr_devel("%s: quiesce ub: dev_id %d\n",
> + __func__, ub->dev_info.dev_id);
> +
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> + ublk_quiesce_queue(ub, ublk_get_queue(ub, i));
> +
> + ub->dev_info.state = UBLK_S_DEV_QUIESCED;
> + unlock:
> + mutex_unlock(&ub->mutex);
> +}
> +
> +static void ublk_quiesce_work_fn(struct work_struct *work)
> +{
> + struct ublk_device *ub =
> + container_of(work, struct ublk_device, quiesce_work);
> +
> + ublk_quiesce_dev(ub);
> +}
> +
> static void ublk_daemon_monitor_work(struct work_struct *work)
> {
> struct ublk_device *ub =
> @@ -1013,10 +1129,14 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
> struct ublk_queue *ubq = ublk_get_queue(ub, i);
>
> if (ubq_daemon_is_dying(ubq)) {
> - schedule_work(&ub->stop_work);
> -
> - /* abort queue is for making forward progress */
> - ublk_abort_queue(ub, ubq);
> + if (ublk_queue_can_use_recovery(ubq)) {
> + schedule_work(&ub->quiesce_work);
> + } else {
> + schedule_work(&ub->stop_work);
> +
> + /* abort queue is for making forward progress */
> + ublk_abort_queue(ub, ubq);
> + }
If quiesce work are always scheduled exclusively with stop work,
the two can be defined as union, but not one big deal.
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-19 9:32 ` Ming Lei
@ 2022-09-19 9:55 ` Ziyang Zhang
2022-09-19 12:33 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-19 9:55 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/19 17:32, Ming Lei wrote:
> On Tue, Sep 13, 2022 at 12:17:05PM +0800, ZiyangZhang wrote:
>> With USER_RECOVERY feature enabled, the monitor_work schedules
>> quiesce_work after finding a dying ubq_daemon. The quiesce_work's job
>> is to:
>> (1) quiesce request queue.
>> (2) check if there is any INFLIGHT rq with UBLK_IO_FLAG_ACTIVE unset.
>> If so, we retry until all these rqs are requeued by ublk_queue_rq()
>> and task_work and become IDLE.
>
> These requests should be being handled by task work or the io_uring
> fallback wq, and suggest to add the words here.
Will do so.
>
>> (3) requeue/abort inflight rqs issued to the crash ubq_daemon before. If
>> UBLK_F_USER_RECOVERY_REISSUE is set, rq is requeued; or it is
>> aborted.
>> (4) complete all ioucmds by calling io_uring_cmd_done(). We are safe to
>> do so because no ioucmd can be referenced now.
>> (5) set ub's state to UBLK_S_DEV_QUIESCED, which means we are ready for
>> recovery. This state is exposed to userspace by GET_DEV_INFO.
>>
>> The driver can always handle STOP_DEV and cleanup everything no matter
>> ub's state is LIVE or QUIESCED. After ub's state is UBLK_S_DEV_QUIESCED,
>> user can recover with new process by sending START_USER_RECOVERY.
>>
>> Note: we do not change the default behavior with reocvery feature
>> disabled. monitor_work still schedules stop_work and abort inflight
>> rqs. Finally ublk_device is released.
>
> This version looks much better than before.
Thanks :)
>
>>
>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
>> ---
>> drivers/block/ublk_drv.c | 168 +++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 161 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>> index b067f33a1913..4409a130d0b6 100644
>> --- a/drivers/block/ublk_drv.c
>> +++ b/drivers/block/ublk_drv.c
>> @@ -121,7 +121,7 @@ struct ublk_queue {
>>
>> unsigned long io_addr; /* mapped vm address */
>> unsigned int max_io_sz;
>> - bool abort_work_pending;
>> + bool force_abort;
>> unsigned short nr_io_ready; /* how many ios setup */
>> struct ublk_device *dev;
>> struct ublk_io ios[0];
>> @@ -163,6 +163,7 @@ struct ublk_device {
>> * monitor each queue's daemon periodically
>> */
>> struct delayed_work monitor_work;
>> + struct work_struct quiesce_work;
>> struct work_struct stop_work;
>> };
>>
>> @@ -660,6 +661,11 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
>> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
>>
>> if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
>> + pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
>> + __func__,
>> + ((struct ublk_queue *)req->mq_hctx->driver_data)->q_id,
>
> req->mq_hctx->queue_num is cleaner.
Ok.
>
>> + req->tag,
>> + io->flags);
>> io->flags |= UBLK_IO_FLAG_ABORTED;
>> blk_mq_end_request(req, BLK_STS_IOERR);
>> }
>> @@ -820,6 +826,21 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
>> res = ublk_setup_iod(ubq, rq);
>> if (unlikely(res != BLK_STS_OK))
>> return BLK_STS_IOERR;
>> + /* With recovery feature enabled, force_abort is set in
>> + * ublk_stop_dev() before calling del_gendisk() if ub's state
>> + * is QUIESCED. We have to abort all requeued and new rqs here
>> + * to let del_gendisk() move on. Besides, we do not call
>> + * io_uring_cmd_complete_in_task() to avoid UAF on io_uring ctx.
>> + *
>> + * Note: force_abort is guaranteed to be seen because it is set
>> + * before request queue is unqiuesced.
>> + */
>> + if (unlikely(ubq->force_abort)) {
>> + pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
>> + __func__, ubq->q_id, rq->tag,
>> + ubq->ios[rq->tag].flags);
>> + return BLK_STS_IOERR;
>> + }
>>
>> blk_mq_start_request(bd->rq);
>>
>> @@ -1003,6 +1024,101 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
>> ublk_put_device(ub);
>> }
>>
>> +static bool ublk_check_inflight_rq(struct request *rq, void *data)
>> +{
>> + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
>> + struct ublk_io *io = &ubq->ios[rq->tag];
>> + bool *busy = data;
>> +
>> + if (io->flags & UBLK_IO_FLAG_ACTIVE) {
>> + *busy = true;
>> + return false;
>> + }
>> + return true;
>> +}
>> +
>> +static void ublk_wait_tagset_rqs_idle(struct ublk_device *ub)
>> +{
>> + bool busy = false;
>> +
>> + WARN_ON_ONCE(!blk_queue_quiesced(ub->ub_disk->queue));
>> + while (true) {
>> + blk_mq_tagset_busy_iter(&ub->tag_set,
>> + ublk_check_inflight_rq, &busy);
>> + if (busy)
>> + msleep(UBLK_REQUEUE_DELAY_MS);
>> + else
>> + break;
>> + }
>> +}
>> +
>> +static void ublk_quiesce_queue(struct ublk_device *ub,
>> + struct ublk_queue *ubq)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < ubq->q_depth; i++) {
>> + struct ublk_io *io = &ubq->ios[i];
>> +
>> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
>> + struct request *rq = blk_mq_tag_to_rq(
>> + ub->tag_set.tags[ubq->q_id], i);
>> +
>> + WARN_ON_ONCE(!rq);
>> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
>> + ublk_queue_can_use_recovery_reissue(ubq) ?
>> + "requeue" : "abort",
>> + ubq->q_id, i, io->flags);
>> + if (ublk_queue_can_use_recovery_reissue(ubq))
>> + blk_mq_requeue_request(rq, false);
>
> This way is too violent.
>
> There may be just one queue dying, but you requeue all requests
> from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
> such as, just requeuing requests in dying queue.
If we want to start a new process after a crash for USER_RECOVERY, all old ubq_daemons
must exit and rqs of all queues have to be requeued/aborted. We cannot let live
ubq_daemons run any more because they do not belong to the new process.
BTW, I really wonder why there could be just one queue dying? All queues must be dying
shortly after any ubq_daemon is dying since they are all pthreads in the same process.
>
> That said you still can re-use the logic in ublk_abort_queue()/ublk_daemon_monitor_work()
> for making progress, just changing aborting request with requeue in
> ublk_abort_queue().
I get your point, but it may be hard to reuse the logic in ublk_daemon_monitor_work()
because:
(1) we have to quiesce request queue in ublk_quiesce_dev(). This has to be done with
ub_mutex.
(2) ublk_quiesce_dev() cannot be run after rqs are requeued/aborted.
>
>> + else
>> + __ublk_fail_req(io, rq);
>> + } else {
>> + pr_devel("%s: done old cmd: qid %d tag %d\n",
>> + __func__, ubq->q_id, i);
>> + io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0);
>> + io->flags &= ~UBLK_IO_FLAG_ACTIVE;
>> + }
>> + ubq->nr_io_ready--;
>> + }
>> + WARN_ON_ONCE(ubq->nr_io_ready);
>> +}
>> +
>> +static void ublk_quiesce_dev(struct ublk_device *ub)
>> +{
>> + int i;
>> +
>> + mutex_lock(&ub->mutex);
>> + if (ub->dev_info.state != UBLK_S_DEV_LIVE)
>> + goto unlock;
>> +
>> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
>> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
>> +
>> + if (!ubq_daemon_is_dying(ubq))
>> + goto unlock;
>> + }
>> + blk_mq_quiesce_queue(ub->ub_disk->queue);
>> + ublk_wait_tagset_rqs_idle(ub);
>> + pr_devel("%s: quiesce ub: dev_id %d\n",
>> + __func__, ub->dev_info.dev_id);
>> +
>> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
>> + ublk_quiesce_queue(ub, ublk_get_queue(ub, i));
>> +
>> + ub->dev_info.state = UBLK_S_DEV_QUIESCED;
>> + unlock:
>> + mutex_unlock(&ub->mutex);
>> +}
>> +
>> +static void ublk_quiesce_work_fn(struct work_struct *work)
>> +{
>> + struct ublk_device *ub =
>> + container_of(work, struct ublk_device, quiesce_work);
>> +
>> + ublk_quiesce_dev(ub);
>> +}
>> +
>> static void ublk_daemon_monitor_work(struct work_struct *work)
>> {
>> struct ublk_device *ub =
>> @@ -1013,10 +1129,14 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
>> struct ublk_queue *ubq = ublk_get_queue(ub, i);
>>
>> if (ubq_daemon_is_dying(ubq)) {
>> - schedule_work(&ub->stop_work);
>> -
>> - /* abort queue is for making forward progress */
>> - ublk_abort_queue(ub, ubq);
>> + if (ublk_queue_can_use_recovery(ubq)) {
>> + schedule_work(&ub->quiesce_work);
>> + } else {
>> + schedule_work(&ub->stop_work);
>> +
>> + /* abort queue is for making forward progress */
>> + ublk_abort_queue(ub, ubq);
>> + }
>
> If quiesce work are always scheduled exclusively with stop work,
> the two can be defined as union, but not one big deal.
OK.
Regards,
Zhang
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-19 9:55 ` Ziyang Zhang
@ 2022-09-19 12:33 ` Ming Lei
2022-09-20 1:49 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-19 12:33 UTC (permalink / raw)
To: Ziyang Zhang
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ming.lei
On Mon, Sep 19, 2022 at 05:55:05PM +0800, Ziyang Zhang wrote:
> On 2022/9/19 17:32, Ming Lei wrote:
> > On Tue, Sep 13, 2022 at 12:17:05PM +0800, ZiyangZhang wrote:
> >> With USER_RECOVERY feature enabled, the monitor_work schedules
> >> quiesce_work after finding a dying ubq_daemon. The quiesce_work's job
> >> is to:
> >> (1) quiesce request queue.
> >> (2) check if there is any INFLIGHT rq with UBLK_IO_FLAG_ACTIVE unset.
> >> If so, we retry until all these rqs are requeued by ublk_queue_rq()
> >> and task_work and become IDLE.
> >
> > These requests should be being handled by task work or the io_uring
> > fallback wq, and suggest to add the words here.
>
> Will do so.
>
> >
> >> (3) requeue/abort inflight rqs issued to the crash ubq_daemon before. If
> >> UBLK_F_USER_RECOVERY_REISSUE is set, rq is requeued; or it is
> >> aborted.
> >> (4) complete all ioucmds by calling io_uring_cmd_done(). We are safe to
> >> do so because no ioucmd can be referenced now.
> >> (5) set ub's state to UBLK_S_DEV_QUIESCED, which means we are ready for
> >> recovery. This state is exposed to userspace by GET_DEV_INFO.
> >>
> >> The driver can always handle STOP_DEV and cleanup everything no matter
> >> ub's state is LIVE or QUIESCED. After ub's state is UBLK_S_DEV_QUIESCED,
> >> user can recover with new process by sending START_USER_RECOVERY.
> >>
> >> Note: we do not change the default behavior with reocvery feature
> >> disabled. monitor_work still schedules stop_work and abort inflight
> >> rqs. Finally ublk_device is released.
> >
> > This version looks much better than before.
>
> Thanks :)
>
> >
> >>
> >> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> >> ---
> >> drivers/block/ublk_drv.c | 168 +++++++++++++++++++++++++++++++++++++--
> >> 1 file changed, 161 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >> index b067f33a1913..4409a130d0b6 100644
> >> --- a/drivers/block/ublk_drv.c
> >> +++ b/drivers/block/ublk_drv.c
> >> @@ -121,7 +121,7 @@ struct ublk_queue {
> >>
> >> unsigned long io_addr; /* mapped vm address */
> >> unsigned int max_io_sz;
> >> - bool abort_work_pending;
> >> + bool force_abort;
> >> unsigned short nr_io_ready; /* how many ios setup */
> >> struct ublk_device *dev;
> >> struct ublk_io ios[0];
> >> @@ -163,6 +163,7 @@ struct ublk_device {
> >> * monitor each queue's daemon periodically
> >> */
> >> struct delayed_work monitor_work;
> >> + struct work_struct quiesce_work;
> >> struct work_struct stop_work;
> >> };
> >>
> >> @@ -660,6 +661,11 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> >> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
> >>
> >> if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
> >> + pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
> >> + __func__,
> >> + ((struct ublk_queue *)req->mq_hctx->driver_data)->q_id,
> >
> > req->mq_hctx->queue_num is cleaner.
>
> Ok.
>
> >
> >> + req->tag,
> >> + io->flags);
> >> io->flags |= UBLK_IO_FLAG_ABORTED;
> >> blk_mq_end_request(req, BLK_STS_IOERR);
> >> }
> >> @@ -820,6 +826,21 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> >> res = ublk_setup_iod(ubq, rq);
> >> if (unlikely(res != BLK_STS_OK))
> >> return BLK_STS_IOERR;
> >> + /* With recovery feature enabled, force_abort is set in
> >> + * ublk_stop_dev() before calling del_gendisk() if ub's state
> >> + * is QUIESCED. We have to abort all requeued and new rqs here
> >> + * to let del_gendisk() move on. Besides, we do not call
> >> + * io_uring_cmd_complete_in_task() to avoid UAF on io_uring ctx.
> >> + *
> >> + * Note: force_abort is guaranteed to be seen because it is set
> >> + * before request queue is unqiuesced.
> >> + */
> >> + if (unlikely(ubq->force_abort)) {
> >> + pr_devel("%s: abort rq: qid %d tag %d io_flags %x\n",
> >> + __func__, ubq->q_id, rq->tag,
> >> + ubq->ios[rq->tag].flags);
> >> + return BLK_STS_IOERR;
> >> + }
> >>
> >> blk_mq_start_request(bd->rq);
> >>
> >> @@ -1003,6 +1024,101 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
> >> ublk_put_device(ub);
> >> }
> >>
> >> +static bool ublk_check_inflight_rq(struct request *rq, void *data)
> >> +{
> >> + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> >> + struct ublk_io *io = &ubq->ios[rq->tag];
> >> + bool *busy = data;
> >> +
> >> + if (io->flags & UBLK_IO_FLAG_ACTIVE) {
> >> + *busy = true;
> >> + return false;
> >> + }
> >> + return true;
> >> +}
> >> +
> >> +static void ublk_wait_tagset_rqs_idle(struct ublk_device *ub)
> >> +{
> >> + bool busy = false;
> >> +
> >> + WARN_ON_ONCE(!blk_queue_quiesced(ub->ub_disk->queue));
> >> + while (true) {
> >> + blk_mq_tagset_busy_iter(&ub->tag_set,
> >> + ublk_check_inflight_rq, &busy);
> >> + if (busy)
> >> + msleep(UBLK_REQUEUE_DELAY_MS);
> >> + else
> >> + break;
> >> + }
> >> +}
> >> +
> >> +static void ublk_quiesce_queue(struct ublk_device *ub,
> >> + struct ublk_queue *ubq)
> >> +{
> >> + int i;
> >> +
> >> + for (i = 0; i < ubq->q_depth; i++) {
> >> + struct ublk_io *io = &ubq->ios[i];
> >> +
> >> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
> >> + struct request *rq = blk_mq_tag_to_rq(
> >> + ub->tag_set.tags[ubq->q_id], i);
> >> +
> >> + WARN_ON_ONCE(!rq);
> >> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
> >> + ublk_queue_can_use_recovery_reissue(ubq) ?
> >> + "requeue" : "abort",
> >> + ubq->q_id, i, io->flags);
> >> + if (ublk_queue_can_use_recovery_reissue(ubq))
> >> + blk_mq_requeue_request(rq, false);
> >
> > This way is too violent.
> >
> > There may be just one queue dying, but you requeue all requests
> > from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
> > such as, just requeuing requests in dying queue.
>
> If we want to start a new process after a crash for USER_RECOVERY, all old ubq_daemons
> must exit and rqs of all queues have to be requeued/aborted. We cannot let live
> ubq_daemons run any more because they do not belong to the new process.
IMO, the old process really can exist, and recently even I got such
requirement for switching queue from one thread to another.
What we should do is to get all inflight requests done, and cancel all io
commands, no matter if the ubq pthread is dead or live.
>
> BTW, I really wonder why there could be just one queue dying? All queues must be dying
> shortly after any ubq_daemon is dying since they are all pthreads in the same process.
You can't assume it is always so. Maybe one pthread is dead first, and
others are dying later, maybe just one is dead.
If one queue's pthread is live, you may get trouble by simply requeuing
the request, that is why I suggest to re-use the logic of
ublk_daemon_monitor_work/ublk_abort_queue().
For stopping device, request queue is frozen in del_gendisk() and all
in-flight requests are drained, and monitor work provides such
guarantee.
For user recovery, monitor work should help you too by aborting one
queue if it is dying until all requests are drained.
>
> >
> > That said you still can re-use the logic in ublk_abort_queue()/ublk_daemon_monitor_work()
> > for making progress, just changing aborting request with requeue in
> > ublk_abort_queue().
>
> I get your point, but it may be hard to reuse the logic in ublk_daemon_monitor_work()
> because:
> (1) we have to quiesce request queue in ublk_quiesce_dev(). This has to be done with
> ub_mutex.
> (2) ublk_quiesce_dev() cannot be run after rqs are requeued/aborted.
I don't get your point, the request queue needs to be quiesced once, then
either inflight requests are requeued if the queue is dying, or completed by
the queue's pthread if it is live. As you mentioned, in reality, most times,
all pthreads will be killed, but timing can be different, and I think
you can not requeue one request if the ubq pthread isn't dying.
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-19 12:33 ` Ming Lei
@ 2022-09-20 1:49 ` Ziyang Zhang
2022-09-20 3:04 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 1:49 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/19 20:33, Ming Lei wrote:
>>>> +
>>>> +static void ublk_quiesce_queue(struct ublk_device *ub,
>>>> + struct ublk_queue *ubq)
>>>> +{
>>>> + int i;
>>>> +
>>>> + for (i = 0; i < ubq->q_depth; i++) {
>>>> + struct ublk_io *io = &ubq->ios[i];
>>>> +
>>>> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
>>>> + struct request *rq = blk_mq_tag_to_rq(
>>>> + ub->tag_set.tags[ubq->q_id], i);
>>>> +
>>>> + WARN_ON_ONCE(!rq);
>>>> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
>>>> + ublk_queue_can_use_recovery_reissue(ubq) ?
>>>> + "requeue" : "abort",
>>>> + ubq->q_id, i, io->flags);
>>>> + if (ublk_queue_can_use_recovery_reissue(ubq))
>>>> + blk_mq_requeue_request(rq, false);
>>>
>>> This way is too violent.
>>>
>>> There may be just one queue dying, but you requeue all requests
>>> from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
>>> such as, just requeuing requests in dying queue.
>>
>> If we want to start a new process after a crash for USER_RECOVERY, all old ubq_daemons
>> must exit and rqs of all queues have to be requeued/aborted. We cannot let live
>> ubq_daemons run any more because they do not belong to the new process.
>
> IMO, the old process really can exist, and recently even I got such
> requirement for switching queue from one thread to another.
For now, only one process can open /dev/ublkcX, so a new process is necessary now.
If you think "per ubq_daemon" recovery is reasonable, I can do that in the future
if multiple processes is supported. But I really suggest that we can keep current
design as the first step which assumes all ubq_daemons are exited and a new process
is started, and that really meets our requirement.
BTW, START_USER_RECOVERY has to be reconsidered because we may need to pass a ubq_id
with it.
>
> What we should do is to get all inflight requests done, and cancel all io
> commands, no matter if the ubq pthread is dead or live.
>
>>
>> BTW, I really wonder why there could be just one queue dying? All queues must be dying
>> shortly after any ubq_daemon is dying since they are all pthreads in the same process.
>
> You can't assume it is always so. Maybe one pthread is dead first, and
> others are dying later, maybe just one is dead.
Yes, I know there may be only one pthread is dead while others keep running, but now
ublk_drv only support one process opening the same /dev/ublkcX, so other pthreads
must dead(no matter they are aborted by signal or themselves) later.
>
> If one queue's pthread is live, you may get trouble by simply requeuing
> the request, that is why I suggest to re-use the logic of
> ublk_daemon_monitor_work/ublk_abort_queue().
Actually, if any ubq_daemon is live, no rqs are requeued, please see the check in
ublk_quiesce_dev(). It always makes sure that ALL ubq_daemons are dying, then it
starts quiesce jobs.
>
> For stopping device, request queue is frozen in del_gendisk() and all
> in-flight requests are drained, and monitor work provides such
> guarantee.
>
> For user recovery, monitor work should help you too by aborting one
> queue if it is dying until all requests are drained.
Monitor work can schedule quiesce_work if it finds a dying ubq_daemon.
Then quiesce_work calls ublk_quiesce_dev(). I do this because ublk_quiesce_dev()
has to wait all inflight rqs with ACTIVE set being requeued.
>
>>
>>>
>>> That said you still can re-use the logic in ublk_abort_queue()/ublk_daemon_monitor_work()
>>> for making progress, just changing aborting request with requeue in
>>> ublk_abort_queue().
>>
>> I get your point, but it may be hard to reuse the logic in ublk_daemon_monitor_work()
>> because:
>> (1) we have to quiesce request queue in ublk_quiesce_dev(). This has to be done with
>> ub_mutex.
>> (2) ublk_quiesce_dev() cannot be run after rqs are requeued/aborted.
>
> I don't get your point, the request queue needs to be quiesced once, then
> either inflight requests are requeued if the queue is dying, or completed by
> the queue's pthread if it is live. As you mentioned, in reality, most times,
> all pthreads will be killed, but timing can be different, and I think
> you can not requeue one request if the ubq pthread isn't dying.
I do not requeue rqs with a live ubq_daemon. ublk_quiesce_dev() always starts
after all ubq_daemons are dying.
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 1:49 ` Ziyang Zhang
@ 2022-09-20 3:04 ` Ming Lei
2022-09-20 3:24 ` Ziyang Zhang
2022-09-20 4:45 ` Ziyang Zhang
0 siblings, 2 replies; 33+ messages in thread
From: Ming Lei @ 2022-09-20 3:04 UTC (permalink / raw)
To: Ziyang Zhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
> On 2022/9/19 20:33, Ming Lei wrote:
> >>>> +
> >>>> +static void ublk_quiesce_queue(struct ublk_device *ub,
> >>>> + struct ublk_queue *ubq)
> >>>> +{
> >>>> + int i;
> >>>> +
> >>>> + for (i = 0; i < ubq->q_depth; i++) {
> >>>> + struct ublk_io *io = &ubq->ios[i];
> >>>> +
> >>>> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
> >>>> + struct request *rq = blk_mq_tag_to_rq(
> >>>> + ub->tag_set.tags[ubq->q_id], i);
> >>>> +
> >>>> + WARN_ON_ONCE(!rq);
> >>>> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
> >>>> + ublk_queue_can_use_recovery_reissue(ubq) ?
> >>>> + "requeue" : "abort",
> >>>> + ubq->q_id, i, io->flags);
> >>>> + if (ublk_queue_can_use_recovery_reissue(ubq))
> >>>> + blk_mq_requeue_request(rq, false);
> >>>
> >>> This way is too violent.
> >>>
> >>> There may be just one queue dying, but you requeue all requests
> >>> from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
> >>> such as, just requeuing requests in dying queue.
> >>
> >> If we want to start a new process after a crash for USER_RECOVERY, all old ubq_daemons
> >> must exit and rqs of all queues have to be requeued/aborted. We cannot let live
> >> ubq_daemons run any more because they do not belong to the new process.
> >
> > IMO, the old process really can exist, and recently even I got such
> > requirement for switching queue from one thread to another.
>
> For now, only one process can open /dev/ublkcX, so a new process is necessary now.
>
> If you think "per ubq_daemon" recovery is reasonable, I can do that in the future
> if multiple processes is supported. But I really suggest that we can keep current
> design as the first step which assumes all ubq_daemons are exited and a new process
> is started, and that really meets our requirement.
>
> BTW, START_USER_RECOVERY has to be reconsidered because we may need to pass a ubq_id
> with it.
>
> >
> > What we should do is to get all inflight requests done, and cancel all io
> > commands, no matter if the ubq pthread is dead or live.
> >
> >>
> >> BTW, I really wonder why there could be just one queue dying? All queues must be dying
> >> shortly after any ubq_daemon is dying since they are all pthreads in the same process.
> >
> > You can't assume it is always so. Maybe one pthread is dead first, and
> > others are dying later, maybe just one is dead.
>
> Yes, I know there may be only one pthread is dead while others keep running, but now
> ublk_drv only support one process opening the same /dev/ublkcX, so other pthreads
> must dead(no matter they are aborted by signal or themselves) later.
>
> >
> > If one queue's pthread is live, you may get trouble by simply requeuing
> > the request, that is why I suggest to re-use the logic of
> > ublk_daemon_monitor_work/ublk_abort_queue().
>
> Actually, if any ubq_daemon is live, no rqs are requeued, please see the check in
> ublk_quiesce_dev(). It always makes sure that ALL ubq_daemons are dying, then it
> starts quiesce jobs.
OK, looks I miss this point, but you should have quiesced queue at the
beginning of ublk_quiesce_dev(), then the transition period can be kept
as short as possible. Otherwise, if one queue pthread isn't dying, the
device can be kept in this part-working state forever.
>
> >
> > For stopping device, request queue is frozen in del_gendisk() and all
> > in-flight requests are drained, and monitor work provides such
> > guarantee.
> >
> > For user recovery, monitor work should help you too by aborting one
> > queue if it is dying until all requests are drained.
>
> Monitor work can schedule quiesce_work if it finds a dying ubq_daemon.
> Then quiesce_work calls ublk_quiesce_dev(). I do this because ublk_quiesce_dev()
> has to wait all inflight rqs with ACTIVE set being requeued.
>
> >
> >>
> >>>
> >>> That said you still can re-use the logic in ublk_abort_queue()/ublk_daemon_monitor_work()
> >>> for making progress, just changing aborting request with requeue in
> >>> ublk_abort_queue().
> >>
> >> I get your point, but it may be hard to reuse the logic in ublk_daemon_monitor_work()
> >> because:
> >> (1) we have to quiesce request queue in ublk_quiesce_dev(). This has to be done with
> >> ub_mutex.
> >> (2) ublk_quiesce_dev() cannot be run after rqs are requeued/aborted.
Follows the delta patch against patch 5 for showing the idea:
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4409a130d0b6..60c5786c4711 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -656,7 +656,8 @@ static void ublk_complete_rq(struct request *req)
* Also aborting may not be started yet, keep in mind that one failed
* request may be issued by block layer again.
*/
-static void __ublk_fail_req(struct ublk_io *io, struct request *req)
+static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
+ struct request *req)
{
WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
@@ -667,7 +668,10 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
req->tag,
io->flags);
io->flags |= UBLK_IO_FLAG_ABORTED;
- blk_mq_end_request(req, BLK_STS_IOERR);
+ if (ublk_queue_can_use_recovery_reissue(ubq))
+ blk_mq_requeue_request(req, false);
+ else
+ blk_mq_end_request(req, BLK_STS_IOERR);
}
}
@@ -1018,7 +1022,7 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
*/
rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
if (rq)
- __ublk_fail_req(io, rq);
+ __ublk_fail_req(ubq, io, rq);
}
}
ublk_put_device(ub);
@@ -1026,12 +1030,10 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
static bool ublk_check_inflight_rq(struct request *rq, void *data)
{
- struct ublk_queue *ubq = rq->mq_hctx->driver_data;
- struct ublk_io *io = &ubq->ios[rq->tag];
- bool *busy = data;
+ bool *idle = data;
- if (io->flags & UBLK_IO_FLAG_ACTIVE) {
- *busy = true;
+ if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) {
+ *idle = false;
return false;
}
return true;
@@ -1039,16 +1041,15 @@ static bool ublk_check_inflight_rq(struct request *rq, void *data)
static void ublk_wait_tagset_rqs_idle(struct ublk_device *ub)
{
- bool busy = false;
+ bool idle = true;
WARN_ON_ONCE(!blk_queue_quiesced(ub->ub_disk->queue));
while (true) {
blk_mq_tagset_busy_iter(&ub->tag_set,
- ublk_check_inflight_rq, &busy);
- if (busy)
- msleep(UBLK_REQUEUE_DELAY_MS);
- else
+ ublk_check_inflight_rq, &idle);
+ if (idle)
break;
+ msleep(UBLK_REQUEUE_DELAY_MS);
}
}
@@ -1069,10 +1070,7 @@ static void ublk_quiesce_queue(struct ublk_device *ub,
ublk_queue_can_use_recovery_reissue(ubq) ?
"requeue" : "abort",
ubq->q_id, i, io->flags);
- if (ublk_queue_can_use_recovery_reissue(ubq))
- blk_mq_requeue_request(rq, false);
- else
- __ublk_fail_req(io, rq);
+ __ublk_fail_req(ubq, io, rq);
} else {
pr_devel("%s: done old cmd: qid %d tag %d\n",
__func__, ubq->q_id, i);
@@ -1092,12 +1090,6 @@ static void ublk_quiesce_dev(struct ublk_device *ub)
if (ub->dev_info.state != UBLK_S_DEV_LIVE)
goto unlock;
- for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
- struct ublk_queue *ubq = ublk_get_queue(ub, i);
-
- if (!ubq_daemon_is_dying(ubq))
- goto unlock;
- }
blk_mq_quiesce_queue(ub->ub_disk->queue);
ublk_wait_tagset_rqs_idle(ub);
pr_devel("%s: quiesce ub: dev_id %d\n",
@@ -1129,14 +1121,13 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
struct ublk_queue *ubq = ublk_get_queue(ub, i);
if (ubq_daemon_is_dying(ubq)) {
- if (ublk_queue_can_use_recovery(ubq)) {
+ if (ublk_queue_can_use_recovery(ubq))
schedule_work(&ub->quiesce_work);
- } else {
+ else
schedule_work(&ub->stop_work);
- /* abort queue is for making forward progress */
- ublk_abort_queue(ub, ubq);
- }
+ /* abort queue is for making forward progress */
+ ublk_abort_queue(ub, ubq);
}
}
Thanks,
Ming
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 3:04 ` Ming Lei
@ 2022-09-20 3:24 ` Ziyang Zhang
2022-09-20 4:01 ` Ming Lei
2022-09-20 4:45 ` Ziyang Zhang
1 sibling, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 3:24 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 11:04, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
>
> Follows the delta patch against patch 5 for showing the idea:
>
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 4409a130d0b6..60c5786c4711 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -656,7 +656,8 @@ static void ublk_complete_rq(struct request *req)
> * Also aborting may not be started yet, keep in mind that one failed
> * request may be issued by block layer again.
> */
> -static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> +static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
> + struct request *req)
> {
> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
>
> @@ -667,7 +668,10 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> req->tag,
> io->flags);
> io->flags |= UBLK_IO_FLAG_ABORTED;
> - blk_mq_end_request(req, BLK_STS_IOERR);
> + if (ublk_queue_can_use_recovery_reissue(ubq))
> + blk_mq_requeue_request(req, false);
Here is one problem:
We reset io->flags to 0 in ublk_queue_reinit() and it is called before new
ubq_daemon with FETCH_REQ is accepted. ublk_abort_queue() is not protected with
ub_mutex and it is called many times in monitor_work. So same rq may be requeued
multiple times.
With recovery disabled, there is no such problem since io->flags does not change
until ublk_dev is released.
In my patch 5 I only requeue the same rq once. So re-using ublk_abort_queue() is
hard for recovery feature.
> + else
> + blk_mq_end_request(req, BLK_STS_IOERR);
> }
> }
>
> @@ -1018,7 +1022,7 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
> */
> rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
> if (rq)
> - __ublk_fail_req(io, rq);
> + __ublk_fail_req(ubq, io, rq);
> }
> }
> ublk_put_device(ub);
> @@ -1026,12 +1030,10 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
>
> static bool ublk_check_inflight_rq(struct request *rq, void *data)
> {
> - struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> - struct ublk_io *io = &ubq->ios[rq->tag];
> - bool *busy = data;
> + bool *idle = data;
>
> - if (io->flags & UBLK_IO_FLAG_ACTIVE) {
> - *busy = true;
> + if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) {
> + *idle = false;
> return false;
> }
> return true;
> @@ -1039,16 +1041,15 @@ static bool ublk_check_inflight_rq(struct request *rq, void *data)
>
> static void ublk_wait_tagset_rqs_idle(struct ublk_device *ub)
> {
> - bool busy = false;
> + bool idle = true;
>
> WARN_ON_ONCE(!blk_queue_quiesced(ub->ub_disk->queue));
> while (true) {
> blk_mq_tagset_busy_iter(&ub->tag_set,
> - ublk_check_inflight_rq, &busy);
> - if (busy)
> - msleep(UBLK_REQUEUE_DELAY_MS);
> - else
> + ublk_check_inflight_rq, &idle);
> + if (idle)
> break;
> + msleep(UBLK_REQUEUE_DELAY_MS);
> }
> }
>
> @@ -1069,10 +1070,7 @@ static void ublk_quiesce_queue(struct ublk_device *ub,
> ublk_queue_can_use_recovery_reissue(ubq) ?
> "requeue" : "abort",
> ubq->q_id, i, io->flags);
> - if (ublk_queue_can_use_recovery_reissue(ubq))
> - blk_mq_requeue_request(rq, false);
> - else
> - __ublk_fail_req(io, rq);
> + __ublk_fail_req(ubq, io, rq);
> } else {
> pr_devel("%s: done old cmd: qid %d tag %d\n",
> __func__, ubq->q_id, i);
> @@ -1092,12 +1090,6 @@ static void ublk_quiesce_dev(struct ublk_device *ub)
> if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> goto unlock;
>
> - for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> - struct ublk_queue *ubq = ublk_get_queue(ub, i);
> -
> - if (!ubq_daemon_is_dying(ubq))
> - goto unlock;
> - }
> blk_mq_quiesce_queue(ub->ub_disk->queue);
> ublk_wait_tagset_rqs_idle(ub);
> pr_devel("%s: quiesce ub: dev_id %d\n",
> @@ -1129,14 +1121,13 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
> struct ublk_queue *ubq = ublk_get_queue(ub, i);
>
> if (ubq_daemon_is_dying(ubq)) {
> - if (ublk_queue_can_use_recovery(ubq)) {
> + if (ublk_queue_can_use_recovery(ubq))
> schedule_work(&ub->quiesce_work);
> - } else {
> + else
> schedule_work(&ub->stop_work);
>
> - /* abort queue is for making forward progress */
> - ublk_abort_queue(ub, ubq);
> - }
> + /* abort queue is for making forward progress */
> + ublk_abort_queue(ub, ubq);
> }
> }
>
>
>
>
>
> Thanks,
> Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 3:24 ` Ziyang Zhang
@ 2022-09-20 4:01 ` Ming Lei
2022-09-20 4:39 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-20 4:01 UTC (permalink / raw)
To: Ziyang Zhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 20, 2022 at 11:24:12AM +0800, Ziyang Zhang wrote:
> On 2022/9/20 11:04, Ming Lei wrote:
> > On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
> >
> > Follows the delta patch against patch 5 for showing the idea:
> >
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 4409a130d0b6..60c5786c4711 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -656,7 +656,8 @@ static void ublk_complete_rq(struct request *req)
> > * Also aborting may not be started yet, keep in mind that one failed
> > * request may be issued by block layer again.
> > */
> > -static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> > +static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
> > + struct request *req)
> > {
> > WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
> >
> > @@ -667,7 +668,10 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> > req->tag,
> > io->flags);
> > io->flags |= UBLK_IO_FLAG_ABORTED;
> > - blk_mq_end_request(req, BLK_STS_IOERR);
> > + if (ublk_queue_can_use_recovery_reissue(ubq))
> > + blk_mq_requeue_request(req, false);
>
> Here is one problem:
> We reset io->flags to 0 in ublk_queue_reinit() and it is called before new
As we agreed, ublk_queue_reinit() will be moved to ublk_ch_release(), when there isn't
any inflight request, which is completed by either ublk server or __ublk_fail_req().
So clearing io->flags isn't related with quisceing device.
> ubq_daemon with FETCH_REQ is accepted. ublk_abort_queue() is not protected with
> ub_mutex and it is called many times in monitor_work. So same rq may be requeued
> multiple times.
UBLK_IO_FLAG_ABORTED is set for the slot, so one req is only ended or
requeued just once.
>
> With recovery disabled, there is no such problem since io->flags does not change
> until ublk_dev is released.
But we have agreed that ublk_queue_reinit() can be moved to release
handler of /dev/ublkcN.
>
> In my patch 5 I only requeue the same rq once. So re-using ublk_abort_queue() is
> hard for recovery feature.
No, the same rq is just requeued once. Here the point is:
1) reuse previous pattern in ublk_stop_dev(), which is proved as
workable reliably
2) avoid to stay in half-working state forever
3) the behind idea is more simpler.
Thanks.
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 4:01 ` Ming Lei
@ 2022-09-20 4:39 ` Ziyang Zhang
2022-09-20 4:49 ` Ming Lei
0 siblings, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 4:39 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 12:01, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 11:24:12AM +0800, Ziyang Zhang wrote:
>> On 2022/9/20 11:04, Ming Lei wrote:
>>> On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
>>>
>>> Follows the delta patch against patch 5 for showing the idea:
>>>
>>>
>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>> index 4409a130d0b6..60c5786c4711 100644
>>> --- a/drivers/block/ublk_drv.c
>>> +++ b/drivers/block/ublk_drv.c
>>> @@ -656,7 +656,8 @@ static void ublk_complete_rq(struct request *req)
>>> * Also aborting may not be started yet, keep in mind that one failed
>>> * request may be issued by block layer again.
>>> */
>>> -static void __ublk_fail_req(struct ublk_io *io, struct request *req)
>>> +static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
>>> + struct request *req)
>>> {
>>> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
>>>
>>> @@ -667,7 +668,10 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
>>> req->tag,
>>> io->flags);
>>> io->flags |= UBLK_IO_FLAG_ABORTED;
>>> - blk_mq_end_request(req, BLK_STS_IOERR);
>>> + if (ublk_queue_can_use_recovery_reissue(ubq))
>>> + blk_mq_requeue_request(req, false);
>>
>> Here is one problem:
>> We reset io->flags to 0 in ublk_queue_reinit() and it is called before new
>
> As we agreed, ublk_queue_reinit() will be moved to ublk_ch_release(), when there isn't
> any inflight request, which is completed by either ublk server or __ublk_fail_req().
>
> So clearing io->flags isn't related with quisceing device.
>
>> ubq_daemon with FETCH_REQ is accepted. ublk_abort_queue() is not protected with
>> ub_mutex and it is called many times in monitor_work. So same rq may be requeued
>> multiple times.
>
> UBLK_IO_FLAG_ABORTED is set for the slot, so one req is only ended or
> requeued just once.
Yes, we can move ublk_queue_reinit() into ublk_ch_release(), but monitor_work is scheduled
periodically so ublk_abort_queue() is called multiple times. As ublk_queue_reinit() clear
io->flags, ublk_abort_queue() can requeue the same rq twice. Note that monitor_work can be
scheduled after ublk_ch_release().
>
>>
>> With recovery disabled, there is no such problem since io->flags does not change
>> until ublk_dev is released.
>
> But we have agreed that ublk_queue_reinit() can be moved to release
> handler of /dev/ublkcN.
>
>>
>> In my patch 5 I only requeue the same rq once. So re-using ublk_abort_queue() is
>> hard for recovery feature.
>
> No, the same rq is just requeued once. Here the point is:
>
> 1) reuse previous pattern in ublk_stop_dev(), which is proved as
> workable reliably
>
> 2) avoid to stay in half-working state forever
>
> 3) the behind idea is more simpler.
Ming, your patch requeue rqs with ACTVE unset. these rqs have been issued to the
dying ubq_daemon. What I concern about is inflight rqs with ACTIVE set.
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 4:39 ` Ziyang Zhang
@ 2022-09-20 4:49 ` Ming Lei
2022-09-20 5:03 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-20 4:49 UTC (permalink / raw)
To: Ziyang Zhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 20, 2022 at 12:39:31PM +0800, Ziyang Zhang wrote:
> On 2022/9/20 12:01, Ming Lei wrote:
> > On Tue, Sep 20, 2022 at 11:24:12AM +0800, Ziyang Zhang wrote:
> >> On 2022/9/20 11:04, Ming Lei wrote:
> >>> On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
> >>>
> >>> Follows the delta patch against patch 5 for showing the idea:
> >>>
> >>>
> >>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >>> index 4409a130d0b6..60c5786c4711 100644
> >>> --- a/drivers/block/ublk_drv.c
> >>> +++ b/drivers/block/ublk_drv.c
> >>> @@ -656,7 +656,8 @@ static void ublk_complete_rq(struct request *req)
> >>> * Also aborting may not be started yet, keep in mind that one failed
> >>> * request may be issued by block layer again.
> >>> */
> >>> -static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> >>> +static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
> >>> + struct request *req)
> >>> {
> >>> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
> >>>
> >>> @@ -667,7 +668,10 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> >>> req->tag,
> >>> io->flags);
> >>> io->flags |= UBLK_IO_FLAG_ABORTED;
> >>> - blk_mq_end_request(req, BLK_STS_IOERR);
> >>> + if (ublk_queue_can_use_recovery_reissue(ubq))
> >>> + blk_mq_requeue_request(req, false);
> >>
> >> Here is one problem:
> >> We reset io->flags to 0 in ublk_queue_reinit() and it is called before new
> >
> > As we agreed, ublk_queue_reinit() will be moved to ublk_ch_release(), when there isn't
> > any inflight request, which is completed by either ublk server or __ublk_fail_req().
> >
> > So clearing io->flags isn't related with quisceing device.
> >
> >> ubq_daemon with FETCH_REQ is accepted. ublk_abort_queue() is not protected with
> >> ub_mutex and it is called many times in monitor_work. So same rq may be requeued
> >> multiple times.
> >
> > UBLK_IO_FLAG_ABORTED is set for the slot, so one req is only ended or
> > requeued just once.
>
> Yes, we can move ublk_queue_reinit() into ublk_ch_release(), but monitor_work is scheduled
> periodically so ublk_abort_queue() is called multiple times. As ublk_queue_reinit() clear
> io->flags, ublk_abort_queue() can requeue the same rq twice. Note that monitor_work can be
> scheduled after ublk_ch_release().
No, monitor work is supposed to be shutdown after in-flight requests are
drained.
>
> >
> >>
> >> With recovery disabled, there is no such problem since io->flags does not change
> >> until ublk_dev is released.
> >
> > But we have agreed that ublk_queue_reinit() can be moved to release
> > handler of /dev/ublkcN.
> >
> >>
> >> In my patch 5 I only requeue the same rq once. So re-using ublk_abort_queue() is
> >> hard for recovery feature.
> >
> > No, the same rq is just requeued once. Here the point is:
> >
> > 1) reuse previous pattern in ublk_stop_dev(), which is proved as
> > workable reliably
> >
> > 2) avoid to stay in half-working state forever
> >
> > 3) the behind idea is more simpler.
>
> Ming, your patch requeue rqs with ACTVE unset. these rqs have been issued to the
> dying ubq_daemon. What I concern about is inflight rqs with ACTIVE set.
My patch drains all inflight requests no matter if ACTIVE is set or not,
and that is the reason why it is simpler.
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 4:49 ` Ming Lei
@ 2022-09-20 5:03 ` Ziyang Zhang
0 siblings, 0 replies; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 5:03 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 12:49, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 12:39:31PM +0800, Ziyang Zhang wrote:
>> On 2022/9/20 12:01, Ming Lei wrote:
>>> On Tue, Sep 20, 2022 at 11:24:12AM +0800, Ziyang Zhang wrote:
>>>> On 2022/9/20 11:04, Ming Lei wrote:
>>>>> On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
>>>>>
>>>>> Follows the delta patch against patch 5 for showing the idea:
>>>>>
>>>>>
>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>>> index 4409a130d0b6..60c5786c4711 100644
>>>>> --- a/drivers/block/ublk_drv.c
>>>>> +++ b/drivers/block/ublk_drv.c
>>>>> @@ -656,7 +656,8 @@ static void ublk_complete_rq(struct request *req)
>>>>> * Also aborting may not be started yet, keep in mind that one failed
>>>>> * request may be issued by block layer again.
>>>>> */
>>>>> -static void __ublk_fail_req(struct ublk_io *io, struct request *req)
>>>>> +static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
>>>>> + struct request *req)
>>>>> {
>>>>> WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
>>>>>
>>>>> @@ -667,7 +668,10 @@ static void __ublk_fail_req(struct ublk_io *io, struct request *req)
>>>>> req->tag,
>>>>> io->flags);
>>>>> io->flags |= UBLK_IO_FLAG_ABORTED;
>>>>> - blk_mq_end_request(req, BLK_STS_IOERR);
>>>>> + if (ublk_queue_can_use_recovery_reissue(ubq))
>>>>> + blk_mq_requeue_request(req, false);
>>>>
>>>> Here is one problem:
>>>> We reset io->flags to 0 in ublk_queue_reinit() and it is called before new
>>>
>>> As we agreed, ublk_queue_reinit() will be moved to ublk_ch_release(), when there isn't
>>> any inflight request, which is completed by either ublk server or __ublk_fail_req().
>>>
>>> So clearing io->flags isn't related with quisceing device.
>>>
>>>> ubq_daemon with FETCH_REQ is accepted. ublk_abort_queue() is not protected with
>>>> ub_mutex and it is called many times in monitor_work. So same rq may be requeued
>>>> multiple times.
>>>
>>> UBLK_IO_FLAG_ABORTED is set for the slot, so one req is only ended or
>>> requeued just once.
>>
>> Yes, we can move ublk_queue_reinit() into ublk_ch_release(), but monitor_work is scheduled
>> periodically so ublk_abort_queue() is called multiple times. As ublk_queue_reinit() clear
>> io->flags, ublk_abort_queue() can requeue the same rq twice. Note that monitor_work can be
>> scheduled after ublk_ch_release().
>
> No, monitor work is supposed to be shutdown after in-flight requests are
> drained.
Let's add cancel_delayed_work_sync(&ub->monitor_work) in ublk_ch_release().
monitor_work should not be scheduled after ub's state is QUIESCED.
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 3:04 ` Ming Lei
2022-09-20 3:24 ` Ziyang Zhang
@ 2022-09-20 4:45 ` Ziyang Zhang
2022-09-20 5:05 ` Ziyang Zhang
1 sibling, 1 reply; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 4:45 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 11:04, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
>> On 2022/9/19 20:33, Ming Lei wrote:
>>>>>> +
>>>>>> +static void ublk_quiesce_queue(struct ublk_device *ub,
>>>>>> + struct ublk_queue *ubq)
>>>>>> +{
>>>>>> + int i;
>>>>>> +
>>>>>> + for (i = 0; i < ubq->q_depth; i++) {
>>>>>> + struct ublk_io *io = &ubq->ios[i];
>>>>>> +
>>>>>> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
>>>>>> + struct request *rq = blk_mq_tag_to_rq(
>>>>>> + ub->tag_set.tags[ubq->q_id], i);
>>>>>> +
>>>>>> + WARN_ON_ONCE(!rq);
>>>>>> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
>>>>>> + ublk_queue_can_use_recovery_reissue(ubq) ?
>>>>>> + "requeue" : "abort",
>>>>>> + ubq->q_id, i, io->flags);
>>>>>> + if (ublk_queue_can_use_recovery_reissue(ubq))
>>>>>> + blk_mq_requeue_request(rq, false);
>>>>>
>>>>> This way is too violent.
>>>>>
>>>>> There may be just one queue dying, but you requeue all requests
>>>>> from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
>>>>> such as, just requeuing requests in dying queue.
>>>>
>>>> If we want to start a new process after a crash for USER_RECOVERY, all old ubq_daemons
>>>> must exit and rqs of all queues have to be requeued/aborted. We cannot let live
>>>> ubq_daemons run any more because they do not belong to the new process.
>>>
>>> IMO, the old process really can exist, and recently even I got such
>>> requirement for switching queue from one thread to another.
>>
>> For now, only one process can open /dev/ublkcX, so a new process is necessary now.
>>
>> If you think "per ubq_daemon" recovery is reasonable, I can do that in the future
>> if multiple processes is supported. But I really suggest that we can keep current
>> design as the first step which assumes all ubq_daemons are exited and a new process
>> is started, and that really meets our requirement.
>>
>> BTW, START_USER_RECOVERY has to be reconsidered because we may need to pass a ubq_id
>> with it.
>>
>>>
>>> What we should do is to get all inflight requests done, and cancel all io
>>> commands, no matter if the ubq pthread is dead or live.
>>>
>>>>
>>>> BTW, I really wonder why there could be just one queue dying? All queues must be dying
>>>> shortly after any ubq_daemon is dying since they are all pthreads in the same process.
>>>
>>> You can't assume it is always so. Maybe one pthread is dead first, and
>>> others are dying later, maybe just one is dead.
>>
>> Yes, I know there may be only one pthread is dead while others keep running, but now
>> ublk_drv only support one process opening the same /dev/ublkcX, so other pthreads
>> must dead(no matter they are aborted by signal or themselves) later.
>>
>>>
>>> If one queue's pthread is live, you may get trouble by simply requeuing
>>> the request, that is why I suggest to re-use the logic of
>>> ublk_daemon_monitor_work/ublk_abort_queue().
>>
>> Actually, if any ubq_daemon is live, no rqs are requeued, please see the check in
>> ublk_quiesce_dev(). It always makes sure that ALL ubq_daemons are dying, then it
>> starts quiesce jobs.
>
> OK, looks I miss this point, but you should have quiesced queue at the
> beginning of ublk_quiesce_dev(), then the transition period can be kept
> as short as possible. Otherwise, if one queue pthread isn't dying, the
> device can be kept in this part-working state forever.
>
Ming, this is what you said in PATCH V2:
"
The simplest handling might be to exit all ublk queues first, and re-create one
new process to recover all since the request queue is required to be
quiesced first, and all ublk queue is actually quiesced too. So from user
viewpoint, there is nothing visible comparing with just recovering
single ubq daemon/pthread.
"
So I assume that quiesce_work starts only after all ubq_damons are dying.
Note that current ublk does not support mutpile process opening the same chardev.
Really we should agree on this. My suggestion is that we only consider "all ubq_daemons
are dying".
You mention that someone want to enable "switch ubq_daemon pthread to another one" and
I think it is another feature but not recovery feature.
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism
2022-09-20 4:45 ` Ziyang Zhang
@ 2022-09-20 5:05 ` Ziyang Zhang
0 siblings, 0 replies; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 5:05 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/20 12:45, Ziyang Zhang wrote:
> On 2022/9/20 11:04, Ming Lei wrote:
>> On Tue, Sep 20, 2022 at 09:49:33AM +0800, Ziyang Zhang wrote:
>>> On 2022/9/19 20:33, Ming Lei wrote:
>>>>>>> +
>>>>>>> +static void ublk_quiesce_queue(struct ublk_device *ub,
>>>>>>> + struct ublk_queue *ubq)
>>>>>>> +{
>>>>>>> + int i;
>>>>>>> +
>>>>>>> + for (i = 0; i < ubq->q_depth; i++) {
>>>>>>> + struct ublk_io *io = &ubq->ios[i];
>>>>>>> +
>>>>>>> + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
>>>>>>> + struct request *rq = blk_mq_tag_to_rq(
>>>>>>> + ub->tag_set.tags[ubq->q_id], i);
>>>>>>> +
>>>>>>> + WARN_ON_ONCE(!rq);
>>>>>>> + pr_devel("%s: %s rq: qid %d tag %d io_flags %x\n", __func__,
>>>>>>> + ublk_queue_can_use_recovery_reissue(ubq) ?
>>>>>>> + "requeue" : "abort",
>>>>>>> + ubq->q_id, i, io->flags);
>>>>>>> + if (ublk_queue_can_use_recovery_reissue(ubq))
>>>>>>> + blk_mq_requeue_request(rq, false);
>>>>>>
>>>>>> This way is too violent.
>>>>>>
>>>>>> There may be just one queue dying, but you requeue all requests
>>>>>> from any queue. I'd suggest to take the approach in ublk_daemon_monitor_work(),
>>>>>> such as, just requeuing requests in dying queue.
>>>>>
>>>>> If we want to start a new process after a crash for USER_RECOVERY, all old ubq_daemons
>>>>> must exit and rqs of all queues have to be requeued/aborted. We cannot let live
>>>>> ubq_daemons run any more because they do not belong to the new process.
>>>>
>>>> IMO, the old process really can exist, and recently even I got such
>>>> requirement for switching queue from one thread to another.
>>>
>>> For now, only one process can open /dev/ublkcX, so a new process is necessary now.
>>>
>>> If you think "per ubq_daemon" recovery is reasonable, I can do that in the future
>>> if multiple processes is supported. But I really suggest that we can keep current
>>> design as the first step which assumes all ubq_daemons are exited and a new process
>>> is started, and that really meets our requirement.
>>>
>>> BTW, START_USER_RECOVERY has to be reconsidered because we may need to pass a ubq_id
>>> with it.
>>>
>>>>
>>>> What we should do is to get all inflight requests done, and cancel all io
>>>> commands, no matter if the ubq pthread is dead or live.
>>>>
>>>>>
>>>>> BTW, I really wonder why there could be just one queue dying? All queues must be dying
>>>>> shortly after any ubq_daemon is dying since they are all pthreads in the same process.
>>>>
>>>> You can't assume it is always so. Maybe one pthread is dead first, and
>>>> others are dying later, maybe just one is dead.
>>>
>>> Yes, I know there may be only one pthread is dead while others keep running, but now
>>> ublk_drv only support one process opening the same /dev/ublkcX, so other pthreads
>>> must dead(no matter they are aborted by signal or themselves) later.
>>>
>>>>
>>>> If one queue's pthread is live, you may get trouble by simply requeuing
>>>> the request, that is why I suggest to re-use the logic of
>>>> ublk_daemon_monitor_work/ublk_abort_queue().
>>>
>>> Actually, if any ubq_daemon is live, no rqs are requeued, please see the check in
>>> ublk_quiesce_dev(). It always makes sure that ALL ubq_daemons are dying, then it
>>> starts quiesce jobs.
>>
>> OK, looks I miss this point, but you should have quiesced queue at the
>> beginning of ublk_quiesce_dev(), then the transition period can be kept
>> as short as possible. Otherwise, if one queue pthread isn't dying, the
>> device can be kept in this part-working state forever.
>>
>
> Ming, this is what you said in PATCH V2:
> "
> The simplest handling might be to exit all ublk queues first, and re-create one
> new process to recover all since the request queue is required to be
> quiesced first, and all ublk queue is actually quiesced too. So from user
> viewpoint, there is nothing visible comparing with just recovering
> single ubq daemon/pthread.
> "
>
> So I assume that quiesce_work starts only after all ubq_damons are dying.
> Note that current ublk does not support mutpile process opening the same chardev.
>
> Really we should agree on this. My suggestion is that we only consider "all ubq_daemons
> are dying".
>
> You mention that someone want to enable "switch ubq_daemon pthread to another one" and
> I think it is another feature but not recovery feature.
>
> Regards,
> Zhang.
This should be considered very carefully, Ming.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
` (4 preceding siblings ...)
2022-09-13 4:17 ` [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-19 13:03 ` Ming Lei
2022-09-13 4:17 ` [PATCH V3 7/7] ublk_drv: do not run monitor_work while ub's state is QUIESCED ZiyangZhang
2022-09-19 2:17 ` [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support Ziyang Zhang
7 siblings, 1 reply; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
START_USER_RECOVERY and END_USER_RECOVERY are two new control commands
to support user recovery feature.
After a crash, user should send START_USER_RECOVERY, it will:
(1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was
set by quiesce_work and (b)the file struct is released.
(2) reinit all ubqs, including:
(a) put the task_struct and reset ->ubq_daemon to NULL.
(b) reset all ublk_io.
(3) reset ub->mm to NULL.
Then, user should start a new process and send FETCH_REQ on each
ubq_daemon.
Finally, user should send END_USER_RECOVERY, it will:
(1) wait for all new ubq_daemons getting ready.
(2) update ublksrv_pid
(3) unquiesce the request queue and expect incoming ublk_queue_rq()
(4) convert ub's state to UBLK_S_DEV_LIVE
Note: we can handle STOP_DEV between START_USER_RECOVERY and
END_USER_RECOVERY. This is helpful to users who cannot start new process
after sending START_USER_RECOVERY ctrl-cmd.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
---
drivers/block/ublk_drv.c | 116 +++++++++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4409a130d0b6..3a3af80ee938 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1961,6 +1961,116 @@ static int ublk_ctrl_set_params(struct io_uring_cmd *cmd)
return ret;
}
+static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
+{
+ int i;
+
+ WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq)));
+ /* All old ioucmds have to be completed */
+ WARN_ON_ONCE(ubq->nr_io_ready);
+ pr_devel("%s: prepare for recovering qid %d\n", __func__, ubq->q_id);
+ /* old daemon is PF_EXITING, put it now */
+ put_task_struct(ubq->ubq_daemon);
+ /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
+ ubq->ubq_daemon = NULL;
+
+ for (i = 0; i < ubq->q_depth; i++) {
+ struct ublk_io *io = &ubq->ios[i];
+
+ /* forget everything now and be ready for new FETCH_REQ */
+ io->flags = 0;
+ io->cmd = NULL;
+ io->addr = 0;
+ }
+}
+
+static int ublk_ctrl_start_recovery(struct io_uring_cmd *cmd)
+{
+ struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
+ struct ublk_device *ub;
+ int ret = -EINVAL;
+ int i;
+
+ ub = ublk_get_device_from_id(header->dev_id);
+ if (!ub)
+ return ret;
+
+ mutex_lock(&ub->mutex);
+ if (!ublk_can_use_recovery(ub))
+ goto out_unlock;
+ /*
+ * START_RECOVERY is only allowd after:
+ *
+ * (1) UB_STATE_OPEN is not set, which means the dying process is exited
+ * and related io_uring ctx is freed so file struct of /dev/ublkcX is
+ * released.
+ *
+ * (2) UBLK_S_DEV_QUIESCED is set, which means the quiesce_work:
+ * (a)has quiesced request queue
+ * (b)has requeued every inflight rqs whose io_flags is ACTIVE
+ * (c)has requeued/aborted every inflight rqs whose io_flags is NOT ACTIVE
+ * (d)has completed/camceled all ioucmds owned by ther dying process
+ */
+ if (test_bit(UB_STATE_OPEN, &ub->state) ||
+ ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ pr_devel("%s: start recovery for dev id %d.\n", __func__, header->dev_id);
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_queue_reinit(ub, ublk_get_queue(ub, i));
+ /* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
+ ub->mm = NULL;
+ ub->nr_queues_ready = 0;
+ init_completion(&ub->completion);
+ ret = 0;
+ out_unlock:
+ mutex_unlock(&ub->mutex);
+ ublk_put_device(ub);
+ return ret;
+}
+
+static int ublk_ctrl_end_recovery(struct io_uring_cmd *cmd)
+{
+ struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
+ int ublksrv_pid = (int)header->data[0];
+ struct ublk_device *ub;
+ int ret = -EINVAL;
+
+ ub = ublk_get_device_from_id(header->dev_id);
+ if (!ub)
+ return ret;
+
+ pr_devel("%s: Waiting for new ubq_daemons(nr: %d) are ready, dev id %d...\n",
+ __func__, ub->dev_info.nr_hw_queues, header->dev_id);
+ /* wait until new ubq_daemon sending all FETCH_REQ */
+ wait_for_completion_interruptible(&ub->completion);
+ pr_devel("%s: All new ubq_daemons(nr: %d) are ready, dev id %d\n",
+ __func__, ub->dev_info.nr_hw_queues, header->dev_id);
+
+ mutex_lock(&ub->mutex);
+ if (!ublk_can_use_recovery(ub))
+ goto out_unlock;
+
+ if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ ub->dev_info.ublksrv_pid = ublksrv_pid;
+ pr_devel("%s: new ublksrv_pid %d, dev id %d\n",
+ __func__, ublksrv_pid, header->dev_id);
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ pr_devel("%s: queue unquiesced, dev id %d.\n",
+ __func__, header->dev_id);
+ blk_mq_kick_requeue_list(ub->ub_disk->queue);
+ ub->dev_info.state = UBLK_S_DEV_LIVE;
+ ret = 0;
+ out_unlock:
+ mutex_unlock(&ub->mutex);
+ ublk_put_device(ub);
+ return ret;
+}
+
static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
unsigned int issue_flags)
{
@@ -2002,6 +2112,12 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
case UBLK_CMD_SET_PARAMS:
ret = ublk_ctrl_set_params(cmd);
break;
+ case UBLK_CMD_START_USER_RECOVERY:
+ ret = ublk_ctrl_start_recovery(cmd);
+ break;
+ case UBLK_CMD_END_USER_RECOVERY:
+ ret = ublk_ctrl_end_recovery(cmd);
+ break;
default:
break;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support
2022-09-13 4:17 ` [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
@ 2022-09-19 13:03 ` Ming Lei
2022-09-20 2:41 ` Ziyang Zhang
0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2022-09-19 13:03 UTC (permalink / raw)
To: ZiyangZhang; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On Tue, Sep 13, 2022 at 12:17:06PM +0800, ZiyangZhang wrote:
> START_USER_RECOVERY and END_USER_RECOVERY are two new control commands
> to support user recovery feature.
>
> After a crash, user should send START_USER_RECOVERY, it will:
> (1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was
> set by quiesce_work and (b)the file struct is released.
> (2) reinit all ubqs, including:
> (a) put the task_struct and reset ->ubq_daemon to NULL.
> (b) reset all ublk_io.
> (3) reset ub->mm to NULL.
>
> Then, user should start a new process and send FETCH_REQ on each
> ubq_daemon.
>
> Finally, user should send END_USER_RECOVERY, it will:
> (1) wait for all new ubq_daemons getting ready.
> (2) update ublksrv_pid
> (3) unquiesce the request queue and expect incoming ublk_queue_rq()
> (4) convert ub's state to UBLK_S_DEV_LIVE
>
> Note: we can handle STOP_DEV between START_USER_RECOVERY and
> END_USER_RECOVERY. This is helpful to users who cannot start new process
> after sending START_USER_RECOVERY ctrl-cmd.
>
> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> ---
> drivers/block/ublk_drv.c | 116 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 116 insertions(+)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 4409a130d0b6..3a3af80ee938 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1961,6 +1961,116 @@ static int ublk_ctrl_set_params(struct io_uring_cmd *cmd)
> return ret;
> }
>
> +static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
> +{
> + int i;
> +
> + WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq)));
> + /* All old ioucmds have to be completed */
> + WARN_ON_ONCE(ubq->nr_io_ready);
> + pr_devel("%s: prepare for recovering qid %d\n", __func__, ubq->q_id);
> + /* old daemon is PF_EXITING, put it now */
> + put_task_struct(ubq->ubq_daemon);
> + /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
> + ubq->ubq_daemon = NULL;
> +
> + for (i = 0; i < ubq->q_depth; i++) {
> + struct ublk_io *io = &ubq->ios[i];
> +
> + /* forget everything now and be ready for new FETCH_REQ */
> + io->flags = 0;
> + io->cmd = NULL;
> + io->addr = 0;
> + }
> +}
> +
> +static int ublk_ctrl_start_recovery(struct io_uring_cmd *cmd)
> +{
> + struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
> + struct ublk_device *ub;
> + int ret = -EINVAL;
> + int i;
> +
> + ub = ublk_get_device_from_id(header->dev_id);
> + if (!ub)
> + return ret;
> +
> + mutex_lock(&ub->mutex);
> + if (!ublk_can_use_recovery(ub))
> + goto out_unlock;
> + /*
> + * START_RECOVERY is only allowd after:
> + *
> + * (1) UB_STATE_OPEN is not set, which means the dying process is exited
> + * and related io_uring ctx is freed so file struct of /dev/ublkcX is
> + * released.
> + *
> + * (2) UBLK_S_DEV_QUIESCED is set, which means the quiesce_work:
> + * (a)has quiesced request queue
> + * (b)has requeued every inflight rqs whose io_flags is ACTIVE
> + * (c)has requeued/aborted every inflight rqs whose io_flags is NOT ACTIVE
> + * (d)has completed/camceled all ioucmds owned by ther dying process
> + */
> + if (test_bit(UB_STATE_OPEN, &ub->state) ||
> + ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
> + ret = -EBUSY;
> + goto out_unlock;
> + }
> + pr_devel("%s: start recovery for dev id %d.\n", __func__, header->dev_id);
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> + ublk_queue_reinit(ub, ublk_get_queue(ub, i));
> + /* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
> + ub->mm = NULL;
> + ub->nr_queues_ready = 0;
I am wondering why you don't move the above(queue reinit, clearing ub->mm)
into ublk_ch_release(), and looks it is more clean to clear this stuff
there. Meantime I guess one control command might be enough for user
recovery.
Thanks,
Ming
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support
2022-09-19 13:03 ` Ming Lei
@ 2022-09-20 2:41 ` Ziyang Zhang
0 siblings, 0 replies; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-20 2:41 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
On 2022/9/19 21:03, Ming Lei wrote:
> On Tue, Sep 13, 2022 at 12:17:06PM +0800, ZiyangZhang wrote:
>> START_USER_RECOVERY and END_USER_RECOVERY are two new control commands
>> to support user recovery feature.
>>
>> After a crash, user should send START_USER_RECOVERY, it will:
>> (1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was
>> set by quiesce_work and (b)the file struct is released.
>> (2) reinit all ubqs, including:
>> (a) put the task_struct and reset ->ubq_daemon to NULL.
>> (b) reset all ublk_io.
>> (3) reset ub->mm to NULL.
>>
>> Then, user should start a new process and send FETCH_REQ on each
>> ubq_daemon.
>>
>> Finally, user should send END_USER_RECOVERY, it will:
>> (1) wait for all new ubq_daemons getting ready.
>> (2) update ublksrv_pid
>> (3) unquiesce the request queue and expect incoming ublk_queue_rq()
>> (4) convert ub's state to UBLK_S_DEV_LIVE
>>
>> Note: we can handle STOP_DEV between START_USER_RECOVERY and
>> END_USER_RECOVERY. This is helpful to users who cannot start new process
>> after sending START_USER_RECOVERY ctrl-cmd.
>>
>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
>> ---
>> drivers/block/ublk_drv.c | 116 +++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 116 insertions(+)
>>
>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>> index 4409a130d0b6..3a3af80ee938 100644
>> --- a/drivers/block/ublk_drv.c
>> +++ b/drivers/block/ublk_drv.c
>> @@ -1961,6 +1961,116 @@ static int ublk_ctrl_set_params(struct io_uring_cmd *cmd)
>> return ret;
>> }
>>
>> +static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
>> +{
>> + int i;
>> +
>> + WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq)));
>> + /* All old ioucmds have to be completed */
>> + WARN_ON_ONCE(ubq->nr_io_ready);
>> + pr_devel("%s: prepare for recovering qid %d\n", __func__, ubq->q_id);
>> + /* old daemon is PF_EXITING, put it now */
>> + put_task_struct(ubq->ubq_daemon);
>> + /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
>> + ubq->ubq_daemon = NULL;
>> +
>> + for (i = 0; i < ubq->q_depth; i++) {
>> + struct ublk_io *io = &ubq->ios[i];
>> +
>> + /* forget everything now and be ready for new FETCH_REQ */
>> + io->flags = 0;
>> + io->cmd = NULL;
>> + io->addr = 0;
>> + }
>> +}
>> +
>> +static int ublk_ctrl_start_recovery(struct io_uring_cmd *cmd)
>> +{
>> + struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
>> + struct ublk_device *ub;
>> + int ret = -EINVAL;
>> + int i;
>> +
>> + ub = ublk_get_device_from_id(header->dev_id);
>> + if (!ub)
>> + return ret;
>> +
>> + mutex_lock(&ub->mutex);
>> + if (!ublk_can_use_recovery(ub))
>> + goto out_unlock;
>> + /*
>> + * START_RECOVERY is only allowd after:
>> + *
>> + * (1) UB_STATE_OPEN is not set, which means the dying process is exited
>> + * and related io_uring ctx is freed so file struct of /dev/ublkcX is
>> + * released.
>> + *
>> + * (2) UBLK_S_DEV_QUIESCED is set, which means the quiesce_work:
>> + * (a)has quiesced request queue
>> + * (b)has requeued every inflight rqs whose io_flags is ACTIVE
>> + * (c)has requeued/aborted every inflight rqs whose io_flags is NOT ACTIVE
>> + * (d)has completed/camceled all ioucmds owned by ther dying process
>> + */
>> + if (test_bit(UB_STATE_OPEN, &ub->state) ||
>> + ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
>> + ret = -EBUSY;
>> + goto out_unlock;
>> + }
>> + pr_devel("%s: start recovery for dev id %d.\n", __func__, header->dev_id);
>> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
>> + ublk_queue_reinit(ub, ublk_get_queue(ub, i));
>> + /* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
>> + ub->mm = NULL;
>> + ub->nr_queues_ready = 0;
>
> I am wondering why you don't move the above(queue reinit, clearing ub->mm)
> into ublk_ch_release(), and looks it is more clean to clear this stuff
> there. Meantime I guess one control command might be enough for user
> recovery.
OK, START_USER_RECOVERY just does cleanup stuff for a new process and
ublk_ch_release() does the similar thing since it is always called when chardev
is released. And our new process must open the chardev after it is released.
So (queue reinit, clearing ub->mm) can be done in ublk_ch_release().
Regards,
Zhang.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH V3 7/7] ublk_drv: do not run monitor_work while ub's state is QUIESCED
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
` (5 preceding siblings ...)
2022-09-13 4:17 ` [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
@ 2022-09-13 4:17 ` ZiyangZhang
2022-09-19 2:17 ` [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support Ziyang Zhang
7 siblings, 0 replies; 33+ messages in thread
From: ZiyangZhang @ 2022-09-13 4:17 UTC (permalink / raw)
To: ming.lei
Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang
START_USER_RECOVERY release task_struct of ubq_daemon and resets
->ubq_daemon to NULL. So in monitor_work, check on ubq_daemon causes UAF.
Besides, monitor_work is not necessary in QUIESCED state since we have
already scheduled quiesce_work and quiesce all ubqs.
Do not let monitor_work schedule itself if state it QUIESCED. And we cancel
it in START_USER_RECOVERY and re-schedule it in END_USER_RECOVERY to avoid
UAF.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
---
drivers/block/ublk_drv.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 3a3af80ee938..044f9b4a0d08 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1143,10 +1143,13 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
/*
* We can't schedule monitor work after ublk_remove() is started.
*
- * No need ub->mutex, monitor work are canceled after state is marked
- * as DEAD, so DEAD state is observed reliably.
+ * We can't schedule monitor work after ub is QUIESCED because
+ * ubq_daemon may be NULL during user recovery.
+ *
+ * No need ub->mutex, monitor work are canceled after state is not
+ * UBLK_S_DEV_LIVE, so new state is observed reliably.
*/
- if (ub->dev_info.state != UBLK_S_DEV_DEAD)
+ if (ub->dev_info.state == UBLK_S_DEV_LIVE)
schedule_delayed_work(&ub->monitor_work,
UBLK_DAEMON_MONITOR_PERIOD);
}
@@ -2016,6 +2019,7 @@ static int ublk_ctrl_start_recovery(struct io_uring_cmd *cmd)
ret = -EBUSY;
goto out_unlock;
}
+ cancel_delayed_work_sync(&ub->monitor_work);
pr_devel("%s: start recovery for dev id %d.\n", __func__, header->dev_id);
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
ublk_queue_reinit(ub, ublk_get_queue(ub, i));
@@ -2064,6 +2068,7 @@ static int ublk_ctrl_end_recovery(struct io_uring_cmd *cmd)
__func__, header->dev_id);
blk_mq_kick_requeue_list(ub->ub_disk->queue);
ub->dev_info.state = UBLK_S_DEV_LIVE;
+ schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD);
ret = 0;
out_unlock:
mutex_unlock(&ub->mutex);
--
2.27.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support
2022-09-13 4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
` (6 preceding siblings ...)
2022-09-13 4:17 ` [PATCH V3 7/7] ublk_drv: do not run monitor_work while ub's state is QUIESCED ZiyangZhang
@ 2022-09-19 2:17 ` Ziyang Zhang
7 siblings, 0 replies; 33+ messages in thread
From: Ziyang Zhang @ 2022-09-19 2:17 UTC (permalink / raw)
To: ming.lei; +Cc: axboe, xiaoguang.wang, linux-block, linux-kernel, joseph.qi
Ping...
^ permalink raw reply [flat|nested] 33+ messages in thread