All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH V2 0/6] ublk_drv: add USER_RECOVERY support
@ 2022-08-31 15:51 ZiyangZhang
  2022-08-31 15:51 ` [RFC PATCH V2 1/6] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: ZiyangZhang @ 2022-08-31 15:51 UTC (permalink / raw)
  To: ming.lei, axboe
  Cc: xiaoguang.wang, linux-block, linux-kernel, joseph.qi, ZiyangZhang

ublk_drv is a driver simply passes all blk-mq rqs to ublksrv[1] in
userspace. For each ublk queue, there is one ubq_daemon(pthread).
All ubq_daemons share the same process which opens /dev/ublkcX.
The ubq_daemon code infinitely loops on io_uring_enter() to
send/receive io_uring cmds which pass information of blk-mq
rqs.

Since the real IO handler(the process opening /dev/ublkcX) is in
userspace, it could crash if:
(1) the user kills -9 it because of IO hang on backend, system
    reboot, etc...
(2) the process catches a exception(segfault, divisor error, oom...)
Therefore, the kernel driver has to deal with a dying process.

Now, if one ubq_daemon(pthread) or the process crashes, ublk_drv
must abort the dying ubq, stop the device and release everything.
This is not a good choice in practice because users do not expect
aborted requests, I/O errors and a released device. They may want
a recovery machenism so that no requests are aborted and no I/O
error occurs. Anyway, users just want everything works as uaual.

This RFC patchset implements USER_RECOVERY support. If the process
crashes, we allow ublksrv to provide new process and ubq_daemons.
We do not support single ubq_daemon(pthread) recovery because a
pthread rarely crashes.

Note: The responsibility of recovery belongs to the user who opens
/dev/ublkcX. After a process crash, the kernel driver only switch
the device's status to be ready for recovery or termination(STOP_DEV).
This patchset does not provide how to detect such a process crash in
userspace. A very straightfoward idea may be adding a watchdog.

Recovery feature is quite useful for real products. In detail,
we support this scenario:
(1) The /dev/ublkc0 is opened by process 0;
(2) Fio is running on /dev/ublkb0 exposed by ublk_drv and all
    rqs are handled by process 0.
(3) Process 0 suddenly crashes(e.g. segfault);
(4) Fio is still running and submit IOs(but these IOs cannot
    complete now)
(5) User recovers with process 1 and attach it to /dev/ublkc0
(6) All rqs are handled by process 1 now and IOs can be
    completed now.

Note: The backend must tolerate double-write because we re-issue
a rq sent to the old(dying) process before. We allow users to
choose whether re-issue these rqs or not, please see patch 7 for
more detail.

We provide a sample script here to simulate the above steps:

***************************script***************************
LOOPS=10

__ublk_get_pid() {
	pid=`./ublk list -n 0 | grep "pid" | awk '{print $7}'`
	echo $pid
}

ublk_recover_kill()
{
	for CNT in `seq $LOOPS`; do
		dmesg -C
                pid=`__ublk_get_pid`
                echo -e "*** kill $pid now ***"
		kill -9 $pid
		sleep 6
                echo -e "*** recover now ***"
                ./ublk recover -n 0
		sleep 6
	done
}

ublk_test()
{
        dmesg -C
        echo -e "*** add ublk device ***"
        ./ublk add -t null -d 4 -i 1
        sleep 2
        echo -e "*** start fio ***"
        fio --bs=4k \
            --filename=/dev/ublkb0 \
            --runtime=140s \
            --rw=read &
        sleep 4
        ublk_recover_kill
        wait
        echo -e "*** delete ublk device ***"
        ./ublk del -n 0
}

for CNT in `seq 4`; do
        modprobe -rv ublk_drv
        modprobe ublk_drv
        echo -e "************ round $CNT ************"
        ublk_test
        sleep 5
done
***************************script***************************

You may run it with our modified ublksrv[2] which supports
recovey feature. No I/O error occurs and you can verify it
by typing
    $ perf-tools/bin/tpoint block:block_rq_error

The basic idea of USER_RECOVERY is quite straightfoward:

(1) release/free everything belongs to the dying process.

    Note: Since ublk_drv does save information about user process,
    this work is important because we don't expect any resource
    lekage. Particularly, ioucmds from the dying ubq_daemons
    need to be completed(freed).

(2) init ublk queues including requeuing/aborting rqs.

(3) allow new ubq_daemons issue FETCH_REQ.

Here is steps to reocver:

(1) The monitor_work detects a crash, and it should requeue/abort inflight
    rqs, complete old ioucmds and quiesce request queue to ban any incoming
    ublk_queue_rq(). Then the ublk device is ready for a recovery/stop
    procedure.

(2) For a user, after a process crash, he sends START_USER_RECOVERY
    ctrl-cmd to /dev/ublk-control with a dev_id X (such as 3 for
    /dev/ublkc3).

(2) Then ublk_drv should perpare for a new process to attach /dev/ublkcX.
    All ublk_io structures are cleared and ubq_daemons are reset.

(3) Then, user should start a new process and ubq_daemons(pthreads) and
    send FETCH_REQ by io_uring_enter() to make all ubqs be ready. The
    user must correctly setup queues, flags and so on(how to persist
    user's information is not related to this patchset).

(4) The user sends END_USER_RECOVERY ctrl-cmd to /dev/ublk-control with a
    dev_id X.

(5) ublk_drv waits for all ubq_daemons getting ready. Then it unquiesces
    request queue and new rqs are allowed.

You should use ublksrv[2] and tests[3] provided by us. We add 2 additional
tests to verify that recovery feature works. Our code will be PR-ed to
Ming's repo soon.

[1] https://github.com/ming1/ubdsrv
[2] https://github.com/old-memories/ubdsrv/tree/recovery-v1
[3] https://github.com/old-memories/ubdsrv/tree/recovery-v1/tests/generic

Since V1:
(1) refactor cover letter. Add intruduction on "how to detect a crash" and
    "why we need recovery feature".
(2) do not refactor task_work and ublk_queue_rq().
(3) allow users freely stop/recover the device.
(4) add comment on ublk_cancel_queue().
(5) refactor monitor_work and aborting machenism since we add recovery
    machenism in monitor_work.

ZiyangZhang (6):
  ublk_drv: check 'current' instead of 'ubq_daemon'
  ublk_drv: refactor ublk_cancel_queue()
  ublk_drv: define macros for recovery feature and check them
  ublk_drv: requeue rqs with recovery feature enabled
  ublk_drv: consider recovery feature in aborting mechanism
  ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support

 drivers/block/ublk_drv.c      | 439 +++++++++++++++++++++++++++++++---
 include/uapi/linux/ublk_cmd.h |   7 +
 2 files changed, 419 insertions(+), 27 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-06  1:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31 15:51 [RFC PATCH V2 0/6] ublk_drv: add USER_RECOVERY support ZiyangZhang
2022-08-31 15:51 ` [RFC PATCH V2 1/6] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
2022-08-31 15:51 ` [RFC PATCH V2 2/6] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
2022-09-03 11:16   ` Ming Lei
2022-08-31 15:51 ` [RFC PATCH V2 3/6] ublk_drv: define macros for recovery feature and check them ZiyangZhang
2022-09-03 11:18   ` Ming Lei
2022-08-31 15:51 ` [RFC PATCH V2 4/6] ublk_drv: requeue rqs with recovery feature enabled ZiyangZhang
2022-08-31 15:51 ` [RFC PATCH V2 5/6] ublk_drv: consider recovery feature in aborting mechanism ZiyangZhang
2022-09-03 13:30   ` Ming Lei
2022-09-04 11:23     ` Ziyang Zhang
2022-09-06  1:12       ` Ming Lei
2022-08-31 15:51 ` [RFC PATCH V2 6/6] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
2022-09-06  1:14 ` [RFC PATCH V2 0/6] ublk_drv: add USER_RECOVERY support Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.