From: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
To: ming.lei@redhat.com, axboe@kernel.dk
Cc: xiaoguang.wang@linux.alibaba.com, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, joseph.qi@linux.alibaba.com,
ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Subject: [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support
Date: Wed, 24 Aug 2022 13:47:35 +0800 [thread overview]
Message-ID: <20220824054744.77812-1-ZiyangZhang@linux.alibaba.com> (raw)
ublk_drv is a driver simply passes all blk-mq rqs to ublksrv[1] in
userspace. For each ublk queue, there is one ubq_daemon(pthread).
All ubq_daemons share the same process which opens /dev/ublkcX.
The ubq_daemon code infinitely loops on io_uring_enter() to
send/receive io_uring cmds which pass information of blk-mq
rqs.
Now, if one ubq_daemon(pthread) or the process crashes, ublk_drv
must abort the dying ubq, stop the device and release everything.
This is not a good choice in practice because users do not expect
aborted requests, I/O errors and a released device. They may want
a recovery machenism so that no requests are aborted and no I/O
error occurs. Anyway, users just want everything works as uaual.
This RFC patchset implements USER_RECOVERY support. If the process
crashes, we allow ublksrv to provide new process and ubq_daemons.
We do not support single ubq_daemon(pthread) recovery because a
pthread rarely crashes.
Recovery feature is quite useful for products do not expect to
return any I/O error to frontend users. In detail, we support
this scenario:
(1) The /dev/ublkc0 is opened by process 0;
(2) Fio is running on /dev/ublkb0 exposed by ublk_drv and all
rqs are handled by process 0.
(3) Process 0 suddenly crashes(e.g. segfault);
(4) Fio is still running and submit IOs(but these IOs cannot
complete now)
(5) User recovers with process 1 and attach it to /dev/ublkc0
(6) All rqs are handled by process 1 now and IOs can be
completed now.
Note: The backend must tolerate double-write because we re-issue
a rq sent to the old(dying) process before. We allow users to
choose whether re-issue these rqs or not, please see patch 7 for
more detail.
We provide a sample script here to simulate the above steps:
***************************script***************************
LOOPS=10
__ublk_get_pid() {
pid=`./ublk list -n 0 | grep "pid" | awk '{print $7}'`
echo $pid
}
ublk_recover_kill()
{
for CNT in `seq $LOOPS`; do
dmesg -C
pid=`__ublk_get_pid`
echo -e "*** kill $pid now ***"
kill -9 $pid
sleep 2
echo -e "*** recover now ***"
./ublk recover -n 0
sleep 4
done
}
ublk_test()
{
dmesg -C
echo -e "*** add ublk device ***"
./ublk add -t null -d 4 -i 1
sleep 2
echo -e "*** start fio ***"
fio --bs=4k \
--filename=/dev/ublkb0 \
--runtime=100s \
--rw=read &
sleep 4
ublk_recover_kill
wait
echo -e "*** delete ublk device ***"
./ublk del -n 0
}
for CNT in `seq 4`; do
modprobe -rv ublk_drv
modprobe ublk_drv
echo -e "************ round $CNT ************"
ublk_test
sleep 5
done
***************************script***************************
You may run it with our modified ublksrv[3] which supports
recovey feature. No I/O error occurs and you can verify it
by typing
$ perf-tools/bin/tpoint block:block_rq_error
The basic idea of USER_RECOVERY is quite straightfoward:
(1) release/free everything belongs to the dying process.
Note: Since ublk_drv does save information about user process,
this work is important because we don't expect any resource
lekage. Particularly, ioucmds from the dying ubq_daemons
need to be completed(freed). Current ublk_drv code cannot satisfy
our need while considering USER_RECOVERY. So we refactor some code
shown in patch 1-5 to gracefully free all ioucmds.
(2) init ublk queues including requeuing/aborting rqs.
(3) allow new ubq_daemons issue FETCH_REQ.
Here is steps to reocver:
(1) For a user, after a process crash(how he detect a crash is not related
to this patchset), he sends START_USER_RECOVERY ctrl-cmd to
/dev/ublk-control with a dev_id X (such as 3 for /dev/ublkc3).
(2) Then ublk_drv should perpare for a new process to attach /dev/ublkcX.
We have described this before. The driver must quiesce the request
queue to ban any incoming ublk_queue_rq().
(3) Then, user should start a new process and ubq_daemons(pthreads) and
send FETCH_REQ by io_uring_enter() to make all ubqs be ready. The
user must correctly setup queues, flags and so on(how to persist
ublksrv information is not related to this patchset).
(4) The user sends END_USER_RECOVERY ctrl-cmd to /dev/ublk-control with a
dev_id X.
(5) ublk_drv waits for all ubq_daemons getting ready. Then it unquiesces
request queue and new rqs are allowed.
After applying refactor patches(patch 1-5), with current ublksrv[1], all
tests[2] passes. Note that refactor patches DO NOT involve any recovery
feature.
After applying all patches(patch 1-9), you should use ublksrv[3] and
tests[4] provided by us. We add 2 additional tests to verify that
recovery feature works. our code will be PR-ed to Ming's repo soon.
[1] https://github.com/ming1/ubdsrv
[2] https://github.com/ming1/ubdsrv/tree/master/tests
[3] https://github.com/old-memories/ubdsrv/tree/recovery-v1
[4] https://github.com/old-memories/ubdsrv/tree/recovery-v1/tests/generic
ZiyangZhang (9):
ublk_drv: check 'current' instead of 'ubq_daemon'
ublk_drv: refactor ublk_cancel_queue()
ublk_drv: add a helper to get ioucmd from pdu
ublk_drv: refactor __ublk_rq_task_work() and aborting machenism
ublk_drv: refactor ublk_stop_dev()
ublk_drv: add pr_devel() to prepare for recovery feature
ublk_drv: define macros for recovery feature and check them
ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support
ublk_drv: do not schedule monitor_work with recovery feature enabled
drivers/block/ublk_drv.c | 517 ++++++++++++++++++++++++++--------
include/uapi/linux/ublk_cmd.h | 7 +
2 files changed, 408 insertions(+), 116 deletions(-)
--
2.27.0
next reply other threads:[~2022-08-24 5:48 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-24 5:47 ZiyangZhang [this message]
2022-08-24 5:47 ` [RFC PATCH 1/9] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
2022-08-29 2:13 ` Ming Lei
2022-08-24 5:47 ` [RFC PATCH 2/9] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
2022-08-29 3:01 ` Ming Lei
2022-08-29 4:50 ` Ziyang Zhang
2022-08-24 5:47 ` [RFC PATCH 3/9] ublk_drv: add a helper to get ioucmd from pdu ZiyangZhang
2022-08-29 3:06 ` Ming Lei
2022-08-29 4:59 ` Ziyang Zhang
2022-08-24 5:47 ` [RFC PATCH 4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism ZiyangZhang
2022-08-29 5:40 ` Ming Lei
2022-08-29 6:13 ` Ziyang Zhang
2022-08-29 8:11 ` Ming Lei
2022-08-29 9:09 ` Ziyang Zhang
2022-08-29 10:02 ` Ming Lei
2022-08-24 5:47 ` [RFC PATCH 5/9] ublk_drv: refactor ublk_stop_dev() ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 6/9] ublk_drv: add pr_devel() to prepare for recovery feature ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 7/9] ublk_drv: define macros for recovery feature and check them ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 8/9] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 9/9] ublk_drv: do not schedule monitor_work with recovery feature enabled ZiyangZhang
2022-08-29 2:08 ` [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support Ming Lei
2022-08-29 4:00 ` Ziyang Zhang
2022-08-29 5:56 ` Ming Lei
2022-08-29 7:29 ` Ziyang Zhang
2022-08-29 8:38 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220824054744.77812-1-ZiyangZhang@linux.alibaba.com \
--to=ziyangzhang@linux.alibaba.com \
--cc=axboe@kernel.dk \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).