linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Cc: axboe@kernel.dk, xiaoguang.wang@linux.alibaba.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	joseph.qi@linux.alibaba.com, ming.lei@redhat.com
Subject: Re: [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support
Date: Mon, 29 Aug 2022 10:08:11 +0800	[thread overview]
Message-ID: <Ywwfi7Dgi0JC2kQ/@T590> (raw)
In-Reply-To: <20220824054744.77812-1-ZiyangZhang@linux.alibaba.com>

On Wed, Aug 24, 2022 at 01:47:35PM +0800, ZiyangZhang wrote:
> ublk_drv is a driver simply passes all blk-mq rqs to ublksrv[1] in
> userspace. For each ublk queue, there is one ubq_daemon(pthread).
> All ubq_daemons share the same process which opens /dev/ublkcX.
> The ubq_daemon code infinitely loops on io_uring_enter() to
> send/receive io_uring cmds which pass information of blk-mq
> rqs.
> 
> Now, if one ubq_daemon(pthread) or the process crashes, ublk_drv
> must abort the dying ubq, stop the device and release everything.
> This is not a good choice in practice because users do not expect
> aborted requests, I/O errors and a released device. They may want
> a recovery machenism so that no requests are aborted and no I/O
> error occurs. Anyway, users just want everything works as uaual.

I understand the requirement is that both /dev/ublkbN and /dev/ublkcN
won't be deleted & re-added from user viewpoint after user recovery,
so the device context won't be lost.

> 
> This RFC patchset implements USER_RECOVERY support. If the process
> crashes, we allow ublksrv to provide new process and ubq_daemons.
> We do not support single ubq_daemon(pthread) recovery because a
> pthread rarely crashes.
> 
> Recovery feature is quite useful for products do not expect to
> return any I/O error to frontend users.

That looks one very ideal requirement. To be honest, no any block driver
can guarantee that 100%, so it is just one soft requirement?

Cause memory allocation may fail, network may be disconnected,
re-creating pthread or process may fail too, ...

> In detail, we support
> this scenario:
> (1) The /dev/ublkc0 is opened by process 0;
> (2) Fio is running on /dev/ublkb0 exposed by ublk_drv and all
>     rqs are handled by process 0.
> (3) Process 0 suddenly crashes(e.g. segfault);
> (4) Fio is still running and submit IOs(but these IOs cannot
>     complete now)
> (5) User recovers with process 1 and attach it to /dev/ublkc0
> (6) All rqs are handled by process 1 now and IOs can be
>     completed now.
> 
> Note: The backend must tolerate double-write because we re-issue
> a rq sent to the old(dying) process before. We allow users to
> choose whether re-issue these rqs or not, please see patch 7 for
> more detail.
> 
> We provide a sample script here to simulate the above steps:
> 
> ***************************script***************************
> LOOPS=10
> 
> __ublk_get_pid() {
> 	pid=`./ublk list -n 0 | grep "pid" | awk '{print $7}'`
> 	echo $pid
> }
> 
> ublk_recover_kill()
> {
> 	for CNT in `seq $LOOPS`; do
> 		dmesg -C
>                 pid=`__ublk_get_pid`
>                 echo -e "*** kill $pid now ***"
> 		kill -9 $pid
> 		sleep 2
>                 echo -e "*** recover now ***"
>                 ./ublk recover -n 0

The current behavior is that /dev/ublkb* is removed after device is
aborted because ubq daemon is killed.

What if 'ublk recover' command isn't sent? So the current behavior
without recovery is changed? Or just changed with this feature enabled?

BTW, I do not mean the change isn't reasonable, but suggest to document
the user visible change, so it can get reviewed from either user
viewpoint and technical point.

> 		sleep 4
> 	done
> }
> 
> ublk_test()
> {
>         dmesg -C
>         echo -e "*** add ublk device ***"
>         ./ublk add -t null -d 4 -i 1
>         sleep 2
>         echo -e "*** start fio ***"
>         fio --bs=4k \
>             --filename=/dev/ublkb0 \
>             --runtime=100s \
>             --rw=read &
>         sleep 4
>         ublk_recover_kill
>         wait
>         echo -e "*** delete ublk device ***"
>         ./ublk del -n 0
> }
> 
> for CNT in `seq 4`; do
>         modprobe -rv ublk_drv
>         modprobe ublk_drv
>         echo -e "************ round $CNT ************"
>         ublk_test
>         sleep 5
> done
> ***************************script***************************
> 
> You may run it with our modified ublksrv[3] which supports
> recovey feature. No I/O error occurs and you can verify it
> by typing
>     $ perf-tools/bin/tpoint block:block_rq_error
> 
> The basic idea of USER_RECOVERY is quite straightfoward:
> 
> (1) release/free everything belongs to the dying process.
> 
>     Note: Since ublk_drv does save information about user process,
>     this work is important because we don't expect any resource
>     lekage. Particularly, ioucmds from the dying ubq_daemons
>     need to be completed(freed). Current ublk_drv code cannot satisfy
>     our need while considering USER_RECOVERY. So we refactor some code
>     shown in patch 1-5 to gracefully free all ioucmds.
> 
> (2) init ublk queues including requeuing/aborting rqs.
> 
> (3) allow new ubq_daemons issue FETCH_REQ.
> 
> Here is steps to reocver:
> 
> (1) For a user, after a process crash(how he detect a crash is not related
>     to this patchset), he sends START_USER_RECOVERY ctrl-cmd to

I'd suggest to describe crash detector a bit at least, as one whole use case,
crash detector should be the input of the use case of user recovery, which is
usually one part of use case when modeling software requirement/design.

Such as, crash is detected after the ubq daemon pthread/process is crashed?
Will you consider io hang in the daemon pthread/process? IMO, long-term,
the crash detector utility should be part of ublksrv.

We don't implement ublk driver's IO timeout yet, but that implementation may be
related with this recovery feature closely, such as, one simple approach is to
kill ubq-daemon if we can't move on after retrying several times, then
let userspace detect & recovery.


Thanks,
Ming


  parent reply	other threads:[~2022-08-29  2:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-24  5:47 [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support ZiyangZhang
2022-08-24  5:47 ` [RFC PATCH 1/9] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
2022-08-29  2:13   ` Ming Lei
2022-08-24  5:47 ` [RFC PATCH 2/9] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
2022-08-29  3:01   ` Ming Lei
2022-08-29  4:50     ` Ziyang Zhang
2022-08-24  5:47 ` [RFC PATCH 3/9] ublk_drv: add a helper to get ioucmd from pdu ZiyangZhang
2022-08-29  3:06   ` Ming Lei
2022-08-29  4:59     ` Ziyang Zhang
2022-08-24  5:47 ` [RFC PATCH 4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism ZiyangZhang
2022-08-29  5:40   ` Ming Lei
2022-08-29  6:13     ` Ziyang Zhang
2022-08-29  8:11       ` Ming Lei
2022-08-29  9:09         ` Ziyang Zhang
2022-08-29 10:02           ` Ming Lei
2022-08-24  5:47 ` [RFC PATCH 5/9] ublk_drv: refactor ublk_stop_dev() ZiyangZhang
2022-08-24  5:47 ` [RFC PATCH 6/9] ublk_drv: add pr_devel() to prepare for recovery feature ZiyangZhang
2022-08-24  5:47 ` [RFC PATCH 7/9] ublk_drv: define macros for recovery feature and check them ZiyangZhang
2022-08-24  5:47 ` [RFC PATCH 8/9] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
2022-08-24  5:47 ` [RFC PATCH 9/9] ublk_drv: do not schedule monitor_work with recovery feature enabled ZiyangZhang
2022-08-29  2:08 ` Ming Lei [this message]
2022-08-29  4:00   ` [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support Ziyang Zhang
2022-08-29  5:56     ` Ming Lei
2022-08-29  7:29       ` Ziyang Zhang
2022-08-29  8:38         ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ywwfi7Dgi0JC2kQ/@T590 \
    --to=ming.lei@redhat.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).