All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	io-uring@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Subject: Re: [PATCH V5 1/2] ublk_drv: add io_uring based userspace block driver
Date: Fri, 15 Jul 2022 10:04:57 +0800	[thread overview]
Message-ID: <YtDLSdwbV/utn0Qv@T590> (raw)
In-Reply-To: <fe9508ae-f12a-2216-1160-145308d746f5@linux.alibaba.com>

On Thu, Jul 14, 2022 at 09:23:40PM +0800, Ziyang Zhang wrote:
> On 2022/7/14 18:48, Ming Lei wrote:
> > On Thu, Jul 14, 2022 at 06:20:38PM +0800, Ziyang Zhang wrote:
> >> On 2022/7/13 22:07, Ming Lei wrote:
> >>> This is the driver part of userspace block driver(ublk driver), the other
> >>> part is userspace daemon part(ublksrv)[1].
> >>>
> >>> The two parts communicate by io_uring's IORING_OP_URING_CMD with one
> >>> shared cmd buffer for storing io command, and the buffer is read only for
> >>> ublksrv, each io command is indexed by io request tag directly, and
> >>> is written by ublk driver.
> >>>
> >>> For example, when one READ io request is submitted to ublk block driver, ublk
> >>> driver stores the io command into cmd buffer first, then completes one
> >>> IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to
> >>> ublk driver beforehand by ublksrv for getting notification of any new io request,
> >>> and each URING_CMD is associated with one io request by tag.
> >>>
> >>> After ublksrv gets the io command, it translates and handles the ublk io
> >>> request, such as, for the ublk-loop target, ublksrv translates the request
> >>> into same request on another file or disk, like the kernel loop block
> >>> driver. In ublksrv's implementation, the io is still handled by io_uring,
> >>> and share same ring with IORING_OP_URING_CMD command. When the target io
> >>> request is done, the same IORING_OP_URING_CMD is issued to ublk driver for
> >>> both committing io request result and getting future notification of new
> >>> io request.
> >>>
> >>> Another thing done by ublk driver is to copy data between kernel io
> >>> request and ublksrv's io buffer:
> >>>
> >>> 1) before ubsrv handles WRITE request, copy the request's data into
> >>> ublksrv's userspace io buffer, so that ublksrv can handle the write
> >>> request
> >>>
> >>> 2) after ubsrv handles READ request, copy ublksrv's userspace io buffer
> >>> into this READ request, then ublk driver can complete the READ request
> >>>
> >>> Zero copy may be switched if mm is ready to support it.
> >>>
> >>> ublk driver doesn't handle any logic of the specific user space driver,
> >>> so it is small/simple enough.
> >>>
> >>> [1] ublksrv
> >>>
> >>> https://github.com/ming1/ubdsrv
> >>>
> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>> ---
> >>
> >>
> >> Hi, Ming
> >>
> >> I find that a big change from v4 to v5 is the simplification of locks.
> >>
> >> In v5 you remove ubq->abort_lock, and I want to ask why it is OK to remove it?
> > 
> > Actually V4 and previous version dealt with the issue too complicated.
> > 
> >>
> >> If you have time, could you explain how ublk deals with potential race on:
> >> 1)queue_rq 2)ublk_abort_queue 3) ublk_ctrl_stop_dev 4) ublk_rq_task_work.
> >> (Lock in ublk really confuses me...)
> > 
> > One big change is the following code:
> > 
> > __ublk_rq_task_work():
> > 	bool task_exiting = current != ubq->ubq_daemon ||
> >                 (current->flags & PF_EXITING);
> > 	...
> > 	if (unlikely(task_exiting)) {
> >                 blk_mq_end_request(req, BLK_STS_IOERR);
> >                 mod_delayed_work(system_wq, &ub->monitor_work, 0);
> >                 return;
> >     }
> > 
> > Abort is always started after PF_EXITING is set, but if PF_EXITING is
> > set, __ublk_rq_task_work fails the request immediately, then io->flags
> > won't be touched, then no race with abort. Also PF_EXITING is
> > per-task flag, can only be set before calling __ublk_rq_task_work(),
> > and setting it actually serialized with calling task work func.
> > 
> > In ublk_queue_rq(), we don't touch io->flags, so there isn't race
> > with abort.
> > 
> > Wrt. ublk_ctrl_stop_dev(), it isn't related with abort directly, and
> > if del_gendisk() waits for inflight IO, abort work will be started
> > for making forward progress. After del_gendisk() returns, there can't
> > be any inflight io, so it is safe to cancel other pending io command.
> > 
> 
> Thanks, Ming. I understand the aborting code now. And it looks good to me.
> 
> Previously I think maybe monitor_work and task_work
> may be scheduled at the same time while task is exiting
> and blk_mq_end_request() on the same tag could be called twice.
> 
> But I find there is a check on ublk_io's flag(UBLK_IO_FLAG_ACTIVE)
> in ublk_daemon_monitor_work() and ublk_io is aborted in task_work
> immediately(with UBLK_IO_FLAG_ACTIVE set, not cleared yet)
> 
> So there is no chance to call a send blk_mq_end_request() on the same tag.
> 
> Besides, for ublk_ios with UBLK_IO_FLAG_ACTIVE unset, 
> stop_work scheduled in monitor work will call ublk_cancel_queue() by sending
> cqes with UBLK_IO_RES_ABORT.
> 
> Put it together:
> 
> When daemon is PF_EXITING:
> 
> 1) current ublk_io: aborted immediately in task_work

Precisely it is just that ublk io request is ended immediately, so io->flags
won't be touched.

> 
> 2) UBLK_IO_FLAG_ACTIVE set: aborted in ublk_daemon_monitor_work

This part is important for making forward progress, that is why it has
to be done in a wq context.

> 
> 3) UBLK_IO_FLAG_ACTIVE unset: send cqe with UBLK_IO_RES_ABORT

This is the 2nd stage of aborting after disk is deleted.

In short, it is one two-stage aborting, and the idea is actually
straightforward.

> 
> 
> Hope I'm correct this time. :)

Absolutely.

> 
> >>
> >>
> >> [...]
> >>
> >>> +
> >>> +/*
> >>> + * __ublk_fail_req() may be called from abort context or ->ubq_daemon
> >>> + * context during exiting, so lock is required.
> >>> + *
> >>> + * Also aborting may not be started yet, keep in mind that one failed
> >>> + * request may be issued by block layer again.
> >>> + */
> >>> +static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> >>> +{
> >>> +	WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
> >>> +
> >>> +	if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
> >>> +		io->flags |= UBLK_IO_FLAG_ABORTED;
> >>> +		blk_mq_end_request(req, BLK_STS_IOERR);
> >>> +	}
> >>> +}
> >>> +
> >>
> >> [...]
> >>
> >>> +
> >>> +/*
> >>> + * When ->ubq_daemon is exiting, either new request is ended immediately,
> >>> + * or any queued io command is drained, so it is safe to abort queue
> >>> + * lockless
> >>> + */
> >>> +static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
> >>> +{
> >>> +	int i;
> >>> +
> >>> +	if (!ublk_get_device(ub))
> >>> +		return;
> >>> +
> >>> +	for (i = 0; i < ubq->q_depth; i++) {
> >>> +		struct ublk_io *io = &ubq->ios[i];
> >>> +
> >>> +		if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
> >>> +			struct request *rq;
> >>> +
> >>> +			/*
> >>> +			 * Either we fail the request or ublk_rq_task_work_fn
> >>> +			 * will do it
> >>> +			 */
> >>> +			rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
> >>> +			if (rq)
> >>> +				__ublk_fail_req(io, rq);
> >>> +		}
> >>> +	}
> >>> +	ublk_put_device(ub);
> >>> +}
> >>> +
> >>
> >>
> >> Another problem: 
> >>
> >> 1) comment of __ublk_fail_req():  "so lock is required"
> > 
> > Yeah, now __ublk_fail_req is only called in abort context, and no race
> > with task work any more, so lock isn't needed.
> 
> Ok, I see.
> 
> > 
> >>
> >> 2) comment of ublk_abort_queue(): "so it is safe to abort queue lockless"
> > 
> > This comment is updated in v5, and it is correct.
> > 
> >>
> >> 3) ublk_abort_queue() calls _ublk_fail_req() on all ubqs.
> > 
> > No, ublk_abort_queue() only aborts the passed ubq, so if one ubq daemon
> > is aborted, other ubqs can still handle IO during deleting disk.
> 
> Ok, I see.
> 
> I think if one ubq daemon is killed(and blk-mq requests related to it are aborted), 
> stop work should call del_gendisk() and other ubq daemons can still complete blk-mq requests
> but no new blk-mq requests will be issued. 
> After that, these unkilled ubq daemons will get UBLK_IO_RES_ABORT cqes
> and exit by themselves.

Yeah, you are right.


Thanks,
Ming


  reply	other threads:[~2022-07-15  2:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13 14:07 [PATCH V5 0/2] ublk: add io_uring based userspace block driver Ming Lei
2022-07-13 14:07 ` [PATCH V5 1/2] ublk_drv: " Ming Lei
2022-07-14 10:20   ` Ziyang Zhang
2022-07-14 10:48     ` Ming Lei
2022-07-14 13:23       ` Ziyang Zhang
2022-07-15  2:04         ` Ming Lei [this message]
2022-07-15  6:07           ` Ziyang Zhang
2022-07-13 14:07 ` [PATCH V5 2/2] ublk_drv: support to complete io command via task_work_add Ming Lei
2022-07-13 20:25 ` [PATCH V5 0/2] ublk: add io_uring based userspace block driver Jens Axboe
2022-07-14  0:19   ` Ming Lei
2022-07-14  2:54     ` Jens Axboe
2022-07-14  2:59       ` Jens Axboe
2022-07-14  5:30         ` Ming Lei
2022-07-19 10:15     ` Geert Uytterhoeven
2022-07-14  2:54 ` Jens Axboe
2022-07-14 14:41 ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YtDLSdwbV/utn0Qv@T590 \
    --to=ming.lei@redhat.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=krisman@collabora.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.