io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Denis V. Lunev" <den@virtuozzo.com>,
	io-uring@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Kirill Tkhai <kirill.tkhai@openvz.org>,
	Manuel Bentele <development@manuel-bentele.de>,
	qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
	rjones@redhat.com, Xie Yongji <xieyongji@bytedance.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Josef Bacik <josef@toxicpanda.com>
Subject: Re: ublk-qcow2: ublk-qcow2 is available
Date: Thu, 6 Oct 2022 23:09:48 +0800	[thread overview]
Message-ID: <Yz7vvNKSNRyBVObo@T590> (raw)
In-Reply-To: <Yz7fTANAxAQ8KT4v@fedora>

On Thu, Oct 06, 2022 at 09:59:40AM -0400, Stefan Hajnoczi wrote:
> On Thu, Oct 06, 2022 at 06:26:15PM +0800, Ming Lei wrote:
> > On Wed, Oct 05, 2022 at 11:11:32AM -0400, Stefan Hajnoczi wrote:
> > > On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote:
> > > > On 10/3/22 21:53, Stefan Hajnoczi wrote:
> > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote:
> > > > > > ublk-qcow2 is available now.
> > > > > Cool, thanks for sharing!
> > > > yep
> > > > 
> > > > > > So far it provides basic read/write function, and compression and snapshot
> > > > > > aren't supported yet. The target/backend implementation is completely
> > > > > > based on io_uring, and share the same io_uring with ublk IO command
> > > > > > handler, just like what ublk-loop does.
> > > > > > 
> > > > > > Follows the main motivations of ublk-qcow2:
> > > > > > 
> > > > > > - building one complicated target from scratch helps libublksrv APIs/functions
> > > > > >    become mature/stable more quickly, since qcow2 is complicated and needs more
> > > > > >    requirement from libublksrv compared with other simple ones(loop, null)
> > > > > > 
> > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as
> > > > > >    ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2
> > > > > >    might useful be for covering requirement in this field
> > > > There is one important thing to keep in mind about all partly-userspace
> > > > implementations though:
> > > > * any single allocation happened in the context of the
> > > >    userspace daemon through try_to_free_pages() in
> > > >    kernel has a possibility to trigger the operation,
> > > >    which will require userspace daemon action, which
> > > >    is inside the kernel now.
> > > > * the probability of this is higher in the overcommitted
> > > >    environment
> > > > 
> > > > This was the main motivation of us in favor for the in-kernel
> > > > implementation.
> > > 
> > > CCed Josef Bacik because the Linux NBD driver has dealt with memory
> > > reclaim hangs in the past.
> > > 
> > > Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and
> > > how to avoid hangs in memory reclaim?
> > 
> > If I remember correctly, there isn't new report after the last NBD(TCMU) deadlock
> > in memory reclaim was addressed by 8d19f1c8e193 ("prctl: PR_{G,S}ET_IO_FLUSHER
> > to support controlling memory reclaim").
> 
> Denis: I'm trying to understand the problem you described. Is this
> correct:
> 
> Due to memory pressure, the kernel reclaims pages and submits a write to
> a ublk block device. The userspace process attempts to allocate memory
> in order to service the write request, but it gets stuck because there
> is no memory available. As a result reclaim gets stuck, the system is
> unable to free more memory and therefore it hangs?

The process should be killed in this situation if PR_SET_IO_FLUSHER
is applied since the page allocation is done in VM fault handler.

Firstly in theory the userspace part should provide forward progress
guarantee in code path for handling IO, such as reserving/mlock pages
for such situation. However, this issue isn't unique for nbd or ublk,
all userspace block device should have such potential risk, and vduse
is no exception, IMO.

Secondly with proper/enough swap space, I think it is hard to trigger
such kind of issue.

Finally ublk driver has added user recovery commands for recovering from
crash, and ublksrv will support it soon.

Thanks,
Ming

  reply	other threads:[~2022-10-06 15:10 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30  9:24 ublk-qcow2: ublk-qcow2 is available Ming Lei
2022-10-03 19:53 ` Stefan Hajnoczi
2022-10-03 23:57   ` Denis V. Lunev
2022-10-05 15:11     ` Stefan Hajnoczi
2022-10-06 10:26       ` Ming Lei
2022-10-06 13:59         ` Stefan Hajnoczi
2022-10-06 15:09           ` Ming Lei [this message]
2022-10-06 18:29             ` Stefan Hajnoczi
2022-10-07 11:21               ` Ming Lei
2022-10-04  9:43   ` Ming Lei
2022-10-04 13:53     ` Stefan Hajnoczi
2022-10-05  4:18       ` Ming Lei
2022-10-05 12:21         ` Stefan Hajnoczi
2022-10-05 12:38           ` Denis V. Lunev
2022-10-06 11:24           ` Ming Lei
2022-10-07 10:04             ` Yongji Xie
2022-10-07 10:51               ` Ming Lei
2022-10-07 11:21                 ` Yongji Xie
2022-10-07 11:23                   ` Ming Lei
2022-10-08  8:43         ` Ziyang Zhang
2022-10-12 14:22           ` Stefan Hajnoczi
2022-10-13  6:48             ` Yongji Xie
2022-10-13 16:02               ` Stefan Hajnoczi
2022-10-14 12:56               ` Ming Lei
2022-10-17 11:11                 ` Yongji Xie
2022-10-18  6:59                   ` Ming Lei
2022-10-18 13:17                     ` Yongji Xie
2022-10-18 14:54                       ` Stefan Hajnoczi
2022-10-19  9:09                         ` Ming Lei
2022-10-24 16:11                           ` Stefan Hajnoczi
2022-10-21  5:33                         ` Yongji Xie
2022-10-21  6:30                           ` Jason Wang
2022-10-25  8:17                             ` Yongji Xie
2022-10-25 12:02                               ` Stefan Hajnoczi
2022-10-28 13:33                                 ` Yongji Xie
2022-11-01  2:36                                 ` Jason Wang
2022-11-02 19:13                                   ` Stefan Hajnoczi
2022-11-04  6:55                                     ` Jason Wang
2022-10-21  6:28                     ` Jason Wang
2022-10-06 10:14       ` Richard W.M. Jones
2022-10-12 14:15         ` Stefan Hajnoczi
2022-10-13  1:50           ` Ming Lei
2022-10-13 16:01             ` Stefan Hajnoczi
2022-10-04  5:43 ` Manuel Bentele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz7vvNKSNRyBVObo@T590 \
    --to=tom.leiming@gmail.com \
    --cc=den@virtuozzo.com \
    --cc=development@manuel-bentele.de \
    --cc=io-uring@vger.kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=kirill.tkhai@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).