From: Sarthak Kukreti <sarthakkukreti@chromium.org>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Daniil Lunev <dlunev@google.com>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
"Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
Mike Snitzer <snitzer@kernel.org>,
linux-kernel@vger.kernel.org,
Gwendal Grignou <gwendal@google.com>,
virtualization@lists.linux-foundation.org, dm-devel@redhat.com,
Andreas Dilger <adilger.kernel@dilger.ca>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Thu, 29 Dec 2022 00:17:00 -0800 [thread overview]
Message-ID: <CAG9=OMPQEoMVpXD8PeHwkymwk-zfB3mSvDO_W6h0S3Zom62JBQ@mail.gmail.com> (raw)
In-Reply-To: <Yy29y/jUvWM6GRZ5@redhat.com>
On Fri, Sep 23, 2022 at 7:08 AM Mike Snitzer <snitzer@redhat.com> wrote:
>
> On Fri, Sep 23 2022 at 4:51P -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
>
> > On Wed, Sep 21, 2022 at 07:48:50AM +1000, Daniil Lunev wrote:
> > > > There is no such thing as WRITE UNAVAILABLE in NVMe.
> > > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of
> > > NVM Express NVM Command Set Specification 1.0b
> >
> > Write uncorrectable is a very different thing, and the equivalent of the
> > horribly misnamed SCSI WRITE LONG COMMAND. It injects an unrecoverable
> > error, and does not provision anything.
> >
> > > * Each application is potentially allowed to consume the entirety
> > > of the disk space - there is no strict size limit for application
> > > * Applications need to pre-allocate space sometime, for which
> > > they use fallocate. Once the operation succeeded, the application
> > > assumed the space is guaranteed to be there for it.
> > > * Since filesystems on the volumes are independent, filesystem
> > > level enforcement of size constraints is impossible and the only
> > > common level is the thin pool, thus, each fallocate has to find its
> > > representation in thin pool one way or another - otherwise you
> > > may end up in the situation, where FS thinks it has allocated space
> > > but when it tries to actually write it, the thin pool is already
> > > exhausted.
> > > * Hole-Punching fallocate will not reach the thin pool, so the only
> > > solution presently is zero-writing pre-allocate.
> >
> > To me it sounds like you want a non-thin pool in dm-thin and/or
> > guaranted space reservations for it.
>
> What is implemented in this patchset: enablement for dm-thinp to
> actually provide guarantees which fallocate requires.
>
> Seems you're getting hung up on the finishing details in HW (details
> which are _not_ the point of this patchset).
>
> The proposed changes are in service to _Linux_ code. The patchset
> implements the primitive from top (ext4) to bottom (dm-thinp, loop).
> It stops short of implementing handling everywhere that'd need it
> (e.g. in XFS, etc). But those changes can come as follow-on work once
> the primitive is established top to bottom.
>
> But you know all this ;)
>
> > > * Thus, a provisioning block operation allows an interface specific
> > > operation that guarantees the presence of the block in the
> > > mapped space. LVM Thin-pool itself is the primary target for our
> > > use case but the argument is that this operation maps well to
> > > other interfaces which allow thinly provisioned units.
> >
> > I think where you are trying to go here is badly mistaken. With flash
> > (or hard drive SMR) there is no such thing as provisioning LBAs. Every
> > write is out of place, and a one time space allocation does not help
> > you at all. So fundamentally what you try to here just goes against
> > the actual physics of modern storage media. While there are some
> > layers that keep up a pretence, trying to that an an exposed API
> > level is a really bad idea.
>
> This doesn't need to be so feudal. Reserving an LBA in physical HW
> really isn't the point.
>
> Fact remains: an operation that ensures space is actually reserved via
> fallocate is long overdue (just because an FS did its job doesn't mean
> underlying layers reflect that). And certainly useful, even if "only"
> benefiting dm-thinp and the loop driver. Like other block primitives,
> REQ_OP_PROVISION is filtered out by block core if the device doesn't
> support it.
>
> That said, I agree with Brian Foster that we need really solid
> documentation and justification for why fallocate mode=0 cannot be
> used (but the case has been made in this thread).
>
> Also, I do see an issue with the implementation (relative to stacked
> devices): dm_table_supports_provision() is too myopic about DM. It
> needs to go a step further and verify that some layer in the stack
> actually services REQ_OP_PROVISION. Will respond to DM patch too.
>
Thanks all for the suggestions and feedback! I just posted v2 (more
than a bit belatedly) on the various mailing lists with the relevant
fixes, documentation and some benchmarks on performance.
Best
Sarthak
next prev parent reply other threads:[~2022-12-29 8:19 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-15 16:48 [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 1/8] block: Introduce provisioning primitives Sarthak Kukreti
2022-09-23 15:15 ` Mike Snitzer
2022-12-29 8:17 ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 2/8] dm: Add support for block provisioning Sarthak Kukreti
2022-09-23 14:23 ` Mike Snitzer
2022-12-29 8:22 ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 3/8] virtio_blk: Add support for provision requests Sarthak Kukreti
2022-09-16 5:48 ` Stefan Hajnoczi
2022-09-20 2:33 ` Sarthak Kukreti
2022-09-27 21:37 ` Michael S. Tsirkin
2022-09-15 16:48 ` [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-16 11:56 ` Brian Foster
2022-09-16 21:02 ` Sarthak Kukreti
2022-09-21 15:39 ` Brian Foster
2022-09-22 8:04 ` Sarthak Kukreti
2022-09-22 18:29 ` Brian Foster
2022-12-29 8:13 ` Sarthak Kukreti
2022-09-20 7:49 ` Christoph Hellwig
2022-09-21 5:54 ` Sarthak Kukreti
2022-09-21 15:21 ` Mike Snitzer
2022-09-22 8:08 ` Sarthak Kukreti
2022-09-23 8:45 ` Christoph Hellwig
2022-12-29 8:14 ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 5/8] loop: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 6/8] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 7/8] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 8/8] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-09-16 6:09 ` [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
2022-09-16 18:48 ` Sarthak Kukreti
2022-09-16 20:01 ` Bart Van Assche
2022-09-16 21:59 ` Sarthak Kukreti
2022-09-20 7:46 ` Christoph Hellwig
[not found] ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com>
2022-09-20 11:30 ` Christoph Hellwig
[not found] ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
2022-09-21 15:08 ` Mike Snitzer
2022-09-23 8:51 ` Christoph Hellwig
2022-09-23 14:08 ` Mike Snitzer
2022-12-29 8:17 ` Sarthak Kukreti [this message]
2022-09-17 3:03 ` [dm-devel] " Darrick J. Wong
2022-09-17 19:46 ` Sarthak Kukreti
2022-09-19 16:36 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG9=OMPQEoMVpXD8PeHwkymwk-zfB3mSvDO_W6h0S3Zom62JBQ@mail.gmail.com' \
--to=sarthakkukreti@chromium.org \
--cc=adilger.kernel@dilger.ca \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@google.com \
--cc=dlunev@google.com \
--cc=dm-devel@redhat.com \
--cc=evgreen@google.com \
--cc=gwendal@google.com \
--cc=hch@infradead.org \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=snitzer@kernel.org \
--cc=snitzer@redhat.com \
--cc=stefanha@redhat.com \
--cc=tytso@mit.edu \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).