linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sarthak Kukreti <sarthakkukreti@chromium.org>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Daniil Lunev <dlunev@google.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Bart Van Assche <bvanassche@google.com>,
	Mike Snitzer <snitzer@kernel.org>,
	linux-kernel@vger.kernel.org,
	Gwendal Grignou <gwendal@google.com>,
	virtualization@lists.linux-foundation.org, dm-devel@redhat.com,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Thu, 29 Dec 2022 00:17:00 -0800	[thread overview]
Message-ID: <CAG9=OMPQEoMVpXD8PeHwkymwk-zfB3mSvDO_W6h0S3Zom62JBQ@mail.gmail.com> (raw)
In-Reply-To: <Yy29y/jUvWM6GRZ5@redhat.com>

On Fri, Sep 23, 2022 at 7:08 AM Mike Snitzer <snitzer@redhat.com> wrote:
>
> On Fri, Sep 23 2022 at  4:51P -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
>
> > On Wed, Sep 21, 2022 at 07:48:50AM +1000, Daniil Lunev wrote:
> > > > There is no such thing as WRITE UNAVAILABLE in NVMe.
> > > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of
> > > NVM Express NVM Command Set Specification 1.0b
> >
> > Write uncorrectable is a very different thing, and the equivalent of the
> > horribly misnamed SCSI WRITE LONG COMMAND.  It injects an unrecoverable
> > error, and does not provision anything.
> >
> > > * Each application is potentially allowed to consume the entirety
> > >   of the disk space - there is no strict size limit for application
> > > * Applications need to pre-allocate space sometime, for which
> > >   they use fallocate. Once the operation succeeded, the application
> > >   assumed the space is guaranteed to be there for it.
> > > * Since filesystems on the volumes are independent, filesystem
> > >   level enforcement of size constraints is impossible and the only
> > >   common level is the thin pool, thus, each fallocate has to find its
> > >   representation in thin pool one way or another - otherwise you
> > >   may end up in the situation, where FS thinks it has allocated space
> > >   but when it tries to actually write it, the thin pool is already
> > >   exhausted.
> > > * Hole-Punching fallocate will not reach the thin pool, so the only
> > >   solution presently is zero-writing pre-allocate.
> >
> > To me it sounds like you want a non-thin pool in dm-thin and/or
> > guaranted space reservations for it.
>
> What is implemented in this patchset: enablement for dm-thinp to
> actually provide guarantees which fallocate requires.
>
> Seems you're getting hung up on the finishing details in HW (details
> which are _not_ the point of this patchset).
>
> The proposed changes are in service to _Linux_ code. The patchset
> implements the primitive from top (ext4) to bottom (dm-thinp, loop).
> It stops short of implementing handling everywhere that'd need it
> (e.g. in XFS, etc). But those changes can come as follow-on work once
> the primitive is established top to bottom.
>
> But you know all this ;)
>
> > > * Thus, a provisioning block operation allows an interface specific
> > >   operation that guarantees the presence of the block in the
> > >   mapped space. LVM Thin-pool itself is the primary target for our
> > >   use case but the argument is that this operation maps well to
> > >   other interfaces which allow thinly provisioned units.
> >
> > I think where you are trying to go here is badly mistaken.  With flash
> > (or hard drive SMR) there is no such thing as provisioning LBAs.  Every
> > write is out of place, and a one time space allocation does not help
> > you at all.  So fundamentally what you try to here just goes against
> > the actual physics of modern storage media.  While there are some
> > layers that keep up a pretence, trying to that an an exposed API
> > level is a really bad idea.
>
> This doesn't need to be so feudal.  Reserving an LBA in physical HW
> really isn't the point.
>
> Fact remains: an operation that ensures space is actually reserved via
> fallocate is long overdue (just because an FS did its job doesn't mean
> underlying layers reflect that). And certainly useful, even if "only"
> benefiting dm-thinp and the loop driver. Like other block primitives,
> REQ_OP_PROVISION is filtered out by block core if the device doesn't
> support it.
>
> That said, I agree with Brian Foster that we need really solid
> documentation and justification for why fallocate mode=0 cannot be
> used (but the case has been made in this thread).
>
> Also, I do see an issue with the implementation (relative to stacked
> devices): dm_table_supports_provision() is too myopic about DM. It
> needs to go a step further and verify that some layer in the stack
> actually services REQ_OP_PROVISION. Will respond to DM patch too.
>
Thanks all for the suggestions and feedback! I just posted v2 (more
than a bit belatedly) on the various mailing lists with the relevant
fixes, documentation and some benchmarks on performance.

Best
Sarthak

  reply	other threads:[~2022-12-29  8:19 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15 16:48 [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 1/8] block: Introduce provisioning primitives Sarthak Kukreti
2022-09-23 15:15   ` Mike Snitzer
2022-12-29  8:17     ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 2/8] dm: Add support for block provisioning Sarthak Kukreti
2022-09-23 14:23   ` Mike Snitzer
2022-12-29  8:22     ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 3/8] virtio_blk: Add support for provision requests Sarthak Kukreti
2022-09-16  5:48   ` Stefan Hajnoczi
2022-09-20  2:33     ` Sarthak Kukreti
2022-09-27 21:37   ` Michael S. Tsirkin
2022-09-15 16:48 ` [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-16 11:56   ` Brian Foster
2022-09-16 21:02     ` Sarthak Kukreti
2022-09-21 15:39       ` Brian Foster
2022-09-22  8:04         ` Sarthak Kukreti
2022-09-22 18:29           ` Brian Foster
2022-12-29  8:13             ` Sarthak Kukreti
2022-09-20  7:49   ` Christoph Hellwig
2022-09-21  5:54     ` Sarthak Kukreti
2022-09-21 15:21       ` Mike Snitzer
2022-09-22  8:08         ` Sarthak Kukreti
2022-09-23  8:45       ` Christoph Hellwig
2022-12-29  8:14         ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 5/8] loop: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 6/8] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 7/8] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 8/8] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-09-16  6:09 ` [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
2022-09-16 18:48   ` Sarthak Kukreti
2022-09-16 20:01     ` Bart Van Assche
2022-09-16 21:59       ` Sarthak Kukreti
2022-09-20  7:46     ` Christoph Hellwig
     [not found]       ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com>
2022-09-20 11:30         ` Christoph Hellwig
     [not found]           ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
2022-09-21 15:08             ` Mike Snitzer
2022-09-23  8:51             ` Christoph Hellwig
2022-09-23 14:08               ` Mike Snitzer
2022-12-29  8:17                 ` Sarthak Kukreti [this message]
2022-09-17  3:03 ` [dm-devel] " Darrick J. Wong
2022-09-17 19:46   ` Sarthak Kukreti
2022-09-19 16:36     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG9=OMPQEoMVpXD8PeHwkymwk-zfB3mSvDO_W6h0S3Zom62JBQ@mail.gmail.com' \
    --to=sarthakkukreti@chromium.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@google.com \
    --cc=dlunev@google.com \
    --cc=dm-devel@redhat.com \
    --cc=evgreen@google.com \
    --cc=gwendal@google.com \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=snitzer@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=tytso@mit.edu \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).