linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sarthak Kukreti <sarthakkukreti@chromium.org>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: dm-devel@redhat.com, linux-block@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Jens Axboe <axboe@kernel.dk>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Bart Van Assche <bvanassche@google.com>,
	Daniil Lunev <dlunev@google.com>, Evan Green <evgreen@google.com>,
	Gwendal Grignou <gwendal@google.com>
Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Fri, 16 Sep 2022 11:48:34 -0700	[thread overview]
Message-ID: <CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com> (raw)
In-Reply-To: <YyQTM5PRT2o/GDwy@fedora>

On Thu, Sep 15, 2022 at 11:10 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote:
> > From: Sarthak Kukreti <sarthakkukreti@chromium.org>
> >
> > Hi,
> >
> > This patch series is an RFC of a mechanism to pass through provision requests on stacked thinly provisioned storage devices/filesystems.
> >
> > The linux kernel provides several mechanisms to set up thinly provisioned block storage abstractions (eg. dm-thin, loop devices over sparse files), either directly as block devices or backing storage for filesystems. Currently, short of writing data to either the device or filesystem, there is no way for users to pre-allocate space for use in such storage setups. Consider the following use-cases:
> >
> > 1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that the underlying thinpool metadata is not modified during the suspend mechanism, the dm-thin device needs to be fully provisioned.
> > 2) If a filesystem uses a loop device over a sparse file, fallocate() on the filesystem will allocate blocks for files but the underlying sparse file will remain intact.
> > 3) Another example is virtual machine using a sparse file/dm-thin as a storage device; by default, allocations within the VM boundaries will not affect the host.
> > 4) Several storage standards support mechanisms for thin provisioning on real hardware devices. For example:
> >   a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: "When the THINP bit in the NSFEAT field of the Identify Namespace data structure is set to ‘1’, the controller ... shall track the number of allocated blocks in the Namespace Utilization field"
> >   b. The SCSi Block Commands reference - 4 section references "Thin provisioned logical units",
> >   c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".
>
> When REQ_OP_PROVISION is sent on an already-allocated range of blocks,
> are those blocks zeroed? NVMe Write Zeroes with Deallocate=0 works this
> way, for example. That behavior is counterintuitive since the operation
> name suggests it just affects the logical block's provisioning state,
> not the contents of the blocks.
>
No, the blocks are not zeroed. The current implementation (in the dm
patch) is to indeed look at the provisioned state of the logical block
and provision if it is unmapped. if the block is already allocated,
REQ_OP_PROVISION should have no effect on the contents of the block.
Similarly, in the file semantics, sending an FALLOC_FL_PROVISION
requests for extents already mapped should not affect the contents in
the extents.

> > In all of the above situations, currently the only way for pre-allocating space is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does not scale well with larger pre-allocation sizes.
>
> What exactly is the issue with WRITE_ZEROES scalability? Are you
> referring to cases where the device doesn't support an efficient
> WRITE_ZEROES command and actually writes blocks filled with zeroes
> instead of updating internal allocation metadata cheaply?
>
Yes. On ChromiumOS, we regularly deal with storage devices that don't
support WRITE_ZEROES or that need to have it disabled, via a quirk,
due to a bug in the vendor's implementation. Using WRITE_ZEROES for
allocation makes the allocation path quite slow for such devices (not
to mention the effect on storage lifetime), so having a separate
provisioning construct is very appealing. Even for devices that do
support an efficient WRITE_ZEROES implementation but don't support
logical provisioning per-se, I suppose that the allocation path might
be a bit faster (the device driver's request queue would report
'max_provision_sectors'=0 and the request would be short circuited
there) although I haven't benchmarked the difference.

Sarthak

> Stefan

  reply	other threads:[~2022-09-16 18:48 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15 16:48 [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 1/8] block: Introduce provisioning primitives Sarthak Kukreti
2022-09-23 15:15   ` Mike Snitzer
2022-12-29  8:17     ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 2/8] dm: Add support for block provisioning Sarthak Kukreti
2022-09-23 14:23   ` Mike Snitzer
2022-12-29  8:22     ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 3/8] virtio_blk: Add support for provision requests Sarthak Kukreti
2022-09-16  5:48   ` Stefan Hajnoczi
2022-09-20  2:33     ` Sarthak Kukreti
2022-09-27 21:37   ` Michael S. Tsirkin
2022-09-15 16:48 ` [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-16 11:56   ` Brian Foster
2022-09-16 21:02     ` Sarthak Kukreti
2022-09-21 15:39       ` Brian Foster
2022-09-22  8:04         ` Sarthak Kukreti
2022-09-22 18:29           ` Brian Foster
2022-12-29  8:13             ` Sarthak Kukreti
2022-09-20  7:49   ` Christoph Hellwig
2022-09-21  5:54     ` Sarthak Kukreti
2022-09-21 15:21       ` Mike Snitzer
2022-09-22  8:08         ` Sarthak Kukreti
2022-09-23  8:45       ` Christoph Hellwig
2022-12-29  8:14         ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 5/8] loop: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 6/8] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 7/8] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 8/8] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-09-16  6:09 ` [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
2022-09-16 18:48   ` Sarthak Kukreti [this message]
2022-09-16 20:01     ` Bart Van Assche
2022-09-16 21:59       ` Sarthak Kukreti
2022-09-20  7:46     ` Christoph Hellwig
     [not found]       ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com>
2022-09-20 11:30         ` Christoph Hellwig
     [not found]           ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
2022-09-21 15:08             ` Mike Snitzer
2022-09-23  8:51             ` Christoph Hellwig
2022-09-23 14:08               ` Mike Snitzer
2022-12-29  8:17                 ` Sarthak Kukreti
2022-09-17  3:03 ` [dm-devel] " Darrick J. Wong
2022-09-17 19:46   ` Sarthak Kukreti
2022-09-19 16:36     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com' \
    --to=sarthakkukreti@chromium.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@google.com \
    --cc=dlunev@google.com \
    --cc=dm-devel@redhat.com \
    --cc=evgreen@google.com \
    --cc=gwendal@google.com \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=tytso@mit.edu \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).