All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sarthak Kukreti <sarthakkukreti@chromium.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	sarthakkukreti@google.com, dm-devel@redhat.com,
	linux-block@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Brian Foster <bfoster@redhat.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Bart Van Assche <bvanassche@google.com>,
	Daniil Lunev <dlunev@google.com>
Subject: Re: [PATCH v2 3/7] fs: Introduce FALLOC_FL_PROVISION
Date: Thu, 30 Mar 2023 17:28:35 -0700	[thread overview]
Message-ID: <CAG9=OMM_0D+ck6=0dfjBi0B_zqTbp3i28tFDr8c3e1TQip1sQA@mail.gmail.com> (raw)
In-Reply-To: <Y7bxjKusa2L/TNRE@mit.edu>

On Thu, Jan 5, 2023 at 7:49 AM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Jan 04, 2023 at 01:22:06PM -0800, Sarthak Kukreti wrote:
> > > How expensive is this expected to be?  Is this why you wanted a separate
> > > mode flag?
> >
> > Yes, the exact latency will depend on the stacked block devices and
> > the fragmentation at the allocation layers.
> >
> > I did a quick test for benchmarking fallocate() with an:
> > A) ext4 filesystem mounted with 'noprovision'
> > B) ext4 filesystem mounted with 'provision' on a dm-thin device.
> > C) ext4 filesystem mounted with 'provision' on a loop device with a
> > sparse backing file on the filesystem in (B).
> >
> > I tested file sizes from 512M to 8G, time taken for fallocate() in (A)
> > remains expectedly flat at ~0.01-0.02s, but for (B), it scales from
> > 0.03-0.4s and for (C) it scales from 0.04s-0.52s (I captured the exact
> > time distribution in the cover letter
> > https://marc.info/?l=linux-ext4&m=167230113520636&w=2)
> >
> > +0.5s for a 8G fallocate doesn't sound a lot but I think fragmentation
> > and how the block device is layered can make this worse...
>
> If userspace uses fallocate(2) there are generally two reasons.
> Either they **really** don't want to get the NOSPC, in which case
> noprovision will not give them what they want unless we modify their
> source code to add this new FALLOC_FL_PROVISION flag --- which may not
> be possible if it is provided in a binary-only format (for example,
> proprietary databases shipped by companies beginning with the letters
> 'I' or 'O').
>
> Or, they really care about avoiding fragmentation by giving a hint to
> the file system that layout is important, and so **please** allocate
> the space right away so that it is more likely that the space will be
> laid out in a contiguous fashion.  Of course, the moment you use
> thin-provisioning this goes out the window, since even if the space is
> contiguous on the dm-thin layer, on the underlying storage layer it is
> likely that things will be fragmented to a fare-thee-well, and either
> (a) you have a vast amount of flash to try to mitigate the performance
> hit of using thin-provisioning (example, hardware thin-provisioning
> such as EMC storage arrays), or (b) you really don't care about
> performance since space savings is what you're going for.
>
> So.... because of the issue of changing the semantics of what
> fallocate(2) will guarantee, unless programs are forced to change
> their code to use this new FALLOC flag, I really am not very fond of
> it.
>
> I suspect that using a mount option (which should default to
> "provision"; if you want to break user API expectations, it should
> require a mount option for the system administrator to explicitly OK
> such a change), is OK.
>
Understood. I dropped the FALLOC flag from the series in v3, instead
we now rely on the filesystem's mount/policy.

> As far as the per-file mode --- I'm not convinced it's really
> necessary.  In general if you are using thin-provisioning file systems
> tend to be used explicitly for one purpose, so adding the complexity
> of doing it on a per-file basis is probably not really needed.  That
> being said, your existing prototype requires searching for the
> extended attribute on every single file allocation, which is not a
> great idea.  On a system with SELinux enabled, every file will have an
> xattr block, and requiring that it be searched on every file
> allocation would be unfortunate.  It would be better to check for the
> xattr when the file is opened, and then setting a flag in the struct
> file.  However, it might be better to see if it there is a real demand
> for such a feature before adding it.
>
Thanks for the feedback! On ChromeOS, we still have filesystems shared
between applications, partly due to inertia of adoption. So, we have a
few cases of needing to share the filesystem but with differing
provisioning policy.

One more idea that I've been exploring in this space and uses the
above file-based mechanism is to use a 'provisioning disabled'
fallocated file to make the apparent free space in the thinly
provisioned filesystem match the space available in the thinpool. In
theory, this prevents userspace applications from writing much more
than what's available on the thinpool. In practice, it depends on the
responsiveness of the service that monitors and resizes this 'storage
balloon'.

Best
Sarthak

>                                                 - Ted

WARNING: multiple messages have this Message-ID (diff)
From: Sarthak Kukreti <sarthakkukreti@chromium.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	sarthakkukreti@google.com, "Darrick J. Wong" <djwong@kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	Bart Van Assche <bvanassche@google.com>,
	Mike Snitzer <snitzer@kernel.org>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	dm-devel@redhat.com, Andreas Dilger <adilger.kernel@dilger.ca>,
	Daniil Lunev <dlunev@google.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Brian Foster <bfoster@redhat.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH v2 3/7] fs: Introduce FALLOC_FL_PROVISION
Date: Thu, 30 Mar 2023 17:28:35 -0700	[thread overview]
Message-ID: <CAG9=OMM_0D+ck6=0dfjBi0B_zqTbp3i28tFDr8c3e1TQip1sQA@mail.gmail.com> (raw)
In-Reply-To: <Y7bxjKusa2L/TNRE@mit.edu>

On Thu, Jan 5, 2023 at 7:49 AM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Jan 04, 2023 at 01:22:06PM -0800, Sarthak Kukreti wrote:
> > > How expensive is this expected to be?  Is this why you wanted a separate
> > > mode flag?
> >
> > Yes, the exact latency will depend on the stacked block devices and
> > the fragmentation at the allocation layers.
> >
> > I did a quick test for benchmarking fallocate() with an:
> > A) ext4 filesystem mounted with 'noprovision'
> > B) ext4 filesystem mounted with 'provision' on a dm-thin device.
> > C) ext4 filesystem mounted with 'provision' on a loop device with a
> > sparse backing file on the filesystem in (B).
> >
> > I tested file sizes from 512M to 8G, time taken for fallocate() in (A)
> > remains expectedly flat at ~0.01-0.02s, but for (B), it scales from
> > 0.03-0.4s and for (C) it scales from 0.04s-0.52s (I captured the exact
> > time distribution in the cover letter
> > https://marc.info/?l=linux-ext4&m=167230113520636&w=2)
> >
> > +0.5s for a 8G fallocate doesn't sound a lot but I think fragmentation
> > and how the block device is layered can make this worse...
>
> If userspace uses fallocate(2) there are generally two reasons.
> Either they **really** don't want to get the NOSPC, in which case
> noprovision will not give them what they want unless we modify their
> source code to add this new FALLOC_FL_PROVISION flag --- which may not
> be possible if it is provided in a binary-only format (for example,
> proprietary databases shipped by companies beginning with the letters
> 'I' or 'O').
>
> Or, they really care about avoiding fragmentation by giving a hint to
> the file system that layout is important, and so **please** allocate
> the space right away so that it is more likely that the space will be
> laid out in a contiguous fashion.  Of course, the moment you use
> thin-provisioning this goes out the window, since even if the space is
> contiguous on the dm-thin layer, on the underlying storage layer it is
> likely that things will be fragmented to a fare-thee-well, and either
> (a) you have a vast amount of flash to try to mitigate the performance
> hit of using thin-provisioning (example, hardware thin-provisioning
> such as EMC storage arrays), or (b) you really don't care about
> performance since space savings is what you're going for.
>
> So.... because of the issue of changing the semantics of what
> fallocate(2) will guarantee, unless programs are forced to change
> their code to use this new FALLOC flag, I really am not very fond of
> it.
>
> I suspect that using a mount option (which should default to
> "provision"; if you want to break user API expectations, it should
> require a mount option for the system administrator to explicitly OK
> such a change), is OK.
>
Understood. I dropped the FALLOC flag from the series in v3, instead
we now rely on the filesystem's mount/policy.

> As far as the per-file mode --- I'm not convinced it's really
> necessary.  In general if you are using thin-provisioning file systems
> tend to be used explicitly for one purpose, so adding the complexity
> of doing it on a per-file basis is probably not really needed.  That
> being said, your existing prototype requires searching for the
> extended attribute on every single file allocation, which is not a
> great idea.  On a system with SELinux enabled, every file will have an
> xattr block, and requiring that it be searched on every file
> allocation would be unfortunate.  It would be better to check for the
> xattr when the file is opened, and then setting a flag in the struct
> file.  However, it might be better to see if it there is a real demand
> for such a feature before adding it.
>
Thanks for the feedback! On ChromeOS, we still have filesystems shared
between applications, partly due to inertia of adoption. So, we have a
few cases of needing to share the filesystem but with differing
provisioning policy.

One more idea that I've been exploring in this space and uses the
above file-based mechanism is to use a 'provisioning disabled'
fallocated file to make the apparent free space in the thinly
provisioned filesystem match the space available in the thinpool. In
theory, this prevents userspace applications from writing much more
than what's available on the thinpool. In practice, it depends on the
responsiveness of the service that monitors and resizes this 'storage
balloon'.

Best
Sarthak

>                                                 - Ted

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2023-03-31  0:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-29  8:12 [PATCH v2 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-12-29  8:12 ` [dm-devel] " Sarthak Kukreti
2022-12-29  8:12 ` [PATCH v2 1/7] block: Introduce provisioning primitives Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti
2022-12-29  8:12 ` [PATCH v2 2/7] dm: Add support for block provisioning Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti
2023-01-05 14:43   ` Brian Foster
2023-01-05 14:43     ` Brian Foster
2023-03-31  0:30     ` Sarthak Kukreti
2023-03-31  0:30       ` [dm-devel] " Sarthak Kukreti
2023-03-31 12:28       ` Brian Foster
2023-03-31 12:28         ` [dm-devel] " Brian Foster
2023-04-03 22:57         ` Sarthak Kukreti
2023-04-03 22:57           ` [dm-devel] " Sarthak Kukreti
2022-12-29  8:12 ` [PATCH v2 3/7] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti
2023-01-04 16:39   ` Darrick J. Wong
2023-01-04 16:39     ` [dm-devel] " Darrick J. Wong
2023-01-04 18:58     ` Sarthak Kukreti
2023-01-04 18:58       ` [dm-devel] " Sarthak Kukreti
2023-01-04 21:22     ` Sarthak Kukreti
2023-01-04 21:22       ` [dm-devel] " Sarthak Kukreti
2023-01-05 14:46       ` Brian Foster
2023-01-05 14:46         ` Brian Foster
2023-01-05 19:35         ` [dm-devel] " Darrick J. Wong
2023-01-05 19:35           ` Darrick J. Wong
2023-01-09 15:07           ` [dm-devel] " Brian Foster
2023-01-09 15:07             ` Brian Foster
2023-03-31  0:28             ` Sarthak Kukreti
2023-03-31  0:28               ` [dm-devel] " Sarthak Kukreti
2023-03-31  0:28         ` Sarthak Kukreti
2023-03-31  0:28           ` [dm-devel] " Sarthak Kukreti
2023-01-05 15:49       ` Theodore Ts'o
2023-01-05 15:49         ` [dm-devel] " Theodore Ts'o
2023-03-31  0:28         ` Sarthak Kukreti [this message]
2023-03-31  0:28           ` Sarthak Kukreti
2022-12-29  8:12 ` [PATCH v2 4/7] loop: Add support for provision requests Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti
2022-12-29  8:12 ` [PATCH v2 5/7] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti
2022-12-29  8:12 ` [PATCH v2 6/7] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti
2023-01-09 15:02   ` Brian Foster
2023-01-09 15:02     ` [dm-devel] " Brian Foster
2022-12-29  8:12 ` [PATCH v2 7/7] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-12-29  8:12   ` [dm-devel] " Sarthak Kukreti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG9=OMM_0D+ck6=0dfjBi0B_zqTbp3i28tFDr8c3e1TQip1sQA@mail.gmail.com' \
    --to=sarthakkukreti@chromium.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bfoster@redhat.com \
    --cc=bvanassche@google.com \
    --cc=djwong@kernel.org \
    --cc=dlunev@google.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=sarthakkukreti@google.com \
    --cc=snitzer@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.