All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adam Borowski <kilobyte@angband.pl>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: Marat Khalili <mkh@rqc.ru>, Duncan <1i5t5.duncan@cox.net>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: qemu-kvm VM died during partial raid1 problems of btrfs
Date: Tue, 12 Sep 2017 22:00:57 +0200	[thread overview]
Message-ID: <20170912200057.3mrgtahlvszkg334@angband.pl> (raw)
In-Reply-To: <7019ace9-723e-0220-6136-473ac3574b55@gmail.com>

On Tue, Sep 12, 2017 at 03:11:52PM -0400, Austin S. Hemmelgarn wrote:
> On 2017-09-12 14:43, Adam Borowski wrote:
> > On Tue, Sep 12, 2017 at 01:36:48PM -0400, Austin S. Hemmelgarn wrote:
> > > On 2017-09-12 13:21, Adam Borowski wrote:
> > > > There's fallocate -d, but that for some reason touches mtime which makes
> > > > rsync go again.  This can be handled manually but is still not nice.
> > 
> > Yeah, the underlying ioctl does modify the file, it's merely fallocate -d
> > calling it on regions that are already zero.  The ioctl doesn't know that,
> > so fallocate would have to restore the mtime by itself.
> > 
> > There's also another problem: such a check + ioctl are racey.  Unlike defrag
> > or FILE_EXTENT_SAME, you can't thus use it on a file that's in use (or could
> > suddenly become in use).  Fixing this would need kernel support, either as
> > FILE_EXTENT_SAME with /dev/zero or as a new mode of fallocate.
> A new fallocate mode would be more likely.  Adding special code to the
> EXTENT_SAME ioctl and then requiring implementation on filesystems that
> don't otherwise support it is not likely to get anywhere.  A new fallocate
> mode though would be easy, especially considering that a naive
> implementation is easy

Sounds like a good idea.  If we go this way, there's a question about
interface: there's choice between:
A) check if the whole range is zero, if even a single bit is one, abort
B) dig many holes, with a given granulation (perhaps left to the
   filesystem's choice)
or even both.  The former is more consistent with FILE_EXTENT_SAME, the
latter can be smarter (like, digging a 4k hole is bad for fragmentation but
replacing a whole extent, no matter how small, is always a win).

> That said, I'm not 100% certain if it's necessary.  Intentionally calling
> fallocate on a file in use is not something most people are going to do
> normally anyway, since there is already a TOCTOU race in the fallocate -d
> implementation as things are right now.

_Current_ fallocate -d suffers from races, the whole gain from doing this
kernel-side would be eliminating those races.  Use cases about the same as
FILE_EXTENT_SAME: you don't need to stop the world.  Heck, as I mentioned
before, it conceptually _is_ FILE_EXTENT_SAME with /dev/null, other than
your (good) point about non-btrfs non-xfs.

> > For now, though, I wonder -- should we send fine folks at util-linux a patch
> > to make fallocate -d restore mtime, either always or on an option?
> It would need to be an option, because it also suffers from a TOCTOU race
> (other things might have changed the mtime while you were punching holes),
> and it breaks from existing behavior.  I think such an option would be
> useful, but not universally (for example, I don't care if the mtime on my VM
> images changes, as it typically matches the current date and time since the
> VM's are running constantly other than when doing maintenance like punching
> holes in the images).

Noted.  Both Marat's and my use cases, though, involve VMs that are off most
of the time, and at least for me, turned on only to test something. 
Touching mtime makes rsync run again, and it's freaking _slow_: worse than
40 minutes for a 40GB VM (source:SSD target:deduped HDD).


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ I've read an article about how lively happy music boosts
⣾⠁⢰⠒⠀⣿⡁ productivity.  You can read it, too, you just need the
⢿⡄⠘⠷⠚⠋⠀ right music while doing so.  I recommend Skepticism
⠈⠳⣄⠀⠀⠀⠀ (funeral doom metal).

  reply	other threads:[~2017-09-12 20:01 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12  8:02 qemu-kvm VM died during partial raid1 problems of btrfs Marat Khalili
2017-09-12  8:25 ` Timofey Titovets
2017-09-12  8:42   ` Marat Khalili
2017-09-12  9:21     ` Timofey Titovets
2017-09-12  9:29       ` Marat Khalili
2017-09-12  9:35         ` Timofey Titovets
2017-09-12 10:01     ` Duncan
2017-09-12 10:32       ` Adam Borowski
2017-09-12 10:39         ` Marat Khalili
2017-09-12 11:01           ` Timofey Titovets
2017-09-12 11:12             ` Adam Borowski
2017-09-12 11:17               ` Timofey Titovets
2017-09-12 11:26               ` Marat Khalili
2017-09-12 17:21                 ` Adam Borowski
2017-09-12 17:36                   ` Austin S. Hemmelgarn
2017-09-12 18:43                     ` Adam Borowski
2017-09-12 18:47                       ` Christoph Hellwig
2017-09-12 19:12                         ` Austin S. Hemmelgarn
2017-09-12 19:11                       ` Austin S. Hemmelgarn
2017-09-12 20:00                         ` Adam Borowski [this message]
2017-09-12 20:12                           ` Austin S. Hemmelgarn
2017-09-12 21:13                             ` Adam Borowski
2017-09-13  0:52                               ` Timofey Titovets
2017-09-13 12:55                                 ` Austin S. Hemmelgarn
2017-09-13 12:21                               ` Austin S. Hemmelgarn
2017-09-18 11:53                                 ` Adam Borowski
2017-09-13 14:47                               ` Martin Raiber
2017-09-13 15:25                                 ` Austin S. Hemmelgarn
2017-09-12 11:09         ` Roman Mamedov
2017-09-13 13:23 ` Chris Murphy
2017-09-13 14:15   ` Marat Khalili
2017-09-13 17:52     ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170912200057.3mrgtahlvszkg334@angband.pl \
    --to=kilobyte@angband.pl \
    --cc=1i5t5.duncan@cox.net \
    --cc=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mkh@rqc.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.