All of
 help / color / mirror / Atom feed
From: Zygo Blaxell <>
To: David Howells <>
Cc: Andreas Dilger <>,
	Christoph Hellwig <>,
	Qu Wenruo <>,
	linux-fsdevel <>,
	Al Viro <>,
	"Theodore Y. Ts'o" <>,
	"Darrick J. Wong" <>,
	Chris Mason <>, Josef Bacik <>,
	David Sterba <>,
	linux-ext4 <>,
	linux-xfs <>,
	linux-btrfs <>,
	Linux Kernel Mailing List <>
Subject: Re: Problems with determining data presence by examining extents?
Date: Sun, 26 Jan 2020 13:19:58 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]

On Wed, Jan 15, 2020 at 11:09:03PM +0000, David Howells wrote:
> Andreas Dilger <> wrote:
> > > It would also have to say that blocks of zeros shouldn't be optimised away.
> > 
> > I don't necessarily see that as a requirement, so long as the filesystem
> > stores a "block" at that offset, but it could dedupe all zero-filled blocks
> > to the same "zero block".  That still allows saving storage space, while
> > keeping the semantics of "this block was written into the file" rather than
> > "there is a hole at this offset".
> Yeah, that's more what I was thinking of.  Provided I can find out that
> something is present, it should be fine.

I'm curious how this proposal handles an application punching a hole
through the cache?  Does that get cached, or does that operation have
to be synchronous with the server?  Or is it a moot point because no
server supports hole punching, so it gets replaced with equivalent zero
block data writes?

Zero blocks are stupidly common on typical user data corpuses, and a
naive block-oriented deduper can create monster extents with millions
or even billions of references if it doesn't have some special handling
for zero blocks.  Even if they don't trigger filesystem performance bugs
or hit RAM or other implementation limits, it's still bigger and slower
to use zero-filled data blocks than just using holes for zero blocks.

In the bees deduper for btrfs, zero blocks get replaced with holes
unconditionally in uncompressed extents, and in compressed extents if the
extent consists entirely of zeros (a long run of zero bytes is compressed
to a few bits by all supported compression algorithms, and hole metdata
is much larger than a few bits, so no gain is possible if anything less
than the entire compressed extent is eliminated).  That behavior could
be adjusted to support this use case, as a non-default user option.

For defrag a similar optimization is possible:  read a long run of
consecutive zero data blocks, write a prealloc extent.  I don't know of
anyone doing that in real life, but it would play havoc with anything
trying to store information in FIEMAP data (or related ioctls like

I think an explicit dirty-cache-data metadata structure is a good idea
despite implementation complexity.  It would eliminate dependencies on
non-portable filesystem behavior, and not abuse a facility that might
already be in active (ab)use by other existing things.  If you have
a writeback cache, you need to properly control write ordering with a
purpose-built metadata structure, or fsync() will be meaningless through
your caching layer, and after a crash you'll upload whatever confused,
delalloc-reordered, torn-written steaming crap is on the local disk to
the backing store.

> David

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2020-01-26 18:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:48 David Howells
2020-01-14 22:49 ` Theodore Y. Ts'o
2020-01-15  3:54 ` Qu Wenruo
2020-01-15 12:46   ` Andreas Dilger
2020-01-15 13:10     ` Qu Wenruo
2020-01-15 13:31       ` Christoph Hellwig
2020-01-15 19:48         ` Andreas Dilger
2020-01-16 10:16           ` Christoph Hellwig
2020-01-15 20:55         ` David Howells
2020-01-15 22:11           ` Andreas Dilger
2020-01-15 23:09           ` David Howells
2020-01-26 18:19             ` Zygo Blaxell [this message]
2020-01-15 14:35       ` David Howells
2020-01-15 14:48         ` Christoph Hellwig
2020-01-15 14:59         ` David Howells
2020-01-16 10:13           ` Christoph Hellwig
2020-01-17 16:43           ` David Howells
2020-01-15 14:20   ` David Howells
2020-01-15  8:38 ` Christoph Hellwig
2020-01-15 13:50 ` David Howells
2020-01-15 14:05 ` David Howells
2020-01-15 14:24   ` Qu Wenruo
2020-01-15 14:50   ` David Howells
2020-01-15 14:15 ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: Problems with determining data presence by examining extents?' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.