Re: Problems with determining data presence by examining extents?

From: David Howells <dhowells@redhat.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, hch@lst.de, tytso@mit.edu,
	adilger.kernel@dilger.ca, darrick.wong@oracle.com, clm@fb.com,
	josef@toxicpanda.com, dsterba@suse.com,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Problems with determining data presence by examining extents?
Date: Wed, 15 Jan 2020 14:50:22 +0000	[thread overview]
Message-ID: <27263.1579099822@warthog.procyon.org.uk> (raw)
In-Reply-To: <6330a53c-781b-83d7-8293-405787979736@gmx.com>

Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> "Unaligned" means "unaligned to fs sector size". In btrfs it's page
> size, thus it shouldn't be a problem for your 256K block size.

Cool.

> > Same answer as above.  Btw, since I'm using DIO reads and writes, would these
> > get compressed?
> 
> Yes. DIO will also be compressed unless you set the inode to nocompression.
> 
> And you may not like this btrfs internal design:
> Compressed extent can only be as large as 128K (uncompressed size).
> 
> So 256K block write will be split into 2 extents anyway.
> And since compressed extent will cause non-continuous physical offset,
> it will always be two extents to fiemap, even you're always writing in
> 256K block size.
> 
> Not sure if this matters though.

Not a problem, provided I can read them with a single DIO read.  I just need
to know whether the data is present.  I don't need to know where it is or what
hoops the filesystem goes through to get it.

> > I'm not sure this isn't the same answer as above either, except if this
> > results in parts of the file being "filled in" with blocks of zeros that I
> > haven't supplied.
> 
> The example would be, you have written 256K data, all filled with 0xaa.
> And it committed to disk.
> Then the next time you write another 256K data, all filled with 0xaa.
> Then instead of writing this data onto disk, the fs chooses to reuse
> your previous written data, doing a reflink to it.

That's fine as long as the filesystem says it's there when I ask for it.
Having it shared isn't a problem.

But that brings me back to the original issue and that's the potential problem
of the filesystem optimising storage by adding or removing blocks of zero
bytes.  If either of those can happen, I cannot rely on the filesystem
metadata.

> So fiemap would report your latter 256K has the same bytenr of your
> previous 256K write (since it's reflinked), and with SHARED flag.

It might be better for me to use SEEK_HOLE than fiemap - barring the slight
issues that SEEK_HOLE has no upper bound and that writes may be taking place
at the same time.

David