Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	hch@lst.de, tytso@mit.edu, adilger.kernel@dilger.ca,
	darrick.wong@oracle.com, clm@fb.com, josef@toxicpanda.com,
	dsterba@suse.com
Cc: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Problems with determining data presence by examining extents?
Date: Wed, 15 Jan 2020 11:54:20 +0800
Message-ID: <00fc7691-77d5-5947-5493-5c97f262da81@gmx.com> (raw)
In-Reply-To: <4467.1579020509@warthog.procyon.org.uk>

[-- Attachment #1.1: Type: text/plain, Size: 3181 bytes --]



On 2020/1/15 上午12:48, David Howells wrote:
> Again with regard to my rewrite of fscache and cachefiles:
> 
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> 
> I've got rid of my use of bmap()!  Hooray!
> 
> However, I'm informed that I can't trust the extent map of a backing file to
> tell me accurately whether content exists in a file because:
> 
>  (a) Not-quite-contiguous extents may be joined by insertion of blocks of
>      zeros by the filesystem optimising itself.  This would give me a false
>      positive when trying to detect the presence of data.

At least for btrfs, only unaligned extents get padding zeros.

But I guess other fs could do whatever they want to optimize themselves.

> 
>  (b) Blocks of zeros that I write into the file may get punched out by
>      filesystem optimisation since a read back would be expected to read zeros
>      there anyway, provided it's below the EOF.  This would give me a false
>      negative.

I know some qemu disk backend has such zero detection.
But not btrfs. So this is another per-fs based behavior.

And problem (c):

(c): A multi-device fs (btrfs) can have their own logical address mapping.
Meaning the bytenr returned makes no sense to end user, unless used for
that fs specific address space.

This is even more trickier when considering single device btrfs.
It still utilize the same logical address space, just like all multiple
disks btrfs.

And it completely possible for a single 1T btrfs has logical address
mapped beyond 10T or even more. (Any aligned bytenr in the range [0,
U64_MAX) is valid for btrfs logical address).


You won't like this case either.
(d): Compressed extents
One compressed extent can represents more data than its on-disk size.

Furthermore, current fiemap interface hasn't considered this case, thus
there it only reports in-memory size (aka, file size), no way to
represent on-disk size.


And even more bad news:
(e): write time dedupe
Although no fs known has implemented it yet (btrfs used to try to
support that, and I guess XFS could do it in theory too), you won't
known when a fs could get such "awesome" feature.

Where your write may be checked and never reach disk if it matches with
existing extents.

This is a little like the zero-detection-auto-punch.

> 
> Is there some setting I can use to prevent these scenarios on a file - or can
> one be added?

I guess no.

> 
> Without being able to trust the filesystem to tell me accurately what I've
> written into it, I have to use some other mechanism.  Currently, I've switched
> to storing a map in an xattr with 1 bit per 256k block, but that gets hard to
> use if the file grows particularly large and also has integrity consequences -
> though those are hopefully limited as I'm now using DIO to store data into the
> cache.

Would you like to explain why you want to know such fs internal info?

Thanks,
Qu
> 
> If it helps, I'm downloading data in aligned 256k blocks and storing data in
> those same aligned 256k blocks, so if that makes it easier...
> 
> David
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:48 David Howells
2020-01-14 22:49 ` Theodore Y. Ts'o
2020-01-15  3:54 ` Qu Wenruo [this message]
2020-01-15 12:46   ` Andreas Dilger
2020-01-15 13:10     ` Qu Wenruo
2020-01-15 13:31       ` Christoph Hellwig
2020-01-15 19:48         ` Andreas Dilger
2020-01-16 10:16           ` Christoph Hellwig
2020-01-15 20:55         ` David Howells
2020-01-15 22:11           ` Andreas Dilger
2020-01-15 23:09           ` David Howells
2020-01-26 18:19             ` Zygo Blaxell
2020-01-15 14:35       ` David Howells
2020-01-15 14:48         ` Christoph Hellwig
2020-01-15 14:59         ` David Howells
2020-01-16 10:13           ` Christoph Hellwig
2020-01-17 16:43           ` David Howells
2020-01-15 14:20   ` David Howells
2020-01-15  8:38 ` Christoph Hellwig
2020-01-15 13:50 ` David Howells
2020-01-15 14:05 ` David Howells
2020-01-15 14:24   ` Qu Wenruo
2020-01-15 14:50   ` David Howells
2020-01-15 14:15 ` David Howells

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00fc7691-77d5-5947-5493-5c97f262da81@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git