Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: David Howells <dhowells@redhat.com>, Christoph Hellwig <hch@lst.de>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Problems with determining data presence by examining extents?
Date: Wed, 15 Jan 2020 12:48:44 -0700
Message-ID: <C0F67EC5-7B5D-4179-9F28-95B84D9CC326@dilger.ca> (raw)
In-Reply-To: <20200115133101.GA28583@lst.de>

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

On Jan 15, 2020, at 6:31 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Wed, Jan 15, 2020 at 09:10:44PM +0800, Qu Wenruo wrote:
>>> That allows userspace to distinguish fe_physical addresses that may be
>>> on different devices.  This isn't in the kernel yet, since it is mostly
>>> useful only for Btrfs and nobody has implemented it there.  I can give
>>> you details if working on this for Btrfs is of interest to you.
>> 
>> IMHO it's not good enough.
>> 
>> The concern is, one extent can exist on multiple devices (mirrors for
>> RAID1/RAID10/RAID1C2/RAID1C3, or stripes for RAID5/6).
>> I didn't see how it can be easily implemented even with extra fields.
>> 
>> And even we implement it, it can be too complex or bug prune to fill
>> per-device info.
> 
> It's also completely bogus for the use cases to start with.  fiemap
> is a debug tool reporting the file system layout.  Using it for anything
> related to actual data storage and data integrity is a receipe for
> disaster.  As said the right thing for the use case would be something
> like the NFS READ_PLUS operation.  If we can't get that easily it can
> be emulated using lseek SEEK_DATA / SEEK_HOLE assuming no other thread
> could be writing to the file, or the raciness doesn't matter.

I don't think either of those will be any better than FIEMAP, if the reason
is that the underlying filesystem is filling in holes with actual data
blocks to optimize the IO pattern.  SEEK_HOLE would not find a hole in
the block allocation, and would happily return the block of zeroes to
the caller.  Also, it isn't clear if SEEK_HOLE considers an allocated but
unwritten extent to be a hole or a block?

I think what is needed here is an fadvise/ioctl that tells the filesystem
"don't allocate blocks unless actually written" for that file.  Storing
anything in a separate data structure is a recipe for disaster, since it
will become inconsistent after a crash, or filesystem corruption+e2fsck,
and will unnecessarily bloat the on-disk metadata for every file to hold
redundant information.

I don't see COW/reflink/compression as being a problem in this case, since
what cachefiles cares about is whether there is _any_ data for a given
logical offset, not where/how the data is stored.  IF FIEMAP was used for
a btrfs backing filesystem, it would need the "EXTENT_DATA_COMPRESSED"
feature to be implemented as well, so that it can distinguish the logical
vs. physical allocations.  I don't think that would be needed for SEEK_HOLE
and SEEK_DATA, so long as they handle unwritten extents properly (and are
correctly implemented in the first place, some filesystems fall back to
always returning the next block for SEEK_DATA).

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:48 David Howells
2020-01-14 22:49 ` Theodore Y. Ts'o
2020-01-15  3:54 ` Qu Wenruo
2020-01-15 12:46   ` Andreas Dilger
2020-01-15 13:10     ` Qu Wenruo
2020-01-15 13:31       ` Christoph Hellwig
2020-01-15 19:48         ` Andreas Dilger [this message]
2020-01-16 10:16           ` Christoph Hellwig
2020-01-15 20:55         ` David Howells
2020-01-15 22:11           ` Andreas Dilger
2020-01-15 23:09           ` David Howells
2020-01-26 18:19             ` Zygo Blaxell
2020-01-15 14:35       ` David Howells
2020-01-15 14:48         ` Christoph Hellwig
2020-01-15 14:59         ` David Howells
2020-01-16 10:13           ` Christoph Hellwig
2020-01-17 16:43           ` David Howells
2020-01-15 14:20   ` David Howells
2020-01-15  8:38 ` Christoph Hellwig
2020-01-15 13:50 ` David Howells
2020-01-15 14:05 ` David Howells
2020-01-15 14:24   ` Qu Wenruo
2020-01-15 14:50   ` David Howells
2020-01-15 14:15 ` David Howells

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C0F67EC5-7B5D-4179-9F28-95B84D9CC326@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git