linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, hch@lst.de, tytso@mit.edu,
	adilger.kernel@dilger.ca, darrick.wong@oracle.com, clm@fb.com,
	josef@toxicpanda.com, dsterba@suse.com,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Problems with determining data presence by examining extents?
Date: Wed, 15 Jan 2020 14:05:03 +0000	[thread overview]
Message-ID: <23358.1579097103@warthog.procyon.org.uk> (raw)
In-Reply-To: <00fc7691-77d5-5947-5493-5c97f262da81@gmx.com>

Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> At least for btrfs, only unaligned extents get padding zeros.

What is "unaligned" defined as?  The revised cachefiles reads and writes 256k
blocks, except for the last - which gets rounded up to the nearest page (which
I'm assuming will be some multiple of the direct-I/O granularity).  The actual
size of the data is noted in an xattr so I don't need to rely on the size of
the cachefile.

> (c): A multi-device fs (btrfs) can have their own logical address mapping.
> Meaning the bytenr returned makes no sense to end user, unless used for
> that fs specific address space.

For the purpose of cachefiles, I don't care where it is, only whether or not
it exists.  Further, if a DIO read will return a short read when it hits a
hole, then I only really care about detecting whether the first byte exists in
the block.

It might be cheaper, I suppose, to initiate the read and have it fail
immediately if no data at all is present in the block than to do a separate
ask of the filesystem.

> You won't like this case either.
> (d): Compressed extents
> One compressed extent can represents more data than its on-disk size.

Same answer as above.  Btw, since I'm using DIO reads and writes, would these
get compressed?

> And even more bad news:
> (e): write time dedupe
> Although no fs known has implemented it yet (btrfs used to try to
> support that, and I guess XFS could do it in theory too), you won't
> known when a fs could get such "awesome" feature.

I'm not sure this isn't the same answer as above either, except if this
results in parts of the file being "filled in" with blocks of zeros that I
haven't supplied.  Couldn't this be disabled on an inode-by-inode basis, say
with an ioctl?

> > Without being able to trust the filesystem to tell me accurately what I've
> > written into it, I have to use some other mechanism.  Currently, I've
> > switched to storing a map in an xattr with 1 bit per 256k block, but that
> > gets hard to use if the file grows particularly large and also has
> > integrity consequences - though those are hopefully limited as I'm now
> > using DIO to store data into the cache.
> 
> Would you like to explain why you want to know such fs internal info?

As Andreas pointed out, fscache+cachefiles is used to cache data locally for
network filesystems (9p, afs, ceph, cifs, nfs).  Cached files may be sparse,
with unreferenced blocks not currently stored in the cache.

I'm attempting to move to a model where I don't use bmap and don't monitor
bit-waitqueues to find out when page flags flip on backing files so that I can
copy data out, but rather use DIO directly to/from the network filesystem
inode pages.

Since the backing filesystem has to keep track of whether data is stored in a
file, it would seem a shame to have to maintain a parallel copy on the same
medium, with the coherency issues that entail.

David



  parent reply	other threads:[~2020-01-15 14:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:48 Problems with determining data presence by examining extents? David Howells
2020-01-14 22:49 ` Theodore Y. Ts'o
2020-01-15  3:54 ` Qu Wenruo
2020-01-15 12:46   ` Andreas Dilger
2020-01-15 13:10     ` Qu Wenruo
2020-01-15 13:31       ` Christoph Hellwig
2020-01-15 19:48         ` Andreas Dilger
2020-01-16 10:16           ` Christoph Hellwig
2020-01-15 20:55         ` David Howells
2020-01-15 22:11           ` Andreas Dilger
2020-01-15 23:09           ` David Howells
2020-01-26 18:19             ` Zygo Blaxell
2020-01-15 14:35       ` David Howells
2020-01-15 14:48         ` Christoph Hellwig
2020-01-15 14:59         ` David Howells
2020-01-16 10:13           ` Christoph Hellwig
2020-01-17 16:43           ` David Howells
2020-01-15 14:20   ` David Howells
2020-01-15  8:38 ` Christoph Hellwig
2020-01-15 13:50 ` David Howells
2020-01-15 14:05 ` David Howells [this message]
2020-01-15 14:24   ` Qu Wenruo
2020-01-15 14:50   ` David Howells
2020-01-15 14:15 ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23358.1579097103@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).