Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: David Howells <dhowells@redhat.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, hch@lst.de, adilger.kernel@dilger.ca,
	darrick.wong@oracle.com, clm@fb.com, josef@toxicpanda.com,
	dsterba@suse.com, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: Problems with determining data presence by examining extents?
Date: Wed, 15 Jan 2020 13:50:11 +0000
Message-ID: <22056.1579096211@warthog.procyon.org.uk> (raw)
In-Reply-To: <20200114224917.GA165687@mit.edu>

Theodore Y. Ts'o <tytso@mit.edu> wrote:

> but I'm not sure we would want to make any guarantees with respect to (b).

Um.  That would potentially make disconnected operation problematic.  Now,
it's unlikely that I'll want to store a 256KiB block of zeros, but not
impossible.

> I suspect I understand why you want this; I've fielded some requests
> for people wanting to do something very like this at $WORK, for what I
> assume to be for the same reason you're seeking to do this; to create
> do incremental caching of files and letting the file system track what
> has and hasn't been cached yet.

Exactly so.  If I can't tap in to the filesystem's own map of what data is
present in a file, then I have to do it myself in parallel.  Keeping my own
list or map has a number of issues:

 (1) It's redundant.  I have to maintain a second copy of what the filesystem
     already maintains.  This uses extra space.

 (2) My map may get out of step with the filesystem after a crash.  The
     filesystem has tools to deal with this in its own structures.

 (3) If the file is very large and sparse, then keeping a bit-per-block map in
     a single xattr may not suffice or may become unmanageable.  There's a
     limit of 64k, which for bit-per-256k limits the maximum mappable size to
     1TiB (I could use multiple xattrs, but some filesystems may have total
     xattr limits) and whatever the size, I need a single buffer big enough to
     hold it.

     I could use a second file as a metadata cache - but that has worse
     coherency properties.  (As I understand it, setxattr is synchronous and
     journalled.)

> If we were going to add such a facility, what we could perhaps do is
> to define a new flag indicating that a particular file should have no
> extent mapping optimization applied, such that FIEMAP would return a
> mapping if and only if userspace had written to a particular block, or
> had requested that a block be preallocated using fallocate().  The
> flag could only be set on a zero-length file, and this might disable
> certain advanced file system features, such as reflink, at the file
> system's discretion; and there might be unspecified performance
> impacts if this flag is set on a file.

That would be fine for cachefiles.

Also, I don't need to know *where* the data is, only that the first byte of my
block exists - if a DIO read returns short when it reaches a hole.

David


  parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:48 David Howells
2020-01-14 22:49 ` Theodore Y. Ts'o
2020-01-15  3:54 ` Qu Wenruo
2020-01-15 12:46   ` Andreas Dilger
2020-01-15 13:10     ` Qu Wenruo
2020-01-15 13:31       ` Christoph Hellwig
2020-01-15 19:48         ` Andreas Dilger
2020-01-16 10:16           ` Christoph Hellwig
2020-01-15 20:55         ` David Howells
2020-01-15 22:11           ` Andreas Dilger
2020-01-15 23:09           ` David Howells
2020-01-26 18:19             ` Zygo Blaxell
2020-01-15 14:35       ` David Howells
2020-01-15 14:48         ` Christoph Hellwig
2020-01-15 14:59         ` David Howells
2020-01-16 10:13           ` Christoph Hellwig
2020-01-17 16:43           ` David Howells
2020-01-15 14:20   ` David Howells
2020-01-15  8:38 ` Christoph Hellwig
2020-01-15 13:50 ` David Howells [this message]
2020-01-15 14:05 ` David Howells
2020-01-15 14:24   ` Qu Wenruo
2020-01-15 14:50   ` David Howells
2020-01-15 14:15 ` David Howells

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22056.1579096211@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git