Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: David Howells <dhowells@redhat.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, hch@lst.de, tytso@mit.edu,
	adilger.kernel@dilger.ca, darrick.wong@oracle.com, clm@fb.com,
	josef@toxicpanda.com, dsterba@suse.com,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Problems with determining data presence by examining extents?
Date: Wed, 15 Jan 2020 14:50:22 +0000
Message-ID: <27263.1579099822@warthog.procyon.org.uk> (raw)
In-Reply-To: <6330a53c-781b-83d7-8293-405787979736@gmx.com>

Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> "Unaligned" means "unaligned to fs sector size". In btrfs it's page
> size, thus it shouldn't be a problem for your 256K block size.

Cool.

> > Same answer as above.  Btw, since I'm using DIO reads and writes, would these
> > get compressed?
> 
> Yes. DIO will also be compressed unless you set the inode to nocompression.
> 
> And you may not like this btrfs internal design:
> Compressed extent can only be as large as 128K (uncompressed size).
> 
> So 256K block write will be split into 2 extents anyway.
> And since compressed extent will cause non-continuous physical offset,
> it will always be two extents to fiemap, even you're always writing in
> 256K block size.
> 
> Not sure if this matters though.

Not a problem, provided I can read them with a single DIO read.  I just need
to know whether the data is present.  I don't need to know where it is or what
hoops the filesystem goes through to get it.

> > I'm not sure this isn't the same answer as above either, except if this
> > results in parts of the file being "filled in" with blocks of zeros that I
> > haven't supplied.
> 
> The example would be, you have written 256K data, all filled with 0xaa.
> And it committed to disk.
> Then the next time you write another 256K data, all filled with 0xaa.
> Then instead of writing this data onto disk, the fs chooses to reuse
> your previous written data, doing a reflink to it.

That's fine as long as the filesystem says it's there when I ask for it.
Having it shared isn't a problem.

But that brings me back to the original issue and that's the potential problem
of the filesystem optimising storage by adding or removing blocks of zero
bytes.  If either of those can happen, I cannot rely on the filesystem
metadata.

> So fiemap would report your latter 256K has the same bytenr of your
> previous 256K write (since it's reflinked), and with SHARED flag.

It might be better for me to use SEEK_HOLE than fiemap - barring the slight
issues that SEEK_HOLE has no upper bound and that writes may be taking place
at the same time.

David


  parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:48 David Howells
2020-01-14 22:49 ` Theodore Y. Ts'o
2020-01-15  3:54 ` Qu Wenruo
2020-01-15 12:46   ` Andreas Dilger
2020-01-15 13:10     ` Qu Wenruo
2020-01-15 13:31       ` Christoph Hellwig
2020-01-15 19:48         ` Andreas Dilger
2020-01-16 10:16           ` Christoph Hellwig
2020-01-15 20:55         ` David Howells
2020-01-15 22:11           ` Andreas Dilger
2020-01-15 23:09           ` David Howells
2020-01-26 18:19             ` Zygo Blaxell
2020-01-15 14:35       ` David Howells
2020-01-15 14:48         ` Christoph Hellwig
2020-01-15 14:59         ` David Howells
2020-01-16 10:13           ` Christoph Hellwig
2020-01-17 16:43           ` David Howells
2020-01-15 14:20   ` David Howells
2020-01-15  8:38 ` Christoph Hellwig
2020-01-15 13:50 ` David Howells
2020-01-15 14:05 ` David Howells
2020-01-15 14:24   ` Qu Wenruo
2020-01-15 14:50   ` David Howells [this message]
2020-01-15 14:15 ` David Howells

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27263.1579099822@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git