Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* Strange SEEK_HOLE / SEEK_DATA behavior
@ 2020-10-26 14:57 Jan Kara
  2020-10-26 15:14 ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2020-10-26 14:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, linux-xfs, Matthew Wilcox

Hello!

When reviewing Matthew's THP patches I've noticed one odd behavior which
got copied from current iomap seek hole/data helpers. Currently we have:

# fallocate -l 4096 testfile
# xfs_io -x -c "seek -h 0" testfile
Whence	Result
HOLE	0
# dd if=testfile bs=4096 count=1 of=/dev/null
# xfs_io -x -c "seek -h 0" testfile
Whence	Result
HOLE	4096

So once we read from an unwritten extent, the areas with cached pages
suddently become treated as data. Later when pages get evicted, they become
treated as holes again. Strictly speaking I wouldn't say this is a bug
since nobody promises we won't treat holes as data but it looks weird.
Shouldn't we treat clean pages over unwritten extents still as holes and
only once the page becomes dirty treat is as data? What do other people
think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Strange SEEK_HOLE / SEEK_DATA behavior
  2020-10-26 14:57 Strange SEEK_HOLE / SEEK_DATA behavior Jan Kara
@ 2020-10-26 15:14 ` Matthew Wilcox
  2020-10-26 16:48   ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2020-10-26 15:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-ext4, linux-xfs

On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote:
> Hello!
> 
> When reviewing Matthew's THP patches I've noticed one odd behavior which
> got copied from current iomap seek hole/data helpers. Currently we have:
> 
> # fallocate -l 4096 testfile
> # xfs_io -x -c "seek -h 0" testfile
> Whence	Result
> HOLE	0
> # dd if=testfile bs=4096 count=1 of=/dev/null
> # xfs_io -x -c "seek -h 0" testfile
> Whence	Result
> HOLE	4096
> 
> So once we read from an unwritten extent, the areas with cached pages
> suddently become treated as data. Later when pages get evicted, they become
> treated as holes again. Strictly speaking I wouldn't say this is a bug
> since nobody promises we won't treat holes as data but it looks weird.
> Shouldn't we treat clean pages over unwritten extents still as holes and
> only once the page becomes dirty treat is as data? What do other people
> think?

I think we actually discussed this recently.  Unless I misunderstood
one or both messages:

https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@dread.disaster.area/

I agree it's not great, but I'm not sure it's worth getting it "right"
by tracking whether a page contains only zeroes.

I have been vaguely thinking about optimising for read-mostly workloads
on sparse files by storing a magic entry that means "use the zero
page" in the page cache instead of a page, like DAX does (only better).
It hasn't risen to the top of my list yet.  Does anyone have a workload
that would benefit from it?

(I don't mean "can anybody construct one"; that's trivially possible.
I mean, do any customers care about the performance of that workload?)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Strange SEEK_HOLE / SEEK_DATA behavior
  2020-10-26 15:14 ` Matthew Wilcox
@ 2020-10-26 16:48   ` Jan Kara
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Kara @ 2020-10-26 16:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-ext4, linux-xfs

On Mon 26-10-20 15:14:04, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote:
> > Hello!
> > 
> > When reviewing Matthew's THP patches I've noticed one odd behavior which
> > got copied from current iomap seek hole/data helpers. Currently we have:
> > 
> > # fallocate -l 4096 testfile
> > # xfs_io -x -c "seek -h 0" testfile
> > Whence	Result
> > HOLE	0
> > # dd if=testfile bs=4096 count=1 of=/dev/null
> > # xfs_io -x -c "seek -h 0" testfile
> > Whence	Result
> > HOLE	4096
> > 
> > So once we read from an unwritten extent, the areas with cached pages
> > suddently become treated as data. Later when pages get evicted, they become
> > treated as holes again. Strictly speaking I wouldn't say this is a bug
> > since nobody promises we won't treat holes as data but it looks weird.
> > Shouldn't we treat clean pages over unwritten extents still as holes and
> > only once the page becomes dirty treat is as data? What do other people
> > think?
> 
> I think we actually discussed this recently.  Unless I misunderstood
> one or both messages:
> 
> https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@dread.disaster.area/

Thanks for the link. That indeed explains it, the concern is that if we'd
check for PageDirty like I suggested, then it would be racy (page could
have been written out just before we found it but after we've received
block mapping from the filesystem). So using PageUptodate is less racy
(although still somewhat racy because page could be also reclaimed).

> I agree it's not great, but I'm not sure it's worth getting it "right"
> by tracking whether a page contains only zeroes.

Yeah, I don't think it's worth it just for this.

> I have been vaguely thinking about optimising for read-mostly workloads
> on sparse files by storing a magic entry that means "use the zero
> page" in the page cache instead of a page, like DAX does (only better).
> It hasn't risen to the top of my list yet.  Does anyone have a workload
> that would benefit from it?
> 
> (I don't mean "can anybody construct one"; that's trivially possible.
> I mean, do any customers care about the performance of that workload?)

No workload comes to my mind now.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26 14:57 Strange SEEK_HOLE / SEEK_DATA behavior Jan Kara
2020-10-26 15:14 ` Matthew Wilcox
2020-10-26 16:48   ` Jan Kara

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git