Linux-ext4 Archive on
 help / color / Atom feed
From: Dave Chinner <>
To: Matthew Wilcox <>
Cc: Jan Kara <>,,,, Ted Tso <>,
	Christoph Hellwig <>,
	Amir Goldstein <>
Subject: Re: [PATCH 0/7 RFC v3] fs: Hole punch vs page cache filling races
Date: Wed, 21 Apr 2021 08:12:55 +1000
Message-ID: <20210420221255.GX63242@dread.disaster.area> (raw)
In-Reply-To: <>

On Mon, Apr 19, 2021 at 04:20:08PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 13, 2021 at 01:28:44PM +0200, Jan Kara wrote:
> > Also when writing the documentation I came across one question: Do we mandate
> > i_mapping_sem for truncate + hole punch for all filesystems or just for
> > filesystems that support hole punching (or other complex fallocate operations)?
> > I wrote the documentation so that we require every filesystem to use
> > i_mapping_sem. This makes locking rules simpler, we can also add asserts when
> > all filesystems are converted. The downside is that simple filesystems now pay
> > the overhead of the locking unnecessary for them. The overhead is small
> > (uncontended rwsem acquisition for truncate) so I don't think we care and the
> > simplicity is worth it but I wanted to spell this out.
> What do we actually get in return for supporting these complex fallocate
> operations?  Someone added them for a reason, but does that reason
> actually benefit me?  Other than running xfstests, how many times has
> holepunch been called on your laptop in the last week?

Quite a lot, actually.

> I don't want to
> incur even one extra instruction per I/O operation to support something
> that happens twice a week; that's a bad tradeoff.

Hole punching gets into all sorts of interesting places. For
example, did you know that issuing fstrim (discards) or "write
zeroes" on a file-backed loopback device will issue hole punches to
the underlying file? nvmet does the same. Userspace iscsi server
implementations (e.g. TGT) do the same thing and have for a long
time. NFSv4 servers issue hole punching based on client side
requests, too.

Then there's Kubernetes management tools. Samba. Qemu. Libvirt.
Mysql. Network-Manager. Gluster. Chromium. RocksDB. Swift. Systemd.
The list of core system infrastructure we have that uses hole
punching is quite large...

So, really, hole punching is something that happens a lot and in
many unexpected places. You can argue that your laptop doesn't use
it, but that really doesn't matter in the bigger scheme of things.
Hole punching is something applications expect to work and not
corrupt data....

> Can we implement holepunch as a NOP?  Or return -ENOTTY?  Those both
> seem like better solutions than adding an extra rwsem to every inode.

We've already added this extra i_rwsem to ext4 and XFS - it's a sunk
cost for almost every production machine out there in the wild. It
needs to be made generic so we can optimise the implementation and
not have to implement a unicorn in every filesystem to work around
the fact the page cache and page faults have no internal
serialisation mechanism against filesystem operations that directly
manipulate and invalidate large ranges of the backing storage the
page cache sits over.

> Failing that, is there a bigger hammer we can use on the holepunch side
> (eg preventing all concurrent accesses while the holepunch is happening)
> to reduce the overhead on the read side?

That's what we currently do and what Jan is trying to refine....


Dave Chinner

      parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 11:28 Jan Kara
2021-04-13 11:28 ` [PATCH 1/7] mm: Fix comments mentioning i_mutex Jan Kara
2021-04-13 12:38   ` Christoph Hellwig
2021-04-13 11:28 ` [PATCH 2/7] mm: Protect operations adding pages to page cache with i_mapping_lock Jan Kara
2021-04-13 12:57   ` Christoph Hellwig
2021-04-13 13:56     ` Jan Kara
2021-04-14  0:01   ` Dave Chinner
2021-04-14 12:23     ` Jan Kara
2021-04-14 21:57       ` Dave Chinner
2021-04-15 13:11         ` Jan Kara
2021-04-14 22:25     ` Matthew Wilcox
2021-04-15  2:05       ` Dave Chinner
2021-04-13 11:28 ` [PATCH 3/7] ext4: Convert to use inode->i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 4/7] ext2: Convert to using i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 5/7] xfs: Convert to use i_mapping_sem Jan Kara
2021-04-13 13:05   ` Christoph Hellwig
2021-04-13 13:42     ` Jan Kara
2021-04-13 11:28 ` [PATCH 6/7] zonefs: Convert to using i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 7/7] fs: Remove i_mapping_sem protection from .page_mkwrite handlers Jan Kara
2021-04-13 13:09 ` [PATCH 0/7 RFC v3] fs: Hole punch vs page cache filling races Christoph Hellwig
2021-04-13 14:17   ` Jan Kara
2021-04-19 15:20 ` Matthew Wilcox
2021-04-19 16:25   ` Jan Kara
2021-04-20 22:12   ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210420221255.GX63242@dread.disaster.area \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on

Archives are clonable:
	git clone --mirror linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ \
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone