linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3 RFC] fs: Hole punch vs page cache filling races
@ 2021-01-20 16:06 Jan Kara
  2021-01-20 16:06 ` [PATCH 1/3] mm: Do not pass iter into generic_file_buffered_read_get_pages() Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Jan Kara @ 2021-01-20 16:06 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox, linux-ext4, Jan Kara

Hello,

Amir has reported [1] a that ext4 has a potential issues when reads can race
with hole punching possibly exposing stale data from freed blocks or even
corrupting filesystem when stale mapping data gets used for writeout. The
problem is that during hole punching, new page cache pages can get instantiated
in a punched range after truncate_inode_pages() has run but before the
filesystem removes blocks from the file.  In principle any filesystem
implementing hole punching thus needs to implement a mechanism to block
instantiating page cache pages during hole punching to avoid this race. This is
further complicated by the fact that there are multiple places that can
instantiate pages in page cache.  We can have regular read(2) or page fault
doing this but fadvise(2) or madvise(2) can also result in reading in page
cache pages through force_page_cache_readahead().

There are couple of ways how to fix this. First way (currently implemented by
XFS) is to protect read(2) and *advise(2) calls with i_rwsem so that they are
serialized with hole punching. This is easy to do but as a result all reads
would then be serialized with writes and thus mixed read-write workloads suffer
heavily on ext4. Thus for ext4 I want to use EXT4_I(inode)->i_mmap_sem for
serialization of reads and hole punching. The same serialization that is
already currently used in ext4 to close this race for page faults. This is
conceptually simple but lock ordering is troublesome - since
EXT4_I(inode)->i_mmap_sem is used in page fault path, it ranks below mmap_sem.
Thus we cannot simply grab EXT4_I(inode)->i_mmap_sem in ext4_file_read_iter()
as generic_file_buffered_read() copies data to userspace which may require
grabbing mmap_sem. Also grabbing EXT4_I(inode)->i_mmap_sem in ext4_readpages()
/ ext4_readpage() is problematic because at that point we already have locked
pages instantiated in the page cache. So EXT4_I(inode)->i_mmap_sem would
effectively rank below page lock which is too low in the locking hierarchy.  So
for ext4 (and other filesystems with similar locking constraints - F2FS, GFS2,
OCFS2, ...) we'd need another hook in the read path that can wrap around
insertion of pages into page cache but does not contain copying of data into
userspace.

This patch set implements one possibility of such hook - we essentially
abstract generic_file_buffered_read_get_pages() into a hook. I'm not completely
sold on the naming or the API, or even whether this is the best place for the
hook. But I wanted to send something out for further discussion. For example
another workable option for ext4 would be to have an aops hook for adding a
page into page cache (essentially abstract add_to_page_cache_lru()). There will
be slight downside that it would mean per-page acquisition of the lock instead
of a per-batch-of-pages, also if we ever transition to range locking the
mapping, per-batch locking would be more efficient.

What do people think about this?

								Honza

[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-06 16:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 16:06 [PATCH 0/3 RFC] fs: Hole punch vs page cache filling races Jan Kara
2021-01-20 16:06 ` [PATCH 1/3] mm: Do not pass iter into generic_file_buffered_read_get_pages() Jan Kara
2021-01-20 16:18   ` Christoph Hellwig
2021-01-20 16:06 ` [PATCH 2/3] mm: Provide address_space operation for filling pages for read Jan Kara
2021-01-20 16:20   ` Christoph Hellwig
2021-01-20 17:27     ` Jan Kara
2021-01-20 17:28       ` Christoph Hellwig
2021-01-20 17:56         ` Matthew Wilcox
2021-04-02 21:17     ` Kent Overstreet
2021-04-06 12:21       ` Jan Kara
2021-01-20 16:06 ` [PATCH 3/3] ext4: Fix stale data exposure when read races with hole punch Jan Kara
2021-01-21 19:27 ` [PATCH 0/3 RFC] fs: Hole punch vs page cache filling races Matthew Wilcox
2021-01-22 14:32   ` Jan Kara
2021-04-02 19:34 ` Theodore Ts'o
2021-04-06 12:17   ` Jan Kara
2021-04-06 16:45     ` Theodore Ts'o
2021-04-06 16:50       ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).