linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] DAX fsync/msync support
@ 2015-12-08 19:18 Ross Zwisler
  2015-12-08 19:18 ` [PATCH v3 1/7] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Ross Zwisler @ 2015-12-08 19:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, H. Peter Anvin, J. Bruce Fields, Theodore Ts'o,
	Alexander Viro, Andreas Dilger, Dave Chinner, Ingo Molnar,
	Jan Kara, Jeff Layton, Matthew Wilcox, Thomas Gleixner,
	linux-ext4, linux-fsdevel, linux-mm, linux-nvdimm, x86, xfs,
	Andrew Morton, Dan Williams, Matthew Wilcox, Dave Hansen

This patch series adds a slimmed down version of fsync/msync support to
DAX.  The major change versus v2 of this patch series is that we no longer
remove DAX entries from the radix tree during fsync/msync calls.  Instead
the list of DAX entries in the radix tree grows for the lifetime of the
mapping.  We reclaim DAX entries from the radix tree via
clear_exceptional_entry() for truncate, when the filesystem is unmounted,
etc.

This change was made because if we try and remove radix tree entries during
writeback operations there are a number of race conditions that exist
between those writeback operations and page faults.  In the non-DAX case
these races are dealt with using the page lock, but we don't have a good
replacement lock with the same granularity.  These races could leave us in
a place where we have a DAX page that is dirty and writeable from userspace
but no longer in the radix tree.  This page would then be skipped during
subsequent writeback operations, which is unacceptable.

I do plan to continue to try and solve these race conditions so that we can
have a more optimal fsync/msync solution for DAX, but I wanted to get this
set out for v4.5 consideration while I continued working.  While
suboptimal the solution in this series gives us correct behavior for DAX
fsync/msync and seems like a reasonable short term compromise.

This series is built upon v4.4-rc4 plus the recent ext4 DAX series from Jan
Kara (http://www.spinics.net/lists/linux-ext4/msg49951.html) and a recent
XFS fix from Dave Chinner (https://lkml.org/lkml/2015/12/2/923).  The tree
with all this working can be found here:

https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsync_v3

Other changes versus v2:
 - Renamed dax_fsync() to dax_writeback_mapping_range(). (Dave Chinner)
 - Removed REQ_FUA/REQ_FLUSH support from the PMEM driver and instead just
   make the call to wmb_pmem() in dax_writeback_mapping_range().  (Dan)
 - Reworked some BUG_ON() calls to be a WARN_ON() followed by an error
   return.
 - Moved call to dax_writeback_mapping_range() from the filesystems down
   into filemap_write_and_wait_range(). (Dave Chinner)
 - Fixed handling of DAX read faults so they create a radix tree entry but
   don't mark it as dirty until the follow-up dax_pfn_mkwrite() call.
 - Update clear_exceptional_entry() and to dax_writeback_one() so they
   validate the DAX radix tree entry before they use it. (Dave Chinner)
 - Added a comment to find_get_entries_tag() to explain the restart
   condition. (Dave Chinner)

Ross Zwisler (7):
  pmem: add wb_cache_pmem() to the PMEM API
  dax: support dirty DAX entries in radix tree
  mm: add find_get_entries_tag()
  dax: add support for fsync/sync
  ext2: call dax_pfn_mkwrite() for DAX fsync/msync
  ext4: call dax_pfn_mkwrite() for DAX fsync/msync
  xfs: call dax_pfn_mkwrite() for DAX fsync/msync

 arch/x86/include/asm/pmem.h |  11 ++--
 fs/block_dev.c              |   3 +-
 fs/dax.c                    | 147 ++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/file.c              |   4 +-
 fs/ext4/file.c              |   4 +-
 fs/inode.c                  |   1 +
 fs/xfs/xfs_file.c           |   7 ++-
 include/linux/dax.h         |   7 +++
 include/linux/fs.h          |   1 +
 include/linux/pagemap.h     |   3 +
 include/linux/pmem.h        |  22 ++++++-
 include/linux/radix-tree.h  |   9 +++
 mm/filemap.c                |  84 ++++++++++++++++++++++++-
 mm/truncate.c               |  64 +++++++++++--------
 14 files changed, 319 insertions(+), 48 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-19  5:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 19:18 [PATCH v3 0/7] DAX fsync/msync support Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 1/7] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 2/7] dax: support dirty DAX entries in radix tree Ross Zwisler
2015-12-18  9:01   ` Jan Kara
2015-12-19  5:23     ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 3/7] mm: add find_get_entries_tag() Ross Zwisler
2015-12-09 19:44   ` Dan Williams
2015-12-10 20:24     ` Ross Zwisler
2015-12-10 20:31       ` Dan Williams
2015-12-18  9:33   ` Jan Kara
2015-12-08 19:18 ` [PATCH v3 4/7] dax: add support for fsync/sync Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 5/7] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 6/7] ext4: " Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 7/7] xfs: " Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).