All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Dave Chinner <david@fromorbit.com>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org,
	xfs@oss.sgi.com, Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: [PATCH v3 0/7] DAX fsync/msync support
Date: Tue,  8 Dec 2015 12:18:38 -0700	[thread overview]
Message-ID: <1449602325-20572-1-git-send-email-ross.zwisler@linux.intel.com> (raw)

This patch series adds a slimmed down version of fsync/msync support to
DAX.  The major change versus v2 of this patch series is that we no longer
remove DAX entries from the radix tree during fsync/msync calls.  Instead
the list of DAX entries in the radix tree grows for the lifetime of the
mapping.  We reclaim DAX entries from the radix tree via
clear_exceptional_entry() for truncate, when the filesystem is unmounted,
etc.

This change was made because if we try and remove radix tree entries during
writeback operations there are a number of race conditions that exist
between those writeback operations and page faults.  In the non-DAX case
these races are dealt with using the page lock, but we don't have a good
replacement lock with the same granularity.  These races could leave us in
a place where we have a DAX page that is dirty and writeable from userspace
but no longer in the radix tree.  This page would then be skipped during
subsequent writeback operations, which is unacceptable.

I do plan to continue to try and solve these race conditions so that we can
have a more optimal fsync/msync solution for DAX, but I wanted to get this
set out for v4.5 consideration while I continued working.  While
suboptimal the solution in this series gives us correct behavior for DAX
fsync/msync and seems like a reasonable short term compromise.

This series is built upon v4.4-rc4 plus the recent ext4 DAX series from Jan
Kara (http://www.spinics.net/lists/linux-ext4/msg49951.html) and a recent
XFS fix from Dave Chinner (https://lkml.org/lkml/2015/12/2/923).  The tree
with all this working can be found here:

https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsync_v3

Other changes versus v2:
 - Renamed dax_fsync() to dax_writeback_mapping_range(). (Dave Chinner)
 - Removed REQ_FUA/REQ_FLUSH support from the PMEM driver and instead just
   make the call to wmb_pmem() in dax_writeback_mapping_range().  (Dan)
 - Reworked some BUG_ON() calls to be a WARN_ON() followed by an error
   return.
 - Moved call to dax_writeback_mapping_range() from the filesystems down
   into filemap_write_and_wait_range(). (Dave Chinner)
 - Fixed handling of DAX read faults so they create a radix tree entry but
   don't mark it as dirty until the follow-up dax_pfn_mkwrite() call.
 - Update clear_exceptional_entry() and to dax_writeback_one() so they
   validate the DAX radix tree entry before they use it. (Dave Chinner)
 - Added a comment to find_get_entries_tag() to explain the restart
   condition. (Dave Chinner)

Ross Zwisler (7):
  pmem: add wb_cache_pmem() to the PMEM API
  dax: support dirty DAX entries in radix tree
  mm: add find_get_entries_tag()
  dax: add support for fsync/sync
  ext2: call dax_pfn_mkwrite() for DAX fsync/msync
  ext4: call dax_pfn_mkwrite() for DAX fsync/msync
  xfs: call dax_pfn_mkwrite() for DAX fsync/msync

 arch/x86/include/asm/pmem.h |  11 ++--
 fs/block_dev.c              |   3 +-
 fs/dax.c                    | 147 ++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/file.c              |   4 +-
 fs/ext4/file.c              |   4 +-
 fs/inode.c                  |   1 +
 fs/xfs/xfs_file.c           |   7 ++-
 include/linux/dax.h         |   7 +++
 include/linux/fs.h          |   1 +
 include/linux/pagemap.h     |   3 +
 include/linux/pmem.h        |  22 ++++++-
 include/linux/radix-tree.h  |   9 +++
 mm/filemap.c                |  84 ++++++++++++++++++++++++-
 mm/truncate.c               |  64 +++++++++++--------
 14 files changed, 319 insertions(+), 48 deletions(-)

-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Dave Chinner <david@fromorbit.com>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org,
	xfs@oss.sgi.com, Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: [PATCH v3 0/7] DAX fsync/msync support
Date: Tue,  8 Dec 2015 12:18:38 -0700	[thread overview]
Message-ID: <1449602325-20572-1-git-send-email-ross.zwisler@linux.intel.com> (raw)

This patch series adds a slimmed down version of fsync/msync support to
DAX.  The major change versus v2 of this patch series is that we no longer
remove DAX entries from the radix tree during fsync/msync calls.  Instead
the list of DAX entries in the radix tree grows for the lifetime of the
mapping.  We reclaim DAX entries from the radix tree via
clear_exceptional_entry() for truncate, when the filesystem is unmounted,
etc.

This change was made because if we try and remove radix tree entries during
writeback operations there are a number of race conditions that exist
between those writeback operations and page faults.  In the non-DAX case
these races are dealt with using the page lock, but we don't have a good
replacement lock with the same granularity.  These races could leave us in
a place where we have a DAX page that is dirty and writeable from userspace
but no longer in the radix tree.  This page would then be skipped during
subsequent writeback operations, which is unacceptable.

I do plan to continue to try and solve these race conditions so that we can
have a more optimal fsync/msync solution for DAX, but I wanted to get this
set out for v4.5 consideration while I continued working.  While
suboptimal the solution in this series gives us correct behavior for DAX
fsync/msync and seems like a reasonable short term compromise.

This series is built upon v4.4-rc4 plus the recent ext4 DAX series from Jan
Kara (http://www.spinics.net/lists/linux-ext4/msg49951.html) and a recent
XFS fix from Dave Chinner (https://lkml.org/lkml/2015/12/2/923).  The tree
with all this working can be found here:

https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsync_v3

Other changes versus v2:
 - Renamed dax_fsync() to dax_writeback_mapping_range(). (Dave Chinner)
 - Removed REQ_FUA/REQ_FLUSH support from the PMEM driver and instead just
   make the call to wmb_pmem() in dax_writeback_mapping_range().  (Dan)
 - Reworked some BUG_ON() calls to be a WARN_ON() followed by an error
   return.
 - Moved call to dax_writeback_mapping_range() from the filesystems down
   into filemap_write_and_wait_range(). (Dave Chinner)
 - Fixed handling of DAX read faults so they create a radix tree entry but
   don't mark it as dirty until the follow-up dax_pfn_mkwrite() call.
 - Update clear_exceptional_entry() and to dax_writeback_one() so they
   validate the DAX radix tree entry before they use it. (Dave Chinner)
 - Added a comment to find_get_entries_tag() to explain the restart
   condition. (Dave Chinner)

Ross Zwisler (7):
  pmem: add wb_cache_pmem() to the PMEM API
  dax: support dirty DAX entries in radix tree
  mm: add find_get_entries_tag()
  dax: add support for fsync/sync
  ext2: call dax_pfn_mkwrite() for DAX fsync/msync
  ext4: call dax_pfn_mkwrite() for DAX fsync/msync
  xfs: call dax_pfn_mkwrite() for DAX fsync/msync

 arch/x86/include/asm/pmem.h |  11 ++--
 fs/block_dev.c              |   3 +-
 fs/dax.c                    | 147 ++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/file.c              |   4 +-
 fs/ext4/file.c              |   4 +-
 fs/inode.c                  |   1 +
 fs/xfs/xfs_file.c           |   7 ++-
 include/linux/dax.h         |   7 +++
 include/linux/fs.h          |   1 +
 include/linux/pagemap.h     |   3 +
 include/linux/pmem.h        |  22 ++++++-
 include/linux/radix-tree.h  |   9 +++
 mm/filemap.c                |  84 ++++++++++++++++++++++++-
 mm/truncate.c               |  64 +++++++++++--------
 14 files changed, 319 insertions(+), 48 deletions(-)

-- 
2.5.0


WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, Andreas Dilger <adilger.kernel@dilger.ca>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-nvdimm@lists.01.org, x86@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-ext4@vger.kernel.org, xfs@oss.sgi.com,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.com>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: [PATCH v3 0/7] DAX fsync/msync support
Date: Tue,  8 Dec 2015 12:18:38 -0700	[thread overview]
Message-ID: <1449602325-20572-1-git-send-email-ross.zwisler@linux.intel.com> (raw)

This patch series adds a slimmed down version of fsync/msync support to
DAX.  The major change versus v2 of this patch series is that we no longer
remove DAX entries from the radix tree during fsync/msync calls.  Instead
the list of DAX entries in the radix tree grows for the lifetime of the
mapping.  We reclaim DAX entries from the radix tree via
clear_exceptional_entry() for truncate, when the filesystem is unmounted,
etc.

This change was made because if we try and remove radix tree entries during
writeback operations there are a number of race conditions that exist
between those writeback operations and page faults.  In the non-DAX case
these races are dealt with using the page lock, but we don't have a good
replacement lock with the same granularity.  These races could leave us in
a place where we have a DAX page that is dirty and writeable from userspace
but no longer in the radix tree.  This page would then be skipped during
subsequent writeback operations, which is unacceptable.

I do plan to continue to try and solve these race conditions so that we can
have a more optimal fsync/msync solution for DAX, but I wanted to get this
set out for v4.5 consideration while I continued working.  While
suboptimal the solution in this series gives us correct behavior for DAX
fsync/msync and seems like a reasonable short term compromise.

This series is built upon v4.4-rc4 plus the recent ext4 DAX series from Jan
Kara (http://www.spinics.net/lists/linux-ext4/msg49951.html) and a recent
XFS fix from Dave Chinner (https://lkml.org/lkml/2015/12/2/923).  The tree
with all this working can be found here:

https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsync_v3

Other changes versus v2:
 - Renamed dax_fsync() to dax_writeback_mapping_range(). (Dave Chinner)
 - Removed REQ_FUA/REQ_FLUSH support from the PMEM driver and instead just
   make the call to wmb_pmem() in dax_writeback_mapping_range().  (Dan)
 - Reworked some BUG_ON() calls to be a WARN_ON() followed by an error
   return.
 - Moved call to dax_writeback_mapping_range() from the filesystems down
   into filemap_write_and_wait_range(). (Dave Chinner)
 - Fixed handling of DAX read faults so they create a radix tree entry but
   don't mark it as dirty until the follow-up dax_pfn_mkwrite() call.
 - Update clear_exceptional_entry() and to dax_writeback_one() so they
   validate the DAX radix tree entry before they use it. (Dave Chinner)
 - Added a comment to find_get_entries_tag() to explain the restart
   condition. (Dave Chinner)

Ross Zwisler (7):
  pmem: add wb_cache_pmem() to the PMEM API
  dax: support dirty DAX entries in radix tree
  mm: add find_get_entries_tag()
  dax: add support for fsync/sync
  ext2: call dax_pfn_mkwrite() for DAX fsync/msync
  ext4: call dax_pfn_mkwrite() for DAX fsync/msync
  xfs: call dax_pfn_mkwrite() for DAX fsync/msync

 arch/x86/include/asm/pmem.h |  11 ++--
 fs/block_dev.c              |   3 +-
 fs/dax.c                    | 147 ++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/file.c              |   4 +-
 fs/ext4/file.c              |   4 +-
 fs/inode.c                  |   1 +
 fs/xfs/xfs_file.c           |   7 ++-
 include/linux/dax.h         |   7 +++
 include/linux/fs.h          |   1 +
 include/linux/pagemap.h     |   3 +
 include/linux/pmem.h        |  22 ++++++-
 include/linux/radix-tree.h  |   9 +++
 mm/filemap.c                |  84 ++++++++++++++++++++++++-
 mm/truncate.c               |  64 +++++++++++--------
 14 files changed, 319 insertions(+), 48 deletions(-)

-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

             reply	other threads:[~2015-12-08 19:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-08 19:18 Ross Zwisler [this message]
2015-12-08 19:18 ` [PATCH v3 0/7] DAX fsync/msync support Ross Zwisler
2015-12-08 19:18 ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 1/7] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 2/7] dax: support dirty DAX entries in radix tree Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-18  9:01   ` Jan Kara
2015-12-18  9:01     ` Jan Kara
2015-12-18  9:01     ` Jan Kara
2015-12-19  5:23     ` Ross Zwisler
2015-12-19  5:23       ` Ross Zwisler
2015-12-19  5:23       ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 3/7] mm: add find_get_entries_tag() Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-09 19:44   ` Dan Williams
2015-12-09 19:44     ` Dan Williams
2015-12-09 19:44     ` Dan Williams
2015-12-10 20:24     ` Ross Zwisler
2015-12-10 20:24       ` Ross Zwisler
2015-12-10 20:24       ` Ross Zwisler
2015-12-10 20:31       ` Dan Williams
2015-12-10 20:31         ` Dan Williams
2015-12-10 20:31         ` Dan Williams
2015-12-10 20:31         ` Dan Williams
2015-12-10 20:31         ` Dan Williams
2015-12-18  9:33   ` Jan Kara
2015-12-18  9:33     ` Jan Kara
2015-12-18  9:33     ` Jan Kara
2015-12-08 19:18 ` [PATCH v3 4/7] dax: add support for fsync/sync Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 5/7] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 6/7] ext4: " Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18 ` [PATCH v3 7/7] xfs: " Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler
2015-12-08 19:18   ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1449602325-20572-1-git-send-email-ross.zwisler@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.