linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] daxfile: enable byte-addressable updates to pmem
@ 2017-06-17  1:15 Dan Williams
       [not found] ` <149766212410.22552.15957843500156182524.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2017-06-17  1:15 ` [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem Dan Williams
  0 siblings, 2 replies; 39+ messages in thread
From: Dan Williams @ 2017-06-17  1:15 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Dave Chinner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig

Quoting PATCH 2/2:

    To date, the full promise of byte-addressable access to persistent
    memory has only been half realized via the filesystem-dax interface. The
    current filesystem-dax mechanism allows an application to consume (read)
    data from persistent storage at byte-size granularity, bypassing the
    full page reads required by traditional storage devices.
    
    Now, for writes, applications still need to contend with
    page-granularity dirtying and flushing semantics as well as filesystem
    coordination for metadata updates after any mmap write. The current
    situation precludes use cases that leverage byte-granularity / in-place
    updates to persistent media.
    
    To get around this limitation there are some specialized applications
    that are using the device-dax interface to bypass the overhead and
    data-safety problems of the current filesystem-dax mmap-write path.
    QEMU-KVM is forced to use device-dax to safely pass through persistent
    memory to a guest [1]. Some specialized databases are using device-dax
    for byte-granularity writes. Outside of those cases, device-dax is
    difficult for general purpose persistent memory applications to consume.
    There is demand for access to pmem without needing to contend with
    special device configuration and other device-dax limitations.
    
    The 'daxfile' interface satisfies this demand and realizes one of Dave
    Chinner's ideas for allowing pmem applications to safely bypass
    fsync/msync requirements. The idea is to make the file immutable with
    respect to the offset-to-block mappings for every extent in the file
    [2]. It turns out that filesystems already need to make this guarantee
    today. This property is needed for files marked as swap files.
    
    The new daxctl() syscall manages setting a file into 'static-dax' mode
    whereby it arranges for the file to be treated as a swapfile as far as
    the filesystem is concerned, but not registered with the core-mm as
    swapfile space. A file in this mode is then safe to be mapped and
    written without the requirement to fsync/msync the writes.  The cpu
    cache management for flushing data to persistence can be handled
    completely in userspace.
   
As can be seen in the patches there are still some TODOs to resolve in
the code, but this otherwise appears to solve the problem of persistent
memory applications needing to coordinate any and all writes to a file
mapping with fsync/msync.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01207.html
[2]: https://lkml.org/lkml/2016/9/11/159

---

Dan Williams (2):
      mm: introduce bmap_walk()
      mm, fs: daxfile, an interface for byte-addressable updates to pmem


 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 include/linux/dax.h                    |    9 ++
 include/linux/fs.h                     |    3 +
 include/linux/syscalls.h               |    1 
 include/uapi/linux/dax.h               |    8 +
 mm/Kconfig                             |    5 +
 mm/Makefile                            |    1 
 mm/daxfile.c                           |  186 ++++++++++++++++++++++++++++++++
 mm/page_io.c                           |  117 +++++++++++++++++---
 9 files changed, 312 insertions(+), 19 deletions(-)
 create mode 100644 include/uapi/linux/dax.h
 create mode 100644 mm/daxfile.c

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-06-23  3:07 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-17  1:15 [RFC PATCH 0/2] daxfile: enable byte-addressable updates to pmem Dan Williams
     [not found] ` <149766212410.22552.15957843500156182524.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-06-17  1:15   ` [RFC PATCH 1/2] mm: introduce bmap_walk() Dan Williams
2017-06-17  5:22     ` Christoph Hellwig
     [not found]       ` <20170617052212.GA8246-jcswGhMUV9g@public.gmane.org>
2017-06-17 12:29         ` Dan Williams
2017-06-18  7:51           ` Christoph Hellwig
2017-06-19 16:18             ` Darrick J. Wong
     [not found]             ` <20170618075152.GA25871-jcswGhMUV9g@public.gmane.org>
2017-06-19 18:19               ` Al Viro
2017-06-20  7:34                 ` Christoph Hellwig
2017-06-17  1:15 ` [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem Dan Williams
     [not found]   ` <149766213493.22552.4057048843646200083.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-06-17 16:25     ` Andy Lutomirski
2017-06-17 21:52       ` Dan Williams
     [not found]         ` <CAPcyv4j4UEegViDJcLZjVv5AFGC18-DcvHFnhZatB0hH3BY85g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-17 23:50           ` Andy Lutomirski
2017-06-18  3:15             ` Dan Williams
2017-06-18  5:05               ` Andy Lutomirski
     [not found]                 ` <CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-19 13:21                   ` Dave Chinner
2017-06-19 15:22                     ` Andy Lutomirski
     [not found]                       ` <CALCETrUe0igzK0RZTSSondkCY3ApYQti89tOh00f0j_APrf_dQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20  0:46                         ` Dave Chinner
2017-06-20  5:53                           ` Andy Lutomirski
2017-06-20  8:49                             ` Christoph Hellwig
     [not found]                               ` <20170620084924.GA9752-jcswGhMUV9g@public.gmane.org>
2017-06-20 16:17                                 ` Dan Williams
     [not found]                                   ` <CAPcyv4jkH6iwDoG4NnCaTNXozwYgVXiJDe2iFSONcE63KvGQoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 16:26                                     ` Andy Lutomirski
2017-06-20 23:53                                   ` Dave Chinner
2017-06-21  1:24                                     ` Darrick J. Wong
2017-06-21  2:19                                       ` Dave Chinner
     [not found]                             ` <CALCETrVuoPDRuuhc9X8eVCYiFUzWLSTRkcjbD6jas_2J2GixNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 10:11                               ` Dave Chinner
2017-06-20 16:14                                 ` Andy Lutomirski
2017-06-21  1:40                                   ` Dave Chinner
2017-06-21  5:18                                     ` Andy Lutomirski
     [not found]                                       ` <CALCETrVYmbyNS-btvsN_M-QyWPZA_Y_4JXOM893g7nhZA+WviQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-22  0:02                                         ` Dave Chinner
2017-06-22  4:07                                           ` Andy Lutomirski
2017-06-23  0:52                                             ` Dave Chinner
2017-06-23  3:07                                               ` Andy Lutomirski
2017-06-18  8:18               ` Christoph Hellwig
     [not found]                 ` <20170618081850.GA26332-jcswGhMUV9g@public.gmane.org>
2017-06-19  1:51                   ` Dan Williams
2017-06-20  5:22   ` Darrick J. Wong
2017-06-20 15:42     ` Ross Zwisler
2017-06-22  7:09       ` Darrick J. Wong
     [not found]     ` <20170620052214.GA3787-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2017-06-21 23:37       ` Dave Chinner
2017-06-22  7:23         ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).