nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] dax, ext4: Synchronous page faults
@ 2017-07-27 13:12 Jan Kara
  2017-07-27 13:12 ` [PATCH 1/7] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
                   ` (8 more replies)
  0 siblings, 9 replies; 41+ messages in thread
From: Jan Kara @ 2017-07-27 13:12 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Jan Kara, linux-nvdimm, Dave Chinner,
	linux-xfs, Andy Lutomirski, linux-ext4

Hello,

after last discussions about whether / how to make flushing of DAX mappings
possible from userspace so that they can be flushed on finer than page
granularity and also avoid the overhead of a syscall, I've decided to give
a stab at implementing "synchronous page faults" idea for ext4 so that
we can see whether that is reasonably possible to implement and how would
such implementation look like. This patch set is the result.

So the functionality this patches implement: We have an inode flag (currently
I abuse S_SYNC inode flag for this and IMHO it kind of makes sense but if
people hate that I'm certainly open to using new flag in the final
implementation) that marks inode as requiring synchronous page faults.
The guarantee provided by this flag on inode is: While a block is writeably
mapped into page tables, it is guaranteed to be visible in the file at that
offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX maps page table entries
read-only and returns special flag VM_FAULT_RO to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to write-enable the page table entries.

>From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
same and it should be even possible for filesystem to implement safe remapping
of a file offset to a different block (i.e. break reflink, do defrag, or
similar stuff) like:

1) Block page faults
2) fdatasync() remapped range (there can be outstanding data modifications
   not yet flushed)
3) unmap_mapping_range()
4) Now remap blocks
5) Unblock page faults

Basically we do the same on events like punch hole so there is not much new
there.

There are couple of open questions with this implementation:

1) Is it worth the hassle?
2) Is S_SYNC good flag to use or should we use a new inode flag?
3) VM_FAULT_RO and especially passing of resulting 'pfn' from
   dax_iomap_fault() through filesystem fault handler to dax_pfn_mkwrite() in
   vmf->orig_pte is a bit of a hack. So far I'm not sure how to refactor
   things to make this cleaner.

Anyway, here are the patches, comments are welcome.

								Honza
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-08-21 19:55 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 13:12 [RFC PATCH 0/7] dax, ext4: Synchronous page faults Jan Kara
2017-07-27 13:12 ` [PATCH 1/7] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-07-27 21:57   ` Ross Zwisler
2017-08-01 10:52   ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 2/7] dax: Add sync argument to dax_iomap_fault() Jan Kara
2017-07-27 22:06   ` Ross Zwisler
2017-07-28  9:40     ` Jan Kara
2017-07-27 13:12 ` [PATCH 3/7] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-07-27 22:09   ` Ross Zwisler
2017-08-01 10:54   ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 4/7] dax: Make dax_insert_mapping() return VM_FAULT_ state Jan Kara
2017-07-27 22:22   ` Ross Zwisler
2017-07-28  9:43     ` Jan Kara
2017-07-27 13:12 ` [PATCH 5/7] dax, iomap: Add support for synchronous faults Jan Kara
2017-07-27 22:42   ` Ross Zwisler
2017-08-01 10:56     ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 6/7] dax: Implement dax_pfn_mkwrite() Jan Kara
2017-07-27 22:53   ` Ross Zwisler
2017-07-27 23:04     ` Ross Zwisler
2017-07-28 10:37     ` Jan Kara
2017-07-27 13:12 ` [PATCH 7/7] ext4: Support for synchronous DAX faults Jan Kara
2017-07-27 22:57   ` Ross Zwisler
2017-07-27 14:09 ` [RFC PATCH 0/7] dax, ext4: Synchronous page faults Jeff Moyer
2017-07-27 21:57   ` Ross Zwisler
2017-07-28  2:05     ` Andy Lutomirski
2017-07-28  9:38       ` Jan Kara
2017-08-01 11:02         ` Christoph Hellwig
2017-08-01 11:26           ` Jan Kara
2017-08-08  0:24             ` Dan Williams
2017-08-11 10:03               ` Christoph Hellwig
2017-08-13  2:44                 ` Dan Williams
2017-08-13  9:25                   ` Christoph Hellwig
2017-08-13 17:08                     ` Dan Williams
2017-08-14  8:30                     ` Jan Kara
2017-08-14 14:04                     ` Boaz Harrosh
2017-08-14 16:03                       ` Dan Williams
2017-08-15  9:06                         ` Boaz Harrosh
2017-08-15  9:44                           ` Boaz Harrosh
2017-08-21 19:57                         ` Ross Zwisler
2017-08-17 16:08                       ` Jan Kara
2017-08-01 10:52 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).