linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/17 v4] dax, ext4, xfs: Synchronous page faults
@ 2017-10-19 12:57 Jan Kara
  2017-10-19 12:58 ` [PATCH 12/17] mm: Define MAP_SYNC and VM_SYNC flags Jan Kara
       [not found] ` <20171019125817.11580-1-jack-AlSwsSmVLrQ@public.gmane.org>
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kara @ 2017-10-19 12:57 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

Hello,

here is the fourth version of my patches to implement synchronous page faults
for DAX mappings to make flushing of DAX mappings possible from userspace so
that they can be flushed on finer than page granularity and also avoid the
overhead of a syscall.

We use a new mmap flag MAP_SYNC to indicate that page faults for the mapping
should be synchronous.  The guarantee provided by this flag is: While a block
is writeably mapped into page tables of this mapping, it is guaranteed to be
visible in the file at that offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX fault handler refrains
from inserting / write-enabling the page table entry and returns special flag
VM_FAULT_NEEDDSYNC together with a PFN to map to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to update the page table entry appropriately.

I did some basic performance testing on the patches over ramdisk - timed
latency of page faults when faulting 512 pages. I did several tests: with file
preallocated / with file empty, with background file copying going on / without
it, with / without MAP_SYNC (so that we get comparison).  The results are
(numbers are in microseconds):

File preallocated, no background load no MAP_SYNC:
min=9 avg=10 max=46
8 - 15 us: 508
16 - 31 us: 3
32 - 63 us: 1

File preallocated, no background load, MAP_SYNC:
min=9 avg=10 max=47
8 - 15 us: 508
16 - 31 us: 2
32 - 63 us: 2

File empty, no background load, no MAP_SYNC:
min=21 avg=22 max=70
16 - 31 us: 506
32 - 63 us: 5
64 - 127 us: 1

File empty, no background load, MAP_SYNC:
min=40 avg=124 max=242
32 - 63 us: 1
64 - 127 us: 333
128 - 255 us: 178

File empty, background load, no MAP_SYNC:
min=21 avg=23 max=67
16 - 31 us: 507
32 - 63 us: 4
64 - 127 us: 1

File empty, background load, MAP_SYNC:
min=94 avg=112 max=181
64 - 127 us: 489
128 - 255 us: 23

So here we can see the difference between MAP_SYNC vs non MAP_SYNC is about
100-200 us when we need to wait for transaction commit in this setup. 

Anyway, here are the patches and AFAICT the series is pretty much complete.
The only missing piece are tests which Ross is working on. Comments are
welcome.

Changes since v3:
* updated some changelogs
* folded fs support for VM_SYNC flag into patches implementing the
  functionality
* removed ->mmap_validate, use ->mmap_supported_flags instead
* added some Reviewed-by tags
* added manpage patch

Changes since v2:
* avoid unnecessary flushing of faulted page (Ross) - I've realized it makes no
  sense to remeasure my benchmark results (after actually doing that and seeing
  no difference, sigh) since I use ramdisk and not real PMEM HW and so flushes
  are ignored.
* handle nojournal mode of ext4
* other smaller cleanups & fixes (Ross)
* factor larger part of finishing of synchronous fault into a helper (Christoph)
* reorder pfnp argument of dax_iomap_fault() (Christoph)
* add XFS support from Christoph
* use proper MAP_SYNC support in mmap(2)
* rebased on top of 4.14-rc4

Changes since v1:
* switched to using mmap flag MAP_SYNC
* cleaned up fault handlers to avoid passing pfn in vmf->orig_pte
* switched to not touching page tables before we are ready to insert final
  entry as it was unnecessary and not really simplifying anything
* renamed fault flag to VM_FAULT_NEEDDSYNC
* other smaller fixes found by reviewers

								Honza

^ permalink raw reply	[flat|nested] 28+ messages in thread
* [PATCH 0/17 v5] dax, ext4, xfs: Synchronous page faults
@ 2017-10-24 15:23 Jan Kara
       [not found] ` <20171024152415.22864-1-jack-AlSwsSmVLrQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2017-10-24 15:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ross Zwisler, Christoph Hellwig, linux-ext4, linux-nvdimm,
	linux-fsdevel, linux-xfs, linux-api, linux-mm, Jan Kara

Hello,

here is the fifth version of my patches to implement synchronous page faults
for DAX mappings to make flushing of DAX mappings possible from userspace so
that they can be flushed on finer than page granularity and also avoid the
overhead of a syscall.

We use a new mmap flag MAP_SYNC to indicate that page faults for the mapping
should be synchronous.  The guarantee provided by this flag is: While a block
is writeably mapped into page tables of this mapping, it is guaranteed to be
visible in the file at that offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX fault handler refrains
from inserting / write-enabling the page table entry and returns special flag
VM_FAULT_NEEDDSYNC together with a PFN to map to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to update the page table entry appropriately.

I did some basic performance testing on the patches over ramdisk - timed
latency of page faults when faulting 512 pages. I did several tests: with file
preallocated / with file empty, with background file copying going on / without
it, with / without MAP_SYNC (so that we get comparison).  The results are
(numbers are in microseconds):

File preallocated, no background load no MAP_SYNC:
min=9 avg=10 max=46
8 - 15 us: 508
16 - 31 us: 3
32 - 63 us: 1

File preallocated, no background load, MAP_SYNC:
min=9 avg=10 max=47
8 - 15 us: 508
16 - 31 us: 2
32 - 63 us: 2

File empty, no background load, no MAP_SYNC:
min=21 avg=22 max=70
16 - 31 us: 506
32 - 63 us: 5
64 - 127 us: 1

File empty, no background load, MAP_SYNC:
min=40 avg=124 max=242
32 - 63 us: 1
64 - 127 us: 333
128 - 255 us: 178

File empty, background load, no MAP_SYNC:
min=21 avg=23 max=67
16 - 31 us: 507
32 - 63 us: 4
64 - 127 us: 1

File empty, background load, MAP_SYNC:
min=94 avg=112 max=181
64 - 127 us: 489
128 - 255 us: 23

So here we can see the difference between MAP_SYNC vs non MAP_SYNC is about
100-200 us when we need to wait for transaction commit in this setup. 

Anyway, here are the patches and since Ross already posted his patches to test
the functionality, I think we are ready to get this merged. I've talked with
Dan and he said he could take the patches through his tree, I'd just like to
get a final ack from Christoph on the patch modifying mmap(2).  Comments are
welcome.

Changes since v4:
* fixed couple of minor things in the manpage
* make legacy mmap flags always supported, remove them from mask declared
  to be supported by ext4 and xfs

Changes since v3:
* updated some changelogs
* folded fs support for VM_SYNC flag into patches implementing the
  functionality
* removed ->mmap_validate, use ->mmap_supported_flags instead
* added some Reviewed-by tags
* added manpage patch

Changes since v2:
* avoid unnecessary flushing of faulted page (Ross) - I've realized it makes no
  sense to remeasure my benchmark results (after actually doing that and seeing
  no difference, sigh) since I use ramdisk and not real PMEM HW and so flushes
  are ignored.
* handle nojournal mode of ext4
* other smaller cleanups & fixes (Ross)
* factor larger part of finishing of synchronous fault into a helper (Christoph)
* reorder pfnp argument of dax_iomap_fault() (Christoph)
* add XFS support from Christoph
* use proper MAP_SYNC support in mmap(2)
* rebased on top of 4.14-rc4

Changes since v1:
* switched to using mmap flag MAP_SYNC
* cleaned up fault handlers to avoid passing pfn in vmf->orig_pte
* switched to not touching page tables before we are ready to insert final
  entry as it was unnecessary and not really simplifying anything
* renamed fault flag to VM_FAULT_NEEDDSYNC
* other smaller fixes found by reviewers

								Honza

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-10-24 21:21 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-19 12:57 [PATCH 0/17 v4] dax, ext4, xfs: Synchronous page faults Jan Kara
2017-10-19 12:58 ` [PATCH 12/17] mm: Define MAP_SYNC and VM_SYNC flags Jan Kara
     [not found] ` <20171019125817.11580-1-jack-AlSwsSmVLrQ@public.gmane.org>
2017-10-19 12:58   ` [PATCH 01/17] mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags Jan Kara
     [not found]     ` <20171019125817.11580-2-jack-AlSwsSmVLrQ@public.gmane.org>
2017-10-19 16:48       ` Dan Williams
2017-10-20  7:27     ` Christoph Hellwig
     [not found]       ` <20171020072707.GA18000-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-10-24 13:08         ` Jan Kara
2017-10-19 12:58   ` [PATCH 02/17] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-10-19 12:58   ` [PATCH 03/17] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-10-19 12:58   ` [PATCH 04/17] dax: Factor out getting of pfn out of iomap Jan Kara
2017-10-19 12:58   ` [PATCH 05/17] dax: Create local variable for VMA in dax_iomap_pte_fault() Jan Kara
2017-10-19 12:58   ` [PATCH 06/17] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test Jan Kara
2017-10-19 12:58   ` [PATCH 07/17] dax: Inline dax_insert_mapping() into the callsite Jan Kara
2017-10-19 12:58   ` [PATCH 08/17] dax: Inline dax_pmd_insert_mapping() " Jan Kara
2017-10-19 12:58   ` [PATCH 09/17] dax: Fix comment describing dax_iomap_fault() Jan Kara
2017-10-19 12:58   ` [PATCH 10/17] dax: Allow dax_iomap_fault() to return pfn Jan Kara
2017-10-19 12:58   ` [PATCH 11/17] dax: Allow tuning whether dax_insert_mapping_entry() dirties entry Jan Kara
2017-10-19 12:58   ` [PATCH 13/17] dax, iomap: Add support for synchronous faults Jan Kara
2017-10-19 12:58   ` [PATCH 14/17] dax: Implement dax_finish_sync_fault() Jan Kara
2017-10-19 12:58   ` [PATCH 15/17] ext4: Simplify error handling in ext4_dax_huge_fault() Jan Kara
2017-10-19 12:58   ` [PATCH 16/17] ext4: Support for synchronous DAX faults Jan Kara
2017-10-19 12:58   ` [PATCH 17/17] xfs: support " Jan Kara
2017-10-19 13:17     ` Christoph Hellwig
2017-10-19 12:58   ` [PATCH] mmap.2: Add description of MAP_SHARED_VALIDATE and MAP_SYNC Jan Kara
     [not found]     ` <20171019125817.11580-19-jack-AlSwsSmVLrQ@public.gmane.org>
2017-10-20 21:47       ` Ross Zwisler
     [not found]         ` <20171020214753.GA15733-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-10-24 13:27           ` Jan Kara
     [not found]             ` <20171024132713.GD8556-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-10-24 14:55               ` Ross Zwisler
2017-10-24 15:23 [PATCH 0/17 v5] dax, ext4, xfs: Synchronous page faults Jan Kara
     [not found] ` <20171024152415.22864-1-jack-AlSwsSmVLrQ@public.gmane.org>
2017-10-24 15:23   ` [PATCH 01/17] mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags Jan Kara
     [not found]     ` <20171024152415.22864-2-jack-AlSwsSmVLrQ@public.gmane.org>
2017-10-24 21:21       ` Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).