All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults
@ 2017-08-17 16:08 ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Hello,

here is second version of my patches to implement synchronous page faults for
DAX mappings to make flushing of DAX mappings possible from userspace so that
they can be flushed on finer than page granularity and also avoid the overhead
of a syscall.

We use a new mmap flag MAP_SYNC to indicate that page faults for the mapping
should be synchronous.  The guarantee provided by this flag is: While a block
is writeably mapped into page tables of this mapping, it is guaranteed to be
visible in the file at that offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX fault handler refrains
from inserting / write-enabling the page table entry and returns special flag
VM_FAULT_NEEDDSYNC together with a PFN to map to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to update the page table entry appropriately.

>From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
same and it should be even possible for filesystem to implement safe remapping
of a file offset to a different block (i.e. break reflink, do defrag, or
similar stuff) like:

1) Block page faults
2) fdatasync() remapped range (there can be outstanding data modifications
   not yet flushed)
3) unmap_mapping_range()
4) Now remap blocks
5) Unblock page faults

Basically we do the same on events like punch hole so there is not much new
there.

Note that the implementation of MAP_SYNC flag is pretty crude for now just to
enable testing since Dan is working in the same area to implement another mmap
flag. Once the decision on how to implement new mmap flag is settled, I can
clean up that patch.

I did some basic performance testing on the patches over ramdisk - timed
latency of page faults when faulting 512 pages. I did several tests: with file
preallocated / with file empty, with background file copying going on / without
it, with / without MAP_SYNC (so that we get comparison).  The results are
(numbers are in microseconds):

File preallocated, no background load no MAP_SYNC:
min=5 avg=6 max=42
4 - 7 us: 398
8 - 15 us: 110
16 - 31 us: 2
32 - 63 us: 2

File preallocated, no background load, MAP_SYNC:
min=10 avg=10 max=43
8 - 15 us: 509
16 - 31 us: 2
32 - 63 us: 1

File empty, no background load, no MAP_SYNC:
min=21 avg=23 max=76
16 - 31 us: 503
32 - 63 us: 8
64 - 127 us: 1

File empty, no background load, MAP_SYNC:
min=91 avg=108 max=234
64 - 127 us: 467
128 - 255 us: 45

File empty, background load, no MAP_SYNC:
min=21 avg=23 max=67
16 - 31 us: 507
32 - 63 us: 4
64 - 127 us: 1

File empty, background load, MAP_SYNC:
min=94 avg=112 max=181
64 - 127 us: 489
128 - 255 us: 23

So here we can see the difference between MAP_SYNC vs non MAP_SYNC is about
100-200 us when we need to wait for transaction commit in this setup. 

Anyway, here are the patches, comments are welcome.

Changes since v1:
* switched to using mmap flag MAP_SYNC
* cleaned up fault handlers to avoid passing pfn in vmf->orig_pte
* switched to not touching page tables before we are ready to insert final
  entry as it was unnecessary and not really simplifying anything
* renamed fault flag to VM_FAULT_NEEDDSYNC
* other smaller fixes found by reviewers

								Honza
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults
@ 2017-08-17 16:08 ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Hello,

here is second version of my patches to implement synchronous page faults for
DAX mappings to make flushing of DAX mappings possible from userspace so that
they can be flushed on finer than page granularity and also avoid the overhead
of a syscall.

We use a new mmap flag MAP_SYNC to indicate that page faults for the mapping
should be synchronous.  The guarantee provided by this flag is: While a block
is writeably mapped into page tables of this mapping, it is guaranteed to be
visible in the file at that offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX fault handler refrains
from inserting / write-enabling the page table entry and returns special flag
VM_FAULT_NEEDDSYNC together with a PFN to map to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to update the page table entry appropriately.

>From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
same and it should be even possible for filesystem to implement safe remapping
of a file offset to a different block (i.e. break reflink, do defrag, or
similar stuff) like:

1) Block page faults
2) fdatasync() remapped range (there can be outstanding data modifications
   not yet flushed)
3) unmap_mapping_range()
4) Now remap blocks
5) Unblock page faults

Basically we do the same on events like punch hole so there is not much new
there.

Note that the implementation of MAP_SYNC flag is pretty crude for now just to
enable testing since Dan is working in the same area to implement another mmap
flag. Once the decision on how to implement new mmap flag is settled, I can
clean up that patch.

I did some basic performance testing on the patches over ramdisk - timed
latency of page faults when faulting 512 pages. I did several tests: with file
preallocated / with file empty, with background file copying going on / without
it, with / without MAP_SYNC (so that we get comparison).  The results are
(numbers are in microseconds):

File preallocated, no background load no MAP_SYNC:
min=5 avg=6 max=42
4 - 7 us: 398
8 - 15 us: 110
16 - 31 us: 2
32 - 63 us: 2

File preallocated, no background load, MAP_SYNC:
min=10 avg=10 max=43
8 - 15 us: 509
16 - 31 us: 2
32 - 63 us: 1

File empty, no background load, no MAP_SYNC:
min=21 avg=23 max=76
16 - 31 us: 503
32 - 63 us: 8
64 - 127 us: 1

File empty, no background load, MAP_SYNC:
min=91 avg=108 max=234
64 - 127 us: 467
128 - 255 us: 45

File empty, background load, no MAP_SYNC:
min=21 avg=23 max=67
16 - 31 us: 507
32 - 63 us: 4
64 - 127 us: 1

File empty, background load, MAP_SYNC:
min=94 avg=112 max=181
64 - 127 us: 489
128 - 255 us: 23

So here we can see the difference between MAP_SYNC vs non MAP_SYNC is about
100-200 us when we need to wait for transaction commit in this setup. 

Anyway, here are the patches, comments are welcome.

Changes since v1:
* switched to using mmap flag MAP_SYNC
* cleaned up fault handlers to avoid passing pfn in vmf->orig_pte
* switched to not touching page tables before we are ready to insert final
  entry as it was unnecessary and not really simplifying anything
* renamed fault flag to VM_FAULT_NEEDDSYNC
* other smaller fixes found by reviewers

								Honza

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults
@ 2017-08-17 16:08 ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

Hello,

here is second version of my patches to implement synchronous page faults for
DAX mappings to make flushing of DAX mappings possible from userspace so that
they can be flushed on finer than page granularity and also avoid the overhead
of a syscall.

We use a new mmap flag MAP_SYNC to indicate that page faults for the mapping
should be synchronous.  The guarantee provided by this flag is: While a block
is writeably mapped into page tables of this mapping, it is guaranteed to be
visible in the file at that offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX fault handler refrains
from inserting / write-enabling the page table entry and returns special flag
VM_FAULT_NEEDDSYNC together with a PFN to map to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to update the page table entry appropriately.

>From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
same and it should be even possible for filesystem to implement safe remapping
of a file offset to a different block (i.e. break reflink, do defrag, or
similar stuff) like:

1) Block page faults
2) fdatasync() remapped range (there can be outstanding data modifications
   not yet flushed)
3) unmap_mapping_range()
4) Now remap blocks
5) Unblock page faults

Basically we do the same on events like punch hole so there is not much new
there.

Note that the implementation of MAP_SYNC flag is pretty crude for now just to
enable testing since Dan is working in the same area to implement another mmap
flag. Once the decision on how to implement new mmap flag is settled, I can
clean up that patch.

I did some basic performance testing on the patches over ramdisk - timed
latency of page faults when faulting 512 pages. I did several tests: with file
preallocated / with file empty, with background file copying going on / without
it, with / without MAP_SYNC (so that we get comparison).  The results are
(numbers are in microseconds):

File preallocated, no background load no MAP_SYNC:
min=5 avg=6 max=42
4 - 7 us: 398
8 - 15 us: 110
16 - 31 us: 2
32 - 63 us: 2

File preallocated, no background load, MAP_SYNC:
min=10 avg=10 max=43
8 - 15 us: 509
16 - 31 us: 2
32 - 63 us: 1

File empty, no background load, no MAP_SYNC:
min=21 avg=23 max=76
16 - 31 us: 503
32 - 63 us: 8
64 - 127 us: 1

File empty, no background load, MAP_SYNC:
min=91 avg=108 max=234
64 - 127 us: 467
128 - 255 us: 45

File empty, background load, no MAP_SYNC:
min=21 avg=23 max=67
16 - 31 us: 507
32 - 63 us: 4
64 - 127 us: 1

File empty, background load, MAP_SYNC:
min=94 avg=112 max=181
64 - 127 us: 489
128 - 255 us: 23

So here we can see the difference between MAP_SYNC vs non MAP_SYNC is about
100-200 us when we need to wait for transaction commit in this setup. 

Anyway, here are the patches, comments are welcome.

Changes since v1:
* switched to using mmap flag MAP_SYNC
* cleaned up fault handlers to avoid passing pfn in vmf->orig_pte
* switched to not touching page tables before we are ready to insert final
  entry as it was unnecessary and not really simplifying anything
* renamed fault flag to VM_FAULT_NEEDDSYNC
* other smaller fixes found by reviewers

								Honza

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults
@ 2017-08-17 16:08 ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Hello,

here is second version of my patches to implement synchronous page faults for
DAX mappings to make flushing of DAX mappings possible from userspace so that
they can be flushed on finer than page granularity and also avoid the overhead
of a syscall.

We use a new mmap flag MAP_SYNC to indicate that page faults for the mapping
should be synchronous.  The guarantee provided by this flag is: While a block
is writeably mapped into page tables of this mapping, it is guaranteed to be
visible in the file at that offset also after a crash.

How I implement this is that ->iomap_begin() indicates by a flag that inode
block mapping metadata is unstable and may need flushing (use the same test as
whether fdatasync() has metadata to write). If yes, DAX fault handler refrains
from inserting / write-enabling the page table entry and returns special flag
VM_FAULT_NEEDDSYNC together with a PFN to map to the filesystem fault handler.
The handler then calls fdatasync() (vfs_fsync_range()) for the affected range
and after that calls DAX code to update the page table entry appropriately.

>From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
same and it should be even possible for filesystem to implement safe remapping
of a file offset to a different block (i.e. break reflink, do defrag, or
similar stuff) like:

1) Block page faults
2) fdatasync() remapped range (there can be outstanding data modifications
   not yet flushed)
3) unmap_mapping_range()
4) Now remap blocks
5) Unblock page faults

Basically we do the same on events like punch hole so there is not much new
there.

Note that the implementation of MAP_SYNC flag is pretty crude for now just to
enable testing since Dan is working in the same area to implement another mmap
flag. Once the decision on how to implement new mmap flag is settled, I can
clean up that patch.

I did some basic performance testing on the patches over ramdisk - timed
latency of page faults when faulting 512 pages. I did several tests: with file
preallocated / with file empty, with background file copying going on / without
it, with / without MAP_SYNC (so that we get comparison).  The results are
(numbers are in microseconds):

File preallocated, no background load no MAP_SYNC:
min=5 avg=6 max=42
4 - 7 us: 398
8 - 15 us: 110
16 - 31 us: 2
32 - 63 us: 2

File preallocated, no background load, MAP_SYNC:
min=10 avg=10 max=43
8 - 15 us: 509
16 - 31 us: 2
32 - 63 us: 1

File empty, no background load, no MAP_SYNC:
min=21 avg=23 max=76
16 - 31 us: 503
32 - 63 us: 8
64 - 127 us: 1

File empty, no background load, MAP_SYNC:
min=91 avg=108 max=234
64 - 127 us: 467
128 - 255 us: 45

File empty, background load, no MAP_SYNC:
min=21 avg=23 max=67
16 - 31 us: 507
32 - 63 us: 4
64 - 127 us: 1

File empty, background load, MAP_SYNC:
min=94 avg=112 max=181
64 - 127 us: 489
128 - 255 us: 23

So here we can see the difference between MAP_SYNC vs non MAP_SYNC is about
100-200 us when we need to wait for transaction commit in this setup. 

Anyway, here are the patches, comments are welcome.

Changes since v1:
* switched to using mmap flag MAP_SYNC
* cleaned up fault handlers to avoid passing pfn in vmf->orig_pte
* switched to not touching page tables before we are ready to insert final
  entry as it was unnecessary and not really simplifying anything
* renamed fault flag to VM_FAULT_NEEDDSYNC
* other smaller fixes found by reviewers

								Honza

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

It is unused.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 483e84cf9fc6..fa036093e76c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,8 +1143,6 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
-#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
-
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 			 VM_FAULT_FALLBACK)
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

It is unused.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 483e84cf9fc6..fa036093e76c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,8 +1143,6 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
-#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
-
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 			 VM_FAULT_FALLBACK)
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

It is unused.

Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 include/linux/mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 483e84cf9fc6..fa036093e76c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,8 +1143,6 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
-#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
-
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 			 VM_FAULT_FALLBACK)
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

It is unused.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 483e84cf9fc6..fa036093e76c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,8 +1143,6 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
-#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
-
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 			 VM_FAULT_FALLBACK)
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping()
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

dax_insert_mapping() has lots of arguments and a lot of them is actuall
duplicated by passing vm_fault structure as well. Change the function to
take the same arguments as dax_pmd_insert_mapping().

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75760f7bdf5d..fbbb6d176987 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -814,23 +814,30 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static int dax_insert_mapping(struct address_space *mapping,
-		struct block_device *bdev, struct dax_device *dax_dev,
-		sector_t sector, size_t size, void *entry,
-		struct vm_area_struct *vma, struct vm_fault *vmf)
+static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
+{
+	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
 {
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	unsigned long vaddr = vmf->address;
 	void *ret, *kaddr;
 	pgoff_t pgoff;
 	int id, rc;
 	pfn_t pfn;
 
-	rc = bdev_dax_pgoff(bdev, sector, size, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
 	if (rc)
 		return rc;
 
 	id = dax_read_lock();
-	rc = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
+	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
+			       &kaddr, &pfn);
 	if (rc < 0) {
 		dax_read_unlock(id);
 		return rc;
@@ -930,11 +937,6 @@ int __dax_zero_page_range(struct block_device *bdev,
 }
 EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
-static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
-{
-	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
-}
-
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
@@ -1076,7 +1078,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
-	sector_t sector;
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
@@ -1129,9 +1130,9 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto error_finish_iomap;
 	}
 
-	sector = dax_iomap_sector(&iomap, pos);
-
 	if (vmf->cow_page) {
+		sector_t sector = dax_iomap_sector(&iomap, pos);
+
 		switch (iomap.type) {
 		case IOMAP_HOLE:
 		case IOMAP_UNWRITTEN:
@@ -1164,8 +1165,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_dev,
-				sector, PAGE_SIZE, entry, vmf->vma, vmf);
+		error = dax_insert_mapping(vmf, &iomap, pos, entry);
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

dax_insert_mapping() has lots of arguments and a lot of them is actuall
duplicated by passing vm_fault structure as well. Change the function to
take the same arguments as dax_pmd_insert_mapping().

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75760f7bdf5d..fbbb6d176987 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -814,23 +814,30 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static int dax_insert_mapping(struct address_space *mapping,
-		struct block_device *bdev, struct dax_device *dax_dev,
-		sector_t sector, size_t size, void *entry,
-		struct vm_area_struct *vma, struct vm_fault *vmf)
+static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
+{
+	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
 {
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	unsigned long vaddr = vmf->address;
 	void *ret, *kaddr;
 	pgoff_t pgoff;
 	int id, rc;
 	pfn_t pfn;
 
-	rc = bdev_dax_pgoff(bdev, sector, size, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
 	if (rc)
 		return rc;
 
 	id = dax_read_lock();
-	rc = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
+	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
+			       &kaddr, &pfn);
 	if (rc < 0) {
 		dax_read_unlock(id);
 		return rc;
@@ -930,11 +937,6 @@ int __dax_zero_page_range(struct block_device *bdev,
 }
 EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
-static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
-{
-	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
-}
-
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
@@ -1076,7 +1078,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
-	sector_t sector;
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
@@ -1129,9 +1130,9 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto error_finish_iomap;
 	}
 
-	sector = dax_iomap_sector(&iomap, pos);
-
 	if (vmf->cow_page) {
+		sector_t sector = dax_iomap_sector(&iomap, pos);
+
 		switch (iomap.type) {
 		case IOMAP_HOLE:
 		case IOMAP_UNWRITTEN:
@@ -1164,8 +1165,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_dev,
-				sector, PAGE_SIZE, entry, vmf->vma, vmf);
+		error = dax_insert_mapping(vmf, &iomap, pos, entry);
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

dax_insert_mapping() has lots of arguments and a lot of them is actuall
duplicated by passing vm_fault structure as well. Change the function to
take the same arguments as dax_pmd_insert_mapping().

Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75760f7bdf5d..fbbb6d176987 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -814,23 +814,30 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static int dax_insert_mapping(struct address_space *mapping,
-		struct block_device *bdev, struct dax_device *dax_dev,
-		sector_t sector, size_t size, void *entry,
-		struct vm_area_struct *vma, struct vm_fault *vmf)
+static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
+{
+	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
 {
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	unsigned long vaddr = vmf->address;
 	void *ret, *kaddr;
 	pgoff_t pgoff;
 	int id, rc;
 	pfn_t pfn;
 
-	rc = bdev_dax_pgoff(bdev, sector, size, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
 	if (rc)
 		return rc;
 
 	id = dax_read_lock();
-	rc = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
+	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
+			       &kaddr, &pfn);
 	if (rc < 0) {
 		dax_read_unlock(id);
 		return rc;
@@ -930,11 +937,6 @@ int __dax_zero_page_range(struct block_device *bdev,
 }
 EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
-static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
-{
-	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
-}
-
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
@@ -1076,7 +1078,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
-	sector_t sector;
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
@@ -1129,9 +1130,9 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto error_finish_iomap;
 	}
 
-	sector = dax_iomap_sector(&iomap, pos);
-
 	if (vmf->cow_page) {
+		sector_t sector = dax_iomap_sector(&iomap, pos);
+
 		switch (iomap.type) {
 		case IOMAP_HOLE:
 		case IOMAP_UNWRITTEN:
@@ -1164,8 +1165,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_dev,
-				sector, PAGE_SIZE, entry, vmf->vma, vmf);
+		error = dax_insert_mapping(vmf, &iomap, pos, entry);
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

dax_insert_mapping() has lots of arguments and a lot of them is actuall
duplicated by passing vm_fault structure as well. Change the function to
take the same arguments as dax_pmd_insert_mapping().

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75760f7bdf5d..fbbb6d176987 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -814,23 +814,30 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static int dax_insert_mapping(struct address_space *mapping,
-		struct block_device *bdev, struct dax_device *dax_dev,
-		sector_t sector, size_t size, void *entry,
-		struct vm_area_struct *vma, struct vm_fault *vmf)
+static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
+{
+	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
 {
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	unsigned long vaddr = vmf->address;
 	void *ret, *kaddr;
 	pgoff_t pgoff;
 	int id, rc;
 	pfn_t pfn;
 
-	rc = bdev_dax_pgoff(bdev, sector, size, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
 	if (rc)
 		return rc;
 
 	id = dax_read_lock();
-	rc = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
+	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
+			       &kaddr, &pfn);
 	if (rc < 0) {
 		dax_read_unlock(id);
 		return rc;
@@ -930,11 +937,6 @@ int __dax_zero_page_range(struct block_device *bdev,
 }
 EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
-static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
-{
-	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
-}
-
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
@@ -1076,7 +1078,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
-	sector_t sector;
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
@@ -1129,9 +1130,9 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto error_finish_iomap;
 	}
 
-	sector = dax_iomap_sector(&iomap, pos);
-
 	if (vmf->cow_page) {
+		sector_t sector = dax_iomap_sector(&iomap, pos);
+
 		switch (iomap.type) {
 		case IOMAP_HOLE:
 		case IOMAP_UNWRITTEN:
@@ -1164,8 +1165,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_dev,
-				sector, PAGE_SIZE, entry, vmf->vma, vmf);
+		error = dax_insert_mapping(vmf, &iomap, pos, entry);
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 03/13] dax: Factor out getting of pfn out of iomap
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Factor out code to get pfn out of iomap that is shared between PTE and
PMD fault path.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 81 +++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 39 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index fbbb6d176987..1122356f8b88 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -819,30 +819,53 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
 	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
+static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
+			 pfn_t *pfnp)
 {
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret, *kaddr;
 	pgoff_t pgoff;
+	void *kaddr;
 	int id, rc;
-	pfn_t pfn;
+	long length;
 
-	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, size, &pgoff);
 	if (rc)
 		return rc;
-
 	id = dax_read_lock();
-	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
-			       &kaddr, &pfn);
-	if (rc < 0) {
-		dax_read_unlock(id);
-		return rc;
+	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
+				   &kaddr, pfnp);
+	if (length < 0) {
+		rc = length;
+		goto out;
 	}
+	rc = -EINVAL;
+	if (PFN_PHYS(length) < size)
+		goto out;
+	if (pfn_t_to_pfn(*pfnp) & (PHYS_PFN(size)-1))
+		goto out;
+	/* For larger pages we need devmap */
+	if (length > 1 && !pfn_t_devmap(*pfnp))
+		goto out;
+	rc = 0;
+out:
 	dax_read_unlock(id);
+	return rc;
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
+{
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	unsigned long vaddr = vmf->address;
+	void *ret;
+	int rc;
+	pfn_t pfn;
+
+	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
+	if (rc < 0)
+		return rc;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
 	if (IS_ERR(ret))
@@ -1218,44 +1241,24 @@ static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct dax_device *dax_dev = iomap->dax_dev;
-	struct block_device *bdev = iomap->bdev;
 	struct inode *inode = mapping->host;
-	const size_t size = PMD_SIZE;
-	void *ret = NULL, *kaddr;
-	long length = 0;
-	pgoff_t pgoff;
+	void *ret = NULL;
 	pfn_t pfn;
-	int id;
+	int rc;
 
-	if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
+	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
+	if (rc < 0)
 		goto fallback;
 
-	id = dax_read_lock();
-	length = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
-	if (length < 0)
-		goto unlock_fallback;
-	length = PFN_PHYS(length);
-
-	if (length < size)
-		goto unlock_fallback;
-	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
-		goto unlock_fallback;
-	if (!pfn_t_devmap(pfn))
-		goto unlock_fallback;
-	dax_read_unlock(id);
-
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
 			RADIX_DAX_PMD);
 	if (IS_ERR(ret))
 		goto fallback;
 
-	trace_dax_pmd_insert_mapping(inode, vmf, length, pfn, ret);
+	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
 	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
 			pfn, vmf->flags & FAULT_FLAG_WRITE);
 
-unlock_fallback:
-	dax_read_unlock(id);
 fallback:
 	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
 	return VM_FAULT_FALLBACK;
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 03/13] dax: Factor out getting of pfn out of iomap
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Factor out code to get pfn out of iomap that is shared between PTE and
PMD fault path.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 81 +++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 39 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index fbbb6d176987..1122356f8b88 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -819,30 +819,53 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
 	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
+static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
+			 pfn_t *pfnp)
 {
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret, *kaddr;
 	pgoff_t pgoff;
+	void *kaddr;
 	int id, rc;
-	pfn_t pfn;
+	long length;
 
-	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, size, &pgoff);
 	if (rc)
 		return rc;
-
 	id = dax_read_lock();
-	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
-			       &kaddr, &pfn);
-	if (rc < 0) {
-		dax_read_unlock(id);
-		return rc;
+	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
+				   &kaddr, pfnp);
+	if (length < 0) {
+		rc = length;
+		goto out;
 	}
+	rc = -EINVAL;
+	if (PFN_PHYS(length) < size)
+		goto out;
+	if (pfn_t_to_pfn(*pfnp) & (PHYS_PFN(size)-1))
+		goto out;
+	/* For larger pages we need devmap */
+	if (length > 1 && !pfn_t_devmap(*pfnp))
+		goto out;
+	rc = 0;
+out:
 	dax_read_unlock(id);
+	return rc;
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
+{
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	unsigned long vaddr = vmf->address;
+	void *ret;
+	int rc;
+	pfn_t pfn;
+
+	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
+	if (rc < 0)
+		return rc;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
 	if (IS_ERR(ret))
@@ -1218,44 +1241,24 @@ static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct dax_device *dax_dev = iomap->dax_dev;
-	struct block_device *bdev = iomap->bdev;
 	struct inode *inode = mapping->host;
-	const size_t size = PMD_SIZE;
-	void *ret = NULL, *kaddr;
-	long length = 0;
-	pgoff_t pgoff;
+	void *ret = NULL;
 	pfn_t pfn;
-	int id;
+	int rc;
 
-	if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
+	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
+	if (rc < 0)
 		goto fallback;
 
-	id = dax_read_lock();
-	length = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
-	if (length < 0)
-		goto unlock_fallback;
-	length = PFN_PHYS(length);
-
-	if (length < size)
-		goto unlock_fallback;
-	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
-		goto unlock_fallback;
-	if (!pfn_t_devmap(pfn))
-		goto unlock_fallback;
-	dax_read_unlock(id);
-
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
 			RADIX_DAX_PMD);
 	if (IS_ERR(ret))
 		goto fallback;
 
-	trace_dax_pmd_insert_mapping(inode, vmf, length, pfn, ret);
+	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
 	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
 			pfn, vmf->flags & FAULT_FLAG_WRITE);
 
-unlock_fallback:
-	dax_read_unlock(id);
 fallback:
 	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
 	return VM_FAULT_FALLBACK;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 03/13] dax: Factor out getting of pfn out of iomap
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

Factor out code to get pfn out of iomap that is shared between PTE and
PMD fault path.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c | 81 +++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 39 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index fbbb6d176987..1122356f8b88 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -819,30 +819,53 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
 	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
+static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
+			 pfn_t *pfnp)
 {
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret, *kaddr;
 	pgoff_t pgoff;
+	void *kaddr;
 	int id, rc;
-	pfn_t pfn;
+	long length;
 
-	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, size, &pgoff);
 	if (rc)
 		return rc;
-
 	id = dax_read_lock();
-	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
-			       &kaddr, &pfn);
-	if (rc < 0) {
-		dax_read_unlock(id);
-		return rc;
+	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
+				   &kaddr, pfnp);
+	if (length < 0) {
+		rc = length;
+		goto out;
 	}
+	rc = -EINVAL;
+	if (PFN_PHYS(length) < size)
+		goto out;
+	if (pfn_t_to_pfn(*pfnp) & (PHYS_PFN(size)-1))
+		goto out;
+	/* For larger pages we need devmap */
+	if (length > 1 && !pfn_t_devmap(*pfnp))
+		goto out;
+	rc = 0;
+out:
 	dax_read_unlock(id);
+	return rc;
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
+{
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	unsigned long vaddr = vmf->address;
+	void *ret;
+	int rc;
+	pfn_t pfn;
+
+	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
+	if (rc < 0)
+		return rc;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
 	if (IS_ERR(ret))
@@ -1218,44 +1241,24 @@ static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct dax_device *dax_dev = iomap->dax_dev;
-	struct block_device *bdev = iomap->bdev;
 	struct inode *inode = mapping->host;
-	const size_t size = PMD_SIZE;
-	void *ret = NULL, *kaddr;
-	long length = 0;
-	pgoff_t pgoff;
+	void *ret = NULL;
 	pfn_t pfn;
-	int id;
+	int rc;
 
-	if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
+	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
+	if (rc < 0)
 		goto fallback;
 
-	id = dax_read_lock();
-	length = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
-	if (length < 0)
-		goto unlock_fallback;
-	length = PFN_PHYS(length);
-
-	if (length < size)
-		goto unlock_fallback;
-	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
-		goto unlock_fallback;
-	if (!pfn_t_devmap(pfn))
-		goto unlock_fallback;
-	dax_read_unlock(id);
-
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
 			RADIX_DAX_PMD);
 	if (IS_ERR(ret))
 		goto fallback;
 
-	trace_dax_pmd_insert_mapping(inode, vmf, length, pfn, ret);
+	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
 	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
 			pfn, vmf->flags & FAULT_FLAG_WRITE);
 
-unlock_fallback:
-	dax_read_unlock(id);
 fallback:
 	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
 	return VM_FAULT_FALLBACK;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 03/13] dax: Factor out getting of pfn out of iomap
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Factor out code to get pfn out of iomap that is shared between PTE and
PMD fault path.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 81 +++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 39 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index fbbb6d176987..1122356f8b88 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -819,30 +819,53 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
 	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
+static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
+			 pfn_t *pfnp)
 {
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret, *kaddr;
 	pgoff_t pgoff;
+	void *kaddr;
 	int id, rc;
-	pfn_t pfn;
+	long length;
 
-	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, size, &pgoff);
 	if (rc)
 		return rc;
-
 	id = dax_read_lock();
-	rc = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(PAGE_SIZE),
-			       &kaddr, &pfn);
-	if (rc < 0) {
-		dax_read_unlock(id);
-		return rc;
+	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
+				   &kaddr, pfnp);
+	if (length < 0) {
+		rc = length;
+		goto out;
 	}
+	rc = -EINVAL;
+	if (PFN_PHYS(length) < size)
+		goto out;
+	if (pfn_t_to_pfn(*pfnp) & (PHYS_PFN(size)-1))
+		goto out;
+	/* For larger pages we need devmap */
+	if (length > 1 && !pfn_t_devmap(*pfnp))
+		goto out;
+	rc = 0;
+out:
 	dax_read_unlock(id);
+	return rc;
+}
+
+static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
+			      loff_t pos, void *entry)
+{
+	const sector_t sector = dax_iomap_sector(iomap, pos);
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	unsigned long vaddr = vmf->address;
+	void *ret;
+	int rc;
+	pfn_t pfn;
+
+	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
+	if (rc < 0)
+		return rc;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
 	if (IS_ERR(ret))
@@ -1218,44 +1241,24 @@ static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct dax_device *dax_dev = iomap->dax_dev;
-	struct block_device *bdev = iomap->bdev;
 	struct inode *inode = mapping->host;
-	const size_t size = PMD_SIZE;
-	void *ret = NULL, *kaddr;
-	long length = 0;
-	pgoff_t pgoff;
+	void *ret = NULL;
 	pfn_t pfn;
-	int id;
+	int rc;
 
-	if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
+	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
+	if (rc < 0)
 		goto fallback;
 
-	id = dax_read_lock();
-	length = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
-	if (length < 0)
-		goto unlock_fallback;
-	length = PFN_PHYS(length);
-
-	if (length < size)
-		goto unlock_fallback;
-	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
-		goto unlock_fallback;
-	if (!pfn_t_devmap(pfn))
-		goto unlock_fallback;
-	dax_read_unlock(id);
-
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
 			RADIX_DAX_PMD);
 	if (IS_ERR(ret))
 		goto fallback;
 
-	trace_dax_pmd_insert_mapping(inode, vmf, length, pfn, ret);
+	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
 	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
 			pfn, vmf->flags & FAULT_FLAG_WRITE);
 
-unlock_fallback:
-	dax_read_unlock(id);
 fallback:
 	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
 	return VM_FAULT_FALLBACK;
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1122356f8b88..75aa0d9fb39f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1097,7 +1097,8 @@ static int dax_fault_return(int error)
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			       const struct iomap_ops *ops)
 {
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
@@ -1185,7 +1186,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	case IOMAP_MAPPED:
 		if (iomap.flags & IOMAP_F_NEW) {
 			count_vm_event(PGMAJFAULT);
-			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
+			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
 		error = dax_insert_mapping(vmf, &iomap, pos, entry);
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1122356f8b88..75aa0d9fb39f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1097,7 +1097,8 @@ static int dax_fault_return(int error)
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			       const struct iomap_ops *ops)
 {
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
@@ -1185,7 +1186,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	case IOMAP_MAPPED:
 		if (iomap.flags & IOMAP_F_NEW) {
 			count_vm_event(PGMAJFAULT);
-			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
+			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
 		error = dax_insert_mapping(vmf, &iomap, pos, entry);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1122356f8b88..75aa0d9fb39f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1097,7 +1097,8 @@ static int dax_fault_return(int error)
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			       const struct iomap_ops *ops)
 {
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
@@ -1185,7 +1186,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	case IOMAP_MAPPED:
 		if (iomap.flags & IOMAP_F_NEW) {
 			count_vm_event(PGMAJFAULT);
-			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
+			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
 		error = dax_insert_mapping(vmf, &iomap, pos, entry);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1122356f8b88..75aa0d9fb39f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1097,7 +1097,8 @@ static int dax_fault_return(int error)
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			       const struct iomap_ops *ops)
 {
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	struct vm_area_struct *vma = vmf->vma;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
@@ -1185,7 +1186,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	case IOMAP_MAPPED:
 		if (iomap.flags & IOMAP_F_NEW) {
 			count_vm_event(PGMAJFAULT);
-			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
+			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
 		error = dax_insert_mapping(vmf, &iomap, pos, entry);
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75aa0d9fb39f..7c150eddc01a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1105,6 +1105,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
 
@@ -1119,7 +1120,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto out;
 	}
 
-	if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
+	if (write && !vmf->cow_page)
 		flags |= IOMAP_WRITE;
 
 	entry = grab_mapping_entry(mapping, vmf->pgoff, 0);
@@ -1196,7 +1197,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
-		if (!(vmf->flags & FAULT_FLAG_WRITE)) {
+		if (!write) {
 			vmf_ret = dax_load_hole(mapping, entry, vmf);
 			goto finish_iomap;
 		}
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75aa0d9fb39f..7c150eddc01a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1105,6 +1105,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
 
@@ -1119,7 +1120,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto out;
 	}
 
-	if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
+	if (write && !vmf->cow_page)
 		flags |= IOMAP_WRITE;
 
 	entry = grab_mapping_entry(mapping, vmf->pgoff, 0);
@@ -1196,7 +1197,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
-		if (!(vmf->flags & FAULT_FLAG_WRITE)) {
+		if (!write) {
 			vmf_ret = dax_load_hole(mapping, entry, vmf);
 			goto finish_iomap;
 		}
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75aa0d9fb39f..7c150eddc01a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1105,6 +1105,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
 
@@ -1119,7 +1120,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto out;
 	}
 
-	if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
+	if (write && !vmf->cow_page)
 		flags |= IOMAP_WRITE;
 
 	entry = grab_mapping_entry(mapping, vmf->pgoff, 0);
@@ -1196,7 +1197,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
-		if (!(vmf->flags & FAULT_FLAG_WRITE)) {
+		if (!write) {
 			vmf_ret = dax_load_hole(mapping, entry, vmf);
 			goto finish_iomap;
 		}
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

There are already two users and more are coming.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 75aa0d9fb39f..7c150eddc01a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1105,6 +1105,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
 
@@ -1119,7 +1120,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		goto out;
 	}
 
-	if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
+	if (write && !vmf->cow_page)
 		flags |= IOMAP_WRITE;
 
 	entry = grab_mapping_entry(mapping, vmf->pgoff, 0);
@@ -1196,7 +1197,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
-		if (!(vmf->flags & FAULT_FLAG_WRITE)) {
+		if (!write) {
 			vmf_ret = dax_load_hole(mapping, entry, vmf);
 			goto finish_iomap;
 		}
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

dax_insert_mapping() has only one callsite and we will need to further
fine tune what it does for synchronous faults. Just inline it into the
callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 46 +++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 7c150eddc01a..766cb840c276 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -852,32 +852,6 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
 	return rc;
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
-{
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret;
-	int rc;
-	pfn_t pfn;
-
-	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
-	if (rc < 0)
-		return rc;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
-	if (IS_ERR(ret))
-		return PTR_ERR(ret);
-
-	trace_dax_insert_mapping(mapping->host, vmf, ret);
-	if (vmf->flags & FAULT_FLAG_WRITE)
-		return vm_insert_mixed_mkwrite(vma, vaddr, pfn);
-	else
-		return vm_insert_mixed(vma, vaddr, pfn);
-}
-
 /*
  * The user has performed a load from a hole in the file.  Allocating a new
  * page in the file would cause excessive storage usage for workloads with
@@ -1108,6 +1082,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
+	pfn_t pfn;
 
 	trace_dax_pte_fault(inode, vmf, vmf_ret);
 	/*
@@ -1190,7 +1165,24 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
+		if (error < 0)
+			goto error_finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						 dax_iomap_sector(&iomap, pos),
+						 0);
+		if (IS_ERR(entry)) {
+			error = PTR_ERR(entry);
+			goto error_finish_iomap;
+		}
+
+		trace_dax_insert_mapping(inode, vmf, entry);
+		if (write)
+			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
+		else
+			error = vm_insert_mixed(vma, vaddr, pfn);
+
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

dax_insert_mapping() has only one callsite and we will need to further
fine tune what it does for synchronous faults. Just inline it into the
callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 46 +++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 7c150eddc01a..766cb840c276 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -852,32 +852,6 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
 	return rc;
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
-{
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret;
-	int rc;
-	pfn_t pfn;
-
-	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
-	if (rc < 0)
-		return rc;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
-	if (IS_ERR(ret))
-		return PTR_ERR(ret);
-
-	trace_dax_insert_mapping(mapping->host, vmf, ret);
-	if (vmf->flags & FAULT_FLAG_WRITE)
-		return vm_insert_mixed_mkwrite(vma, vaddr, pfn);
-	else
-		return vm_insert_mixed(vma, vaddr, pfn);
-}
-
 /*
  * The user has performed a load from a hole in the file.  Allocating a new
  * page in the file would cause excessive storage usage for workloads with
@@ -1108,6 +1082,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
+	pfn_t pfn;
 
 	trace_dax_pte_fault(inode, vmf, vmf_ret);
 	/*
@@ -1190,7 +1165,24 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
+		if (error < 0)
+			goto error_finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						 dax_iomap_sector(&iomap, pos),
+						 0);
+		if (IS_ERR(entry)) {
+			error = PTR_ERR(entry);
+			goto error_finish_iomap;
+		}
+
+		trace_dax_insert_mapping(inode, vmf, entry);
+		if (write)
+			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
+		else
+			error = vm_insert_mixed(vma, vaddr, pfn);
+
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

dax_insert_mapping() has only one callsite and we will need to further
fine tune what it does for synchronous faults. Just inline it into the
callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c | 46 +++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 7c150eddc01a..766cb840c276 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -852,32 +852,6 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
 	return rc;
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
-{
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret;
-	int rc;
-	pfn_t pfn;
-
-	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
-	if (rc < 0)
-		return rc;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
-	if (IS_ERR(ret))
-		return PTR_ERR(ret);
-
-	trace_dax_insert_mapping(mapping->host, vmf, ret);
-	if (vmf->flags & FAULT_FLAG_WRITE)
-		return vm_insert_mixed_mkwrite(vma, vaddr, pfn);
-	else
-		return vm_insert_mixed(vma, vaddr, pfn);
-}
-
 /*
  * The user has performed a load from a hole in the file.  Allocating a new
  * page in the file would cause excessive storage usage for workloads with
@@ -1108,6 +1082,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
+	pfn_t pfn;
 
 	trace_dax_pte_fault(inode, vmf, vmf_ret);
 	/*
@@ -1190,7 +1165,24 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
+		if (error < 0)
+			goto error_finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						 dax_iomap_sector(&iomap, pos),
+						 0);
+		if (IS_ERR(entry)) {
+			error = PTR_ERR(entry);
+			goto error_finish_iomap;
+		}
+
+		trace_dax_insert_mapping(inode, vmf, entry);
+		if (write)
+			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
+		else
+			error = vm_insert_mixed(vma, vaddr, pfn);
+
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

dax_insert_mapping() has only one callsite and we will need to further
fine tune what it does for synchronous faults. Just inline it into the
callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 46 +++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 7c150eddc01a..766cb840c276 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -852,32 +852,6 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
 	return rc;
 }
 
-static int dax_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-			      loff_t pos, void *entry)
-{
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	unsigned long vaddr = vmf->address;
-	void *ret;
-	int rc;
-	pfn_t pfn;
-
-	rc = dax_iomap_pfn(iomap, pos, PAGE_SIZE, &pfn);
-	if (rc < 0)
-		return rc;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector, 0);
-	if (IS_ERR(ret))
-		return PTR_ERR(ret);
-
-	trace_dax_insert_mapping(mapping->host, vmf, ret);
-	if (vmf->flags & FAULT_FLAG_WRITE)
-		return vm_insert_mixed_mkwrite(vma, vaddr, pfn);
-	else
-		return vm_insert_mixed(vma, vaddr, pfn);
-}
-
 /*
  * The user has performed a load from a hole in the file.  Allocating a new
  * page in the file would cause excessive storage usage for workloads with
@@ -1108,6 +1082,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	int vmf_ret = 0;
 	void *entry;
+	pfn_t pfn;
 
 	trace_dax_pte_fault(inode, vmf, vmf_ret);
 	/*
@@ -1190,7 +1165,24 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
+		if (error < 0)
+			goto error_finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						 dax_iomap_sector(&iomap, pos),
+						 0);
+		if (IS_ERR(entry)) {
+			error = PTR_ERR(entry);
+			goto error_finish_iomap;
+		}
+
+		trace_dax_insert_mapping(inode, vmf, entry);
+		if (write)
+			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
+		else
+			error = vm_insert_mixed(vma, vaddr, pfn);
+
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

dax_pmd_insert_mapping() has only one callsite and we will need to
further fine tune what it does for synchronous faults. Just inline it
into the callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 43 ++++++++++++++-----------------------------
 include/trace/events/fs_dax.h |  1 -
 2 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 766cb840c276..54d2a9161f71 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1230,34 +1230,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
  */
 #define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
 
-static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-		loff_t pos, void *entry)
-{
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct inode *inode = mapping->host;
-	void *ret = NULL;
-	pfn_t pfn;
-	int rc;
-
-	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
-	if (rc < 0)
-		goto fallback;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
-			RADIX_DAX_PMD);
-	if (IS_ERR(ret))
-		goto fallback;
-
-	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
-	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
-			pfn, vmf->flags & FAULT_FLAG_WRITE);
-
-fallback:
-	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
-	return VM_FAULT_FALLBACK;
-}
-
 static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 		void *entry)
 {
@@ -1312,6 +1284,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 	void *entry;
 	loff_t pos;
 	int error;
+	pfn_t pfn;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1379,7 +1352,19 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
-		result = dax_pmd_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
+		if (error < 0)
+			goto finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						dax_iomap_sector(&iomap, pos),
+						RADIX_DAX_PMD);
+		if (IS_ERR(entry))
+			goto finish_iomap;
+
+		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
+		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
+					    write);
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index fbc4a06f7310..88a9d19b8ff8 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -148,7 +148,6 @@ DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
 	TP_ARGS(inode, vmf, length, pfn, radix_entry))
 
 DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
-DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
 
 DECLARE_EVENT_CLASS(dax_pte_fault_class,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result),
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

dax_pmd_insert_mapping() has only one callsite and we will need to
further fine tune what it does for synchronous faults. Just inline it
into the callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 43 ++++++++++++++-----------------------------
 include/trace/events/fs_dax.h |  1 -
 2 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 766cb840c276..54d2a9161f71 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1230,34 +1230,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
  */
 #define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
 
-static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-		loff_t pos, void *entry)
-{
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct inode *inode = mapping->host;
-	void *ret = NULL;
-	pfn_t pfn;
-	int rc;
-
-	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
-	if (rc < 0)
-		goto fallback;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
-			RADIX_DAX_PMD);
-	if (IS_ERR(ret))
-		goto fallback;
-
-	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
-	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
-			pfn, vmf->flags & FAULT_FLAG_WRITE);
-
-fallback:
-	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
-	return VM_FAULT_FALLBACK;
-}
-
 static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 		void *entry)
 {
@@ -1312,6 +1284,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 	void *entry;
 	loff_t pos;
 	int error;
+	pfn_t pfn;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1379,7 +1352,19 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
-		result = dax_pmd_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
+		if (error < 0)
+			goto finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						dax_iomap_sector(&iomap, pos),
+						RADIX_DAX_PMD);
+		if (IS_ERR(entry))
+			goto finish_iomap;
+
+		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
+		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
+					    write);
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index fbc4a06f7310..88a9d19b8ff8 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -148,7 +148,6 @@ DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
 	TP_ARGS(inode, vmf, length, pfn, radix_entry))
 
 DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
-DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
 
 DECLARE_EVENT_CLASS(dax_pte_fault_class,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result),
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

dax_pmd_insert_mapping() has only one callsite and we will need to
further fine tune what it does for synchronous faults. Just inline it
into the callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c                      | 43 ++++++++++++++-----------------------------
 include/trace/events/fs_dax.h |  1 -
 2 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 766cb840c276..54d2a9161f71 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1230,34 +1230,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
  */
 #define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
 
-static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-		loff_t pos, void *entry)
-{
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct inode *inode = mapping->host;
-	void *ret = NULL;
-	pfn_t pfn;
-	int rc;
-
-	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
-	if (rc < 0)
-		goto fallback;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
-			RADIX_DAX_PMD);
-	if (IS_ERR(ret))
-		goto fallback;
-
-	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
-	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
-			pfn, vmf->flags & FAULT_FLAG_WRITE);
-
-fallback:
-	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
-	return VM_FAULT_FALLBACK;
-}
-
 static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 		void *entry)
 {
@@ -1312,6 +1284,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 	void *entry;
 	loff_t pos;
 	int error;
+	pfn_t pfn;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1379,7 +1352,19 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
-		result = dax_pmd_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
+		if (error < 0)
+			goto finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						dax_iomap_sector(&iomap, pos),
+						RADIX_DAX_PMD);
+		if (IS_ERR(entry))
+			goto finish_iomap;
+
+		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
+		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
+					    write);
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index fbc4a06f7310..88a9d19b8ff8 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -148,7 +148,6 @@ DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
 	TP_ARGS(inode, vmf, length, pfn, radix_entry))
 
 DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
-DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
 
 DECLARE_EVENT_CLASS(dax_pte_fault_class,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result),
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

dax_pmd_insert_mapping() has only one callsite and we will need to
further fine tune what it does for synchronous faults. Just inline it
into the callsite so that we don't have to pass awkward bools around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 43 ++++++++++++++-----------------------------
 include/trace/events/fs_dax.h |  1 -
 2 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 766cb840c276..54d2a9161f71 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1230,34 +1230,6 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
  */
 #define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
 
-static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
-		loff_t pos, void *entry)
-{
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-	const sector_t sector = dax_iomap_sector(iomap, pos);
-	struct inode *inode = mapping->host;
-	void *ret = NULL;
-	pfn_t pfn;
-	int rc;
-
-	rc = dax_iomap_pfn(iomap, pos, PMD_SIZE, &pfn);
-	if (rc < 0)
-		goto fallback;
-
-	ret = dax_insert_mapping_entry(mapping, vmf, entry, sector,
-			RADIX_DAX_PMD);
-	if (IS_ERR(ret))
-		goto fallback;
-
-	trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, ret);
-	return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
-			pfn, vmf->flags & FAULT_FLAG_WRITE);
-
-fallback:
-	trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
-	return VM_FAULT_FALLBACK;
-}
-
 static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 		void *entry)
 {
@@ -1312,6 +1284,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 	void *entry;
 	loff_t pos;
 	int error;
+	pfn_t pfn;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1379,7 +1352,19 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
-		result = dax_pmd_insert_mapping(vmf, &iomap, pos, entry);
+		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
+		if (error < 0)
+			goto finish_iomap;
+
+		entry = dax_insert_mapping_entry(mapping, vmf, entry,
+						dax_iomap_sector(&iomap, pos),
+						RADIX_DAX_PMD);
+		if (IS_ERR(entry))
+			goto finish_iomap;
+
+		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
+		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
+					    write);
 		break;
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index fbc4a06f7310..88a9d19b8ff8 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -148,7 +148,6 @@ DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
 	TP_ARGS(inode, vmf, length, pfn, radix_entry))
 
 DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
-DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
 
 DECLARE_EVENT_CLASS(dax_pte_fault_class,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result),
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
  2017-08-17 16:08 ` Jan Kara
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Add missing argument description.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 54d2a9161f71..85ea49bbbdbf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1414,7 +1414,8 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 /**
  * dax_iomap_fault - handle a page fault on a DAX file
  * @vmf: The description of the fault
- * @ops: iomap ops passed from the file system
+ * @pe_size: Size of the page to fault in
+ * @ops: Iomap ops passed from the file system
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Add missing argument description.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 54d2a9161f71..85ea49bbbdbf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1414,7 +1414,8 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 /**
  * dax_iomap_fault - handle a page fault on a DAX file
  * @vmf: The description of the fault
- * @ops: iomap ops passed from the file system
+ * @pe_size: Size of the page to fault in
+ * @ops: Iomap ops passed from the file system
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Add missing argument description.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 54d2a9161f71..85ea49bbbdbf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1414,7 +1414,8 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 /**
  * dax_iomap_fault - handle a page fault on a DAX file
  * @vmf: The description of the fault
- * @ops: iomap ops passed from the file system
+ * @pe_size: Size of the page to fault in
+ * @ops: Iomap ops passed from the file system
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

For synchronous page fault dax_iomap_fault() will need to return PFN
which will then need to inserted into page tables after fsync()
completes. Add necessary parameter to dax_iomap_fault().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c            | 13 +++++++------
 fs/ext2/file.c      |  2 +-
 fs/ext4/file.c      |  2 +-
 fs/xfs/xfs_file.c   |  8 ++++----
 include/linux/dax.h |  2 +-
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85ea49bbbdbf..bc040e654cc9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1069,7 +1069,7 @@ static int dax_fault_return(int error)
 }
 
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1270,7 +1270,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 }
 
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1405,7 +1405,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 }
 #else
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	return VM_FAULT_FALLBACK;
 }
@@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * @vmf: The description of the fault
  * @pe_size: Size of the page to fault in
  * @ops: Iomap ops passed from the file system
+ * @pfnp: PFN to insert for synchronous faults if fsync is required
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
@@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * successfully.
  */
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops)
+		    const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	switch (pe_size) {
 	case PE_SIZE_PTE:
-		return dax_iomap_pte_fault(vmf, ops);
+		return dax_iomap_pte_fault(vmf, ops, pfnp);
 	case PE_SIZE_PMD:
-		return dax_iomap_pmd_fault(vmf, ops);
+		return dax_iomap_pmd_fault(vmf, ops, pfnp);
 	default:
 		return VM_FAULT_FALLBACK;
 	}
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ff3a3636a5ca..689f17b5f444 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -99,7 +99,7 @@ static int ext2_dax_fault(struct vm_fault *vmf)
 	}
 	down_read(&ei->dax_sem);
 
-	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops);
+	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops, NULL);
 
 	up_read(&ei->dax_sem);
 	if (vmf->flags & FAULT_FLAG_WRITE)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 9dda70edba74..f84bb29e941e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -291,7 +291,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
 	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops);
+		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
 	else
 		result = VM_FAULT_SIGBUS;
 	if (write) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 62db8ffa83b9..c17ca982272c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1032,7 +1032,7 @@ xfs_filemap_page_mkwrite(
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (IS_DAX(inode)) {
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	} else {
 		ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
 		ret = block_page_mkwrite_return(ret);
@@ -1059,7 +1059,7 @@ xfs_filemap_fault(
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 	if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	else
 		ret = filemap_fault(vmf);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
@@ -1094,7 +1094,7 @@ xfs_filemap_huge_fault(
 	}
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops);
+	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops, NULL);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1130,7 +1130,7 @@ xfs_filemap_pfn_mkwrite(
 	if (vmf->pgoff >= size)
 		ret = VM_FAULT_SIGBUS;
 	else if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	xfs_iunlock(ip, XFS_MMAPLOCK_SHARED);
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d0e32729ad1e..8f493d9879f7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,7 +91,7 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops);
+		    const struct iomap_ops *ops, pfn_t *pfnp);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

For synchronous page fault dax_iomap_fault() will need to return PFN
which will then need to inserted into page tables after fsync()
completes. Add necessary parameter to dax_iomap_fault().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c            | 13 +++++++------
 fs/ext2/file.c      |  2 +-
 fs/ext4/file.c      |  2 +-
 fs/xfs/xfs_file.c   |  8 ++++----
 include/linux/dax.h |  2 +-
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85ea49bbbdbf..bc040e654cc9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1069,7 +1069,7 @@ static int dax_fault_return(int error)
 }
 
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1270,7 +1270,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 }
 
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1405,7 +1405,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 }
 #else
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	return VM_FAULT_FALLBACK;
 }
@@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * @vmf: The description of the fault
  * @pe_size: Size of the page to fault in
  * @ops: Iomap ops passed from the file system
+ * @pfnp: PFN to insert for synchronous faults if fsync is required
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
@@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * successfully.
  */
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops)
+		    const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	switch (pe_size) {
 	case PE_SIZE_PTE:
-		return dax_iomap_pte_fault(vmf, ops);
+		return dax_iomap_pte_fault(vmf, ops, pfnp);
 	case PE_SIZE_PMD:
-		return dax_iomap_pmd_fault(vmf, ops);
+		return dax_iomap_pmd_fault(vmf, ops, pfnp);
 	default:
 		return VM_FAULT_FALLBACK;
 	}
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ff3a3636a5ca..689f17b5f444 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -99,7 +99,7 @@ static int ext2_dax_fault(struct vm_fault *vmf)
 	}
 	down_read(&ei->dax_sem);
 
-	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops);
+	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops, NULL);
 
 	up_read(&ei->dax_sem);
 	if (vmf->flags & FAULT_FLAG_WRITE)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 9dda70edba74..f84bb29e941e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -291,7 +291,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
 	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops);
+		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
 	else
 		result = VM_FAULT_SIGBUS;
 	if (write) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 62db8ffa83b9..c17ca982272c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1032,7 +1032,7 @@ xfs_filemap_page_mkwrite(
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (IS_DAX(inode)) {
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	} else {
 		ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
 		ret = block_page_mkwrite_return(ret);
@@ -1059,7 +1059,7 @@ xfs_filemap_fault(
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 	if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	else
 		ret = filemap_fault(vmf);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
@@ -1094,7 +1094,7 @@ xfs_filemap_huge_fault(
 	}
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops);
+	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops, NULL);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1130,7 +1130,7 @@ xfs_filemap_pfn_mkwrite(
 	if (vmf->pgoff >= size)
 		ret = VM_FAULT_SIGBUS;
 	else if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	xfs_iunlock(ip, XFS_MMAPLOCK_SHARED);
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d0e32729ad1e..8f493d9879f7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,7 +91,7 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops);
+		    const struct iomap_ops *ops, pfn_t *pfnp);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

For synchronous page fault dax_iomap_fault() will need to return PFN
which will then need to inserted into page tables after fsync()
completes. Add necessary parameter to dax_iomap_fault().

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c            | 13 +++++++------
 fs/ext2/file.c      |  2 +-
 fs/ext4/file.c      |  2 +-
 fs/xfs/xfs_file.c   |  8 ++++----
 include/linux/dax.h |  2 +-
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85ea49bbbdbf..bc040e654cc9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1069,7 +1069,7 @@ static int dax_fault_return(int error)
 }
 
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1270,7 +1270,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 }
 
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1405,7 +1405,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 }
 #else
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	return VM_FAULT_FALLBACK;
 }
@@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * @vmf: The description of the fault
  * @pe_size: Size of the page to fault in
  * @ops: Iomap ops passed from the file system
+ * @pfnp: PFN to insert for synchronous faults if fsync is required
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
@@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * successfully.
  */
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops)
+		    const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	switch (pe_size) {
 	case PE_SIZE_PTE:
-		return dax_iomap_pte_fault(vmf, ops);
+		return dax_iomap_pte_fault(vmf, ops, pfnp);
 	case PE_SIZE_PMD:
-		return dax_iomap_pmd_fault(vmf, ops);
+		return dax_iomap_pmd_fault(vmf, ops, pfnp);
 	default:
 		return VM_FAULT_FALLBACK;
 	}
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ff3a3636a5ca..689f17b5f444 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -99,7 +99,7 @@ static int ext2_dax_fault(struct vm_fault *vmf)
 	}
 	down_read(&ei->dax_sem);
 
-	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops);
+	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops, NULL);
 
 	up_read(&ei->dax_sem);
 	if (vmf->flags & FAULT_FLAG_WRITE)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 9dda70edba74..f84bb29e941e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -291,7 +291,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
 	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops);
+		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
 	else
 		result = VM_FAULT_SIGBUS;
 	if (write) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 62db8ffa83b9..c17ca982272c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1032,7 +1032,7 @@ xfs_filemap_page_mkwrite(
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (IS_DAX(inode)) {
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	} else {
 		ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
 		ret = block_page_mkwrite_return(ret);
@@ -1059,7 +1059,7 @@ xfs_filemap_fault(
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 	if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	else
 		ret = filemap_fault(vmf);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
@@ -1094,7 +1094,7 @@ xfs_filemap_huge_fault(
 	}
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops);
+	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops, NULL);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1130,7 +1130,7 @@ xfs_filemap_pfn_mkwrite(
 	if (vmf->pgoff >= size)
 		ret = VM_FAULT_SIGBUS;
 	else if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	xfs_iunlock(ip, XFS_MMAPLOCK_SHARED);
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d0e32729ad1e..8f493d9879f7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,7 +91,7 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops);
+		    const struct iomap_ops *ops, pfn_t *pfnp);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

For synchronous page fault dax_iomap_fault() will need to return PFN
which will then need to inserted into page tables after fsync()
completes. Add necessary parameter to dax_iomap_fault().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c            | 13 +++++++------
 fs/ext2/file.c      |  2 +-
 fs/ext4/file.c      |  2 +-
 fs/xfs/xfs_file.c   |  8 ++++----
 include/linux/dax.h |  2 +-
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85ea49bbbdbf..bc040e654cc9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1069,7 +1069,7 @@ static int dax_fault_return(int error)
 }
 
 static int dax_iomap_pte_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1270,7 +1270,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
 }
 
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1405,7 +1405,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 }
 #else
 static int dax_iomap_pmd_fault(struct vm_fault *vmf,
-			       const struct iomap_ops *ops)
+			       const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	return VM_FAULT_FALLBACK;
 }
@@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * @vmf: The description of the fault
  * @pe_size: Size of the page to fault in
  * @ops: Iomap ops passed from the file system
+ * @pfnp: PFN to insert for synchronous faults if fsync is required
  *
  * When a page fault occurs, filesystems may call this helper in
  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
@@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
  * successfully.
  */
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops)
+		    const struct iomap_ops *ops, pfn_t *pfnp)
 {
 	switch (pe_size) {
 	case PE_SIZE_PTE:
-		return dax_iomap_pte_fault(vmf, ops);
+		return dax_iomap_pte_fault(vmf, ops, pfnp);
 	case PE_SIZE_PMD:
-		return dax_iomap_pmd_fault(vmf, ops);
+		return dax_iomap_pmd_fault(vmf, ops, pfnp);
 	default:
 		return VM_FAULT_FALLBACK;
 	}
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ff3a3636a5ca..689f17b5f444 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -99,7 +99,7 @@ static int ext2_dax_fault(struct vm_fault *vmf)
 	}
 	down_read(&ei->dax_sem);
 
-	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops);
+	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &ext2_iomap_ops, NULL);
 
 	up_read(&ei->dax_sem);
 	if (vmf->flags & FAULT_FLAG_WRITE)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 9dda70edba74..f84bb29e941e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -291,7 +291,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
 	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops);
+		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
 	else
 		result = VM_FAULT_SIGBUS;
 	if (write) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 62db8ffa83b9..c17ca982272c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1032,7 +1032,7 @@ xfs_filemap_page_mkwrite(
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (IS_DAX(inode)) {
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	} else {
 		ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
 		ret = block_page_mkwrite_return(ret);
@@ -1059,7 +1059,7 @@ xfs_filemap_fault(
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 	if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	else
 		ret = filemap_fault(vmf);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
@@ -1094,7 +1094,7 @@ xfs_filemap_huge_fault(
 	}
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops);
+	ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops, NULL);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
 	if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1130,7 +1130,7 @@ xfs_filemap_pfn_mkwrite(
 	if (vmf->pgoff >= size)
 		ret = VM_FAULT_SIGBUS;
 	else if (IS_DAX(inode))
-		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops, NULL);
 	xfs_iunlock(ip, XFS_MMAPLOCK_SHARED);
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d0e32729ad1e..8f493d9879f7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,7 +91,7 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
-		    const struct iomap_ops *ops);
+		    const struct iomap_ops *ops, pfn_t *pfnp);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Pretty crude for now...

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c                  | 2 ++
 include/linux/mm.h              | 1 +
 include/linux/mman.h            | 3 ++-
 include/uapi/asm-generic/mman.h | 1 +
 mm/mmap.c                       | 5 +++++
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f84bb29e941e..850037e140d7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
 	} else {
 		vma->vm_ops = &ext4_file_vm_ops;
+		if (vma->vm_flags & VM_SYNC)
+			return -EOPNOTSUPP;
 	}
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa036093e76c..d0fb385414a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -188,6 +188,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
+#define VM_SYNC		0x00800000	/* Synchronous page faults */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
 #define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
diff --git a/include/linux/mman.h b/include/linux/mman.h
index c8367041fafd..c38279b651e5 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
+	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
 }
 
 unsigned long vm_commit_limit(void);
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..00e55627d2df 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..18453c04b09f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 				return -ENODEV;
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
 				return -EINVAL;
+			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
+				return -EINVAL;
 			break;
 
 		default:
 			return -EINVAL;
 		}
 	} else {
+		if (vm_flags & VM_SYNC)
+			return -EINVAL;
+
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Pretty crude for now...

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c                  | 2 ++
 include/linux/mm.h              | 1 +
 include/linux/mman.h            | 3 ++-
 include/uapi/asm-generic/mman.h | 1 +
 mm/mmap.c                       | 5 +++++
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f84bb29e941e..850037e140d7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
 	} else {
 		vma->vm_ops = &ext4_file_vm_ops;
+		if (vma->vm_flags & VM_SYNC)
+			return -EOPNOTSUPP;
 	}
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa036093e76c..d0fb385414a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -188,6 +188,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
+#define VM_SYNC		0x00800000	/* Synchronous page faults */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
 #define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
diff --git a/include/linux/mman.h b/include/linux/mman.h
index c8367041fafd..c38279b651e5 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
+	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
 }
 
 unsigned long vm_commit_limit(void);
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..00e55627d2df 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..18453c04b09f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 				return -ENODEV;
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
 				return -EINVAL;
+			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
+				return -EINVAL;
 			break;
 
 		default:
 			return -EINVAL;
 		}
 	} else {
+		if (vm_flags & VM_SYNC)
+			return -EINVAL;
+
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

Pretty crude for now...

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/ext4/file.c                  | 2 ++
 include/linux/mm.h              | 1 +
 include/linux/mman.h            | 3 ++-
 include/uapi/asm-generic/mman.h | 1 +
 mm/mmap.c                       | 5 +++++
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f84bb29e941e..850037e140d7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
 	} else {
 		vma->vm_ops = &ext4_file_vm_ops;
+		if (vma->vm_flags & VM_SYNC)
+			return -EOPNOTSUPP;
 	}
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa036093e76c..d0fb385414a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -188,6 +188,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
+#define VM_SYNC		0x00800000	/* Synchronous page faults */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
 #define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
diff --git a/include/linux/mman.h b/include/linux/mman.h
index c8367041fafd..c38279b651e5 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
+	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
 }
 
 unsigned long vm_commit_limit(void);
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..00e55627d2df 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..18453c04b09f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 				return -ENODEV;
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
 				return -EINVAL;
+			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
+				return -EINVAL;
 			break;
 
 		default:
 			return -EINVAL;
 		}
 	} else {
+		if (vm_flags & VM_SYNC)
+			return -EINVAL;
+
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Pretty crude for now...

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c                  | 2 ++
 include/linux/mm.h              | 1 +
 include/linux/mman.h            | 3 ++-
 include/uapi/asm-generic/mman.h | 1 +
 mm/mmap.c                       | 5 +++++
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f84bb29e941e..850037e140d7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
 	} else {
 		vma->vm_ops = &ext4_file_vm_ops;
+		if (vma->vm_flags & VM_SYNC)
+			return -EOPNOTSUPP;
 	}
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa036093e76c..d0fb385414a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -188,6 +188,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
+#define VM_SYNC		0x00800000	/* Synchronous page faults */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
 #define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
diff --git a/include/linux/mman.h b/include/linux/mman.h
index c8367041fafd..c38279b651e5 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
+	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
 }
 
 unsigned long vm_commit_limit(void);
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..00e55627d2df 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..18453c04b09f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 				return -ENODEV;
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
 				return -EINVAL;
+			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
+				return -EINVAL;
 			break;
 
 		default:
 			return -EINVAL;
 		}
 	} else {
+		if (vm_flags & VM_SYNC)
+			return -EINVAL;
+
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
 			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Add a flag to iomap interface informing the caller that inode needs
fdstasync(2) for returned extent to become persistent and use it in DAX
fault code so that we map such extents only read only. We propagate the
information that the page table entry has been inserted write-protected
from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
handler is then responsible for calling fdatasync(2) and updating page
tables to map pfns read-write. dax_iomap_fault() also takes care of
updating vmf->orig_pte to match the PTE that was inserted so that we can
safely recheck that PTE did not change while write-enabling it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c              | 31 +++++++++++++++++++++++++++++++
 include/linux/iomap.h |  2 ++
 include/linux/mm.h    |  6 +++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index bc040e654cc9..ca88fc356786 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			goto error_finish_iomap;
 		}
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PTE into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp)) {
+				error = -EIO;
+				goto error_finish_iomap;
+			}
+			*pfnp = pfn;
+			vmf_ret = VM_FAULT_NEEDDSYNC | major;
+			goto finish_iomap;
+		}
 		trace_dax_insert_mapping(inode, vmf, entry);
 		if (write)
 			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
@@ -1362,6 +1378,21 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 		if (IS_ERR(entry))
 			goto finish_iomap;
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PMD into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vmf->vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp))
+				goto finish_iomap;
+			*pfnp = pfn;
+			result = VM_FAULT_NEEDDSYNC;
+			goto finish_iomap;
+		}
+
 		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
 		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
 					    write);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f64dc6ce5161..957463602f6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -22,6 +22,8 @@ struct vm_fault;
  * Flags for all iomap mappings:
  */
 #define IOMAP_F_NEW	0x01	/* blocks have been newly allocated */
+#define IOMAP_F_NEEDDSYNC	0x02	/* inode needs fdatasync for storage to
+					 * become persistent */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d0fb385414a4..20e95c3a7701 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,6 +1143,9 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
+#define VM_FAULT_NEEDDSYNC  0x2000	/* ->fault did not modify page tables
+					 * and needs fsync() to complete (for
+					 * synchronous page faults in DAX) */
 
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
@@ -1160,7 +1163,8 @@ static inline void clear_page_pfmemalloc(struct page *page)
 	{ VM_FAULT_LOCKED,		"LOCKED" }, \
 	{ VM_FAULT_RETRY,		"RETRY" }, \
 	{ VM_FAULT_FALLBACK,		"FALLBACK" }, \
-	{ VM_FAULT_DONE_COW,		"DONE_COW" }
+	{ VM_FAULT_DONE_COW,		"DONE_COW" }, \
+	{ VM_FAULT_NEEDDSYNC,		"NEEDDSYNC" }
 
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Add a flag to iomap interface informing the caller that inode needs
fdstasync(2) for returned extent to become persistent and use it in DAX
fault code so that we map such extents only read only. We propagate the
information that the page table entry has been inserted write-protected
from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
handler is then responsible for calling fdatasync(2) and updating page
tables to map pfns read-write. dax_iomap_fault() also takes care of
updating vmf->orig_pte to match the PTE that was inserted so that we can
safely recheck that PTE did not change while write-enabling it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c              | 31 +++++++++++++++++++++++++++++++
 include/linux/iomap.h |  2 ++
 include/linux/mm.h    |  6 +++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index bc040e654cc9..ca88fc356786 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			goto error_finish_iomap;
 		}
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PTE into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp)) {
+				error = -EIO;
+				goto error_finish_iomap;
+			}
+			*pfnp = pfn;
+			vmf_ret = VM_FAULT_NEEDDSYNC | major;
+			goto finish_iomap;
+		}
 		trace_dax_insert_mapping(inode, vmf, entry);
 		if (write)
 			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
@@ -1362,6 +1378,21 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 		if (IS_ERR(entry))
 			goto finish_iomap;
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PMD into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vmf->vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp))
+				goto finish_iomap;
+			*pfnp = pfn;
+			result = VM_FAULT_NEEDDSYNC;
+			goto finish_iomap;
+		}
+
 		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
 		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
 					    write);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f64dc6ce5161..957463602f6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -22,6 +22,8 @@ struct vm_fault;
  * Flags for all iomap mappings:
  */
 #define IOMAP_F_NEW	0x01	/* blocks have been newly allocated */
+#define IOMAP_F_NEEDDSYNC	0x02	/* inode needs fdatasync for storage to
+					 * become persistent */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d0fb385414a4..20e95c3a7701 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,6 +1143,9 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
+#define VM_FAULT_NEEDDSYNC  0x2000	/* ->fault did not modify page tables
+					 * and needs fsync() to complete (for
+					 * synchronous page faults in DAX) */
 
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
@@ -1160,7 +1163,8 @@ static inline void clear_page_pfmemalloc(struct page *page)
 	{ VM_FAULT_LOCKED,		"LOCKED" }, \
 	{ VM_FAULT_RETRY,		"RETRY" }, \
 	{ VM_FAULT_FALLBACK,		"FALLBACK" }, \
-	{ VM_FAULT_DONE_COW,		"DONE_COW" }
+	{ VM_FAULT_DONE_COW,		"DONE_COW" }, \
+	{ VM_FAULT_NEEDDSYNC,		"NEEDDSYNC" }
 
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

Add a flag to iomap interface informing the caller that inode needs
fdstasync(2) for returned extent to become persistent and use it in DAX
fault code so that we map such extents only read only. We propagate the
information that the page table entry has been inserted write-protected
from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
handler is then responsible for calling fdatasync(2) and updating page
tables to map pfns read-write. dax_iomap_fault() also takes care of
updating vmf->orig_pte to match the PTE that was inserted so that we can
safely recheck that PTE did not change while write-enabling it.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c              | 31 +++++++++++++++++++++++++++++++
 include/linux/iomap.h |  2 ++
 include/linux/mm.h    |  6 +++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index bc040e654cc9..ca88fc356786 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			goto error_finish_iomap;
 		}
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PTE into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp)) {
+				error = -EIO;
+				goto error_finish_iomap;
+			}
+			*pfnp = pfn;
+			vmf_ret = VM_FAULT_NEEDDSYNC | major;
+			goto finish_iomap;
+		}
 		trace_dax_insert_mapping(inode, vmf, entry);
 		if (write)
 			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
@@ -1362,6 +1378,21 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 		if (IS_ERR(entry))
 			goto finish_iomap;
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PMD into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vmf->vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp))
+				goto finish_iomap;
+			*pfnp = pfn;
+			result = VM_FAULT_NEEDDSYNC;
+			goto finish_iomap;
+		}
+
 		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
 		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
 					    write);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f64dc6ce5161..957463602f6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -22,6 +22,8 @@ struct vm_fault;
  * Flags for all iomap mappings:
  */
 #define IOMAP_F_NEW	0x01	/* blocks have been newly allocated */
+#define IOMAP_F_NEEDDSYNC	0x02	/* inode needs fdatasync for storage to
+					 * become persistent */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d0fb385414a4..20e95c3a7701 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,6 +1143,9 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
+#define VM_FAULT_NEEDDSYNC  0x2000	/* ->fault did not modify page tables
+					 * and needs fsync() to complete (for
+					 * synchronous page faults in DAX) */
 
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
@@ -1160,7 +1163,8 @@ static inline void clear_page_pfmemalloc(struct page *page)
 	{ VM_FAULT_LOCKED,		"LOCKED" }, \
 	{ VM_FAULT_RETRY,		"RETRY" }, \
 	{ VM_FAULT_FALLBACK,		"FALLBACK" }, \
-	{ VM_FAULT_DONE_COW,		"DONE_COW" }
+	{ VM_FAULT_DONE_COW,		"DONE_COW" }, \
+	{ VM_FAULT_NEEDDSYNC,		"NEEDDSYNC" }
 
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Add a flag to iomap interface informing the caller that inode needs
fdstasync(2) for returned extent to become persistent and use it in DAX
fault code so that we map such extents only read only. We propagate the
information that the page table entry has been inserted write-protected
from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
handler is then responsible for calling fdatasync(2) and updating page
tables to map pfns read-write. dax_iomap_fault() also takes care of
updating vmf->orig_pte to match the PTE that was inserted so that we can
safely recheck that PTE did not change while write-enabling it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c              | 31 +++++++++++++++++++++++++++++++
 include/linux/iomap.h |  2 ++
 include/linux/mm.h    |  6 +++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index bc040e654cc9..ca88fc356786 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
 			goto error_finish_iomap;
 		}
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PTE into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp)) {
+				error = -EIO;
+				goto error_finish_iomap;
+			}
+			*pfnp = pfn;
+			vmf_ret = VM_FAULT_NEEDDSYNC | major;
+			goto finish_iomap;
+		}
 		trace_dax_insert_mapping(inode, vmf, entry);
 		if (write)
 			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
@@ -1362,6 +1378,21 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
 		if (IS_ERR(entry))
 			goto finish_iomap;
 
+		/*
+		 * If we are doing synchronous page fault and inode needs fsync,
+		 * we can insert PMD into page tables only after that happens.
+		 * Skip insertion for now and return the pfn so that caller can
+		 * insert it after fsync is done.
+		 */
+		if (write && (vmf->vma->vm_flags & VM_SYNC) &&
+		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
+			if (WARN_ON_ONCE(!pfnp))
+				goto finish_iomap;
+			*pfnp = pfn;
+			result = VM_FAULT_NEEDDSYNC;
+			goto finish_iomap;
+		}
+
 		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
 		result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
 					    write);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f64dc6ce5161..957463602f6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -22,6 +22,8 @@ struct vm_fault;
  * Flags for all iomap mappings:
  */
 #define IOMAP_F_NEW	0x01	/* blocks have been newly allocated */
+#define IOMAP_F_NEEDDSYNC	0x02	/* inode needs fdatasync for storage to
+					 * become persistent */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d0fb385414a4..20e95c3a7701 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1143,6 +1143,9 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
+#define VM_FAULT_NEEDDSYNC  0x2000	/* ->fault did not modify page tables
+					 * and needs fsync() to complete (for
+					 * synchronous page faults in DAX) */
 
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
@@ -1160,7 +1163,8 @@ static inline void clear_page_pfmemalloc(struct page *page)
 	{ VM_FAULT_LOCKED,		"LOCKED" }, \
 	{ VM_FAULT_RETRY,		"RETRY" }, \
 	{ VM_FAULT_FALLBACK,		"FALLBACK" }, \
-	{ VM_FAULT_DONE_COW,		"DONE_COW" }
+	{ VM_FAULT_DONE_COW,		"DONE_COW" }, \
+	{ VM_FAULT_NEEDDSYNC,		"NEEDDSYNC" }
 
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite()
  2017-08-17 16:08 ` Jan Kara
  (?)
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Implement a function that inserts a writeable page table entry (PTE or
PMD) and takes care of marking it dirty in the radix tree. This function
will be used to finish synchronous page fault.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h           |  2 ++
 include/trace/events/fs_dax.h |  2 ++
 3 files changed, 56 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index ca88fc356786..ca81084f6608 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1467,3 +1467,55 @@ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 	}
 }
 EXPORT_SYMBOL_GPL(dax_iomap_fault);
+
+/**
+ * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables
+ * @vmf: The description of the fault
+ * @pe_size: Size of entry to be inserted
+ * @pfn: PFN to insert
+ *
+ * This function inserts writeable PTE or PMD entry into page tables for mmaped
+ * DAX file.  It takes care of marking corresponding radix tree entry as dirty
+ * as well.
+ */
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	void *entry, **slot;
+	pgoff_t index = vmf->pgoff;
+	int vmf_ret, error;
+
+	spin_lock_irq(&mapping->tree_lock);
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	/* Did we race with someone splitting entry or so? */
+	if (!entry || (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
+	    (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
+		put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
+						      VM_FAULT_NOPAGE);
+		return VM_FAULT_NOPAGE;
+	}
+	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
+	entry = lock_slot(mapping, slot);
+	spin_unlock_irq(&mapping->tree_lock);
+	switch (pe_size) {
+	case PE_SIZE_PTE:
+		error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
+		vmf_ret = dax_fault_return(error);
+		break;
+#ifdef CONFIG_FS_DAX_PMD
+	case PE_SIZE_PMD:
+		vmf_ret = vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
+			pfn, true);
+		break;
+#endif
+	default:
+		vmf_ret = VM_FAULT_FALLBACK;
+	}
+	put_locked_mapping_entry(mapping, index);
+	trace_dax_insert_pfn_mkwrite(mapping->host, vmf, vmf_ret);
+	return vmf_ret;
+}
+EXPORT_SYMBOL_GPL(dax_insert_pfn_mkwrite);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8f493d9879f7..d8e67995a958 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -92,6 +92,8 @@ ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 		    const struct iomap_ops *ops, pfn_t *pfnp);
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 88a9d19b8ff8..7725459fafef 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -190,6 +190,8 @@ DEFINE_EVENT(dax_pte_fault_class, name, \
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault);
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done);
 DEFINE_PTE_FAULT_EVENT(dax_load_hole);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite_no_entry);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite);
 
 TRACE_EVENT(dax_insert_mapping,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, void *radix_entry),
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Implement a function that inserts a writeable page table entry (PTE or
PMD) and takes care of marking it dirty in the radix tree. This function
will be used to finish synchronous page fault.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h           |  2 ++
 include/trace/events/fs_dax.h |  2 ++
 3 files changed, 56 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index ca88fc356786..ca81084f6608 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1467,3 +1467,55 @@ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 	}
 }
 EXPORT_SYMBOL_GPL(dax_iomap_fault);
+
+/**
+ * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables
+ * @vmf: The description of the fault
+ * @pe_size: Size of entry to be inserted
+ * @pfn: PFN to insert
+ *
+ * This function inserts writeable PTE or PMD entry into page tables for mmaped
+ * DAX file.  It takes care of marking corresponding radix tree entry as dirty
+ * as well.
+ */
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	void *entry, **slot;
+	pgoff_t index = vmf->pgoff;
+	int vmf_ret, error;
+
+	spin_lock_irq(&mapping->tree_lock);
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	/* Did we race with someone splitting entry or so? */
+	if (!entry || (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
+	    (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
+		put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
+						      VM_FAULT_NOPAGE);
+		return VM_FAULT_NOPAGE;
+	}
+	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
+	entry = lock_slot(mapping, slot);
+	spin_unlock_irq(&mapping->tree_lock);
+	switch (pe_size) {
+	case PE_SIZE_PTE:
+		error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
+		vmf_ret = dax_fault_return(error);
+		break;
+#ifdef CONFIG_FS_DAX_PMD
+	case PE_SIZE_PMD:
+		vmf_ret = vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
+			pfn, true);
+		break;
+#endif
+	default:
+		vmf_ret = VM_FAULT_FALLBACK;
+	}
+	put_locked_mapping_entry(mapping, index);
+	trace_dax_insert_pfn_mkwrite(mapping->host, vmf, vmf_ret);
+	return vmf_ret;
+}
+EXPORT_SYMBOL_GPL(dax_insert_pfn_mkwrite);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8f493d9879f7..d8e67995a958 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -92,6 +92,8 @@ ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 		    const struct iomap_ops *ops, pfn_t *pfnp);
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 88a9d19b8ff8..7725459fafef 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -190,6 +190,8 @@ DEFINE_EVENT(dax_pte_fault_class, name, \
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault);
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done);
 DEFINE_PTE_FAULT_EVENT(dax_load_hole);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite_no_entry);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite);
 
 TRACE_EVENT(dax_insert_mapping,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, void *radix_entry),
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

Implement a function that inserts a writeable page table entry (PTE or
PMD) and takes care of marking it dirty in the radix tree. This function
will be used to finish synchronous page fault.

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/dax.c                      | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h           |  2 ++
 include/trace/events/fs_dax.h |  2 ++
 3 files changed, 56 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index ca88fc356786..ca81084f6608 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1467,3 +1467,55 @@ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 	}
 }
 EXPORT_SYMBOL_GPL(dax_iomap_fault);
+
+/**
+ * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables
+ * @vmf: The description of the fault
+ * @pe_size: Size of entry to be inserted
+ * @pfn: PFN to insert
+ *
+ * This function inserts writeable PTE or PMD entry into page tables for mmaped
+ * DAX file.  It takes care of marking corresponding radix tree entry as dirty
+ * as well.
+ */
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	void *entry, **slot;
+	pgoff_t index = vmf->pgoff;
+	int vmf_ret, error;
+
+	spin_lock_irq(&mapping->tree_lock);
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	/* Did we race with someone splitting entry or so? */
+	if (!entry || (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
+	    (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
+		put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
+						      VM_FAULT_NOPAGE);
+		return VM_FAULT_NOPAGE;
+	}
+	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
+	entry = lock_slot(mapping, slot);
+	spin_unlock_irq(&mapping->tree_lock);
+	switch (pe_size) {
+	case PE_SIZE_PTE:
+		error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
+		vmf_ret = dax_fault_return(error);
+		break;
+#ifdef CONFIG_FS_DAX_PMD
+	case PE_SIZE_PMD:
+		vmf_ret = vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
+			pfn, true);
+		break;
+#endif
+	default:
+		vmf_ret = VM_FAULT_FALLBACK;
+	}
+	put_locked_mapping_entry(mapping, index);
+	trace_dax_insert_pfn_mkwrite(mapping->host, vmf, vmf_ret);
+	return vmf_ret;
+}
+EXPORT_SYMBOL_GPL(dax_insert_pfn_mkwrite);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8f493d9879f7..d8e67995a958 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -92,6 +92,8 @@ ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 		    const struct iomap_ops *ops, pfn_t *pfnp);
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 88a9d19b8ff8..7725459fafef 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -190,6 +190,8 @@ DEFINE_EVENT(dax_pte_fault_class, name, \
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault);
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done);
 DEFINE_PTE_FAULT_EVENT(dax_load_hole);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite_no_entry);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite);
 
 TRACE_EVENT(dax_insert_mapping,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, void *radix_entry),
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite()
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

Implement a function that inserts a writeable page table entry (PTE or
PMD) and takes care of marking it dirty in the radix tree. This function
will be used to finish synchronous page fault.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h           |  2 ++
 include/trace/events/fs_dax.h |  2 ++
 3 files changed, 56 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index ca88fc356786..ca81084f6608 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1467,3 +1467,55 @@ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 	}
 }
 EXPORT_SYMBOL_GPL(dax_iomap_fault);
+
+/**
+ * dax_insert_pfn_mkwrite - insert PTE or PMD entry into page tables
+ * @vmf: The description of the fault
+ * @pe_size: Size of entry to be inserted
+ * @pfn: PFN to insert
+ *
+ * This function inserts writeable PTE or PMD entry into page tables for mmaped
+ * DAX file.  It takes care of marking corresponding radix tree entry as dirty
+ * as well.
+ */
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	void *entry, **slot;
+	pgoff_t index = vmf->pgoff;
+	int vmf_ret, error;
+
+	spin_lock_irq(&mapping->tree_lock);
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	/* Did we race with someone splitting entry or so? */
+	if (!entry || (pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
+	    (pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
+		put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
+						      VM_FAULT_NOPAGE);
+		return VM_FAULT_NOPAGE;
+	}
+	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
+	entry = lock_slot(mapping, slot);
+	spin_unlock_irq(&mapping->tree_lock);
+	switch (pe_size) {
+	case PE_SIZE_PTE:
+		error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
+		vmf_ret = dax_fault_return(error);
+		break;
+#ifdef CONFIG_FS_DAX_PMD
+	case PE_SIZE_PMD:
+		vmf_ret = vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
+			pfn, true);
+		break;
+#endif
+	default:
+		vmf_ret = VM_FAULT_FALLBACK;
+	}
+	put_locked_mapping_entry(mapping, index);
+	trace_dax_insert_pfn_mkwrite(mapping->host, vmf, vmf_ret);
+	return vmf_ret;
+}
+EXPORT_SYMBOL_GPL(dax_insert_pfn_mkwrite);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8f493d9879f7..d8e67995a958 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -92,6 +92,8 @@ ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 		    const struct iomap_ops *ops, pfn_t *pfnp);
+int dax_insert_pfn_mkwrite(struct vm_fault *vmf, enum page_entry_size pe_size,
+			   pfn_t pfn);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 88a9d19b8ff8..7725459fafef 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -190,6 +190,8 @@ DEFINE_EVENT(dax_pte_fault_class, name, \
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault);
 DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done);
 DEFINE_PTE_FAULT_EVENT(dax_load_hole);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite_no_entry);
+DEFINE_PTE_FAULT_EVENT(dax_insert_pfn_mkwrite);
 
 TRACE_EVENT(dax_insert_mapping,
 	TP_PROTO(struct inode *inode, struct vm_fault *vmf, void *radix_entry),
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-17 16:08 ` Jan Kara
  (?)
@ 2017-08-17 16:08   ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
synchronous write fault when inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case,
call vfs_fsync_range() to make sure all metadata is committed, and call
dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
dirty corresponding radix tree entry which is what we want - fsync(2)
will still provide data integrity guarantees for applications not using
userspace flushing. And applications using userspace flushing can avoid
calling fsync(2) and thus avoid the performance overhead.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
 fs/ext4/inode.c      |  4 ++++
 fs/jbd2/journal.c    | 17 +++++++++++++++++
 include/linux/jbd2.h |  1 +
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 850037e140d7..3765c4ed1368 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct super_block *sb = inode->i_sb;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	pfn_t pfn;
 
 	if (write) {
 		sb_start_pagefault(sb);
@@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
 					       EXT4_DATA_TRANS_BLOCKS(sb));
+		if (IS_ERR(handle)) {
+			up_read(&EXT4_I(inode)->i_mmap_sem);
+			sb_end_pagefault(sb);
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
-	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
-	else
-		result = VM_FAULT_SIGBUS;
+	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
 	if (write) {
-		if (!IS_ERR(handle))
-			ext4_journal_stop(handle);
+		ext4_journal_stop(handle);
+		/* Write fault but PFN mapped only RO? */
+		if (result & VM_FAULT_NEEDDSYNC) {
+			int err;
+			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
+			size_t len = 0;
+
+			if (pe_size == PE_SIZE_PTE)
+				len = PAGE_SIZE;
+#ifdef CONFIG_FS_DAX_PMD
+			else if (pe_size == PE_SIZE_PMD)
+				len = HPAGE_PMD_SIZE;
+#endif
+			else
+				WARN_ON_ONCE(1);
+			err = vfs_fsync_range(vmf->vma->vm_file, start,
+					      start + len - 1, 1);
+			if (err)
+				result = VM_FAULT_SIGBUS;
+			else
+				result = dax_insert_pfn_mkwrite(vmf, pe_size,
+								pfn);
+		}
 		up_read(&EXT4_I(inode)->i_mmap_sem);
 		sb_end_pagefault(sb);
 	} else {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c600f02673f..7a7529c3f0c8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3429,6 +3429,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	}
 
 	iomap->flags = 0;
+	if ((flags & IOMAP_WRITE) &&
+	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
+					EXT4_I(inode)->i_datasync_tid))
+		iomap->flags |= IOMAP_F_NEEDDSYNC;
 	bdev = inode->i_sb->s_bdev;
 	iomap->bdev = bdev;
 	if (blk_queue_dax(bdev->bd_queue))
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 7d5ef3bf3f3e..fa8cde498b4b 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -738,6 +738,23 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
 	return err;
 }
 
+/* Return 1 when transaction with given tid has already committed. */
+int jbd2_transaction_committed(journal_t *journal, tid_t tid)
+{
+	int ret = 1;
+
+	read_lock(&journal->j_state_lock);
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == tid)
+		ret = 0;
+	if (journal->j_committing_transaction &&
+	    journal->j_committing_transaction->t_tid == tid)
+		ret = 0;
+	read_unlock(&journal->j_state_lock);
+	return ret;
+}
+EXPORT_SYMBOL(jbd2_transaction_committed);
+
 /*
  * When this function returns the transaction corresponding to tid
  * will be completed.  If the transaction has currently running, start
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 606b6bce3a5b..296d1e0ea87b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1367,6 +1367,7 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
+int jbd2_transaction_committed(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
 int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
-- 
2.12.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
synchronous write fault when inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case,
call vfs_fsync_range() to make sure all metadata is committed, and call
dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
dirty corresponding radix tree entry which is what we want - fsync(2)
will still provide data integrity guarantees for applications not using
userspace flushing. And applications using userspace flushing can avoid
calling fsync(2) and thus avoid the performance overhead.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
 fs/ext4/inode.c      |  4 ++++
 fs/jbd2/journal.c    | 17 +++++++++++++++++
 include/linux/jbd2.h |  1 +
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 850037e140d7..3765c4ed1368 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct super_block *sb = inode->i_sb;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	pfn_t pfn;
 
 	if (write) {
 		sb_start_pagefault(sb);
@@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
 					       EXT4_DATA_TRANS_BLOCKS(sb));
+		if (IS_ERR(handle)) {
+			up_read(&EXT4_I(inode)->i_mmap_sem);
+			sb_end_pagefault(sb);
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
-	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
-	else
-		result = VM_FAULT_SIGBUS;
+	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
 	if (write) {
-		if (!IS_ERR(handle))
-			ext4_journal_stop(handle);
+		ext4_journal_stop(handle);
+		/* Write fault but PFN mapped only RO? */
+		if (result & VM_FAULT_NEEDDSYNC) {
+			int err;
+			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
+			size_t len = 0;
+
+			if (pe_size == PE_SIZE_PTE)
+				len = PAGE_SIZE;
+#ifdef CONFIG_FS_DAX_PMD
+			else if (pe_size == PE_SIZE_PMD)
+				len = HPAGE_PMD_SIZE;
+#endif
+			else
+				WARN_ON_ONCE(1);
+			err = vfs_fsync_range(vmf->vma->vm_file, start,
+					      start + len - 1, 1);
+			if (err)
+				result = VM_FAULT_SIGBUS;
+			else
+				result = dax_insert_pfn_mkwrite(vmf, pe_size,
+								pfn);
+		}
 		up_read(&EXT4_I(inode)->i_mmap_sem);
 		sb_end_pagefault(sb);
 	} else {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c600f02673f..7a7529c3f0c8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3429,6 +3429,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	}
 
 	iomap->flags = 0;
+	if ((flags & IOMAP_WRITE) &&
+	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
+					EXT4_I(inode)->i_datasync_tid))
+		iomap->flags |= IOMAP_F_NEEDDSYNC;
 	bdev = inode->i_sb->s_bdev;
 	iomap->bdev = bdev;
 	if (blk_queue_dax(bdev->bd_queue))
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 7d5ef3bf3f3e..fa8cde498b4b 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -738,6 +738,23 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
 	return err;
 }
 
+/* Return 1 when transaction with given tid has already committed. */
+int jbd2_transaction_committed(journal_t *journal, tid_t tid)
+{
+	int ret = 1;
+
+	read_lock(&journal->j_state_lock);
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == tid)
+		ret = 0;
+	if (journal->j_committing_transaction &&
+	    journal->j_committing_transaction->t_tid == tid)
+		ret = 0;
+	read_unlock(&journal->j_state_lock);
+	return ret;
+}
+EXPORT_SYMBOL(jbd2_transaction_committed);
+
 /*
  * When this function returns the transaction corresponding to tid
  * will be completed.  If the transaction has currently running, start
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 606b6bce3a5b..296d1e0ea87b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1367,6 +1367,7 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
+int jbd2_transaction_committed(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
 int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-17 16:08   ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-17 16:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-nvdimm, Andy Lutomirski, linux-ext4, linux-xfs,
	Christoph Hellwig, Ross Zwisler, Dan Williams, Boaz Harrosh,
	Jan Kara

We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
synchronous write fault when inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case,
call vfs_fsync_range() to make sure all metadata is committed, and call
dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
dirty corresponding radix tree entry which is what we want - fsync(2)
will still provide data integrity guarantees for applications not using
userspace flushing. And applications using userspace flushing can avoid
calling fsync(2) and thus avoid the performance overhead.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
 fs/ext4/inode.c      |  4 ++++
 fs/jbd2/journal.c    | 17 +++++++++++++++++
 include/linux/jbd2.h |  1 +
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 850037e140d7..3765c4ed1368 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct super_block *sb = inode->i_sb;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	pfn_t pfn;
 
 	if (write) {
 		sb_start_pagefault(sb);
@@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
 					       EXT4_DATA_TRANS_BLOCKS(sb));
+		if (IS_ERR(handle)) {
+			up_read(&EXT4_I(inode)->i_mmap_sem);
+			sb_end_pagefault(sb);
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
-	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
-	else
-		result = VM_FAULT_SIGBUS;
+	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
 	if (write) {
-		if (!IS_ERR(handle))
-			ext4_journal_stop(handle);
+		ext4_journal_stop(handle);
+		/* Write fault but PFN mapped only RO? */
+		if (result & VM_FAULT_NEEDDSYNC) {
+			int err;
+			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
+			size_t len = 0;
+
+			if (pe_size == PE_SIZE_PTE)
+				len = PAGE_SIZE;
+#ifdef CONFIG_FS_DAX_PMD
+			else if (pe_size == PE_SIZE_PMD)
+				len = HPAGE_PMD_SIZE;
+#endif
+			else
+				WARN_ON_ONCE(1);
+			err = vfs_fsync_range(vmf->vma->vm_file, start,
+					      start + len - 1, 1);
+			if (err)
+				result = VM_FAULT_SIGBUS;
+			else
+				result = dax_insert_pfn_mkwrite(vmf, pe_size,
+								pfn);
+		}
 		up_read(&EXT4_I(inode)->i_mmap_sem);
 		sb_end_pagefault(sb);
 	} else {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c600f02673f..7a7529c3f0c8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3429,6 +3429,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	}
 
 	iomap->flags = 0;
+	if ((flags & IOMAP_WRITE) &&
+	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
+					EXT4_I(inode)->i_datasync_tid))
+		iomap->flags |= IOMAP_F_NEEDDSYNC;
 	bdev = inode->i_sb->s_bdev;
 	iomap->bdev = bdev;
 	if (blk_queue_dax(bdev->bd_queue))
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 7d5ef3bf3f3e..fa8cde498b4b 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -738,6 +738,23 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
 	return err;
 }
 
+/* Return 1 when transaction with given tid has already committed. */
+int jbd2_transaction_committed(journal_t *journal, tid_t tid)
+{
+	int ret = 1;
+
+	read_lock(&journal->j_state_lock);
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == tid)
+		ret = 0;
+	if (journal->j_committing_transaction &&
+	    journal->j_committing_transaction->t_tid == tid)
+		ret = 0;
+	read_unlock(&journal->j_state_lock);
+	return ret;
+}
+EXPORT_SYMBOL(jbd2_transaction_committed);
+
 /*
  * When this function returns the transaction corresponding to tid
  * will be completed.  If the transaction has currently running, start
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 606b6bce3a5b..296d1e0ea87b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1367,6 +1367,7 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
+int jbd2_transaction_committed(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
 int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 03/13] dax: Factor out getting of pfn out of iomap
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-18 22:06     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:05PM +0200, Jan Kara wrote:
> Factor out code to get pfn out of iomap that is shared between PTE and
> PMD fault path.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Yep, this looks correct to me.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 03/13] dax: Factor out getting of pfn out of iomap
@ 2017-08-18 22:06     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:05PM +0200, Jan Kara wrote:
> Factor out code to get pfn out of iomap that is shared between PTE and
> PMD fault path.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Yep, this looks correct to me.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-18 22:08     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:06PM +0200, Jan Kara wrote:
> There are already two users and more are coming.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Sure.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
@ 2017-08-18 22:08     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:06PM +0200, Jan Kara wrote:
> There are already two users and more are coming.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Sure.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-18 22:08     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:07PM +0200, Jan Kara wrote:
> There are already two users and more are coming.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Sure, and this is much nicer to read.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-18 22:08     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:07PM +0200, Jan Kara wrote:
> There are already two users and more are coming.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Sure, and this is much nicer to read.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-18 22:08     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 17, 2017 at 06:08:07PM +0200, Jan Kara wrote:
> There are already two users and more are coming.
> 
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>

Sure, and this is much nicer to read.

Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-18 22:10     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:08PM +0200, Jan Kara wrote:
> dax_insert_mapping() has only one callsite and we will need to further
> fine tune what it does for synchronous faults. Just inline it into the
> callsite so that we don't have to pass awkward bools around.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-18 22:10     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:08PM +0200, Jan Kara wrote:
> dax_insert_mapping() has only one callsite and we will need to further
> fine tune what it does for synchronous faults. Just inline it into the
> callsite so that we don't have to pass awkward bools around.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-18 22:10     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 17, 2017 at 06:08:08PM +0200, Jan Kara wrote:
> dax_insert_mapping() has only one callsite and we will need to further
> fine tune what it does for synchronous faults. Just inline it into the
> callsite so that we don't have to pass awkward bools around.
> 
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>

Looks good.

Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-18 22:12     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:09PM +0200, Jan Kara wrote:
> dax_pmd_insert_mapping() has only one callsite and we will need to
> further fine tune what it does for synchronous faults. Just inline it
> into the callsite so that we don't have to pass awkward bools around.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Sure, looks good.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
@ 2017-08-18 22:12     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:09PM +0200, Jan Kara wrote:
> dax_pmd_insert_mapping() has only one callsite and we will need to
> further fine tune what it does for synchronous faults. Just inline it
> into the callsite so that we don't have to pass awkward bools around.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Sure, looks good.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
@ 2017-08-18 22:12     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 17, 2017 at 06:08:09PM +0200, Jan Kara wrote:
> dax_pmd_insert_mapping() has only one callsite and we will need to
> further fine tune what it does for synchronous faults. Just inline it
> into the callsite so that we don't have to pass awkward bools around.
> 
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>

Sure, looks good.

Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-18 22:12     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:10PM +0200, Jan Kara wrote:
> Add missing argument description.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
@ 2017-08-18 22:12     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-18 22:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:10PM +0200, Jan Kara wrote:
> Add missing argument description.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-21 18:45     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 18:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:11PM +0200, Jan Kara wrote:
> For synchronous page fault dax_iomap_fault() will need to return PFN
> which will then need to inserted into page tables after fsync()
> completes. Add necessary parameter to dax_iomap_fault().
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Yep, this seems like a nice, straightforward way of doing things.  I like this
better than the vmf->orig_pte solution from the previous RFC.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-21 18:45     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 18:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:11PM +0200, Jan Kara wrote:
> For synchronous page fault dax_iomap_fault() will need to return PFN
> which will then need to inserted into page tables after fsync()
> completes. Add necessary parameter to dax_iomap_fault().
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Yep, this seems like a nice, straightforward way of doing things.  I like this
better than the vmf->orig_pte solution from the previous RFC.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-21 18:58     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 18:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> Add a flag to iomap interface informing the caller that inode needs
> fdstasync(2) for returned extent to become persistent and use it in DAX
> fault code so that we map such extents only read only. We propagate the
> information that the page table entry has been inserted write-protected
> from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> handler is then responsible for calling fdatasync(2) and updating page
> tables to map pfns read-write. dax_iomap_fault() also takes care of
> updating vmf->orig_pte to match the PTE that was inserted so that we can
> safely recheck that PTE did not change while write-enabling it.

This changelog needs a little love.  s/VM_FAULT_RO/VM_FAULT_NEEDDSYNC/, the
new path doesn't do the RO mapping, but instead just does the entire RW
mapping after the fdatasync is complete, the vmf->orig_pte manipulation went
away, etc.

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
>  include/linux/iomap.h |  2 ++
>  include/linux/mm.h    |  6 +++++-
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index bc040e654cc9..ca88fc356786 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
>  			goto error_finish_iomap;
>  		}
>  
> +		/*
> +		 * If we are doing synchronous page fault and inode needs fsync,
> +		 * we can insert PTE into page tables only after that happens.
> +		 * Skip insertion for now and return the pfn so that caller can
> +		 * insert it after fsync is done.
> +		 */
> +		if (write && (vma->vm_flags & VM_SYNC) &&
> +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {

Just a small nit, but I don't think we really need to check for 'write' here.
The fact that IOMAP_F_NEEDDSYNC is set tells us that we are doing a write.

	if ((flags & IOMAP_WRITE) &&
	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
					EXT4_I(inode)->i_datasync_tid))
		iomap->flags |= IOMAP_F_NEEDDSYNC;

Ditto for the PMD case.

With that one simplification and a cleaned up changelog, you can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> +			if (WARN_ON_ONCE(!pfnp)) {
> +				error = -EIO;
> +				goto error_finish_iomap;
> +			}
> +			*pfnp = pfn;
> +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> +			goto finish_iomap;
> +		}
>  		trace_dax_insert_mapping(inode, vmf, entry);
>  		if (write)
>  			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-21 18:58     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 18:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> Add a flag to iomap interface informing the caller that inode needs
> fdstasync(2) for returned extent to become persistent and use it in DAX
> fault code so that we map such extents only read only. We propagate the
> information that the page table entry has been inserted write-protected
> from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> handler is then responsible for calling fdatasync(2) and updating page
> tables to map pfns read-write. dax_iomap_fault() also takes care of
> updating vmf->orig_pte to match the PTE that was inserted so that we can
> safely recheck that PTE did not change while write-enabling it.

This changelog needs a little love.  s/VM_FAULT_RO/VM_FAULT_NEEDDSYNC/, the
new path doesn't do the RO mapping, but instead just does the entire RW
mapping after the fdatasync is complete, the vmf->orig_pte manipulation went
away, etc.

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
>  include/linux/iomap.h |  2 ++
>  include/linux/mm.h    |  6 +++++-
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index bc040e654cc9..ca88fc356786 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
>  			goto error_finish_iomap;
>  		}
>  
> +		/*
> +		 * If we are doing synchronous page fault and inode needs fsync,
> +		 * we can insert PTE into page tables only after that happens.
> +		 * Skip insertion for now and return the pfn so that caller can
> +		 * insert it after fsync is done.
> +		 */
> +		if (write && (vma->vm_flags & VM_SYNC) &&
> +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {

Just a small nit, but I don't think we really need to check for 'write' here.
The fact that IOMAP_F_NEEDDSYNC is set tells us that we are doing a write.

	if ((flags & IOMAP_WRITE) &&
	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
					EXT4_I(inode)->i_datasync_tid))
		iomap->flags |= IOMAP_F_NEEDDSYNC;

Ditto for the PMD case.

With that one simplification and a cleaned up changelog, you can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> +			if (WARN_ON_ONCE(!pfnp)) {
> +				error = -EIO;
> +				goto error_finish_iomap;
> +			}
> +			*pfnp = pfn;
> +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> +			goto finish_iomap;
> +		}
>  		trace_dax_insert_mapping(inode, vmf, entry);
>  		if (write)
>  			error = vm_insert_mixed_mkwrite(vma, vaddr, pfn);

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite()
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-21 19:01     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 19:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:14PM +0200, Jan Kara wrote:
> Implement a function that inserts a writeable page table entry (PTE or
> PMD) and takes care of marking it dirty in the radix tree. This function
> will be used to finish synchronous page fault.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Yep, this looks great.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite()
@ 2017-08-21 19:01     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 19:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:14PM +0200, Jan Kara wrote:
> Implement a function that inserts a writeable page table entry (PTE or
> PMD) and takes care of marking it dirty in the radix tree. This function
> will be used to finish synchronous page fault.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Yep, this looks great.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-21 19:19     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 19:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also

Need to fix up the above line a little -
s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
make it writeable.

> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
>  fs/ext4/inode.c      |  4 ++++
>  fs/jbd2/journal.c    | 17 +++++++++++++++++
>  include/linux/jbd2.h |  1 +
>  4 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 850037e140d7..3765c4ed1368 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	struct super_block *sb = inode->i_sb;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> +	pfn_t pfn;
>  
>  	if (write) {
>  		sb_start_pagefault(sb);
> @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}
>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
>  	if (write) {
> -		if (!IS_ERR(handle))
> -			ext4_journal_stop(handle);
> +		ext4_journal_stop(handle);
> +		/* Write fault but PFN mapped only RO? */

The above comment is out of date.

> +		if (result & VM_FAULT_NEEDDSYNC) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;

In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
are always the same (from include/linux/huge_mm.h, the only defintion of
HPAGE_PMD_SIZE):

#define HPAGE_PMD_SHIFT PMD_SHIFT
#define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)

and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
well.  I don't understand why we have both?

In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
manage to be different, I think we'd want PMD_SIZE anyway.

With those nits and an updated changelog:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-21 19:19     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 19:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also

Need to fix up the above line a little -
s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
make it writeable.

> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
>  fs/ext4/inode.c      |  4 ++++
>  fs/jbd2/journal.c    | 17 +++++++++++++++++
>  include/linux/jbd2.h |  1 +
>  4 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 850037e140d7..3765c4ed1368 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	struct super_block *sb = inode->i_sb;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> +	pfn_t pfn;
>  
>  	if (write) {
>  		sb_start_pagefault(sb);
> @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}
>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
>  	if (write) {
> -		if (!IS_ERR(handle))
> -			ext4_journal_stop(handle);
> +		ext4_journal_stop(handle);
> +		/* Write fault but PFN mapped only RO? */

The above comment is out of date.

> +		if (result & VM_FAULT_NEEDDSYNC) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;

In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
are always the same (from include/linux/huge_mm.h, the only defintion of
HPAGE_PMD_SIZE):

#define HPAGE_PMD_SHIFT PMD_SHIFT
#define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)

and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
well.  I don't understand why we have both?

In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
manage to be different, I think we'd want PMD_SIZE anyway.

With those nits and an updated changelog:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-21 21:09     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> Add a flag to iomap interface informing the caller that inode needs
> fdstasync(2) for returned extent to become persistent and use it in DAX
> fault code so that we map such extents only read only. We propagate the
> information that the page table entry has been inserted write-protected
> from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> handler is then responsible for calling fdatasync(2) and updating page
> tables to map pfns read-write. dax_iomap_fault() also takes care of
> updating vmf->orig_pte to match the PTE that was inserted so that we can
> safely recheck that PTE did not change while write-enabling it.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
>  include/linux/iomap.h |  2 ++
>  include/linux/mm.h    |  6 +++++-
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index bc040e654cc9..ca88fc356786 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
>  			goto error_finish_iomap;
>  		}
>  
> +		/*
> +		 * If we are doing synchronous page fault and inode needs fsync,
> +		 * we can insert PTE into page tables only after that happens.
> +		 * Skip insertion for now and return the pfn so that caller can
> +		 * insert it after fsync is done.
> +		 */
> +		if (write && (vma->vm_flags & VM_SYNC) &&
> +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> +			if (WARN_ON_ONCE(!pfnp)) {
> +				error = -EIO;
> +				goto error_finish_iomap;
> +			}
> +			*pfnp = pfn;
> +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> +			goto finish_iomap;
> +		}

Sorry for the second reply, but I spotted this during my testing.

The radix tree entry is inserted and marked as dirty by the
dax_insert_mapping_entry() call a few lines above this newly added block.

I think that this patch should prevent the radix tree entry that we insert
from being marked as dirty, and let the dax_insert_pfn_mkwrite() handler do
that work.  Right now it is being made dirty twice, which we don't need.

Just inserting the entry as clean here and then marking it as dirty later in
dax_insert_pfn_mkwrite() keeps the radix tree entry dirty state consistent
with the PTE dirty state.  It also solves an issue we have right now where the
newly inserted dirty entry will immediately be flushed as part of the
vfs_fsync_range() call that the filesystem will do before
dax_insert_pfn_mkwrite(). 

For example, here's a trace of a PMD write fault on a completely sparse file:

  dax_pmd_fault: dev 259:0 ino 0xc shared WRITE|ALLOW_RETRY|KILLABLE|USER
  address 0x7feab8e00000 vm_start 0x7feab8e00000 vm_end 0x7feab9000000 pgoff
  0x0 max_pgoff 0x1ff 
  
  dax_pmd_fault_done: dev 259:0 ino 0xc shared WRITE|ALLOW_RETRY|KILLABLE|USER
  address 0x7feab8e00000 vm_start 0x7feab8e00000 vm_end 0x7feab9000000 pgoff
  0x0 max_pgoff 0x1ff NEEDDSYNC
  
  dax_writeback_range: dev 259:0 ino 0xc pgoff 0x0-0x1ff
  
  dax_writeback_one: dev 259:0 ino 0xc pgoff 0x0 pglen 0x200
  
  dax_writeback_range_done: dev 259:0 ino 0xc pgoff 0x1-0x1ff
  
  dax_insert_pfn_mkwrite: dev 259:0 ino 0xc shared
  WRITE|ALLOW_RETRY|KILLABLE|USER address 0x7feab8e00000 pgoff 0x0 NOPAGE

The PMD that we are writing back with dax_writeback_one() is the one that we
just made dirty via the first 1/2 of the sync fault, before we've installed a
page table entry.  This fix might speed up some of your test measurements as
well.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-21 21:09     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> Add a flag to iomap interface informing the caller that inode needs
> fdstasync(2) for returned extent to become persistent and use it in DAX
> fault code so that we map such extents only read only. We propagate the
> information that the page table entry has been inserted write-protected
> from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> handler is then responsible for calling fdatasync(2) and updating page
> tables to map pfns read-write. dax_iomap_fault() also takes care of
> updating vmf->orig_pte to match the PTE that was inserted so that we can
> safely recheck that PTE did not change while write-enabling it.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
>  include/linux/iomap.h |  2 ++
>  include/linux/mm.h    |  6 +++++-
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index bc040e654cc9..ca88fc356786 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
>  			goto error_finish_iomap;
>  		}
>  
> +		/*
> +		 * If we are doing synchronous page fault and inode needs fsync,
> +		 * we can insert PTE into page tables only after that happens.
> +		 * Skip insertion for now and return the pfn so that caller can
> +		 * insert it after fsync is done.
> +		 */
> +		if (write && (vma->vm_flags & VM_SYNC) &&
> +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> +			if (WARN_ON_ONCE(!pfnp)) {
> +				error = -EIO;
> +				goto error_finish_iomap;
> +			}
> +			*pfnp = pfn;
> +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> +			goto finish_iomap;
> +		}

Sorry for the second reply, but I spotted this during my testing.

The radix tree entry is inserted and marked as dirty by the
dax_insert_mapping_entry() call a few lines above this newly added block.

I think that this patch should prevent the radix tree entry that we insert
from being marked as dirty, and let the dax_insert_pfn_mkwrite() handler do
that work.  Right now it is being made dirty twice, which we don't need.

Just inserting the entry as clean here and then marking it as dirty later in
dax_insert_pfn_mkwrite() keeps the radix tree entry dirty state consistent
with the PTE dirty state.  It also solves an issue we have right now where the
newly inserted dirty entry will immediately be flushed as part of the
vfs_fsync_range() call that the filesystem will do before
dax_insert_pfn_mkwrite(). 

For example, here's a trace of a PMD write fault on a completely sparse file:

  dax_pmd_fault: dev 259:0 ino 0xc shared WRITE|ALLOW_RETRY|KILLABLE|USER
  address 0x7feab8e00000 vm_start 0x7feab8e00000 vm_end 0x7feab9000000 pgoff
  0x0 max_pgoff 0x1ff 
  
  dax_pmd_fault_done: dev 259:0 ino 0xc shared WRITE|ALLOW_RETRY|KILLABLE|USER
  address 0x7feab8e00000 vm_start 0x7feab8e00000 vm_end 0x7feab9000000 pgoff
  0x0 max_pgoff 0x1ff NEEDDSYNC
  
  dax_writeback_range: dev 259:0 ino 0xc pgoff 0x0-0x1ff
  
  dax_writeback_one: dev 259:0 ino 0xc pgoff 0x0 pglen 0x200
  
  dax_writeback_range_done: dev 259:0 ino 0xc pgoff 0x1-0x1ff
  
  dax_insert_pfn_mkwrite: dev 259:0 ino 0xc shared
  WRITE|ALLOW_RETRY|KILLABLE|USER address 0x7feab8e00000 pgoff 0x0 NOPAGE

The PMD that we are writing back with dax_writeback_one() is the one that we
just made dirty via the first 1/2 of the sync fault, before we've installed a
page table entry.  This fix might speed up some of your test measurements as
well.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-21 21:37     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c                  | 2 ++
>  include/linux/mm.h              | 1 +
>  include/linux/mman.h            | 3 ++-
>  include/uapi/asm-generic/mman.h | 1 +
>  mm/mmap.c                       | 5 +++++
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f84bb29e941e..850037e140d7 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>  	} else {
>  		vma->vm_ops = &ext4_file_vm_ops;
> +		if (vma->vm_flags & VM_SYNC)
> +			return -EOPNOTSUPP;
>  	}
>  	return 0;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fa036093e76c..d0fb385414a4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -188,6 +188,7 @@ extern unsigned int kobjsize(const void *objp);
>  #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
>  #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
>  #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
> +#define VM_SYNC		0x00800000	/* Synchronous page faults */
>  #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
>  #define VM_ARCH_2	0x02000000
>  #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index c8367041fafd..c38279b651e5 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
>  {
>  	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
> -	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
> +	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> +	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
>  }
>  
>  unsigned long vm_commit_limit(void);
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..00e55627d2df 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
>  
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f19efcf75418..18453c04b09f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  				return -ENODEV;
>  			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
>  				return -EINVAL;
> +			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
> +				return -EINVAL;

I know this will be reworked with Dan's new mmap() interface, but I was
curious what the !(vm_flags & VM_SHARED) check here was for.  We're in a
MAP_PRIVATE case, so is it ever possible for VM_SHARED to be set in vm_flags?
I tried to make this happen with some various test scenarios, but wasn't able.

>  			break;
>  
>  		default:
>  			return -EINVAL;
>  		}
>  	} else {
> +		if (vm_flags & VM_SYNC)
> +			return -EINVAL;
> +
>  		switch (flags & MAP_TYPE) {
>  		case MAP_SHARED:
>  			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
> -- 
> 2.12.3
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-21 21:37     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c                  | 2 ++
>  include/linux/mm.h              | 1 +
>  include/linux/mman.h            | 3 ++-
>  include/uapi/asm-generic/mman.h | 1 +
>  mm/mmap.c                       | 5 +++++
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f84bb29e941e..850037e140d7 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>  	} else {
>  		vma->vm_ops = &ext4_file_vm_ops;
> +		if (vma->vm_flags & VM_SYNC)
> +			return -EOPNOTSUPP;
>  	}
>  	return 0;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fa036093e76c..d0fb385414a4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -188,6 +188,7 @@ extern unsigned int kobjsize(const void *objp);
>  #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
>  #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
>  #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
> +#define VM_SYNC		0x00800000	/* Synchronous page faults */
>  #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
>  #define VM_ARCH_2	0x02000000
>  #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index c8367041fafd..c38279b651e5 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
>  {
>  	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
> -	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
> +	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> +	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
>  }
>  
>  unsigned long vm_commit_limit(void);
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..00e55627d2df 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
>  
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f19efcf75418..18453c04b09f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  				return -ENODEV;
>  			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
>  				return -EINVAL;
> +			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
> +				return -EINVAL;

I know this will be reworked with Dan's new mmap() interface, but I was
curious what the !(vm_flags & VM_SHARED) check here was for.  We're in a
MAP_PRIVATE case, so is it ever possible for VM_SHARED to be set in vm_flags?
I tried to make this happen with some various test scenarios, but wasn't able.

>  			break;
>  
>  		default:
>  			return -EINVAL;
>  		}
>  	} else {
> +		if (vm_flags & VM_SYNC)
> +			return -EINVAL;
> +
>  		switch (flags & MAP_TYPE) {
>  		case MAP_SHARED:
>  			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
> -- 
> 2.12.3
> 

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-21 21:57     ` Ross Zwisler
  -1 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

One other thing that should probably be wired up before this is all said and
done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
flag it ends up as ??:

7f44e6cbd000-7f44e6dbd000 rw-s 00000000 103:00 12              /root/dax/data
Size:               1024 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms ?? mm hg 

The quick one-liner at the end of this patch changes that to:

7fe30e87f000-7fe30e97f000 rw-s 00000000 103:00 12               /root/dax/data
Size:               1024 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms sf mm hg

I think that of software can rely on this flag for userspace flushing without
worrying about any new TOCTOU races because there isn't a way to unset the
VM_SYNC flag once it is set - it should be valid as long as the mmap() remains
open and the mmap'd address is valid.

--- 8< ---
 fs/proc/task_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b836fd6..a2a82ed 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -650,6 +650,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_ACCOUNT)]	= "ac",
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
+		[ilog2(VM_SYNC)]	= "sf",
 		[ilog2(VM_ARCH_1)]	= "ar",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-21 21:57     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

One other thing that should probably be wired up before this is all said and
done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
flag it ends up as ??:

7f44e6cbd000-7f44e6dbd000 rw-s 00000000 103:00 12              /root/dax/data
Size:               1024 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms ?? mm hg 

The quick one-liner at the end of this patch changes that to:

7fe30e87f000-7fe30e97f000 rw-s 00000000 103:00 12               /root/dax/data
Size:               1024 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms sf mm hg

I think that of software can rely on this flag for userspace flushing without
worrying about any new TOCTOU races because there isn't a way to unset the
VM_SYNC flag once it is set - it should be valid as long as the mmap() remains
open and the mmap'd address is valid.

--- 8< ---
 fs/proc/task_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b836fd6..a2a82ed 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -650,6 +650,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_ACCOUNT)]	= "ac",
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
+		[ilog2(VM_SYNC)]	= "sf",
 		[ilog2(VM_ARCH_1)]	= "ar",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-21 21:57     ` Ross Zwisler
  0 siblings, 0 replies; 142+ messages in thread
From: Ross Zwisler @ 2017-08-21 21:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>

One other thing that should probably be wired up before this is all said and
done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
flag it ends up as ??:

7f44e6cbd000-7f44e6dbd000 rw-s 00000000 103:00 12              /root/dax/data
Size:               1024 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms ?? mm hg 

The quick one-liner at the end of this patch changes that to:

7fe30e87f000-7fe30e97f000 rw-s 00000000 103:00 12               /root/dax/data
Size:               1024 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms sf mm hg

I think that of software can rely on this flag for userspace flushing without
worrying about any new TOCTOU races because there isn't a way to unset the
VM_SYNC flag once it is set - it should be valid as long as the mmap() remains
open and the mmap'd address is valid.

--- 8< ---
 fs/proc/task_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b836fd6..a2a82ed 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -650,6 +650,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_ACCOUNT)]	= "ac",
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
+		[ilog2(VM_SYNC)]	= "sf",
 		[ilog2(VM_ARCH_1)]	= "ar",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-21 21:57     ` Ross Zwisler
@ 2017-08-22  9:34       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22  9:34 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-fsdevel, linux-ext4

On Mon 21-08-17 15:57:03, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> > Pretty crude for now...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> One other thing that should probably be wired up before this is all said and
> done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
> flag it ends up as ??:

Thanks. Patch folded so that I don't forget about this when updating the
patch to the new interface :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-22  9:34       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22  9:34 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Christoph Hellwig, Dan Williams,
	Boaz Harrosh

On Mon 21-08-17 15:57:03, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> > Pretty crude for now...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> One other thing that should probably be wired up before this is all said and
> done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
> flag it ends up as ??:

Thanks. Patch folded so that I don't forget about this when updating the
patch to the new interface :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-21 21:37     ` Ross Zwisler
@ 2017-08-22  9:36       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22  9:36 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-fsdevel, linux-ext4

On Mon 21-08-17 15:37:04, Ross Zwisler wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f19efcf75418..18453c04b09f 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  				return -ENODEV;
> >  			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
> >  				return -EINVAL;
> > +			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
> > +				return -EINVAL;
> 
> I know this will be reworked with Dan's new mmap() interface, but I was
> curious what the !(vm_flags & VM_SHARED) check here was for.  We're in a
> MAP_PRIVATE case, so is it ever possible for VM_SHARED to be set in vm_flags?
> I tried to make this happen with some various test scenarios, but wasn't able.

I was also caught by this :). Check how MAP_SHARED case above falls through
to the MAP_PRIVATE case...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-22  9:36       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22  9:36 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Christoph Hellwig, Dan Williams,
	Boaz Harrosh

On Mon 21-08-17 15:37:04, Ross Zwisler wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f19efcf75418..18453c04b09f 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1423,12 +1423,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  				return -ENODEV;
> >  			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
> >  				return -EINVAL;
> > +			if (!(vm_flags & VM_SHARED) && (vm_flags & VM_SYNC))
> > +				return -EINVAL;
> 
> I know this will be reworked with Dan's new mmap() interface, but I was
> curious what the !(vm_flags & VM_SHARED) check here was for.  We're in a
> MAP_PRIVATE case, so is it ever possible for VM_SHARED to be set in vm_flags?
> I tried to make this happen with some various test scenarios, but wasn't able.

I was also caught by this :). Check how MAP_SHARED case above falls through
to the MAP_PRIVATE case...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-21 18:58     ` Ross Zwisler
@ 2017-08-22  9:46       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22  9:46 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-fsdevel, linux-ext4

On Mon 21-08-17 12:58:30, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> > Add a flag to iomap interface informing the caller that inode needs
> > fdstasync(2) for returned extent to become persistent and use it in DAX
> > fault code so that we map such extents only read only. We propagate the
> > information that the page table entry has been inserted write-protected
> > from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> > handler is then responsible for calling fdatasync(2) and updating page
> > tables to map pfns read-write. dax_iomap_fault() also takes care of
> > updating vmf->orig_pte to match the PTE that was inserted so that we can
> > safely recheck that PTE did not change while write-enabling it.
> 
> This changelog needs a little love.  s/VM_FAULT_RO/VM_FAULT_NEEDDSYNC/, the
> new path doesn't do the RO mapping, but instead just does the entire RW
> mapping after the fdatasync is complete, the vmf->orig_pte manipulation went
> away, etc.

Yeah, fixed. Thanks for noticing.

> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
> >  include/linux/iomap.h |  2 ++
> >  include/linux/mm.h    |  6 +++++-
> >  3 files changed, 38 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bc040e654cc9..ca88fc356786 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
> >  			goto error_finish_iomap;
> >  		}
> >  
> > +		/*
> > +		 * If we are doing synchronous page fault and inode needs fsync,
> > +		 * we can insert PTE into page tables only after that happens.
> > +		 * Skip insertion for now and return the pfn so that caller can
> > +		 * insert it after fsync is done.
> > +		 */
> > +		if (write && (vma->vm_flags & VM_SYNC) &&
> > +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> 
> Just a small nit, but I don't think we really need to check for 'write' here.
> The fact that IOMAP_F_NEEDDSYNC is set tells us that we are doing a write.
> 
> 	if ((flags & IOMAP_WRITE) &&
> 	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
> 					EXT4_I(inode)->i_datasync_tid))
> 		iomap->flags |= IOMAP_F_NEEDDSYNC;
> 
> Ditto for the PMD case.

OK, done.

> With that one simplification and a cleaned up changelog, you can add:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-22  9:46       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22  9:46 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Christoph Hellwig, Dan Williams,
	Boaz Harrosh

On Mon 21-08-17 12:58:30, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> > Add a flag to iomap interface informing the caller that inode needs
> > fdstasync(2) for returned extent to become persistent and use it in DAX
> > fault code so that we map such extents only read only. We propagate the
> > information that the page table entry has been inserted write-protected
> > from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> > handler is then responsible for calling fdatasync(2) and updating page
> > tables to map pfns read-write. dax_iomap_fault() also takes care of
> > updating vmf->orig_pte to match the PTE that was inserted so that we can
> > safely recheck that PTE did not change while write-enabling it.
> 
> This changelog needs a little love.  s/VM_FAULT_RO/VM_FAULT_NEEDDSYNC/, the
> new path doesn't do the RO mapping, but instead just does the entire RW
> mapping after the fdatasync is complete, the vmf->orig_pte manipulation went
> away, etc.

Yeah, fixed. Thanks for noticing.

> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
> >  include/linux/iomap.h |  2 ++
> >  include/linux/mm.h    |  6 +++++-
> >  3 files changed, 38 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bc040e654cc9..ca88fc356786 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
> >  			goto error_finish_iomap;
> >  		}
> >  
> > +		/*
> > +		 * If we are doing synchronous page fault and inode needs fsync,
> > +		 * we can insert PTE into page tables only after that happens.
> > +		 * Skip insertion for now and return the pfn so that caller can
> > +		 * insert it after fsync is done.
> > +		 */
> > +		if (write && (vma->vm_flags & VM_SYNC) &&
> > +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> 
> Just a small nit, but I don't think we really need to check for 'write' here.
> The fact that IOMAP_F_NEEDDSYNC is set tells us that we are doing a write.
> 
> 	if ((flags & IOMAP_WRITE) &&
> 	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
> 					EXT4_I(inode)->i_datasync_tid))
> 		iomap->flags |= IOMAP_F_NEEDDSYNC;
> 
> Ditto for the PMD case.

OK, done.

> With that one simplification and a cleaned up changelog, you can add:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-21 21:09     ` Ross Zwisler
  (?)
@ 2017-08-22 10:08       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22 10:08 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-fsdevel, linux-ext4

On Mon 21-08-17 15:09:16, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> > Add a flag to iomap interface informing the caller that inode needs
> > fdstasync(2) for returned extent to become persistent and use it in DAX
> > fault code so that we map such extents only read only. We propagate the
> > information that the page table entry has been inserted write-protected
> > from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> > handler is then responsible for calling fdatasync(2) and updating page
> > tables to map pfns read-write. dax_iomap_fault() also takes care of
> > updating vmf->orig_pte to match the PTE that was inserted so that we can
> > safely recheck that PTE did not change while write-enabling it.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
> >  include/linux/iomap.h |  2 ++
> >  include/linux/mm.h    |  6 +++++-
> >  3 files changed, 38 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bc040e654cc9..ca88fc356786 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
> >  			goto error_finish_iomap;
> >  		}
> >  
> > +		/*
> > +		 * If we are doing synchronous page fault and inode needs fsync,
> > +		 * we can insert PTE into page tables only after that happens.
> > +		 * Skip insertion for now and return the pfn so that caller can
> > +		 * insert it after fsync is done.
> > +		 */
> > +		if (write && (vma->vm_flags & VM_SYNC) &&
> > +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> > +			if (WARN_ON_ONCE(!pfnp)) {
> > +				error = -EIO;
> > +				goto error_finish_iomap;
> > +			}
> > +			*pfnp = pfn;
> > +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> > +			goto finish_iomap;
> > +		}
> 
> Sorry for the second reply, but I spotted this during my testing.
> 
> The radix tree entry is inserted and marked as dirty by the
> dax_insert_mapping_entry() call a few lines above this newly added block.

Yes I know and this is actually deliberate. My original thinking was that
we *want* following vfs_fsync_range() to flush out any changes that are
possibly lingering in CPU caches for the block. Now thinking about this
again, changes through write(2) or pre-zeroing of block are non-temporal
anyway and changes through mmap(2) will already dirty the radix tree entry
so it seems you are right that we don't need the dirtying here. I'll change
this. Thanks for asking.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-22 10:08       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22 10:08 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Christoph Hellwig, Dan Williams,
	Boaz Harrosh

On Mon 21-08-17 15:09:16, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> > Add a flag to iomap interface informing the caller that inode needs
> > fdstasync(2) for returned extent to become persistent and use it in DAX
> > fault code so that we map such extents only read only. We propagate the
> > information that the page table entry has been inserted write-protected
> > from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> > handler is then responsible for calling fdatasync(2) and updating page
> > tables to map pfns read-write. dax_iomap_fault() also takes care of
> > updating vmf->orig_pte to match the PTE that was inserted so that we can
> > safely recheck that PTE did not change while write-enabling it.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
> >  include/linux/iomap.h |  2 ++
> >  include/linux/mm.h    |  6 +++++-
> >  3 files changed, 38 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bc040e654cc9..ca88fc356786 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
> >  			goto error_finish_iomap;
> >  		}
> >  
> > +		/*
> > +		 * If we are doing synchronous page fault and inode needs fsync,
> > +		 * we can insert PTE into page tables only after that happens.
> > +		 * Skip insertion for now and return the pfn so that caller can
> > +		 * insert it after fsync is done.
> > +		 */
> > +		if (write && (vma->vm_flags & VM_SYNC) &&
> > +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> > +			if (WARN_ON_ONCE(!pfnp)) {
> > +				error = -EIO;
> > +				goto error_finish_iomap;
> > +			}
> > +			*pfnp = pfn;
> > +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> > +			goto finish_iomap;
> > +		}
> 
> Sorry for the second reply, but I spotted this during my testing.
> 
> The radix tree entry is inserted and marked as dirty by the
> dax_insert_mapping_entry() call a few lines above this newly added block.

Yes I know and this is actually deliberate. My original thinking was that
we *want* following vfs_fsync_range() to flush out any changes that are
possibly lingering in CPU caches for the block. Now thinking about this
again, changes through write(2) or pre-zeroing of block are non-temporal
anyway and changes through mmap(2) will already dirty the radix tree entry
so it seems you are right that we don't need the dirtying here. I'll change
this. Thanks for asking.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-22 10:08       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22 10:08 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Mon 21-08-17 15:09:16, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> > Add a flag to iomap interface informing the caller that inode needs
> > fdstasync(2) for returned extent to become persistent and use it in DAX
> > fault code so that we map such extents only read only. We propagate the
> > information that the page table entry has been inserted write-protected
> > from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> > handler is then responsible for calling fdatasync(2) and updating page
> > tables to map pfns read-write. dax_iomap_fault() also takes care of
> > updating vmf->orig_pte to match the PTE that was inserted so that we can
> > safely recheck that PTE did not change while write-enabling it.
> > 
> > Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> > ---
> >  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
> >  include/linux/iomap.h |  2 ++
> >  include/linux/mm.h    |  6 +++++-
> >  3 files changed, 38 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bc040e654cc9..ca88fc356786 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
> >  			goto error_finish_iomap;
> >  		}
> >  
> > +		/*
> > +		 * If we are doing synchronous page fault and inode needs fsync,
> > +		 * we can insert PTE into page tables only after that happens.
> > +		 * Skip insertion for now and return the pfn so that caller can
> > +		 * insert it after fsync is done.
> > +		 */
> > +		if (write && (vma->vm_flags & VM_SYNC) &&
> > +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> > +			if (WARN_ON_ONCE(!pfnp)) {
> > +				error = -EIO;
> > +				goto error_finish_iomap;
> > +			}
> > +			*pfnp = pfn;
> > +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> > +			goto finish_iomap;
> > +		}
> 
> Sorry for the second reply, but I spotted this during my testing.
> 
> The radix tree entry is inserted and marked as dirty by the
> dax_insert_mapping_entry() call a few lines above this newly added block.

Yes I know and this is actually deliberate. My original thinking was that
we *want* following vfs_fsync_range() to flush out any changes that are
possibly lingering in CPU caches for the block. Now thinking about this
again, changes through write(2) or pre-zeroing of block are non-temporal
anyway and changes through mmap(2) will already dirty the radix tree entry
so it seems you are right that we don't need the dirtying here. I'll change
this. Thanks for asking.

								Honza
-- 
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-21 19:19     ` Ross Zwisler
  (?)
@ 2017-08-22 10:18       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22 10:18 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-fsdevel, linux-ext4

On Mon 21-08-17 13:19:48, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> 
> Need to fix up the above line a little -
> s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
> make it writeable.

Fixed up, thanks.

> >  	if (write) {
> > -		if (!IS_ERR(handle))
> > -			ext4_journal_stop(handle);
> > +		ext4_journal_stop(handle);
> > +		/* Write fault but PFN mapped only RO? */
> 
> The above comment is out of date.

Fixed.

> > +		if (result & VM_FAULT_NEEDDSYNC) {
> > +			int err;
> > +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> > +			size_t len = 0;
> > +
> > +			if (pe_size == PE_SIZE_PTE)
> > +				len = PAGE_SIZE;
> > +#ifdef CONFIG_FS_DAX_PMD
> > +			else if (pe_size == PE_SIZE_PMD)
> > +				len = HPAGE_PMD_SIZE;
> 
> In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
> are always the same (from include/linux/huge_mm.h, the only defintion of
> HPAGE_PMD_SIZE):
> 
> #define HPAGE_PMD_SHIFT PMD_SHIFT
> #define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)
> 
> and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
> well.  I don't understand why we have both?
> 
> In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
> ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
> manage to be different, I think we'd want PMD_SIZE anyway.

Yeah, I've changed that to PMD_SIZE.

> With those nits and an updated changelog:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-22 10:18       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22 10:18 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Christoph Hellwig, Dan Williams,
	Boaz Harrosh

On Mon 21-08-17 13:19:48, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> 
> Need to fix up the above line a little -
> s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
> make it writeable.

Fixed up, thanks.

> >  	if (write) {
> > -		if (!IS_ERR(handle))
> > -			ext4_journal_stop(handle);
> > +		ext4_journal_stop(handle);
> > +		/* Write fault but PFN mapped only RO? */
> 
> The above comment is out of date.

Fixed.

> > +		if (result & VM_FAULT_NEEDDSYNC) {
> > +			int err;
> > +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> > +			size_t len = 0;
> > +
> > +			if (pe_size == PE_SIZE_PTE)
> > +				len = PAGE_SIZE;
> > +#ifdef CONFIG_FS_DAX_PMD
> > +			else if (pe_size == PE_SIZE_PMD)
> > +				len = HPAGE_PMD_SIZE;
> 
> In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
> are always the same (from include/linux/huge_mm.h, the only defintion of
> HPAGE_PMD_SIZE):
> 
> #define HPAGE_PMD_SHIFT PMD_SHIFT
> #define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)
> 
> and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
> well.  I don't understand why we have both?
> 
> In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
> ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
> manage to be different, I think we'd want PMD_SIZE anyway.

Yeah, I've changed that to PMD_SIZE.

> With those nits and an updated changelog:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-22 10:18       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-22 10:18 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Christoph Hellwig, Boaz Harrosh, Jan Kara,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Mon 21-08-17 13:19:48, Ross Zwisler wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> 
> Need to fix up the above line a little -
> s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
> make it writeable.

Fixed up, thanks.

> >  	if (write) {
> > -		if (!IS_ERR(handle))
> > -			ext4_journal_stop(handle);
> > +		ext4_journal_stop(handle);
> > +		/* Write fault but PFN mapped only RO? */
> 
> The above comment is out of date.

Fixed.

> > +		if (result & VM_FAULT_NEEDDSYNC) {
> > +			int err;
> > +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> > +			size_t len = 0;
> > +
> > +			if (pe_size == PE_SIZE_PTE)
> > +				len = PAGE_SIZE;
> > +#ifdef CONFIG_FS_DAX_PMD
> > +			else if (pe_size == PE_SIZE_PMD)
> > +				len = HPAGE_PMD_SIZE;
> 
> In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
> are always the same (from include/linux/huge_mm.h, the only defintion of
> HPAGE_PMD_SIZE):
> 
> #define HPAGE_PMD_SHIFT PMD_SHIFT
> #define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)
> 
> and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
> well.  I don't understand why we have both?
> 
> In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
> ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
> manage to be different, I think we'd want PMD_SIZE anyway.

Yeah, I've changed that to PMD_SIZE.

> With those nits and an updated changelog:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

Thanks!

								Honza

-- 
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-21 21:57     ` Ross Zwisler
  (?)
@ 2017-08-22 17:27       ` Dan Williams
  -1 siblings, 0 replies; 142+ messages in thread
From: Dan Williams @ 2017-08-22 17:27 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, linux-fsdevel, linux-nvdimm,
	Andy Lutomirski, linux-ext4, linux-xfs, Christoph Hellwig,
	Dan Williams, Boaz Harrosh

On Mon, Aug 21, 2017 at 2:57 PM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
>> Pretty crude for now...
>>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>
> One other thing that should probably be wired up before this is all said and
> done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
> flag it ends up as ??:
>
> 7f44e6cbd000-7f44e6dbd000 rw-s 00000000 103:00 12              /root/dax/data
> Size:               1024 kB
> Rss:                   0 kB
> Pss:                   0 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         0 kB
> Referenced:            0 kB
> Anonymous:             0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr sh mr mw me ms ?? mm hg
>
> The quick one-liner at the end of this patch changes that to:
>
> 7fe30e87f000-7fe30e97f000 rw-s 00000000 103:00 12               /root/dax/data
> Size:               1024 kB
> Rss:                   0 kB
> Pss:                   0 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         0 kB
> Referenced:            0 kB
> Anonymous:             0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr sh mr mw me ms sf mm hg
>
> I think that of software can rely on this flag for userspace flushing without
> worrying about any new TOCTOU races because there isn't a way to unset the
> VM_SYNC flag once it is set - it should be valid as long as the mmap() remains
> open and the mmap'd address is valid.
>
> --- 8< ---
>  fs/proc/task_mmu.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b836fd6..a2a82ed 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -650,6 +650,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>                 [ilog2(VM_ACCOUNT)]     = "ac",
>                 [ilog2(VM_NORESERVE)]   = "nr",
>                 [ilog2(VM_HUGETLB)]     = "ht",
> +               [ilog2(VM_SYNC)]        = "sf",
>                 [ilog2(VM_ARCH_1)]      = "ar",
>                 [ilog2(VM_DONTDUMP)]    = "dd",
>  #ifdef CONFIG_MEM_SOFT_DIRTY

So, I'm not sure I agree with this. I'm explicitly *not* advertising
MAP_DIRECT in ->vm_flags in my patches because we've seen applications
try to use smaps as an API rather than a debug tool. The toctou race
is fundamentally unsolvable unless you trust the agent that setup the
mapping will not tear it down while you've observed the sync flag.
Otherwise, if you do trust that the mapping will not be torn down then
userspace can already just trust itself and not rely on the kernel to
communicate the flag state.

I'm not against adding it, but the reasoning should be for debug and
not an api guarantee that applications will rely on, and unfortunately
I think we've seen that applications will rely on smaps no matter how
we document it.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-22 17:27       ` Dan Williams
  0 siblings, 0 replies; 142+ messages in thread
From: Dan Williams @ 2017-08-22 17:27 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, linux-fsdevel, linux-nvdimm,
	Andy Lutomirski, linux-ext4, linux-xfs, Christoph Hellwig,
	Dan Williams, Boaz Harrosh

On Mon, Aug 21, 2017 at 2:57 PM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
>> Pretty crude for now...
>>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>
> One other thing that should probably be wired up before this is all said and
> done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
> flag it ends up as ??:
>
> 7f44e6cbd000-7f44e6dbd000 rw-s 00000000 103:00 12              /root/dax/data
> Size:               1024 kB
> Rss:                   0 kB
> Pss:                   0 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         0 kB
> Referenced:            0 kB
> Anonymous:             0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr sh mr mw me ms ?? mm hg
>
> The quick one-liner at the end of this patch changes that to:
>
> 7fe30e87f000-7fe30e97f000 rw-s 00000000 103:00 12               /root/dax/data
> Size:               1024 kB
> Rss:                   0 kB
> Pss:                   0 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         0 kB
> Referenced:            0 kB
> Anonymous:             0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr sh mr mw me ms sf mm hg
>
> I think that of software can rely on this flag for userspace flushing without
> worrying about any new TOCTOU races because there isn't a way to unset the
> VM_SYNC flag once it is set - it should be valid as long as the mmap() remains
> open and the mmap'd address is valid.
>
> --- 8< ---
>  fs/proc/task_mmu.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b836fd6..a2a82ed 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -650,6 +650,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>                 [ilog2(VM_ACCOUNT)]     = "ac",
>                 [ilog2(VM_NORESERVE)]   = "nr",
>                 [ilog2(VM_HUGETLB)]     = "ht",
> +               [ilog2(VM_SYNC)]        = "sf",
>                 [ilog2(VM_ARCH_1)]      = "ar",
>                 [ilog2(VM_DONTDUMP)]    = "dd",
>  #ifdef CONFIG_MEM_SOFT_DIRTY

So, I'm not sure I agree with this. I'm explicitly *not* advertising
MAP_DIRECT in ->vm_flags in my patches because we've seen applications
try to use smaps as an API rather than a debug tool. The toctou race
is fundamentally unsolvable unless you trust the agent that setup the
mapping will not tear it down while you've observed the sync flag.
Otherwise, if you do trust that the mapping will not be torn down then
userspace can already just trust itself and not rely on the kernel to
communicate the flag state.

I'm not against adding it, but the reasoning should be for debug and
not an api guarantee that applications will rely on, and unfortunately
I think we've seen that applications will rely on smaps no matter how
we document it.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-22 17:27       ` Dan Williams
  0 siblings, 0 replies; 142+ messages in thread
From: Dan Williams @ 2017-08-22 17:27 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, linux-fsdevel,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Andy Lutomirski, linux-ext4,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	Dan Williams, Boaz Harrosh

On Mon, Aug 21, 2017 at 2:57 PM, Ross Zwisler
<ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
>> Pretty crude for now...
>>
>> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
>
> One other thing that should probably be wired up before this is all said and
> done is the VmFlag string in /proc/<pid>/smaps.  Right now when we set this
> flag it ends up as ??:
>
> 7f44e6cbd000-7f44e6dbd000 rw-s 00000000 103:00 12              /root/dax/data
> Size:               1024 kB
> Rss:                   0 kB
> Pss:                   0 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         0 kB
> Referenced:            0 kB
> Anonymous:             0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr sh mr mw me ms ?? mm hg
>
> The quick one-liner at the end of this patch changes that to:
>
> 7fe30e87f000-7fe30e97f000 rw-s 00000000 103:00 12               /root/dax/data
> Size:               1024 kB
> Rss:                   0 kB
> Pss:                   0 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         0 kB
> Referenced:            0 kB
> Anonymous:             0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr sh mr mw me ms sf mm hg
>
> I think that of software can rely on this flag for userspace flushing without
> worrying about any new TOCTOU races because there isn't a way to unset the
> VM_SYNC flag once it is set - it should be valid as long as the mmap() remains
> open and the mmap'd address is valid.
>
> --- 8< ---
>  fs/proc/task_mmu.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b836fd6..a2a82ed 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -650,6 +650,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>                 [ilog2(VM_ACCOUNT)]     = "ac",
>                 [ilog2(VM_NORESERVE)]   = "nr",
>                 [ilog2(VM_HUGETLB)]     = "ht",
> +               [ilog2(VM_SYNC)]        = "sf",
>                 [ilog2(VM_ARCH_1)]      = "ar",
>                 [ilog2(VM_DONTDUMP)]    = "dd",
>  #ifdef CONFIG_MEM_SOFT_DIRTY

So, I'm not sure I agree with this. I'm explicitly *not* advertising
MAP_DIRECT in ->vm_flags in my patches because we've seen applications
try to use smaps as an API rather than a debug tool. The toctou race
is fundamentally unsolvable unless you trust the agent that setup the
mapping will not tear it down while you've observed the sync flag.
Otherwise, if you do trust that the mapping will not be torn down then
userspace can already just trust itself and not rely on the kernel to
communicate the flag state.

I'm not against adding it, but the reasoning should be for debug and
not an api guarantee that applications will rely on, and unfortunately
I think we've seen that applications will rely on smaps no matter how
we document it.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 03/13] dax: Factor out getting of pfn out of iomap
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-23 18:30     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 03/13] dax: Factor out getting of pfn out of iomap
@ 2017-08-23 18:30     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-23 18:30     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault()
@ 2017-08-23 18:30     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-23 18:31     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-23 18:31     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test
@ 2017-08-23 18:31     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-23 18:31     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-23 18:31     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite
@ 2017-08-23 18:31     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-23 18:32     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite
@ 2017-08-23 18:32     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-23 18:32     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 08/13] dax: Fix comment describing dax_iomap_fault()
@ 2017-08-23 18:32     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-23 18:34     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

> @@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
>   * @vmf: The description of the fault
>   * @pe_size: Size of the page to fault in
>   * @ops: Iomap ops passed from the file system
> + * @pfnp: PFN to insert for synchronous faults if fsync is required
>   *
>   * When a page fault occurs, filesystems may call this helper in
>   * their fault handler for DAX files. dax_iomap_fault() assumes the caller
> @@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
>   * successfully.
>   */
>  int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
> -		    const struct iomap_ops *ops)
> +		    const struct iomap_ops *ops, pfn_t *pfnp)

Please keep the iomap_ops argument the last one for the exported
function (and probably all others for consistency).

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-23 18:34     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

> @@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
>   * @vmf: The description of the fault
>   * @pe_size: Size of the page to fault in
>   * @ops: Iomap ops passed from the file system
> + * @pfnp: PFN to insert for synchronous faults if fsync is required
>   *
>   * When a page fault occurs, filesystems may call this helper in
>   * their fault handler for DAX files. dax_iomap_fault() assumes the caller
> @@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
>   * successfully.
>   */
>  int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
> -		    const struct iomap_ops *ops)
> +		    const struct iomap_ops *ops, pfn_t *pfnp)

Please keep the iomap_ops argument the last one for the exported
function (and probably all others for consistency).

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-23 18:34     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Boaz Harrosh

> @@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
>   * @vmf: The description of the fault
>   * @pe_size: Size of the page to fault in
>   * @ops: Iomap ops passed from the file system
> + * @pfnp: PFN to insert for synchronous faults if fsync is required
>   *
>   * When a page fault occurs, filesystems may call this helper in
>   * their fault handler for DAX files. dax_iomap_fault() assumes the caller
> @@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
>   * successfully.
>   */
>  int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
> -		    const struct iomap_ops *ops)
> +		    const struct iomap_ops *ops, pfn_t *pfnp)

Please keep the iomap_ops argument the last one for the exported
function (and probably all others for consistency).

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-23 18:37     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, linux-ext4, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, Boaz Harrosh

> +	pfn_t pfn;
>  
>  	if (write) {
>  		sb_start_pagefault(sb);
> @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}
>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);

Maybe split the error handling refactor into a simple prep patch to
make this one more readable?

> +		/* Write fault but PFN mapped only RO? */
> +		if (result & VM_FAULT_NEEDDSYNC) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;
> +#endif
> +			else
> +				WARN_ON_ONCE(1);
> +			err = vfs_fsync_range(vmf->vma->vm_file, start,
> +					      start + len - 1, 1);
> +			if (err)
> +				result = VM_FAULT_SIGBUS;
> +			else
> +				result = dax_insert_pfn_mkwrite(vmf, pe_size,
> +								pfn);
> +		}

I think this needs to become a helper exported from the DAX code,
way too much magic inside the file system as-is.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-23 18:37     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Boaz Harrosh, linux-nvdimm,
	linux-xfs, Andy Lutomirski, linux-ext4

> +	pfn_t pfn;
>  
>  	if (write) {
>  		sb_start_pagefault(sb);
> @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}
>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);

Maybe split the error handling refactor into a simple prep patch to
make this one more readable?

> +		/* Write fault but PFN mapped only RO? */
> +		if (result & VM_FAULT_NEEDDSYNC) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;
> +#endif
> +			else
> +				WARN_ON_ONCE(1);
> +			err = vfs_fsync_range(vmf->vma->vm_file, start,
> +					      start + len - 1, 1);
> +			if (err)
> +				result = VM_FAULT_SIGBUS;
> +			else
> +				result = dax_insert_pfn_mkwrite(vmf, pe_size,
> +								pfn);
> +		}

I think this needs to become a helper exported from the DAX code,
way too much magic inside the file system as-is.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-17 16:08   ` Jan Kara
  (?)
@ 2017-08-23 18:43     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c                  | 2 ++
>  include/linux/mm.h              | 1 +
>  include/linux/mman.h            | 3 ++-
>  include/uapi/asm-generic/mman.h | 1 +
>  mm/mmap.c                       | 5 +++++
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f84bb29e941e..850037e140d7 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>  	} else {
>  		vma->vm_ops = &ext4_file_vm_ops;
> +		if (vma->vm_flags & VM_SYNC)
> +			return -EOPNOTSUPP;
>  	}

So each mmap instance would need to reject the flag explicitly?

Or do I misunderstand this VM_SYNC flag?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-23 18:43     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c                  | 2 ++
>  include/linux/mm.h              | 1 +
>  include/linux/mman.h            | 3 ++-
>  include/uapi/asm-generic/mman.h | 1 +
>  mm/mmap.c                       | 5 +++++
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f84bb29e941e..850037e140d7 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>  	} else {
>  		vma->vm_ops = &ext4_file_vm_ops;
> +		if (vma->vm_flags & VM_SYNC)
> +			return -EOPNOTSUPP;
>  	}

So each mmap instance would need to reject the flag explicitly?

Or do I misunderstand this VM_SYNC flag?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-23 18:43     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-23 18:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> Pretty crude for now...
> 
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> ---
>  fs/ext4/file.c                  | 2 ++
>  include/linux/mm.h              | 1 +
>  include/linux/mman.h            | 3 ++-
>  include/uapi/asm-generic/mman.h | 1 +
>  mm/mmap.c                       | 5 +++++
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f84bb29e941e..850037e140d7 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>  	} else {
>  		vma->vm_ops = &ext4_file_vm_ops;
> +		if (vma->vm_flags & VM_SYNC)
> +			return -EOPNOTSUPP;
>  	}

So each mmap instance would need to reject the flag explicitly?

Or do I misunderstand this VM_SYNC flag?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
  2017-08-23 18:43     ` Christoph Hellwig
@ 2017-08-24  7:16       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24  7:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Boaz Harrosh, Jan Kara, linux-nvdimm, linux-xfs, Andy Lutomirski,
	linux-fsdevel, linux-ext4

On Wed 23-08-17 11:43:49, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> > Pretty crude for now...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/file.c                  | 2 ++
> >  include/linux/mm.h              | 1 +
> >  include/linux/mman.h            | 3 ++-
> >  include/uapi/asm-generic/mman.h | 1 +
> >  mm/mmap.c                       | 5 +++++
> >  5 files changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index f84bb29e941e..850037e140d7 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
> >  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
> >  	} else {
> >  		vma->vm_ops = &ext4_file_vm_ops;
> > +		if (vma->vm_flags & VM_SYNC)
> > +			return -EOPNOTSUPP;
> >  	}
> 
> So each mmap instance would need to reject the flag explicitly?
> 
> Or do I misunderstand this VM_SYNC flag?

Yes, if this should be cleaned up, then each mmap instance not supporting
it would need to reject it. However Dan has in his version of mmap()
syscall a mask of supported flags so when I switch to that, it would be
just opt-in. Or I could just reject VM_SYNC for any !IS_DAX inode so then
only ext2 & xfs would need to reject it... But the biggest problem with
this patch is that we need to settle on a safe way of adding new mmap flag.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 10/13] mm: Wire up MAP_SYNC
@ 2017-08-24  7:16       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24  7:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Ross Zwisler, Dan Williams, Boaz Harrosh

On Wed 23-08-17 11:43:49, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:12PM +0200, Jan Kara wrote:
> > Pretty crude for now...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/file.c                  | 2 ++
> >  include/linux/mm.h              | 1 +
> >  include/linux/mman.h            | 3 ++-
> >  include/uapi/asm-generic/mman.h | 1 +
> >  mm/mmap.c                       | 5 +++++
> >  5 files changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index f84bb29e941e..850037e140d7 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -340,6 +340,8 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
> >  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
> >  	} else {
> >  		vma->vm_ops = &ext4_file_vm_ops;
> > +		if (vma->vm_flags & VM_SYNC)
> > +			return -EOPNOTSUPP;
> >  	}
> 
> So each mmap instance would need to reject the flag explicitly?
> 
> Or do I misunderstand this VM_SYNC flag?

Yes, if this should be cleaned up, then each mmap instance not supporting
it would need to reject it. However Dan has in his version of mmap()
syscall a mask of supported flags so when I switch to that, it would be
just opt-in. Or I could just reject VM_SYNC for any !IS_DAX inode so then
only ext2 & xfs would need to reject it... But the biggest problem with
this patch is that we need to settle on a safe way of adding new mmap flag.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-23 18:37     ` Christoph Hellwig
@ 2017-08-24  7:18       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24  7:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-ext4, Jan Kara, linux-nvdimm, linux-xfs, Andy Lutomirski,
	linux-fsdevel, Boaz Harrosh

On Wed 23-08-17 11:37:14, Christoph Hellwig wrote:
> > +	pfn_t pfn;
> >  
> >  	if (write) {
> >  		sb_start_pagefault(sb);
> > @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
> >  		down_read(&EXT4_I(inode)->i_mmap_sem);
> >  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
> >  					       EXT4_DATA_TRANS_BLOCKS(sb));
> > +		if (IS_ERR(handle)) {
> > +			up_read(&EXT4_I(inode)->i_mmap_sem);
> > +			sb_end_pagefault(sb);
> > +			return VM_FAULT_SIGBUS;
> > +		}
> >  	} else {
> >  		down_read(&EXT4_I(inode)->i_mmap_sem);
> >  	}
> > -	if (!IS_ERR(handle))
> > -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> > -	else
> > -		result = VM_FAULT_SIGBUS;
> > +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
> 
> Maybe split the error handling refactor into a simple prep patch to
> make this one more readable?

OK, will do.

> > +		/* Write fault but PFN mapped only RO? */
> > +		if (result & VM_FAULT_NEEDDSYNC) {
> > +			int err;
> > +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> > +			size_t len = 0;
> > +
> > +			if (pe_size == PE_SIZE_PTE)
> > +				len = PAGE_SIZE;
> > +#ifdef CONFIG_FS_DAX_PMD
> > +			else if (pe_size == PE_SIZE_PMD)
> > +				len = HPAGE_PMD_SIZE;
> > +#endif
> > +			else
> > +				WARN_ON_ONCE(1);
> > +			err = vfs_fsync_range(vmf->vma->vm_file, start,
> > +					      start + len - 1, 1);
> > +			if (err)
> > +				result = VM_FAULT_SIGBUS;
> > +			else
> > +				result = dax_insert_pfn_mkwrite(vmf, pe_size,
> > +								pfn);
> > +		}
> 
> I think this needs to become a helper exported from the DAX code,
> way too much magic inside the file system as-is.

Good point, there isn't anything fs specific in there.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-24  7:18       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24  7:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-ext4

On Wed 23-08-17 11:37:14, Christoph Hellwig wrote:
> > +	pfn_t pfn;
> >  
> >  	if (write) {
> >  		sb_start_pagefault(sb);
> > @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
> >  		down_read(&EXT4_I(inode)->i_mmap_sem);
> >  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
> >  					       EXT4_DATA_TRANS_BLOCKS(sb));
> > +		if (IS_ERR(handle)) {
> > +			up_read(&EXT4_I(inode)->i_mmap_sem);
> > +			sb_end_pagefault(sb);
> > +			return VM_FAULT_SIGBUS;
> > +		}
> >  	} else {
> >  		down_read(&EXT4_I(inode)->i_mmap_sem);
> >  	}
> > -	if (!IS_ERR(handle))
> > -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> > -	else
> > -		result = VM_FAULT_SIGBUS;
> > +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
> 
> Maybe split the error handling refactor into a simple prep patch to
> make this one more readable?

OK, will do.

> > +		/* Write fault but PFN mapped only RO? */
> > +		if (result & VM_FAULT_NEEDDSYNC) {
> > +			int err;
> > +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> > +			size_t len = 0;
> > +
> > +			if (pe_size == PE_SIZE_PTE)
> > +				len = PAGE_SIZE;
> > +#ifdef CONFIG_FS_DAX_PMD
> > +			else if (pe_size == PE_SIZE_PMD)
> > +				len = HPAGE_PMD_SIZE;
> > +#endif
> > +			else
> > +				WARN_ON_ONCE(1);
> > +			err = vfs_fsync_range(vmf->vma->vm_file, start,
> > +					      start + len - 1, 1);
> > +			if (err)
> > +				result = VM_FAULT_SIGBUS;
> > +			else
> > +				result = dax_insert_pfn_mkwrite(vmf, pe_size,
> > +								pfn);
> > +		}
> 
> I think this needs to become a helper exported from the DAX code,
> way too much magic inside the file system as-is.

Good point, there isn't anything fs specific in there.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
  2017-08-23 18:34     ` Christoph Hellwig
@ 2017-08-24  7:26       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24  7:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-ext4, Jan Kara, linux-nvdimm, linux-xfs, Andy Lutomirski,
	linux-fsdevel, Boaz Harrosh

On Wed 23-08-17 11:34:00, Christoph Hellwig wrote:
> > @@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
> >   * @vmf: The description of the fault
> >   * @pe_size: Size of the page to fault in
> >   * @ops: Iomap ops passed from the file system
> > + * @pfnp: PFN to insert for synchronous faults if fsync is required
> >   *
> >   * When a page fault occurs, filesystems may call this helper in
> >   * their fault handler for DAX files. dax_iomap_fault() assumes the caller
> > @@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
> >   * successfully.
> >   */
> >  int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
> > -		    const struct iomap_ops *ops)
> > +		    const struct iomap_ops *ops, pfn_t *pfnp)
> 
> Please keep the iomap_ops argument the last one for the exported
> function (and probably all others for consistency).

Hum, I wanted the output argument to be the last one. But I don't care
much. Swapped.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn
@ 2017-08-24  7:26       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24  7:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-ext4

On Wed 23-08-17 11:34:00, Christoph Hellwig wrote:
> > @@ -1416,6 +1416,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
> >   * @vmf: The description of the fault
> >   * @pe_size: Size of the page to fault in
> >   * @ops: Iomap ops passed from the file system
> > + * @pfnp: PFN to insert for synchronous faults if fsync is required
> >   *
> >   * When a page fault occurs, filesystems may call this helper in
> >   * their fault handler for DAX files. dax_iomap_fault() assumes the caller
> > @@ -1423,13 +1424,13 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
> >   * successfully.
> >   */
> >  int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
> > -		    const struct iomap_ops *ops)
> > +		    const struct iomap_ops *ops, pfn_t *pfnp)
> 
> Please keep the iomap_ops argument the last one for the exported
> function (and probably all others for consistency).

Hum, I wanted the output argument to be the last one. But I don't care
much. Swapped.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-24 12:27     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

Just curious:  how does IOMAP_F_NEEDDSYNC practically differ
from IOMAP_F_NEW?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-24 12:27     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

Just curious:  how does IOMAP_F_NEEDDSYNC practically differ
from IOMAP_F_NEW?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-17 16:08   ` Jan Kara
@ 2017-08-24 12:31     ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.

Why is this only wiered up for the huge_fault handler and not the
regular?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-24 12:31     ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.

Why is this only wiered up for the huge_fault handler and not the
regular?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-24 12:27     ` Christoph Hellwig
@ 2017-08-24 12:34       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24 12:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Boaz Harrosh, Jan Kara, linux-nvdimm, linux-xfs, Andy Lutomirski,
	linux-fsdevel, linux-ext4

On Thu 24-08-17 05:27:20, Christoph Hellwig wrote:
> Just curious:  how does IOMAP_F_NEEDDSYNC practically differ
> from IOMAP_F_NEW?

In a subtle but important way ;). The main difference is that if the extent
has been already allocated by previous write, but the changing transaction
is not yet committed, we will return IOMAP_F_NEEDDSYNC but not IOMAP_F_NEW.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-24 12:34       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24 12:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Ross Zwisler, Dan Williams, Boaz Harrosh

On Thu 24-08-17 05:27:20, Christoph Hellwig wrote:
> Just curious:  how does IOMAP_F_NEEDDSYNC practically differ
> from IOMAP_F_NEW?

In a subtle but important way ;). The main difference is that if the extent
has been already allocated by previous write, but the changing transaction
is not yet committed, we will return IOMAP_F_NEEDDSYNC but not IOMAP_F_NEW.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-24 12:31     ` Christoph Hellwig
  (?)
@ 2017-08-24 12:34       ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh, linux-nvdimm, linux-xfs,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 24, 2017 at 05:31:26AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> > dirty corresponding radix tree entry which is what we want - fsync(2)
> > will still provide data integrity guarantees for applications not using
> > userspace flushing. And applications using userspace flushing can avoid
> > calling fsync(2) and thus avoid the performance overhead.
> 
> Why is this only wiered up for the huge_fault handler and not the
> regular?

Ah, turns out ext4 implements ->fault in terms of ->huge_fault.

We'll really need to sort out this mess of fault handlers before
doing too much surgery here..
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-24 12:34       ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-nvdimm, Andy Lutomirski, linux-ext4,
	linux-xfs, Christoph Hellwig, Ross Zwisler, Dan Williams,
	Boaz Harrosh

On Thu, Aug 24, 2017 at 05:31:26AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> > dirty corresponding radix tree entry which is what we want - fsync(2)
> > will still provide data integrity guarantees for applications not using
> > userspace flushing. And applications using userspace flushing can avoid
> > calling fsync(2) and thus avoid the performance overhead.
> 
> Why is this only wiered up for the huge_fault handler and not the
> regular?

Ah, turns out ext4 implements ->fault in terms of ->huge_fault.

We'll really need to sort out this mess of fault handlers before
doing too much surgery here..

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-24 12:34       ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 12:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Boaz Harrosh,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 24, 2017 at 05:31:26AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> > dirty corresponding radix tree entry which is what we want - fsync(2)
> > will still provide data integrity guarantees for applications not using
> > userspace flushing. And applications using userspace flushing can avoid
> > calling fsync(2) and thus avoid the performance overhead.
> 
> Why is this only wiered up for the huge_fault handler and not the
> regular?

Ah, turns out ext4 implements ->fault in terms of ->huge_fault.

We'll really need to sort out this mess of fault handlers before
doing too much surgery here..

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
  2017-08-24 12:31     ` Christoph Hellwig
@ 2017-08-24 12:36       ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24 12:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Boaz Harrosh, Jan Kara, linux-nvdimm, linux-xfs, Andy Lutomirski,
	linux-fsdevel, linux-ext4

On Thu 24-08-17 05:31:26, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> > dirty corresponding radix tree entry which is what we want - fsync(2)
> > will still provide data integrity guarantees for applications not using
> > userspace flushing. And applications using userspace flushing can avoid
> > calling fsync(2) and thus avoid the performance overhead.
> 
> Why is this only wiered up for the huge_fault handler and not the
> regular?

We do handle both. Just ext4 naming is a bit confusing and ext4_dax_fault()
uses ext4_dax_huge_fault() for handling.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
@ 2017-08-24 12:36       ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24 12:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Ross Zwisler, Dan Williams, Boaz Harrosh

On Thu 24-08-17 05:31:26, Christoph Hellwig wrote:
> On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> > We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> > synchronous write fault when inode has some uncommitted metadata
> > changes. In the fault handler ext4_dax_fault() we then detect this case,
> > call vfs_fsync_range() to make sure all metadata is committed, and call
> > dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> > dirty corresponding radix tree entry which is what we want - fsync(2)
> > will still provide data integrity guarantees for applications not using
> > userspace flushing. And applications using userspace flushing can avoid
> > calling fsync(2) and thus avoid the performance overhead.
> 
> Why is this only wiered up for the huge_fault handler and not the
> regular?

We do handle both. Just ext4 naming is a bit confusing and ext4_dax_fault()
uses ext4_dax_huge_fault() for handling.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-24 12:34       ` Jan Kara
@ 2017-08-24 13:38         ` Christoph Hellwig
  -1 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 13:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-xfs, Boaz Harrosh, linux-nvdimm, Christoph Hellwig,
	Andy Lutomirski, linux-fsdevel, linux-ext4

On Thu, Aug 24, 2017 at 02:34:51PM +0200, Jan Kara wrote:
> In a subtle but important way ;). The main difference is that if the extent
> has been already allocated by previous write, but the changing transaction
> is not yet committed, we will return IOMAP_F_NEEDDSYNC but not IOMAP_F_NEW.

Ok.  How about a IOMAP_F_DIRTY flag and a better explanation?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-24 13:38         ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2017-08-24 13:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Ross Zwisler, Dan Williams, Boaz Harrosh

On Thu, Aug 24, 2017 at 02:34:51PM +0200, Jan Kara wrote:
> In a subtle but important way ;). The main difference is that if the extent
> has been already allocated by previous write, but the changing transaction
> is not yet committed, we will return IOMAP_F_NEEDDSYNC but not IOMAP_F_NEW.

Ok.  How about a IOMAP_F_DIRTY flag and a better explanation?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
  2017-08-24 13:38         ` Christoph Hellwig
@ 2017-08-24 16:45           ` Jan Kara
  -1 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24 16:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Boaz Harrosh, Jan Kara, linux-nvdimm, linux-xfs, Andy Lutomirski,
	linux-fsdevel, linux-ext4

On Thu 24-08-17 06:38:09, Christoph Hellwig wrote:
> On Thu, Aug 24, 2017 at 02:34:51PM +0200, Jan Kara wrote:
> > In a subtle but important way ;). The main difference is that if the extent
> > has been already allocated by previous write, but the changing transaction
> > is not yet committed, we will return IOMAP_F_NEEDDSYNC but not IOMAP_F_NEW.
> 
> Ok.  How about a IOMAP_F_DIRTY flag and a better explanation?

OK, will change it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
@ 2017-08-24 16:45           ` Jan Kara
  0 siblings, 0 replies; 142+ messages in thread
From: Jan Kara @ 2017-08-24 16:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, linux-nvdimm, Andy Lutomirski,
	linux-ext4, linux-xfs, Ross Zwisler, Dan Williams, Boaz Harrosh

On Thu 24-08-17 06:38:09, Christoph Hellwig wrote:
> On Thu, Aug 24, 2017 at 02:34:51PM +0200, Jan Kara wrote:
> > In a subtle but important way ;). The main difference is that if the extent
> > has been already allocated by previous write, but the changing transaction
> > is not yet committed, we will return IOMAP_F_NEEDDSYNC but not IOMAP_F_NEW.
> 
> Ok.  How about a IOMAP_F_DIRTY flag and a better explanation?

OK, will change it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 142+ messages in thread

end of thread, other threads:[~2017-08-24 16:45 UTC | newest]

Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-17 16:08 [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults Jan Kara
2017-08-17 16:08 ` Jan Kara
2017-08-17 16:08 ` Jan Kara
2017-08-17 16:08 ` Jan Kara
2017-08-17 16:08 ` [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08 ` [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08 ` [PATCH 03/13] dax: Factor out getting of pfn out of iomap Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:06   ` Ross Zwisler
2017-08-18 22:06     ` Ross Zwisler
2017-08-23 18:30   ` Christoph Hellwig
2017-08-23 18:30     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:08   ` Ross Zwisler
2017-08-18 22:08     ` Ross Zwisler
2017-08-23 18:30   ` Christoph Hellwig
2017-08-23 18:30     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:08   ` Ross Zwisler
2017-08-18 22:08     ` Ross Zwisler
2017-08-18 22:08     ` Ross Zwisler
2017-08-23 18:31   ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:10   ` Ross Zwisler
2017-08-18 22:10     ` Ross Zwisler
2017-08-18 22:10     ` Ross Zwisler
2017-08-23 18:31   ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() " Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:12   ` Ross Zwisler
2017-08-18 22:12     ` Ross Zwisler
2017-08-18 22:12     ` Ross Zwisler
2017-08-23 18:32   ` Christoph Hellwig
2017-08-23 18:32     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 08/13] dax: Fix comment describing dax_iomap_fault() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:12   ` Ross Zwisler
2017-08-18 22:12     ` Ross Zwisler
2017-08-23 18:32   ` Christoph Hellwig
2017-08-23 18:32     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 18:45   ` Ross Zwisler
2017-08-21 18:45     ` Ross Zwisler
2017-08-23 18:34   ` Christoph Hellwig
2017-08-23 18:34     ` Christoph Hellwig
2017-08-23 18:34     ` Christoph Hellwig
2017-08-24  7:26     ` Jan Kara
2017-08-24  7:26       ` Jan Kara
2017-08-17 16:08 ` [PATCH 10/13] mm: Wire up MAP_SYNC Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 21:37   ` Ross Zwisler
2017-08-21 21:37     ` Ross Zwisler
2017-08-22  9:36     ` Jan Kara
2017-08-22  9:36       ` Jan Kara
2017-08-21 21:57   ` Ross Zwisler
2017-08-21 21:57     ` Ross Zwisler
2017-08-21 21:57     ` Ross Zwisler
2017-08-22  9:34     ` Jan Kara
2017-08-22  9:34       ` Jan Kara
2017-08-22 17:27     ` Dan Williams
2017-08-22 17:27       ` Dan Williams
2017-08-22 17:27       ` Dan Williams
2017-08-23 18:43   ` Christoph Hellwig
2017-08-23 18:43     ` Christoph Hellwig
2017-08-23 18:43     ` Christoph Hellwig
2017-08-24  7:16     ` Jan Kara
2017-08-24  7:16       ` Jan Kara
2017-08-17 16:08 ` [PATCH 11/13] dax, iomap: Add support for synchronous faults Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 18:58   ` Ross Zwisler
2017-08-21 18:58     ` Ross Zwisler
2017-08-22  9:46     ` Jan Kara
2017-08-22  9:46       ` Jan Kara
2017-08-21 21:09   ` Ross Zwisler
2017-08-21 21:09     ` Ross Zwisler
2017-08-22 10:08     ` Jan Kara
2017-08-22 10:08       ` Jan Kara
2017-08-22 10:08       ` Jan Kara
2017-08-24 12:27   ` Christoph Hellwig
2017-08-24 12:27     ` Christoph Hellwig
2017-08-24 12:34     ` Jan Kara
2017-08-24 12:34       ` Jan Kara
2017-08-24 13:38       ` Christoph Hellwig
2017-08-24 13:38         ` Christoph Hellwig
2017-08-24 16:45         ` Jan Kara
2017-08-24 16:45           ` Jan Kara
2017-08-17 16:08 ` [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 19:01   ` Ross Zwisler
2017-08-21 19:01     ` Ross Zwisler
2017-08-17 16:08 ` [PATCH 13/13] ext4: Support for synchronous DAX faults Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 19:19   ` Ross Zwisler
2017-08-21 19:19     ` Ross Zwisler
2017-08-22 10:18     ` Jan Kara
2017-08-22 10:18       ` Jan Kara
2017-08-22 10:18       ` Jan Kara
2017-08-23 18:37   ` Christoph Hellwig
2017-08-23 18:37     ` Christoph Hellwig
2017-08-24  7:18     ` Jan Kara
2017-08-24  7:18       ` Jan Kara
2017-08-24 12:31   ` Christoph Hellwig
2017-08-24 12:31     ` Christoph Hellwig
2017-08-24 12:34     ` Christoph Hellwig
2017-08-24 12:34       ` Christoph Hellwig
2017-08-24 12:34       ` Christoph Hellwig
2017-08-24 12:36     ` Jan Kara
2017-08-24 12:36       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.