All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-nvdimm@lists.01.org, Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [RFC PATCH 0/7] dax, ext4: Synchronous page faults
Date: Tue, 1 Aug 2017 03:52:19 -0700	[thread overview]
Message-ID: <20170801105219.GA6742@infradead.org> (raw)
In-Reply-To: <20170727131245.28279-1-jack@suse.cz>

On Thu, Jul 27, 2017 at 03:12:38PM +0200, Jan Kara wrote:
> So the functionality this patches implement: We have an inode flag (currently
> I abuse S_SYNC inode flag for this and IMHO it kind of makes sense but if
> people hate that I'm certainly open to using new flag in the final
> implementation) that marks inode as requiring synchronous page faults.
> The guarantee provided by this flag on inode is: While a block is writeably
> mapped into page tables, it is guaranteed to be visible in the file at that
> offset also after a crash.

I think the right interface for page fault behavior is a mmap
flag, MAP_SYNC or similar, which will be optional and a failure of
a MAP_SYNC mmap will indicated that this behavior can't be provided
for the given file descriptor.

> >From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
> same and it should be even possible for filesystem to implement safe remapping
> of a file offset to a different block (i.e. break reflink, do defrag, or
> similar stuff) like:

It should.  But what I'm worried about for both ext4 and XFS is the
worst case behavior that the page faul path can now hit, e.g. flushing
a potentially full log.  Do you have any numbers of how long your
ext4 page faults take with this in the worst case?

> There are couple of open questions with this implementation:
> 
> 1) Is it worth the hassle?

For that I'd really like to see performance numbers.  And compared to
the immutable nightmare that Dan proposed this looks orders of magnitude
better.

> 2) Is S_SYNC good flag to use or should we use a new inode flag?

I think the right interface is mmap as said above.  But even if not
we should not simply reuse existing flags with a well defined (although
not particular useful) behavior.

> 3) VM_FAULT_RO and especially passing of resulting 'pfn' from
>    dax_iomap_fault() through filesystem fault handler to dax_pfn_mkwrite() in
>    vmf->orig_pte is a bit of a hack. So far I'm not sure how to refactor
>    things to make this cleaner.

I'll take a look.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [RFC PATCH 0/7] dax, ext4: Synchronous page faults
Date: Tue, 1 Aug 2017 03:52:19 -0700	[thread overview]
Message-ID: <20170801105219.GA6742@infradead.org> (raw)
In-Reply-To: <20170727131245.28279-1-jack@suse.cz>

On Thu, Jul 27, 2017 at 03:12:38PM +0200, Jan Kara wrote:
> So the functionality this patches implement: We have an inode flag (currently
> I abuse S_SYNC inode flag for this and IMHO it kind of makes sense but if
> people hate that I'm certainly open to using new flag in the final
> implementation) that marks inode as requiring synchronous page faults.
> The guarantee provided by this flag on inode is: While a block is writeably
> mapped into page tables, it is guaranteed to be visible in the file at that
> offset also after a crash.

I think the right interface for page fault behavior is a mmap
flag, MAP_SYNC or similar, which will be optional and a failure of
a MAP_SYNC mmap will indicated that this behavior can't be provided
for the given file descriptor.

> >From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
> same and it should be even possible for filesystem to implement safe remapping
> of a file offset to a different block (i.e. break reflink, do defrag, or
> similar stuff) like:

It should.  But what I'm worried about for both ext4 and XFS is the
worst case behavior that the page faul path can now hit, e.g. flushing
a potentially full log.  Do you have any numbers of how long your
ext4 page faults take with this in the worst case?

> There are couple of open questions with this implementation:
> 
> 1) Is it worth the hassle?

For that I'd really like to see performance numbers.  And compared to
the immutable nightmare that Dan proposed this looks orders of magnitude
better.

> 2) Is S_SYNC good flag to use or should we use a new inode flag?

I think the right interface is mmap as said above.  But even if not
we should not simply reuse existing flags with a well defined (although
not particular useful) behavior.

> 3) VM_FAULT_RO and especially passing of resulting 'pfn' from
>    dax_iomap_fault() through filesystem fault handler to dax_pfn_mkwrite() in
>    vmf->orig_pte is a bit of a hack. So far I'm not sure how to refactor
>    things to make this cleaner.

I'll take a look.

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
To: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [RFC PATCH 0/7] dax, ext4: Synchronous page faults
Date: Tue, 1 Aug 2017 03:52:19 -0700	[thread overview]
Message-ID: <20170801105219.GA6742@infradead.org> (raw)
In-Reply-To: <20170727131245.28279-1-jack-AlSwsSmVLrQ@public.gmane.org>

On Thu, Jul 27, 2017 at 03:12:38PM +0200, Jan Kara wrote:
> So the functionality this patches implement: We have an inode flag (currently
> I abuse S_SYNC inode flag for this and IMHO it kind of makes sense but if
> people hate that I'm certainly open to using new flag in the final
> implementation) that marks inode as requiring synchronous page faults.
> The guarantee provided by this flag on inode is: While a block is writeably
> mapped into page tables, it is guaranteed to be visible in the file at that
> offset also after a crash.

I think the right interface for page fault behavior is a mmap
flag, MAP_SYNC or similar, which will be optional and a failure of
a MAP_SYNC mmap will indicated that this behavior can't be provided
for the given file descriptor.

> >From my (fairly limited) knowledge of XFS it seems XFS should be able to do the
> same and it should be even possible for filesystem to implement safe remapping
> of a file offset to a different block (i.e. break reflink, do defrag, or
> similar stuff) like:

It should.  But what I'm worried about for both ext4 and XFS is the
worst case behavior that the page faul path can now hit, e.g. flushing
a potentially full log.  Do you have any numbers of how long your
ext4 page faults take with this in the worst case?

> There are couple of open questions with this implementation:
> 
> 1) Is it worth the hassle?

For that I'd really like to see performance numbers.  And compared to
the immutable nightmare that Dan proposed this looks orders of magnitude
better.

> 2) Is S_SYNC good flag to use or should we use a new inode flag?

I think the right interface is mmap as said above.  But even if not
we should not simply reuse existing flags with a well defined (although
not particular useful) behavior.

> 3) VM_FAULT_RO and especially passing of resulting 'pfn' from
>    dax_iomap_fault() through filesystem fault handler to dax_pfn_mkwrite() in
>    vmf->orig_pte is a bit of a hack. So far I'm not sure how to refactor
>    things to make this cleaner.

I'll take a look.

  parent reply	other threads:[~2017-08-01 10:50 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27 13:12 [RFC PATCH 0/7] dax, ext4: Synchronous page faults Jan Kara
2017-07-27 13:12 ` Jan Kara
2017-07-27 13:12 ` Jan Kara
2017-07-27 13:12 ` Jan Kara
2017-07-27 13:12 ` [PATCH 1/7] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 21:57   ` Ross Zwisler
2017-07-27 21:57     ` Ross Zwisler
2017-07-27 21:57     ` Ross Zwisler
2017-08-01 10:52   ` Christoph Hellwig
2017-08-01 10:52     ` Christoph Hellwig
2017-08-01 10:52     ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 2/7] dax: Add sync argument to dax_iomap_fault() Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:06   ` Ross Zwisler
2017-07-27 22:06     ` Ross Zwisler
2017-07-27 22:06     ` Ross Zwisler
2017-07-28  9:40     ` Jan Kara
2017-07-28  9:40       ` Jan Kara
2017-07-27 13:12 ` [PATCH 3/7] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:09   ` Ross Zwisler
2017-07-27 22:09     ` Ross Zwisler
2017-07-27 22:09     ` Ross Zwisler
2017-08-01 10:54   ` Christoph Hellwig
2017-08-01 10:54     ` Christoph Hellwig
2017-08-01 10:54     ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 4/7] dax: Make dax_insert_mapping() return VM_FAULT_ state Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:22   ` Ross Zwisler
2017-07-27 22:22     ` Ross Zwisler
2017-07-28  9:43     ` Jan Kara
2017-07-28  9:43       ` Jan Kara
2017-07-27 13:12 ` [PATCH 5/7] dax, iomap: Add support for synchronous faults Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:42   ` Ross Zwisler
2017-07-27 22:42     ` Ross Zwisler
2017-08-01 10:56     ` Christoph Hellwig
2017-08-01 10:56       ` Christoph Hellwig
2017-08-01 10:56       ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 6/7] dax: Implement dax_pfn_mkwrite() Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:53   ` Ross Zwisler
2017-07-27 22:53     ` Ross Zwisler
2017-07-27 22:53     ` Ross Zwisler
2017-07-27 23:04     ` Ross Zwisler
2017-07-27 23:04       ` Ross Zwisler
2017-07-28 10:37     ` Jan Kara
2017-07-28 10:37       ` Jan Kara
2017-07-28 10:37       ` Jan Kara
2017-07-27 13:12 ` [PATCH 7/7] ext4: Support for synchronous DAX faults Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:57   ` Ross Zwisler
2017-07-27 22:57     ` Ross Zwisler
2017-07-27 14:09 ` [RFC PATCH 0/7] dax, ext4: Synchronous page faults Jeff Moyer
2017-07-27 14:09   ` Jeff Moyer
2017-07-27 14:09   ` Jeff Moyer
2017-07-27 21:57   ` Ross Zwisler
2017-07-27 21:57     ` Ross Zwisler
2017-07-28  2:05     ` Andy Lutomirski
2017-07-28  2:05       ` Andy Lutomirski
2017-07-28  9:38       ` Jan Kara
2017-07-28  9:38         ` Jan Kara
2017-07-28  9:38         ` Jan Kara
2017-08-01 11:02         ` Christoph Hellwig
2017-08-01 11:02           ` Christoph Hellwig
2017-08-01 11:26           ` Jan Kara
2017-08-01 11:26             ` Jan Kara
2017-08-01 11:26             ` Jan Kara
2017-08-08  0:24             ` Dan Williams
2017-08-08  0:24               ` Dan Williams
2017-08-11 10:03               ` Christoph Hellwig
2017-08-11 10:03                 ` Christoph Hellwig
2017-08-11 10:03                 ` Christoph Hellwig
2017-08-13  2:44                 ` Dan Williams
2017-08-13  2:44                   ` Dan Williams
2017-08-13  2:44                   ` Dan Williams
2017-08-13  9:25                   ` Christoph Hellwig
2017-08-13  9:25                     ` Christoph Hellwig
2017-08-13 17:08                     ` Dan Williams
2017-08-13 17:08                       ` Dan Williams
2017-08-14  8:30                     ` Jan Kara
2017-08-14  8:30                       ` Jan Kara
2017-08-14 14:04                     ` Boaz Harrosh
2017-08-14 14:04                       ` Boaz Harrosh
2017-08-14 16:03                       ` Dan Williams
2017-08-14 16:03                         ` Dan Williams
2017-08-15  9:06                         ` Boaz Harrosh
2017-08-15  9:06                           ` Boaz Harrosh
2017-08-15  9:44                           ` Boaz Harrosh
2017-08-15  9:44                             ` Boaz Harrosh
2017-08-21 19:57                         ` Ross Zwisler
2017-08-21 19:57                           ` Ross Zwisler
2017-08-21 19:57                           ` Ross Zwisler
2017-08-17 16:08                       ` Jan Kara
2017-08-17 16:08                         ` Jan Kara
2017-08-01 10:52 ` Christoph Hellwig [this message]
2017-08-01 10:52   ` Christoph Hellwig
2017-08-01 10:52   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801105219.GA6742@infradead.org \
    --to=hch@infradead.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.