All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
	Boaz Harrosh <boazh@netapp.com>,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org,
	Andy Lutomirski <luto@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
Date: Mon, 21 Aug 2017 13:19:48 -0600	[thread overview]
Message-ID: <20170821191948.GD26220@linux.intel.com> (raw)
In-Reply-To: <20170817160815.30466-14-jack@suse.cz>

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also

Need to fix up the above line a little -
s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
make it writeable.

> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
>  fs/ext4/inode.c      |  4 ++++
>  fs/jbd2/journal.c    | 17 +++++++++++++++++
>  include/linux/jbd2.h |  1 +
>  4 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 850037e140d7..3765c4ed1368 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	struct super_block *sb = inode->i_sb;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> +	pfn_t pfn;
>  
>  	if (write) {
>  		sb_start_pagefault(sb);
> @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}
>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
>  	if (write) {
> -		if (!IS_ERR(handle))
> -			ext4_journal_stop(handle);
> +		ext4_journal_stop(handle);
> +		/* Write fault but PFN mapped only RO? */

The above comment is out of date.

> +		if (result & VM_FAULT_NEEDDSYNC) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;

In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
are always the same (from include/linux/huge_mm.h, the only defintion of
HPAGE_PMD_SIZE):

#define HPAGE_PMD_SHIFT PMD_SHIFT
#define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)

and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
well.  I don't understand why we have both?

In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
manage to be different, I think we'd want PMD_SIZE anyway.

With those nits and an updated changelog:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	Andy Lutomirski <luto@kernel.org>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Boaz Harrosh <boazh@netapp.com>
Subject: Re: [PATCH 13/13] ext4: Support for synchronous DAX faults
Date: Mon, 21 Aug 2017 13:19:48 -0600	[thread overview]
Message-ID: <20170821191948.GD26220@linux.intel.com> (raw)
In-Reply-To: <20170817160815.30466-14-jack@suse.cz>

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also

Need to fix up the above line a little -
s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
make it writeable.

> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
>  fs/ext4/inode.c      |  4 ++++
>  fs/jbd2/journal.c    | 17 +++++++++++++++++
>  include/linux/jbd2.h |  1 +
>  4 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 850037e140d7..3765c4ed1368 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	struct super_block *sb = inode->i_sb;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> +	pfn_t pfn;
>  
>  	if (write) {
>  		sb_start_pagefault(sb);
> @@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}
>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
>  	if (write) {
> -		if (!IS_ERR(handle))
> -			ext4_journal_stop(handle);
> +		ext4_journal_stop(handle);
> +		/* Write fault but PFN mapped only RO? */

The above comment is out of date.

> +		if (result & VM_FAULT_NEEDDSYNC) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;

In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
are always the same (from include/linux/huge_mm.h, the only defintion of
HPAGE_PMD_SIZE):

#define HPAGE_PMD_SHIFT PMD_SHIFT
#define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)

and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
well.  I don't understand why we have both?

In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
manage to be different, I think we'd want PMD_SIZE anyway.

With those nits and an updated changelog:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

  reply	other threads:[~2017-08-21 19:17 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17 16:08 [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults Jan Kara
2017-08-17 16:08 ` Jan Kara
2017-08-17 16:08 ` Jan Kara
2017-08-17 16:08 ` Jan Kara
2017-08-17 16:08 ` [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08 ` [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08 ` [PATCH 03/13] dax: Factor out getting of pfn out of iomap Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:06   ` Ross Zwisler
2017-08-18 22:06     ` Ross Zwisler
2017-08-23 18:30   ` Christoph Hellwig
2017-08-23 18:30     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:08   ` Ross Zwisler
2017-08-18 22:08     ` Ross Zwisler
2017-08-23 18:30   ` Christoph Hellwig
2017-08-23 18:30     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:08   ` Ross Zwisler
2017-08-18 22:08     ` Ross Zwisler
2017-08-18 22:08     ` Ross Zwisler
2017-08-23 18:31   ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:10   ` Ross Zwisler
2017-08-18 22:10     ` Ross Zwisler
2017-08-18 22:10     ` Ross Zwisler
2017-08-23 18:31   ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-23 18:31     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() " Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:12   ` Ross Zwisler
2017-08-18 22:12     ` Ross Zwisler
2017-08-18 22:12     ` Ross Zwisler
2017-08-23 18:32   ` Christoph Hellwig
2017-08-23 18:32     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 08/13] dax: Fix comment describing dax_iomap_fault() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-18 22:12   ` Ross Zwisler
2017-08-18 22:12     ` Ross Zwisler
2017-08-23 18:32   ` Christoph Hellwig
2017-08-23 18:32     ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 18:45   ` Ross Zwisler
2017-08-21 18:45     ` Ross Zwisler
2017-08-23 18:34   ` Christoph Hellwig
2017-08-23 18:34     ` Christoph Hellwig
2017-08-23 18:34     ` Christoph Hellwig
2017-08-24  7:26     ` Jan Kara
2017-08-24  7:26       ` Jan Kara
2017-08-17 16:08 ` [PATCH 10/13] mm: Wire up MAP_SYNC Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 21:37   ` Ross Zwisler
2017-08-21 21:37     ` Ross Zwisler
2017-08-22  9:36     ` Jan Kara
2017-08-22  9:36       ` Jan Kara
2017-08-21 21:57   ` Ross Zwisler
2017-08-21 21:57     ` Ross Zwisler
2017-08-21 21:57     ` Ross Zwisler
2017-08-22  9:34     ` Jan Kara
2017-08-22  9:34       ` Jan Kara
2017-08-22 17:27     ` Dan Williams
2017-08-22 17:27       ` Dan Williams
2017-08-22 17:27       ` Dan Williams
2017-08-23 18:43   ` Christoph Hellwig
2017-08-23 18:43     ` Christoph Hellwig
2017-08-23 18:43     ` Christoph Hellwig
2017-08-24  7:16     ` Jan Kara
2017-08-24  7:16       ` Jan Kara
2017-08-17 16:08 ` [PATCH 11/13] dax, iomap: Add support for synchronous faults Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 18:58   ` Ross Zwisler
2017-08-21 18:58     ` Ross Zwisler
2017-08-22  9:46     ` Jan Kara
2017-08-22  9:46       ` Jan Kara
2017-08-21 21:09   ` Ross Zwisler
2017-08-21 21:09     ` Ross Zwisler
2017-08-22 10:08     ` Jan Kara
2017-08-22 10:08       ` Jan Kara
2017-08-22 10:08       ` Jan Kara
2017-08-24 12:27   ` Christoph Hellwig
2017-08-24 12:27     ` Christoph Hellwig
2017-08-24 12:34     ` Jan Kara
2017-08-24 12:34       ` Jan Kara
2017-08-24 13:38       ` Christoph Hellwig
2017-08-24 13:38         ` Christoph Hellwig
2017-08-24 16:45         ` Jan Kara
2017-08-24 16:45           ` Jan Kara
2017-08-17 16:08 ` [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite() Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 19:01   ` Ross Zwisler
2017-08-21 19:01     ` Ross Zwisler
2017-08-17 16:08 ` [PATCH 13/13] ext4: Support for synchronous DAX faults Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-17 16:08   ` Jan Kara
2017-08-21 19:19   ` Ross Zwisler [this message]
2017-08-21 19:19     ` Ross Zwisler
2017-08-22 10:18     ` Jan Kara
2017-08-22 10:18       ` Jan Kara
2017-08-22 10:18       ` Jan Kara
2017-08-23 18:37   ` Christoph Hellwig
2017-08-23 18:37     ` Christoph Hellwig
2017-08-24  7:18     ` Jan Kara
2017-08-24  7:18       ` Jan Kara
2017-08-24 12:31   ` Christoph Hellwig
2017-08-24 12:31     ` Christoph Hellwig
2017-08-24 12:34     ` Christoph Hellwig
2017-08-24 12:34       ` Christoph Hellwig
2017-08-24 12:34       ` Christoph Hellwig
2017-08-24 12:36     ` Jan Kara
2017-08-24 12:36       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170821191948.GD26220@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=boazh@netapp.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.