All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-nvdimm@lists.01.org, Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 7/7] ext4: Support for synchronous DAX faults
Date: Thu, 27 Jul 2017 16:57:39 -0600	[thread overview]
Message-ID: <20170727225739.GH22000@linux.intel.com> (raw)
In-Reply-To: <20170727131245.28279-8-jack@suse.cz>

On Thu, Jul 27, 2017 at 03:12:45PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c       | 35 +++++++++++++++++++++++++++++------
>  fs/ext4/inode.c      |  4 ++++
>  fs/jbd2/journal.c    | 16 ++++++++++++++++
>  include/linux/jbd2.h |  1 +
>  4 files changed, 50 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index d401403e5095..b221d0b546b0 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -287,16 +287,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}

Yay, this error handling seems cleaner to me anyway.

>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, false, &ext4_iomap_ops);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, IS_SYNC(inode), &ext4_iomap_ops);
>  	if (write) {
> -		if (!IS_ERR(handle))
> -			ext4_journal_stop(handle);
> +		ext4_journal_stop(handle);
> +		/* Write fault but PFN mapped only RO? */
> +		if (result & VM_FAULT_RO) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;
> +			else
> +				WARN_ON_ONCE(1);

I think this "else WARN_ON_ONCE(1);" should live outside of the
CONFIG_FS_DAX_PMD so that we get warned in all configs if we get an
unsupported pe_size.

> +#endif
> +			WARN_ON_ONCE(!IS_SYNC(inode));
> +			err = vfs_fsync_range(vmf->vma->vm_file, start,
> +					      start + len - 1, 1);
> +			if (err)
> +				result = VM_FAULT_SIGBUS;
> +			else
> +				result = dax_pfn_mkwrite(vmf, pe_size);
> +		}
>  		up_read(&EXT4_I(inode)->i_mmap_sem);
>  		sb_end_pagefault(sb);
>  	} else {
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3c600f02673f..e68231bb227c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3429,6 +3429,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  	}
>  
>  	iomap->flags = 0;
> +	if ((flags & IOMAP_FAULT) && (flags & IOMAP_WRITE) && IS_SYNC(inode) &&
> +	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
> +					EXT4_I(inode)->i_datasync_tid))
> +		iomap->flags |= IOMAP_F_NEEDDSYNC;

Do we need to check for (flags & IOMAP_FAULT), or can we rely on the fact that
we are in ext4_iomap_begin()?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 7/7] ext4: Support for synchronous DAX faults
Date: Thu, 27 Jul 2017 16:57:39 -0600	[thread overview]
Message-ID: <20170727225739.GH22000@linux.intel.com> (raw)
In-Reply-To: <20170727131245.28279-8-jack@suse.cz>

On Thu, Jul 27, 2017 at 03:12:45PM +0200, Jan Kara wrote:
> We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
> synchronous write fault when inode has some uncommitted metadata
> changes. In the fault handler ext4_dax_fault() we then detect this case,
> call vfs_fsync_range() to make sure all metadata is committed, and call
> dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also
> dirty corresponding radix tree entry which is what we want - fsync(2)
> will still provide data integrity guarantees for applications not using
> userspace flushing. And applications using userspace flushing can avoid
> calling fsync(2) and thus avoid the performance overhead.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/file.c       | 35 +++++++++++++++++++++++++++++------
>  fs/ext4/inode.c      |  4 ++++
>  fs/jbd2/journal.c    | 16 ++++++++++++++++
>  include/linux/jbd2.h |  1 +
>  4 files changed, 50 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index d401403e5095..b221d0b546b0 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -287,16 +287,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
>  					       EXT4_DATA_TRANS_BLOCKS(sb));
> +		if (IS_ERR(handle)) {
> +			up_read(&EXT4_I(inode)->i_mmap_sem);
> +			sb_end_pagefault(sb);
> +			return VM_FAULT_SIGBUS;
> +		}

Yay, this error handling seems cleaner to me anyway.

>  	} else {
>  		down_read(&EXT4_I(inode)->i_mmap_sem);
>  	}
> -	if (!IS_ERR(handle))
> -		result = dax_iomap_fault(vmf, pe_size, false, &ext4_iomap_ops);
> -	else
> -		result = VM_FAULT_SIGBUS;
> +	result = dax_iomap_fault(vmf, pe_size, IS_SYNC(inode), &ext4_iomap_ops);
>  	if (write) {
> -		if (!IS_ERR(handle))
> -			ext4_journal_stop(handle);
> +		ext4_journal_stop(handle);
> +		/* Write fault but PFN mapped only RO? */
> +		if (result & VM_FAULT_RO) {
> +			int err;
> +			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
> +			size_t len = 0;
> +
> +			if (pe_size == PE_SIZE_PTE)
> +				len = PAGE_SIZE;
> +#ifdef CONFIG_FS_DAX_PMD
> +			else if (pe_size == PE_SIZE_PMD)
> +				len = HPAGE_PMD_SIZE;
> +			else
> +				WARN_ON_ONCE(1);

I think this "else WARN_ON_ONCE(1);" should live outside of the
CONFIG_FS_DAX_PMD so that we get warned in all configs if we get an
unsupported pe_size.

> +#endif
> +			WARN_ON_ONCE(!IS_SYNC(inode));
> +			err = vfs_fsync_range(vmf->vma->vm_file, start,
> +					      start + len - 1, 1);
> +			if (err)
> +				result = VM_FAULT_SIGBUS;
> +			else
> +				result = dax_pfn_mkwrite(vmf, pe_size);
> +		}
>  		up_read(&EXT4_I(inode)->i_mmap_sem);
>  		sb_end_pagefault(sb);
>  	} else {
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3c600f02673f..e68231bb227c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3429,6 +3429,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  	}
>  
>  	iomap->flags = 0;
> +	if ((flags & IOMAP_FAULT) && (flags & IOMAP_WRITE) && IS_SYNC(inode) &&
> +	    !jbd2_transaction_committed(EXT4_SB(inode->i_sb)->s_journal,
> +					EXT4_I(inode)->i_datasync_tid))
> +		iomap->flags |= IOMAP_F_NEEDDSYNC;

Do we need to check for (flags & IOMAP_FAULT), or can we rely on the fact that
we are in ext4_iomap_begin()?

  reply	other threads:[~2017-07-27 22:55 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27 13:12 [RFC PATCH 0/7] dax, ext4: Synchronous page faults Jan Kara
2017-07-27 13:12 ` Jan Kara
2017-07-27 13:12 ` Jan Kara
2017-07-27 13:12 ` Jan Kara
2017-07-27 13:12 ` [PATCH 1/7] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 21:57   ` Ross Zwisler
2017-07-27 21:57     ` Ross Zwisler
2017-07-27 21:57     ` Ross Zwisler
2017-08-01 10:52   ` Christoph Hellwig
2017-08-01 10:52     ` Christoph Hellwig
2017-08-01 10:52     ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 2/7] dax: Add sync argument to dax_iomap_fault() Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:06   ` Ross Zwisler
2017-07-27 22:06     ` Ross Zwisler
2017-07-27 22:06     ` Ross Zwisler
2017-07-28  9:40     ` Jan Kara
2017-07-28  9:40       ` Jan Kara
2017-07-27 13:12 ` [PATCH 3/7] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:09   ` Ross Zwisler
2017-07-27 22:09     ` Ross Zwisler
2017-07-27 22:09     ` Ross Zwisler
2017-08-01 10:54   ` Christoph Hellwig
2017-08-01 10:54     ` Christoph Hellwig
2017-08-01 10:54     ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 4/7] dax: Make dax_insert_mapping() return VM_FAULT_ state Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:22   ` Ross Zwisler
2017-07-27 22:22     ` Ross Zwisler
2017-07-28  9:43     ` Jan Kara
2017-07-28  9:43       ` Jan Kara
2017-07-27 13:12 ` [PATCH 5/7] dax, iomap: Add support for synchronous faults Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:42   ` Ross Zwisler
2017-07-27 22:42     ` Ross Zwisler
2017-08-01 10:56     ` Christoph Hellwig
2017-08-01 10:56       ` Christoph Hellwig
2017-08-01 10:56       ` Christoph Hellwig
2017-07-27 13:12 ` [PATCH 6/7] dax: Implement dax_pfn_mkwrite() Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:53   ` Ross Zwisler
2017-07-27 22:53     ` Ross Zwisler
2017-07-27 22:53     ` Ross Zwisler
2017-07-27 23:04     ` Ross Zwisler
2017-07-27 23:04       ` Ross Zwisler
2017-07-28 10:37     ` Jan Kara
2017-07-28 10:37       ` Jan Kara
2017-07-28 10:37       ` Jan Kara
2017-07-27 13:12 ` [PATCH 7/7] ext4: Support for synchronous DAX faults Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 13:12   ` Jan Kara
2017-07-27 22:57   ` Ross Zwisler [this message]
2017-07-27 22:57     ` Ross Zwisler
2017-07-27 14:09 ` [RFC PATCH 0/7] dax, ext4: Synchronous page faults Jeff Moyer
2017-07-27 14:09   ` Jeff Moyer
2017-07-27 14:09   ` Jeff Moyer
2017-07-27 21:57   ` Ross Zwisler
2017-07-27 21:57     ` Ross Zwisler
2017-07-28  2:05     ` Andy Lutomirski
2017-07-28  2:05       ` Andy Lutomirski
2017-07-28  9:38       ` Jan Kara
2017-07-28  9:38         ` Jan Kara
2017-07-28  9:38         ` Jan Kara
2017-08-01 11:02         ` Christoph Hellwig
2017-08-01 11:02           ` Christoph Hellwig
2017-08-01 11:26           ` Jan Kara
2017-08-01 11:26             ` Jan Kara
2017-08-01 11:26             ` Jan Kara
2017-08-08  0:24             ` Dan Williams
2017-08-08  0:24               ` Dan Williams
2017-08-11 10:03               ` Christoph Hellwig
2017-08-11 10:03                 ` Christoph Hellwig
2017-08-11 10:03                 ` Christoph Hellwig
2017-08-13  2:44                 ` Dan Williams
2017-08-13  2:44                   ` Dan Williams
2017-08-13  2:44                   ` Dan Williams
2017-08-13  9:25                   ` Christoph Hellwig
2017-08-13  9:25                     ` Christoph Hellwig
2017-08-13 17:08                     ` Dan Williams
2017-08-13 17:08                       ` Dan Williams
2017-08-14  8:30                     ` Jan Kara
2017-08-14  8:30                       ` Jan Kara
2017-08-14 14:04                     ` Boaz Harrosh
2017-08-14 14:04                       ` Boaz Harrosh
2017-08-14 16:03                       ` Dan Williams
2017-08-14 16:03                         ` Dan Williams
2017-08-15  9:06                         ` Boaz Harrosh
2017-08-15  9:06                           ` Boaz Harrosh
2017-08-15  9:44                           ` Boaz Harrosh
2017-08-15  9:44                             ` Boaz Harrosh
2017-08-21 19:57                         ` Ross Zwisler
2017-08-21 19:57                           ` Ross Zwisler
2017-08-21 19:57                           ` Ross Zwisler
2017-08-17 16:08                       ` Jan Kara
2017-08-17 16:08                         ` Jan Kara
2017-08-01 10:52 ` Christoph Hellwig
2017-08-01 10:52   ` Christoph Hellwig
2017-08-01 10:52   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170727225739.GH22000@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.