All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	linux-ext4@vger.kernel.org, linux-nvdimm@lists.01.org
Subject: Re: [PATCH] ext2, ext4: Fix issue with missing journal entry
Date: Thu, 28 Jan 2016 09:32:11 -0700	[thread overview]
Message-ID: <20160128163211.GA19770@linux.intel.com> (raw)
In-Reply-To: <20160128131630.GI7726@quack.suse.cz>

On Thu, Jan 28, 2016 at 02:16:30PM +0100, Jan Kara wrote:
> On Wed 27-01-16 12:01:48, Ross Zwisler wrote:
> > As it is currently written ext4_dax_mkwrite() assumes that the call into
> > __dax_mkwrite() will not have to do a block allocation so it doesn't create
> > a journal entry.  For a read that creates a zero page to cover a hole
> > followed by a write that actually allocates storage this is incorrect.  The
> > ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
> > get_blocks() to allocate storage.
> > 
> > Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
> > as this function already has all the logic needed to allocate a journal
> > entry and call __dax_fault().
> > 
> > Also update the ext2 fault handlers in this same way to remove duplicate
> > code and keep the logic between ext2 and ext4 the same.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Ah, ok, you are right. The patch looks good but Matthew is reworking the
> area more (so ext4_da_mkwrite() is likely to return) so it this worth it?
> Or do you expect Matthew's patches to land much later?

Yep, Matthew is in the process of reworking all of the DAX fault handling.

I was thinking that we might want to take this patch for v4.5, since it fixes
a bug that I'm guessing could lead to some sort of corruption (lack of a
journal entry entry for an allocating write), and then Matthew's reworks would
land in v4.6?

> > ---
> >  fs/ext2/file.c | 19 +------------------
> >  fs/ext4/file.c | 19 ++-----------------
> >  2 files changed, 3 insertions(+), 35 deletions(-)
> > 
> > diff --git a/fs/ext2/file.c b/fs/ext2/file.c
> > index 2c88d68..c1400b1 100644
> > --- a/fs/ext2/file.c
> > +++ b/fs/ext2/file.c
> > @@ -80,23 +80,6 @@ static int ext2_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
> >  	return ret;
> >  }
> >  
> > -static int ext2_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > -{
> > -	struct inode *inode = file_inode(vma->vm_file);
> > -	struct ext2_inode_info *ei = EXT2_I(inode);
> > -	int ret;
> > -
> > -	sb_start_pagefault(inode->i_sb);
> > -	file_update_time(vma->vm_file);
> > -	down_read(&ei->dax_sem);
> > -
> > -	ret = __dax_mkwrite(vma, vmf, ext2_get_block, NULL);
> > -
> > -	up_read(&ei->dax_sem);
> > -	sb_end_pagefault(inode->i_sb);
> > -	return ret;
> > -}
> > -
> >  static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
> >  		struct vm_fault *vmf)
> >  {
> > @@ -124,7 +107,7 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
> >  static const struct vm_operations_struct ext2_dax_vm_ops = {
> >  	.fault		= ext2_dax_fault,
> >  	.pmd_fault	= ext2_dax_pmd_fault,
> > -	.page_mkwrite	= ext2_dax_mkwrite,
> > +	.page_mkwrite	= ext2_dax_fault,
> >  	.pfn_mkwrite	= ext2_dax_pfn_mkwrite,
> >  };
> >  
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index 1126436..d2e8500 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -262,23 +262,8 @@ static int ext4_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
> >  	return result;
> >  }
> >  
> > -static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > -{
> > -	int err;
> > -	struct inode *inode = file_inode(vma->vm_file);
> > -
> > -	sb_start_pagefault(inode->i_sb);
> > -	file_update_time(vma->vm_file);
> > -	down_read(&EXT4_I(inode)->i_mmap_sem);
> > -	err = __dax_mkwrite(vma, vmf, ext4_dax_mmap_get_block, NULL);
> > -	up_read(&EXT4_I(inode)->i_mmap_sem);
> > -	sb_end_pagefault(inode->i_sb);
> > -
> > -	return err;
> > -}
> > -
> >  /*
> > - * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_mkwrite()
> > + * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_fault()
> >   * handler we check for races agaist truncate. Note that since we cycle through
> >   * i_mmap_sem, we are sure that also any hole punching that began before we
> >   * were called is finished by now and so if it included part of the file we
> > @@ -311,7 +296,7 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma,
> >  static const struct vm_operations_struct ext4_dax_vm_ops = {
> >  	.fault		= ext4_dax_fault,
> >  	.pmd_fault	= ext4_dax_pmd_fault,
> > -	.page_mkwrite	= ext4_dax_mkwrite,
> > +	.page_mkwrite	= ext4_dax_fault,
> >  	.pfn_mkwrite	= ext4_dax_pfn_mkwrite,
> >  };
> >  #else
> > -- 
> > 2.5.0
> > 
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	linux-ext4@vger.kernel.org, linux-nvdimm@ml01.01.org
Subject: Re: [PATCH] ext2, ext4: Fix issue with missing journal entry
Date: Thu, 28 Jan 2016 09:32:11 -0700	[thread overview]
Message-ID: <20160128163211.GA19770@linux.intel.com> (raw)
In-Reply-To: <20160128131630.GI7726@quack.suse.cz>

On Thu, Jan 28, 2016 at 02:16:30PM +0100, Jan Kara wrote:
> On Wed 27-01-16 12:01:48, Ross Zwisler wrote:
> > As it is currently written ext4_dax_mkwrite() assumes that the call into
> > __dax_mkwrite() will not have to do a block allocation so it doesn't create
> > a journal entry.  For a read that creates a zero page to cover a hole
> > followed by a write that actually allocates storage this is incorrect.  The
> > ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
> > get_blocks() to allocate storage.
> > 
> > Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
> > as this function already has all the logic needed to allocate a journal
> > entry and call __dax_fault().
> > 
> > Also update the ext2 fault handlers in this same way to remove duplicate
> > code and keep the logic between ext2 and ext4 the same.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Ah, ok, you are right. The patch looks good but Matthew is reworking the
> area more (so ext4_da_mkwrite() is likely to return) so it this worth it?
> Or do you expect Matthew's patches to land much later?

Yep, Matthew is in the process of reworking all of the DAX fault handling.

I was thinking that we might want to take this patch for v4.5, since it fixes
a bug that I'm guessing could lead to some sort of corruption (lack of a
journal entry entry for an allocating write), and then Matthew's reworks would
land in v4.6?

> > ---
> >  fs/ext2/file.c | 19 +------------------
> >  fs/ext4/file.c | 19 ++-----------------
> >  2 files changed, 3 insertions(+), 35 deletions(-)
> > 
> > diff --git a/fs/ext2/file.c b/fs/ext2/file.c
> > index 2c88d68..c1400b1 100644
> > --- a/fs/ext2/file.c
> > +++ b/fs/ext2/file.c
> > @@ -80,23 +80,6 @@ static int ext2_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
> >  	return ret;
> >  }
> >  
> > -static int ext2_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > -{
> > -	struct inode *inode = file_inode(vma->vm_file);
> > -	struct ext2_inode_info *ei = EXT2_I(inode);
> > -	int ret;
> > -
> > -	sb_start_pagefault(inode->i_sb);
> > -	file_update_time(vma->vm_file);
> > -	down_read(&ei->dax_sem);
> > -
> > -	ret = __dax_mkwrite(vma, vmf, ext2_get_block, NULL);
> > -
> > -	up_read(&ei->dax_sem);
> > -	sb_end_pagefault(inode->i_sb);
> > -	return ret;
> > -}
> > -
> >  static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
> >  		struct vm_fault *vmf)
> >  {
> > @@ -124,7 +107,7 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
> >  static const struct vm_operations_struct ext2_dax_vm_ops = {
> >  	.fault		= ext2_dax_fault,
> >  	.pmd_fault	= ext2_dax_pmd_fault,
> > -	.page_mkwrite	= ext2_dax_mkwrite,
> > +	.page_mkwrite	= ext2_dax_fault,
> >  	.pfn_mkwrite	= ext2_dax_pfn_mkwrite,
> >  };
> >  
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index 1126436..d2e8500 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -262,23 +262,8 @@ static int ext4_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
> >  	return result;
> >  }
> >  
> > -static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > -{
> > -	int err;
> > -	struct inode *inode = file_inode(vma->vm_file);
> > -
> > -	sb_start_pagefault(inode->i_sb);
> > -	file_update_time(vma->vm_file);
> > -	down_read(&EXT4_I(inode)->i_mmap_sem);
> > -	err = __dax_mkwrite(vma, vmf, ext4_dax_mmap_get_block, NULL);
> > -	up_read(&EXT4_I(inode)->i_mmap_sem);
> > -	sb_end_pagefault(inode->i_sb);
> > -
> > -	return err;
> > -}
> > -
> >  /*
> > - * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_mkwrite()
> > + * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_fault()
> >   * handler we check for races agaist truncate. Note that since we cycle through
> >   * i_mmap_sem, we are sure that also any hole punching that began before we
> >   * were called is finished by now and so if it included part of the file we
> > @@ -311,7 +296,7 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma,
> >  static const struct vm_operations_struct ext4_dax_vm_ops = {
> >  	.fault		= ext4_dax_fault,
> >  	.pmd_fault	= ext4_dax_pmd_fault,
> > -	.page_mkwrite	= ext4_dax_mkwrite,
> > +	.page_mkwrite	= ext4_dax_fault,
> >  	.pfn_mkwrite	= ext4_dax_pfn_mkwrite,
> >  };
> >  #else
> > -- 
> > 2.5.0
> > 
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

  reply	other threads:[~2016-01-28 16:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-27 19:01 [PATCH] ext2, ext4: Fix issue with missing journal entry Ross Zwisler
2016-01-27 19:01 ` Ross Zwisler
2016-01-27 19:01 ` Ross Zwisler
2016-01-28 13:16 ` Jan Kara
2016-01-28 13:16   ` Jan Kara
2016-01-28 16:32   ` Ross Zwisler [this message]
2016-01-28 16:32     ` Ross Zwisler
2016-02-24 20:44     ` Ross Zwisler
2016-02-24 20:44       ` Ross Zwisler
2016-02-25  8:37       ` Jan Kara
2016-02-25  8:37         ` Jan Kara
2016-02-27 19:21         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160128163211.GA19770@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=tytso@mit.edu \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.