All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <boaz@plexistor.com>
To: Boaz Harrosh <boaz@plexistor.com>,
	Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, linux-nvdimm <linux-nvdimm@ml01.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Eryu Guan <eguan@redhat.com>
Subject: Re: [PATCH 3/3] RFC: dax: dax_prepare_freeze
Date: Tue, 24 Mar 2015 14:37:45 +0200	[thread overview]
Message-ID: <55115A99.40705@plexistor.com> (raw)
In-Reply-To: <55100D10.6090902@plexistor.com>

On 03/23/2015 02:54 PM, Boaz Harrosh wrote:
> From: Boaz Harrosh <boaz@plexistor.com>
> 
> When freezing an FS, we must write protect all IS_DAX()
> inodes that have an mmap mapping on an inode. Otherwise
> application will be able to modify previously faulted-in
> file pages.
> 
> I'm actually doing a full unmap_mapping_range because
> there is no readily available "mapping_write_protect" like
> functionality. I do not think it is worth it to define one
> just for here and just for some extra read-faults after an
> fs_freeze.
> 
> How hot-path is fs_freeze at all?
> 

OK So reinspecting this was a complete raw RFC. I need to do
more work on this thing

comments below ...

> CC: Jan Kara <jack@suse.cz>
> CC: Matthew Wilcox <matthew.r.wilcox@intel.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
> ---
>  fs/dax.c           | 30 ++++++++++++++++++++++++++++++
>  fs/super.c         |  3 +++
>  include/linux/fs.h |  1 +
>  3 files changed, 34 insertions(+)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index d0bd1f4..f3fc28b 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -549,3 +549,33 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
>  	return dax_zero_page_range(inode, from, length, get_block);
>  }
>  EXPORT_SYMBOL_GPL(dax_truncate_page);
> +
> +/* This is meant to be called as part of freeze_super. otherwise we might
> + * Need some extra locking before calling here.
> + */
> +void dax_prepare_freeze(struct super_block *sb)
> +{
> +	struct inode *inode;
> +
> +	/* TODO: each DAX fs has some private mount option to enable DAX. If
> +	 * We made that option a generic MS_DAX_ENABLE super_block flag we could
> +	 * Avoid the 95% extra unneeded loop-on-all-inodes every freeze.
> +	 * if (!(sb->s_flags & MS_DAX_ENABLE))
> +	 *	return 0;
> +	 */
> +
> +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		/* TODO: For freezing we can actually do with write-protecting
> +		 * the page. But I cannot find a ready made function that does
> +		 * that for a giving mapping (with all the proper locking).
> +		 * How performance sensitive is the all sb_freeze API?
> +		 * For now we can just unmap the all mapping, and pay extra
> +		 * on read faults.
> +		 */
> +		/* NOTE: Do not unmap private COW mapped pages it will not
> +		 * modify the FS.
> +		 */
> +		if (IS_DAX(inode))
> +			unmap_mapping_range(inode->i_mapping, 0, 0, 0);

So what happens here is that we loop on all sb->s_inodes every freeze
and in the not DAX case just do nothing.

It could be nice to have a flag at the sb level to tel us if we need
to expect IS_DAX() inodes at all, for example when we are mounted on
an harddisk it should not be set.

All of ext2/4 and now Dave's xfs have their own
	XFS_MOUNT_DAX / EXT2_MOUNT_DAX / EXT4_MOUNT_DAX

Is it OK if I unify all this on sb->s_flags |= MS_MOUNT_DAX so I can check it
here in Generic code? The option parsing will be done by each FS but
the flag be global?

> +	}
> +}
> diff --git a/fs/super.c b/fs/super.c
> index 2b7dc90..9ef490c 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1329,6 +1329,9 @@ int freeze_super(struct super_block *sb)
>  	/* All writers are done so after syncing there won't be dirty data */
>  	sync_filesystem(sb);
>  
> +	/* Need to take care of DAX mmaped inodes */
> +	dax_prepare_freeze(sb);
> +

So if CONFIG_FS_DAX is not set this will not compile I need to
define an empty one if not set

Cheers
Boaz


>  	/* Now wait for internal filesystem counter */
>  	sb->s_writers.frozen = SB_FREEZE_FS;
>  	smp_wmb();
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 24af817..3b943d4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2599,6 +2599,7 @@ int dax_truncate_page(struct inode *, loff_t from, get_block_t);
>  int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
>  int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
>  #define dax_mkwrite(vma, vmf, gb)	dax_fault(vma, vmf, gb)
> +void dax_prepare_freeze(struct super_block *sb);
>  
>  #ifdef CONFIG_BLOCK
>  typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
> 


WARNING: multiple messages have this Message-ID
From: Boaz Harrosh <boaz@plexistor.com>
To: Boaz Harrosh <boaz@plexistor.com>,
	Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, linux-nvdimm <linux-nvdimm@ml01.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Eryu Guan <eguan@redhat.com>
Subject: Re: [PATCH 3/3] RFC: dax: dax_prepare_freeze
Date: Tue, 24 Mar 2015 14:37:45 +0200	[thread overview]
Message-ID: <55115A99.40705@plexistor.com> (raw)
In-Reply-To: <55100D10.6090902@plexistor.com>

On 03/23/2015 02:54 PM, Boaz Harrosh wrote:
> From: Boaz Harrosh <boaz@plexistor.com>
> 
> When freezing an FS, we must write protect all IS_DAX()
> inodes that have an mmap mapping on an inode. Otherwise
> application will be able to modify previously faulted-in
> file pages.
> 
> I'm actually doing a full unmap_mapping_range because
> there is no readily available "mapping_write_protect" like
> functionality. I do not think it is worth it to define one
> just for here and just for some extra read-faults after an
> fs_freeze.
> 
> How hot-path is fs_freeze at all?
> 

OK So reinspecting this was a complete raw RFC. I need to do
more work on this thing

comments below ...

> CC: Jan Kara <jack@suse.cz>
> CC: Matthew Wilcox <matthew.r.wilcox@intel.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
> ---
>  fs/dax.c           | 30 ++++++++++++++++++++++++++++++
>  fs/super.c         |  3 +++
>  include/linux/fs.h |  1 +
>  3 files changed, 34 insertions(+)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index d0bd1f4..f3fc28b 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -549,3 +549,33 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
>  	return dax_zero_page_range(inode, from, length, get_block);
>  }
>  EXPORT_SYMBOL_GPL(dax_truncate_page);
> +
> +/* This is meant to be called as part of freeze_super. otherwise we might
> + * Need some extra locking before calling here.
> + */
> +void dax_prepare_freeze(struct super_block *sb)
> +{
> +	struct inode *inode;
> +
> +	/* TODO: each DAX fs has some private mount option to enable DAX. If
> +	 * We made that option a generic MS_DAX_ENABLE super_block flag we could
> +	 * Avoid the 95% extra unneeded loop-on-all-inodes every freeze.
> +	 * if (!(sb->s_flags & MS_DAX_ENABLE))
> +	 *	return 0;
> +	 */
> +
> +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		/* TODO: For freezing we can actually do with write-protecting
> +		 * the page. But I cannot find a ready made function that does
> +		 * that for a giving mapping (with all the proper locking).
> +		 * How performance sensitive is the all sb_freeze API?
> +		 * For now we can just unmap the all mapping, and pay extra
> +		 * on read faults.
> +		 */
> +		/* NOTE: Do not unmap private COW mapped pages it will not
> +		 * modify the FS.
> +		 */
> +		if (IS_DAX(inode))
> +			unmap_mapping_range(inode->i_mapping, 0, 0, 0);

So what happens here is that we loop on all sb->s_inodes every freeze
and in the not DAX case just do nothing.

It could be nice to have a flag at the sb level to tel us if we need
to expect IS_DAX() inodes at all, for example when we are mounted on
an harddisk it should not be set.

All of ext2/4 and now Dave's xfs have their own
	XFS_MOUNT_DAX / EXT2_MOUNT_DAX / EXT4_MOUNT_DAX

Is it OK if I unify all this on sb->s_flags |= MS_MOUNT_DAX so I can check it
here in Generic code? The option parsing will be done by each FS but
the flag be global?

> +	}
> +}
> diff --git a/fs/super.c b/fs/super.c
> index 2b7dc90..9ef490c 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1329,6 +1329,9 @@ int freeze_super(struct super_block *sb)
>  	/* All writers are done so after syncing there won't be dirty data */
>  	sync_filesystem(sb);
>  
> +	/* Need to take care of DAX mmaped inodes */
> +	dax_prepare_freeze(sb);
> +

So if CONFIG_FS_DAX is not set this will not compile I need to
define an empty one if not set

Cheers
Boaz


>  	/* Now wait for internal filesystem counter */
>  	sb->s_writers.frozen = SB_FREEZE_FS;
>  	smp_wmb();
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 24af817..3b943d4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2599,6 +2599,7 @@ int dax_truncate_page(struct inode *, loff_t from, get_block_t);
>  int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
>  int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
>  #define dax_mkwrite(vma, vmf, gb)	dax_fault(vma, vmf, gb)
> +void dax_prepare_freeze(struct super_block *sb);
>  
>  #ifdef CONFIG_BLOCK
>  typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-03-24 12:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-23 12:47 [PATCH 0/3 v3] dax: Fix mmap-write not updating c/mtime Boaz Harrosh
2015-03-23 12:47 ` Boaz Harrosh
2015-03-23 12:49 ` [PATCH 1/3] mm: New pfn_mkwrite same as page_mkwrite for VM_PFNMAP Boaz Harrosh
2015-03-23 22:49   ` Andrew Morton
2015-03-23 22:49     ` Andrew Morton
2015-03-23 12:52 ` [PATCH 2/3] dax: use pfn_mkwrite to update c/mtime + freeze protection Boaz Harrosh
2015-03-23 12:54 ` [PATCH 3/3] RFC: dax: dax_prepare_freeze Boaz Harrosh
2015-03-23 22:40   ` Dave Chinner
2015-03-23 22:40     ` Dave Chinner
2015-03-24  6:14     ` Boaz Harrosh
2015-03-24  6:14       ` Boaz Harrosh
2015-03-25  2:22       ` Dave Chinner
2015-03-25  2:22         ` Dave Chinner
2015-03-25  8:10         ` Boaz Harrosh
2015-03-25  9:29           ` Dave Chinner
2015-03-25  9:29             ` Dave Chinner
2015-03-25 10:19             ` Boaz Harrosh
2015-03-25 10:19               ` Boaz Harrosh
2015-03-25 20:00               ` Dave Chinner
2015-03-25 20:00                 ` Dave Chinner
2015-03-26  8:02                 ` Boaz Harrosh
2015-03-26 20:58                   ` Dave Chinner
2015-03-26 20:58                     ` Dave Chinner
2015-03-24 12:37   ` Boaz Harrosh [this message]
2015-03-24 12:37     ` Boaz Harrosh
2015-03-25  2:26     ` Dave Chinner
2015-03-25  2:26       ` Dave Chinner
2015-03-25  8:31       ` Boaz Harrosh
2015-03-25  8:31         ` Boaz Harrosh
2015-03-25  9:41         ` Dave Chinner
2015-03-25  9:41           ` Dave Chinner
2015-03-25 10:40           ` Boaz Harrosh
2015-03-25 10:40             ` Boaz Harrosh
2015-03-25 20:05             ` Dave Chinner
2015-03-25 20:05               ` Dave Chinner
2015-03-23 12:56 ` [PATCH v4] xfstest: generic/080 test that mmap-write updates c/mtime Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55115A99.40705@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=eguan@redhat.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.