All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs
@ 2017-01-23 12:32 Amir Goldstein
  2017-01-23 12:32 ` [RFC][PATCH v2 1/3] vfs: freeze protect overlay inode on file_start_write() Amir Goldstein
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-23 12:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

Miklos,

Take #2 for overlayfs freeze.

Patch #2 I am quite confident is a bug fix and I have written an
xfs specific test for it.

Patch #1 is a POC of what I think overlay freezing should be like,
although we may want to optimize it with some file flag?

Patch #3 just enables overlayfs freezing with NOP freeze_fs()
unfreeze_fs() operations, so I could test it.  It behaves as
expected - overlay and underlying fs can be frozen independently
and writes continue when both fs are thawed.

I believe that mmap freeze protection is not covered, although I am
not sure exactly how it works?

Am I missing anything else?

Amir Goldstein (3):
  vfs: freeze protect overlay inode on file_start_write()
  ovl: properly implement sync_filesystem()
  ovl: support freeze/thaw super

 fs/overlayfs/super.c | 32 ++++++++++++++++++++++++++++++++
 include/linux/fs.h   | 17 ++++++++++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH v2 1/3] vfs: freeze protect overlay inode on file_start_write()
  2017-01-23 12:32 [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
@ 2017-01-23 12:32 ` Amir Goldstein
  2017-01-23 19:43   ` [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write() Amir Goldstein
  2017-01-23 12:32 ` [RFC][PATCH v2 2/3] ovl: properly implement sync_filesystem() Amir Goldstein
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2017-01-23 12:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

Before writing to overlay file, need to check if either overlay
or upper fs are frozen.
This allows freezing overlayfs mount independently of freezing
underlying fs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fs.h | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4a7f3cc..71811be 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2547,14 +2547,26 @@ static inline void file_start_write(struct file *file)
 {
 	if (!S_ISREG(file_inode(file)->i_mode))
 		return;
+	__sb_start_write(locks_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+	if (likely(locks_inode(file) == file_inode(file)))
+		return;
 	__sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+
 }
 
 static inline bool file_start_write_trylock(struct file *file)
 {
+	int ret;
+
 	if (!S_ISREG(file_inode(file)->i_mode))
 		return true;
-	return __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, false);
+	ret = __sb_start_write(locks_inode(file)->i_sb, SB_FREEZE_WRITE, false);
+	if (!ret || likely(locks_inode(file) == file_inode(file)))
+		return ret;
+	ret = __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, false);
+	if (!ret)
+		__sb_end_write(locks_inode(file)->i_sb, SB_FREEZE_WRITE);
+	return ret;
 }
 
 static inline void file_end_write(struct file *file)
@@ -2562,6 +2574,9 @@ static inline void file_end_write(struct file *file)
 	if (!S_ISREG(file_inode(file)->i_mode))
 		return;
 	__sb_end_write(file_inode(file)->i_sb, SB_FREEZE_WRITE);
+	if (likely(locks_inode(file) == file_inode(file)))
+		return;
+	__sb_end_write(locks_inode(file)->i_sb, SB_FREEZE_WRITE);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH v2 2/3] ovl: properly implement sync_filesystem()
  2017-01-23 12:32 [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
  2017-01-23 12:32 ` [RFC][PATCH v2 1/3] vfs: freeze protect overlay inode on file_start_write() Amir Goldstein
@ 2017-01-23 12:32 ` Amir Goldstein
  2017-01-23 12:32 ` [RFC][PATCH v2 3/3] ovl: support freeze/thaw super Amir Goldstein
  2017-01-23 19:52 ` [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
  3 siblings, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-23 12:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

overlayfs syncs all inode pages on sync_filesystem(), but it also
needs to call s_op->sync_fs() of upper fs for metadata sync.

This fixes correctness of syncfs(2) as demonstrated by following
xfs specific test:

xfs_sync_stats()
{
	echo $1
	echo -n "xfs_log_force = "
	grep log /proc/fs/xfs/stat  | awk '{ print $5 }'
}

xfs_sync_stats "before touch"
touch x
xfs_sync_stats "after touch"
xfs_io -c syncfs .
xfs_sync_stats "after syncfs"
xfs_io -c fsync x
xfs_sync_stats "after fsync"
xfs_io -c fsync x
xfs_sync_stats "after fsync #2"

When this test is run in overlay mount over xfs, log force
count does not increase with syncfs command.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 6792bb7..346f668 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -160,6 +160,25 @@ static void ovl_put_super(struct super_block *sb)
 	kfree(ufs);
 }
 
+static int ovl_sync_fs(struct super_block *sb, int wait)
+{
+	struct ovl_fs *ufs = sb->s_fs_info;
+	struct super_block *upper_sb;
+	int ret;
+
+	if (!ufs->upper_mnt)
+		return 0;
+	upper_sb = ufs->upper_mnt->mnt_sb;
+	if (!upper_sb->s_op->sync_fs)
+		return 0;
+
+	/* real inodes have already been synced by sync_filesystem(ovl_sb) */
+	down_read(&upper_sb->s_umount);
+	ret = upper_sb->s_op->sync_fs(upper_sb, wait);
+	up_read(&upper_sb->s_umount);
+	return ret;
+}
+
 /**
  * ovl_statfs
  * @sb: The overlayfs super block
@@ -222,6 +241,7 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 
 static const struct super_operations ovl_super_operations = {
 	.put_super	= ovl_put_super,
+	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
 	.show_options	= ovl_show_options,
 	.remount_fs	= ovl_remount,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH v2 3/3] ovl: support freeze/thaw super
  2017-01-23 12:32 [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
  2017-01-23 12:32 ` [RFC][PATCH v2 1/3] vfs: freeze protect overlay inode on file_start_write() Amir Goldstein
  2017-01-23 12:32 ` [RFC][PATCH v2 2/3] ovl: properly implement sync_filesystem() Amir Goldstein
@ 2017-01-23 12:32 ` Amir Goldstein
  2017-01-23 19:52 ` [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
  3 siblings, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-23 12:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 346f668..cb85eeb 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -179,6 +179,16 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 	return ret;
 }
 
+static int ovl_freeze(struct super_block *sb)
+{
+	return 0;
+}
+
+static int ovl_unfreeze(struct super_block *sb)
+{
+	return 0;
+}
+
 /**
  * ovl_statfs
  * @sb: The overlayfs super block
@@ -242,6 +252,8 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 static const struct super_operations ovl_super_operations = {
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
+	.freeze_fs	= ovl_freeze,
+	.unfreeze_fs	= ovl_unfreeze,
 	.statfs		= ovl_statfs,
 	.show_options	= ovl_show_options,
 	.remount_fs	= ovl_remount,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-23 12:32 ` [RFC][PATCH v2 1/3] vfs: freeze protect overlay inode on file_start_write() Amir Goldstein
@ 2017-01-23 19:43   ` Amir Goldstein
  2017-01-24 10:09     ` Jan Kara
  2017-01-27 11:09     ` Miklos Szeredi
  0 siblings, 2 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-23 19:43 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

Before calling write f_ops, call file_start_write() instead
of sb_start_write().

This ensures freeze protection for both overlay and upper fs
when file is open from an overlayfs mount.

Replace {sb,file}_start_write() for {copy,clone}_file_range() and
for fallocate().

For dedup_file_range() there is no need for mnt_want_write_file().
File is already open for write, so we already have mnt_want_write()
and we only need file_start_write().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/open.c          |  4 ++--
 fs/read_write.c    | 12 ++++--------
 include/linux/fs.h | 26 +++++++++++++-------------
 3 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 9921f70..d3fe1a1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -316,7 +316,7 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (!file->f_op->fallocate)
 		return -EOPNOTSUPP;
 
-	sb_start_write(inode->i_sb);
+	file_start_write(file);
 	ret = file->f_op->fallocate(file, mode, offset, len);
 
 	/*
@@ -329,7 +329,7 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (ret == 0)
 		fsnotify_modify(file);
 
-	sb_end_write(inode->i_sb);
+	file_end_write(file);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vfs_fallocate);
diff --git a/fs/read_write.c b/fs/read_write.c
index 5816d4c..1e42487 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1538,7 +1538,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (len == 0)
 		return 0;
 
-	sb_start_write(inode_out->i_sb);
+	file_start_write(file_out);
 
 	/*
 	 * Try cloning first, this is supported by more file systems, and
@@ -1574,7 +1574,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	inc_syscr(current);
 	inc_syscw(current);
 
-	sb_end_write(inode_out->i_sb);
+	file_end_write(file_out);
 
 	return ret;
 }
@@ -1976,11 +1976,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 		}
 		dst = file_inode(dst_file);
 
-		ret = mnt_want_write_file(dst_file);
-		if (ret) {
-			info->status = ret;
-			goto next_loop;
-		}
+		file_start_write(dst_file);
 
 		dst_off = info->dest_offset;
 		ret = clone_verify_area(dst_file, dst_off, len, true);
@@ -2013,7 +2009,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 		}
 
 next_file:
-		mnt_drop_write_file(dst_file);
+		file_end_write(dst_file);
 next_loop:
 		fdput(dst_fd);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 71811be..8a4dbc6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1741,19 +1741,6 @@ extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 extern int vfs_dedupe_file_range(struct file *file,
 				 struct file_dedupe_range *same);
 
-static inline int do_clone_file_range(struct file *file_in, loff_t pos_in,
-				      struct file *file_out, loff_t pos_out,
-				      u64 len)
-{
-	int ret;
-
-	sb_start_write(file_inode(file_out)->i_sb);
-	ret = vfs_clone_file_range(file_in, pos_in, file_out, pos_out, len);
-	sb_end_write(file_inode(file_out)->i_sb);
-
-	return ret;
-}
-
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
 	void (*destroy_inode)(struct inode *);
@@ -2579,6 +2566,19 @@ static inline void file_end_write(struct file *file)
 	__sb_end_write(locks_inode(file)->i_sb, SB_FREEZE_WRITE);
 }
 
+static inline int do_clone_file_range(struct file *file_in, loff_t pos_in,
+				      struct file *file_out, loff_t pos_out,
+				      u64 len)
+{
+	int ret;
+
+	file_start_write(file_out);
+	ret = vfs_clone_file_range(file_in, pos_in, file_out, pos_out, len);
+	file_end_write(file_out);
+
+	return ret;
+}
+
 /*
  * get_write_access() gets write permission for a file.
  * put_write_access() releases this write permission.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs
  2017-01-23 12:32 [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
                   ` (2 preceding siblings ...)
  2017-01-23 12:32 ` [RFC][PATCH v2 3/3] ovl: support freeze/thaw super Amir Goldstein
@ 2017-01-23 19:52 ` Amir Goldstein
  3 siblings, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-23 19:52 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Jan 23, 2017 at 2:32 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Miklos,
>
> Take #2 for overlayfs freeze.
>
> Patch #2 I am quite confident is a bug fix and I have written an
> xfs specific test for it.
>
> Patch #1 is a POC of what I think overlay freezing should be like,
> although we may want to optimize it with some file flag?
>
> Patch #3 just enables overlayfs freezing with NOP freeze_fs()
> unfreeze_fs() operations, so I could test it.  It behaves as
> expected - overlay and underlying fs can be frozen independently
> and writes continue when both fs are thawed.
>
> I believe that mmap freeze protection is not covered, although I am
> not sure exactly how it works?
>
> Am I missing anything else?

Found a few missed spots ({copy,clone,dedup}_file_range() and
fallocate(). sent patch #4.

Anyway, I totally understand if the reaction to this patch set is
"what is the justification for overlayfs freeze", because outside
my use case I haven't found any yet.

I would still appreciate if you could tell me whether I am in the
general right direction, as I do need to carry these patches for
overlayfs snapshots.

As for patch #2, I am not sure how often apps use syncfs(2),
and whether or not any of the containers that use overlayfs relay
on it in some way, but it looks to me like something worth fixing.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-23 19:43   ` [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write() Amir Goldstein
@ 2017-01-24 10:09     ` Jan Kara
  2017-01-27 11:09     ` Miklos Szeredi
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Kara @ 2017-01-24 10:09 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Mon 23-01-17 21:43:16, Amir Goldstein wrote:
> Before calling write f_ops, call file_start_write() instead
> of sb_start_write().
> 
> This ensures freeze protection for both overlay and upper fs
> when file is open from an overlayfs mount.
> 
> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
> for fallocate().
> 
> For dedup_file_range() there is no need for mnt_want_write_file().
> File is already open for write, so we already have mnt_want_write()
> and we only need file_start_write().
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

The patch looks like a good cleanup regardless of your overlayfs work. It
seems we just missed these places when converting freeze protection to
file_start_write(). You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/open.c          |  4 ++--
>  fs/read_write.c    | 12 ++++--------
>  include/linux/fs.h | 26 +++++++++++++-------------
>  3 files changed, 19 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 9921f70..d3fe1a1 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -316,7 +316,7 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	if (!file->f_op->fallocate)
>  		return -EOPNOTSUPP;
>  
> -	sb_start_write(inode->i_sb);
> +	file_start_write(file);
>  	ret = file->f_op->fallocate(file, mode, offset, len);
>  
>  	/*
> @@ -329,7 +329,7 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	if (ret == 0)
>  		fsnotify_modify(file);
>  
> -	sb_end_write(inode->i_sb);
> +	file_end_write(file);
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(vfs_fallocate);
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 5816d4c..1e42487 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1538,7 +1538,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (len == 0)
>  		return 0;
>  
> -	sb_start_write(inode_out->i_sb);
> +	file_start_write(file_out);
>  
>  	/*
>  	 * Try cloning first, this is supported by more file systems, and
> @@ -1574,7 +1574,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	inc_syscr(current);
>  	inc_syscw(current);
>  
> -	sb_end_write(inode_out->i_sb);
> +	file_end_write(file_out);
>  
>  	return ret;
>  }
> @@ -1976,11 +1976,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>  		}
>  		dst = file_inode(dst_file);
>  
> -		ret = mnt_want_write_file(dst_file);
> -		if (ret) {
> -			info->status = ret;
> -			goto next_loop;
> -		}
> +		file_start_write(dst_file);
>  
>  		dst_off = info->dest_offset;
>  		ret = clone_verify_area(dst_file, dst_off, len, true);
> @@ -2013,7 +2009,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>  		}
>  
>  next_file:
> -		mnt_drop_write_file(dst_file);
> +		file_end_write(dst_file);
>  next_loop:
>  		fdput(dst_fd);
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 71811be..8a4dbc6 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1741,19 +1741,6 @@ extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>  extern int vfs_dedupe_file_range(struct file *file,
>  				 struct file_dedupe_range *same);
>  
> -static inline int do_clone_file_range(struct file *file_in, loff_t pos_in,
> -				      struct file *file_out, loff_t pos_out,
> -				      u64 len)
> -{
> -	int ret;
> -
> -	sb_start_write(file_inode(file_out)->i_sb);
> -	ret = vfs_clone_file_range(file_in, pos_in, file_out, pos_out, len);
> -	sb_end_write(file_inode(file_out)->i_sb);
> -
> -	return ret;
> -}
> -
>  struct super_operations {
>     	struct inode *(*alloc_inode)(struct super_block *sb);
>  	void (*destroy_inode)(struct inode *);
> @@ -2579,6 +2566,19 @@ static inline void file_end_write(struct file *file)
>  	__sb_end_write(locks_inode(file)->i_sb, SB_FREEZE_WRITE);
>  }
>  
> +static inline int do_clone_file_range(struct file *file_in, loff_t pos_in,
> +				      struct file *file_out, loff_t pos_out,
> +				      u64 len)
> +{
> +	int ret;
> +
> +	file_start_write(file_out);
> +	ret = vfs_clone_file_range(file_in, pos_in, file_out, pos_out, len);
> +	file_end_write(file_out);
> +
> +	return ret;
> +}
> +
>  /*
>   * get_write_access() gets write permission for a file.
>   * put_write_access() releases this write permission.
> -- 
> 2.7.4
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-23 19:43   ` [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write() Amir Goldstein
  2017-01-24 10:09     ` Jan Kara
@ 2017-01-27 11:09     ` Miklos Szeredi
  2017-01-27 11:50       ` Amir Goldstein
  2017-01-31  7:11       ` Amir Goldstein
  1 sibling, 2 replies; 18+ messages in thread
From: Miklos Szeredi @ 2017-01-27 11:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Before calling write f_ops, call file_start_write() instead
> of sb_start_write().
>
> This ensures freeze protection for both overlay and upper fs
> when file is open from an overlayfs mount.
>
> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
> for fallocate().
>
> For dedup_file_range() there is no need for mnt_want_write_file().
> File is already open for write, so we already have mnt_want_write()
> and we only need file_start_write().

Being opened for write is not verified if capable(CAP_SYS_ADMIN).
Ugly special case, don't ask me why it's done...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 11:09     ` Miklos Szeredi
@ 2017-01-27 11:50       ` Amir Goldstein
  2017-01-27 16:20         ` Amir Goldstein
  2017-01-31  7:11       ` Amir Goldstein
  1 sibling, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2017-01-27 11:50 UTC (permalink / raw)
  To: Miklos Szeredi, Christoph Hellwig, Darrick J. Wong
  Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Before calling write f_ops, call file_start_write() instead
>> of sb_start_write().
>>
>> This ensures freeze protection for both overlay and upper fs
>> when file is open from an overlayfs mount.
>>
>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>> for fallocate().
>>
>> For dedup_file_range() there is no need for mnt_want_write_file().
>> File is already open for write, so we already have mnt_want_write()
>> and we only need file_start_write().
>
> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
> Ugly special case, don't ask me why it's done...
>

Christoph, Darrick, is that by design?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 11:50       ` Amir Goldstein
@ 2017-01-27 16:20         ` Amir Goldstein
  2017-01-27 16:31           ` Darrick J. Wong
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-27 16:20 UTC (permalink / raw)
  To: Miklos Szeredi, Michael Kerrisk (man-pages)
  Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel,
	Christoph Hellwig, Darrick J. Wong

On Fri, Jan 27, 2017 at 1:50 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Before calling write f_ops, call file_start_write() instead
>>> of sb_start_write().
>>>
>>> This ensures freeze protection for both overlay and upper fs
>>> when file is open from an overlayfs mount.
>>>
>>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>>> for fallocate().
>>>
>>> For dedup_file_range() there is no need for mnt_want_write_file().
>>> File is already open for write, so we already have mnt_want_write()
>>> and we only need file_start_write().
>>
>> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
>> Ugly special case, don't ask me why it's done...
>>
>
> Christoph, Darrick, is that by design?

Anyway, whether is makes sense or not, that's a legacy from
BTRFS_IOC_FILE_EXTENT_SAME, we probably have to live with.

Michael, I recon man page needs updating.

I'll remove this hunk from the patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 16:20         ` Amir Goldstein
@ 2017-01-27 16:31           ` Darrick J. Wong
  2017-01-27 17:29           ` Amir Goldstein
  2017-01-27 19:30           ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-27 16:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Michael Kerrisk (man-pages),
	Jan Kara, Al Viro, linux-unionfs, linux-fsdevel,
	Christoph Hellwig, mfasheh, linux-btrfs

[adding mfasheh & btrfs list to cc]

On Fri, Jan 27, 2017 at 06:20:12PM +0200, Amir Goldstein wrote:
> On Fri, Jan 27, 2017 at 1:50 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >>> Before calling write f_ops, call file_start_write() instead
> >>> of sb_start_write().
> >>>
> >>> This ensures freeze protection for both overlay and upper fs
> >>> when file is open from an overlayfs mount.
> >>>
> >>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
> >>> for fallocate().
> >>>
> >>> For dedup_file_range() there is no need for mnt_want_write_file().
> >>> File is already open for write, so we already have mnt_want_write()
> >>> and we only need file_start_write().
> >>
> >> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
> >> Ugly special case, don't ask me why it's done...
> >>
> >
> > Christoph, Darrick, is that by design?
> 
> Anyway, whether is makes sense or not, that's a legacy from
> BTRFS_IOC_FILE_EXTENT_SAME, we probably have to live with.
> 
> Michael, I recon man page needs updating.
> 
> I'll remove this hunk from the patch.

I /think/ that behavior (CAP_SYS_ADMIN not requiring destfd to be open
for writes in order to dedupe) was intentional; it seems to date back to
the original ioctl in 2013.  My guess of the justification is that we're
not really writing to dest, so if the admin comes along with an O_RDONLY
destfd it's ok?

<shrug> Let's see if we get any bites from the btrfs developers. :)

--D

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 16:20         ` Amir Goldstein
  2017-01-27 16:31           ` Darrick J. Wong
@ 2017-01-27 17:29           ` Amir Goldstein
  2017-01-27 17:51             ` Christoph Hellwig
  2017-01-27 19:30           ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2017-01-27 17:29 UTC (permalink / raw)
  To: Miklos Szeredi, Michael Kerrisk (man-pages)
  Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel,
	Christoph Hellwig, Darrick J. Wong

On Fri, Jan 27, 2017 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 1:50 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Before calling write f_ops, call file_start_write() instead
>>>> of sb_start_write().
>>>>
>>>> This ensures freeze protection for both overlay and upper fs
>>>> when file is open from an overlayfs mount.
>>>>
>>>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>>>> for fallocate().
>>>>


Oh boy! there is more.

IIUC, file_start_write() is in order were write to regular file should get
freeze protection and write to special file should not.

So after eliminating dedup_range from this patch, from the 3 remaining ops:
fallocate(), clone_file_range() and copy_file_range() all have nice stories.

fallocate() can operate or regular file directory and blockdev.
blockdev does not call for freeze protection, if we use file_start_write()
fs that supports fallocate on directory (anybody?) will not get freeze
protection.

clone_file_range() operates only on regular file, so file_start_write() seems
in order. especially if clone_range() is ever extended to operate on blockdev,
which does not sound such a far fetched idea.

copy_file_range() looks like it was meant to operate only on regular files,
but neither syscall nor vfs helper actually check that.

Jan,
Would it make sense to add directory to file types for which file_start_write()
gets freeze protection to cover the fallocate() case correctly?

Christoph,
To your understanding, is it correct to add the IS_REG check in
vfs_copy_file_range()?

Michael,
man page for copy_file_range(2) does not explicitly mention regular files
but it seems implied and also EINVAL does not mention the case of
fd is not a regular file, which is how xfs (and probably other fs too) respond.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 17:29           ` Amir Goldstein
@ 2017-01-27 17:51             ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2017-01-27 17:51 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Michael Kerrisk (man-pages),
	Jan Kara, Al Viro, linux-unionfs, linux-fsdevel,
	Christoph Hellwig, Darrick J. Wong

On Fri, Jan 27, 2017 at 07:29:00PM +0200, Amir Goldstein wrote:
> Christoph,
> To your understanding, is it correct to add the IS_REG check in
> vfs_copy_file_range()?

Yes.  The only thing that could maybe make sense would be block devices.
But I'd prefer to only add then on an as-needed basis.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 16:20         ` Amir Goldstein
  2017-01-27 16:31           ` Darrick J. Wong
  2017-01-27 17:29           ` Amir Goldstein
@ 2017-01-27 19:30           ` Michael Kerrisk (man-pages)
  2017-01-27 20:09             ` Amir Goldstein
  2 siblings, 1 reply; 18+ messages in thread
From: Michael Kerrisk (man-pages) @ 2017-01-27 19:30 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: mtk.manpages, Jan Kara, Al Viro, linux-unionfs, linux-fsdevel,
	Christoph Hellwig, Darrick J. Wong

On 01/28/2017 05:20 AM, Amir Goldstein wrote:
> On Fri, Jan 27, 2017 at 1:50 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Before calling write f_ops, call file_start_write() instead
>>>> of sb_start_write().
>>>>
>>>> This ensures freeze protection for both overlay and upper fs
>>>> when file is open from an overlayfs mount.
>>>>
>>>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>>>> for fallocate().
>>>>
>>>> For dedup_file_range() there is no need for mnt_want_write_file().
>>>> File is already open for write, so we already have mnt_want_write()
>>>> and we only need file_start_write().
>>>
>>> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
>>> Ugly special case, don't ask me why it's done...
>>>
>>
>> Christoph, Darrick, is that by design?
> 
> Anyway, whether is makes sense or not, that's a legacy from
> BTRFS_IOC_FILE_EXTENT_SAME, we probably have to live with.
> 
> Michael, I recon man page needs updating.

Sorry--can you elaborate on what changes are required?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 19:30           ` Michael Kerrisk (man-pages)
@ 2017-01-27 20:09             ` Amir Goldstein
  0 siblings, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2017-01-27 20:09 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Miklos Szeredi, Jan Kara, Al Viro, linux-unionfs, linux-fsdevel,
	Christoph Hellwig, Darrick J. Wong

On Fri, Jan 27, 2017 at 9:30 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On 01/28/2017 05:20 AM, Amir Goldstein wrote:
>> On Fri, Jan 27, 2017 at 1:50 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> Before calling write f_ops, call file_start_write() instead
>>>>> of sb_start_write().
>>>>>
>>>>> This ensures freeze protection for both overlay and upper fs
>>>>> when file is open from an overlayfs mount.
>>>>>
>>>>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>>>>> for fallocate().
>>>>>
>>>>> For dedup_file_range() there is no need for mnt_want_write_file().
>>>>> File is already open for write, so we already have mnt_want_write()
>>>>> and we only need file_start_write().
>>>>
>>>> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
>>>> Ugly special case, don't ask me why it's done...
>>>>
>>>
>>> Christoph, Darrick, is that by design?
>>
>> Anyway, whether is makes sense or not, that's a legacy from
>> BTRFS_IOC_FILE_EXTENT_SAME, we probably have to live with.
>>
>> Michael, I recon man page needs updating.
>
> Sorry--can you elaborate on what changes are required?
>

1. IOCTL-FIDEDUPERANGE(2)

"During the call, src_fd must be open for reading and dest_fd must be
open for writing"

Apparently, with CAP_SYS_ADMIN dest_fd is allowed to be open for read-only
<eyebrows raised>. It makes some sense, because the data of dest_fd is not being
modified. Its not really clear to me why admin would need this capability in the
first place, but that's the way it is.

2. copy_file_range(2)

It doesn't say anywhere that fd_in and fd_out are supposed to be regular files
and that EINVAL/EISDIR can be returns if they are not, but that is
probably the behavior
of all file systems.
Per Christoph's suggestion, I am going to enforce IS_REG in the vfs helper,
which is consistent with behavior of clone_file_range and dedupe_file_range

3. fallocate(2)

The entire man page says nothing about what to expect from fallocate
of directory,
but the error codes document:
"ENODEV fd does not refer to a regular file or a directory"

In reality, file systems return EBADF for directory fd and the vfs
helper code says:
        /*
         * Let individual file system decide if it supports preallocation
         * for directories or not.
         */
        if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
            !S_ISBLK(inode->i_mode))
                return -ENODEV;

IS_BLK was added quite recently so man page is out of date is that regard.

And I really wonder if there is any file system that "decides to support
preallocation for directories"?
Because it would make life easier for me to fix correctness for freeze
protection of fallocate if vfs helper would deny falloacte a directory.

fallocate() seems to be the only operation, theoretically allowed on
a directory file descriptor, but only if open for write.
Is there any other beast of this sort?
If not, and no fs ever implemented this bizar functionality, then
perhaps it is best to get rid of that corner case.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-27 11:09     ` Miklos Szeredi
  2017-01-27 11:50       ` Amir Goldstein
@ 2017-01-31  7:11       ` Amir Goldstein
  2017-02-06 14:49         ` Amir Goldstein
  1 sibling, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2017-01-31  7:11 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Before calling write f_ops, call file_start_write() instead
>> of sb_start_write().
>>
>> This ensures freeze protection for both overlay and upper fs
>> when file is open from an overlayfs mount.
>>
>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>> for fallocate().
>>
>> For dedup_file_range() there is no need for mnt_want_write_file().
>> File is already open for write, so we already have mnt_want_write()
>> and we only need file_start_write().
>
> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
> Ugly special case, don't ask me why it's done...
>


Miklos,

Your comment was correct, but you applied the patch as is with the
dedup_file_range()
change to overlayfs-next regardless. mistake??

I was preparing to re-send without the dedup_file_range() bits, but then
I realized that it is possible to get to fallocate() and copy_file_range() with
non regular file, so I did not re-send.

I'll re-post with another patch to limit  fallocate() and copy_file_range() to
regular file in vfs helper.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-01-31  7:11       ` Amir Goldstein
@ 2017-02-06 14:49         ` Amir Goldstein
  2017-02-07 15:03           ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2017-02-06 14:49 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Tue, Jan 31, 2017 at 9:11 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Before calling write f_ops, call file_start_write() instead
>>> of sb_start_write().
>>>
>>> This ensures freeze protection for both overlay and upper fs
>>> when file is open from an overlayfs mount.
>>>
>>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and
>>> for fallocate().
>>>
>>> For dedup_file_range() there is no need for mnt_want_write_file().
>>> File is already open for write, so we already have mnt_want_write()
>>> and we only need file_start_write().
>>
>> Being opened for write is not verified if capable(CAP_SYS_ADMIN).
>> Ugly special case, don't ask me why it's done...
>>
>
>
> Miklos,
>
> Your comment was correct, but you applied the patch as is with the
> dedup_file_range()
> change to overlayfs-next regardless. mistake??
>

Ping.
overlayfs-next still hold a wrong patch.
Already posted v3 with some more vfs fixes.


> I was preparing to re-send without the dedup_file_range() bits, but then
> I realized that it is possible to get to fallocate() and copy_file_range() with
> non regular file, so I did not re-send.
>
> I'll re-post with another patch to limit  fallocate() and copy_file_range() to
> regular file in vfs helper.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
  2017-02-06 14:49         ` Amir Goldstein
@ 2017-02-07 15:03           ` Miklos Szeredi
  0 siblings, 0 replies; 18+ messages in thread
From: Miklos Szeredi @ 2017-02-07 15:03 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, linux-unionfs, linux-fsdevel

On Mon, Feb 6, 2017 at 3:49 PM, Amir Goldstein <amir73il@gmail.com> wrote:

> Ping.
> overlayfs-next still hold a wrong patch.
> Already posted v3 with some more vfs fixes.

Pushed.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-02-07 15:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-23 12:32 [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein
2017-01-23 12:32 ` [RFC][PATCH v2 1/3] vfs: freeze protect overlay inode on file_start_write() Amir Goldstein
2017-01-23 19:43   ` [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write() Amir Goldstein
2017-01-24 10:09     ` Jan Kara
2017-01-27 11:09     ` Miklos Szeredi
2017-01-27 11:50       ` Amir Goldstein
2017-01-27 16:20         ` Amir Goldstein
2017-01-27 16:31           ` Darrick J. Wong
2017-01-27 17:29           ` Amir Goldstein
2017-01-27 17:51             ` Christoph Hellwig
2017-01-27 19:30           ` Michael Kerrisk (man-pages)
2017-01-27 20:09             ` Amir Goldstein
2017-01-31  7:11       ` Amir Goldstein
2017-02-06 14:49         ` Amir Goldstein
2017-02-07 15:03           ` Miklos Szeredi
2017-01-23 12:32 ` [RFC][PATCH v2 2/3] ovl: properly implement sync_filesystem() Amir Goldstein
2017-01-23 12:32 ` [RFC][PATCH v2 3/3] ovl: support freeze/thaw super Amir Goldstein
2017-01-23 19:52 ` [RFC][PATCH v2 0/3] overlayfs: support freeze/thaw/syncfs Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.