linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/39] overlayfs: stack file operations
@ 2018-05-29 14:43 Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 01/39] vfs: dedpue: return loff_t Miklos Szeredi
                   ` (38 more replies)
  0 siblings, 39 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Up till now overlayfs didn't stack regular file operations.  Instead, when
a file was opened on an overlay, the file from one of the underlying layers
would be opened and any file operations performed would directly go to the
underlying file on a real filesystem.

This works well mostly, but various hacks were added to the VFS to work
around issues with this:

 - d_path() and friends
 - relatime handling
 - file locking
 - fsnotify
 - writecount handling

There are also issues that are unresolved before this patchset:

 - ioctl's that need write access but can be performed on a O_RDONLY fd
 - ro/rw inconsistency: file on lower layer opened for read-only will
   return stale data on read after copy-up and modification
 - ro/rw inconsistency for mmap: file on lower layer mapped shared will
   contain stale data after copy-up and modification

This patch series reverts the VFS hacks (with the exception of d_path) and
fixes the unresoved issues.  We need to keep d_path related hacks, because
memory maps are still not stacked, yet d_path() should keep working on
vma->vm_file->f_path.

No regressions were observed after running various test suites (xfstests,
ltp, unionmount-testsuite, pjd-fstest).

Performance impact of stacking was found to be minimal.  Memory use for
open overlay files increases by about 256bytes or 12%.

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-rorw

---
Miklos Szeredi (39):
  vfs: dedpue: return loff_t
  vfs: dedupe: rationalize args
  vfs: dedupe: extract helper for a single dedup
  vfs: add path_open()
  vfs: optionally don't account file in nr_files
  vfs: add f_op->pre_mmap()
  vfs: export vfs_ioctl() to modules
  vfs: export vfs_dedupe_file_range_one() to modules
  ovl: copy up times
  ovl: copy up inode flags
  Revert "Revert "ovl: get_write_access() in truncate""
  ovl: copy up file size as well
  ovl: deal with overlay files in ovl_d_real()
  ovl: stack file ops
  ovl: add helper to return real file
  ovl: add ovl_read_iter()
  ovl: add ovl_write_iter()
  ovl: add ovl_fsync()
  ovl: add ovl_mmap()
  ovl: add ovl_fallocate()
  ovl: add lsattr/chattr support
  ovl: add ovl_fiemap()
  ovl: add O_DIRECT support
  ovl: add reflink/copyfile/dedup support
  vfs: don't open real
  ovl: copy-up on MAP_SHARED
  ovl: obsolete "check_copy_up" module option
  ovl: fix documentation of non-standard behavior
  vfs: simplify dentry_open()
  Revert "ovl: fix may_write_real() for overlayfs directories"
  Revert "ovl: don't allow writing ioctl on lower layer"
  vfs: fix freeze protection in mnt_want_write_file() for overlayfs
  Revert "ovl: fix relatime for directories"
  Revert "vfs: update ovl inode before relatime check"
  Revert "vfs: add flags to d_real()"
  Revert "vfs: do get_write_access() on upper layer of overlayfs"
  Partially revert "locks: fix file locking on overlayfs"
  Revert "fsnotify: support overlayfs"
  vfs: remove open_flags from d_real()

 Documentation/filesystems/Locking       |   4 +-
 Documentation/filesystems/overlayfs.txt |  60 ++--
 Documentation/filesystems/vfs.txt       |  19 +-
 fs/btrfs/ctree.h                        |   5 +-
 fs/btrfs/ioctl.c                        |   7 +-
 fs/file_table.c                         |  13 +-
 fs/inode.c                              |  46 +--
 fs/internal.h                           |  17 +-
 fs/ioctl.c                              |   1 +
 fs/locks.c                              |  20 +-
 fs/namei.c                              |   2 +-
 fs/namespace.c                          |  69 +----
 fs/ocfs2/file.c                         |  10 +-
 fs/open.c                               |  74 ++---
 fs/overlayfs/Kconfig                    |  21 ++
 fs/overlayfs/Makefile                   |   4 +-
 fs/overlayfs/copy_up.c                  |  30 +-
 fs/overlayfs/dir.c                      |  31 +-
 fs/overlayfs/file.c                     | 509 ++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c                    |  63 +++-
 fs/overlayfs/overlayfs.h                |  21 +-
 fs/overlayfs/ovl_entry.h                |   1 +
 fs/overlayfs/super.c                    |  65 ++--
 fs/overlayfs/util.c                     |  11 +-
 fs/read_write.c                         |  91 +++---
 fs/xattr.c                              |   9 +-
 fs/xfs/xfs_file.c                       |   8 +-
 include/linux/dcache.h                  |  15 +-
 include/linux/fs.h                      |  32 +-
 include/linux/fsnotify.h                |  14 +-
 include/uapi/linux/fs.h                 |   1 -
 mm/util.c                               |   5 +
 32 files changed, 898 insertions(+), 380 deletions(-)
 create mode 100644 fs/overlayfs/file.c

-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 01/39] vfs: dedpue: return loff_t
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-04  8:43   ` Christoph Hellwig
  2018-05-29 14:43 ` [PATCH 02/39] vfs: dedupe: rationalize args Miklos Szeredi
                   ` (37 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

f_op->dedupe_file_range() gets a u64 length to dedup and returns an ssize_t
actual length deduped.  This breaks badly on 32bit archs since the returned
length will be truncated and possibly overflow into the sign bit (xfs and
ocfs2 are affected, btrfs limits actual length to 16MiB).

Returning loff_t should be good, since clone_verify_area() makes sure that
the supplied length doesn't overflow.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/ioctl.c   |  6 +++---
 fs/ocfs2/file.c    | 10 +++++-----
 fs/read_write.c    |  2 +-
 fs/xfs/xfs_file.c  |  2 +-
 include/linux/fs.h |  2 +-
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d422c9908b8..990e011c9f0c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3271,8 +3271,8 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
 				struct btrfs_ioctl_space_info *space);
 void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 			       struct btrfs_ioctl_balance_args *bargs);
-ssize_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
-			   struct file *dst_file, u64 dst_loff);
+loff_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
+			    struct file *dst_file, u64 dst_loff);
 
 /* file.c */
 int __init btrfs_auto_defrag_init(void);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 632e26d6f7ce..1b5cc5fd4868 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3194,13 +3194,13 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 
 #define BTRFS_MAX_DEDUPE_LEN	SZ_16M
 
-ssize_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
-				struct file *dst_file, u64 dst_loff)
+loff_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
+			       struct file *dst_file, u64 dst_loff)
 {
 	struct inode *src = file_inode(src_file);
 	struct inode *dst = file_inode(dst_file);
 	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
-	ssize_t res;
+	int res;
 
 	if (olen > BTRFS_MAX_DEDUPE_LEN)
 		olen = BTRFS_MAX_DEDUPE_LEN;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 6ee94bc23f5b..4a81d82ab7f6 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2537,11 +2537,11 @@ static int ocfs2_file_clone_range(struct file *file_in,
 					 len, false);
 }
 
-static ssize_t ocfs2_file_dedupe_range(struct file *src_file,
-				       u64 loff,
-				       u64 len,
-				       struct file *dst_file,
-				       u64 dst_loff)
+static loff_t ocfs2_file_dedupe_range(struct file *src_file,
+				      u64 loff,
+				      u64 len,
+				      struct file *dst_file,
+				      u64 dst_loff)
 {
 	int error;
 
diff --git a/fs/read_write.c b/fs/read_write.c
index c4eabbfc90df..c41e2a1eb7c7 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1976,7 +1976,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 	u16 count = same->dest_count;
 	struct file *dst_file;
 	loff_t dst_off;
-	ssize_t deduped;
+	loff_t deduped;
 
 	if (!(file->f_mode & FMODE_READ))
 		return -EINVAL;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e70fb8ccecea..cf51d47efdb6 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -872,7 +872,7 @@ xfs_file_clone_range(
 				     len, false);
 }
 
-STATIC ssize_t
+STATIC loff_t
 xfs_file_dedupe_range(
 	struct file	*src_file,
 	u64		loff,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f637a9b213d..8e49defc7aab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1738,7 +1738,7 @@ struct file_operations {
 			loff_t, size_t, unsigned int);
 	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
 			u64);
-	ssize_t (*dedupe_file_range)(struct file *, u64, u64, struct file *,
+	loff_t (*dedupe_file_range)(struct file *, u64, u64, struct file *,
 			u64);
 } __randomize_layout;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 02/39] vfs: dedupe: rationalize args
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 01/39] vfs: dedpue: return loff_t Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-06 15:02   ` Darrick J. Wong
  2018-05-29 14:43 ` [PATCH 03/39] vfs: dedupe: extract helper for a single dedup Miklos Szeredi
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Clean up f_op->dedupe_file_range() interface.

1) Use loff_t for offsets and length instead of u64
2) Order the arguments the same way as {copy|clone}_file_range().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/btrfs/ctree.h   | 5 +++--
 fs/btrfs/ioctl.c   | 5 +++--
 fs/ocfs2/file.c    | 6 +++---
 fs/read_write.c    | 4 ++--
 fs/xfs/xfs_file.c  | 6 +++---
 include/linux/fs.h | 4 ++--
 6 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 990e011c9f0c..5968ba5aa0d1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3271,8 +3271,9 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
 				struct btrfs_ioctl_space_info *space);
 void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 			       struct btrfs_ioctl_balance_args *bargs);
-loff_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
-			    struct file *dst_file, u64 dst_loff);
+loff_t btrfs_dedupe_file_range(struct file *src_file, loff_t loff,
+			       struct file *dst_file, loff_t dst_loff,
+			       loff_t olen);
 
 /* file.c */
 int __init btrfs_auto_defrag_init(void);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 1b5cc5fd4868..70eac76804df 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3194,8 +3194,9 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 
 #define BTRFS_MAX_DEDUPE_LEN	SZ_16M
 
-loff_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
-			       struct file *dst_file, u64 dst_loff)
+loff_t btrfs_dedupe_file_range(struct file *src_file, loff_t loff,
+			       struct file *dst_file, loff_t dst_loff,
+			       loff_t olen)
 {
 	struct inode *src = file_inode(src_file);
 	struct inode *dst = file_inode(dst_file);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 4a81d82ab7f6..a024715cd227 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2538,10 +2538,10 @@ static int ocfs2_file_clone_range(struct file *file_in,
 }
 
 static loff_t ocfs2_file_dedupe_range(struct file *src_file,
-				      u64 loff,
-				      u64 len,
+				      loff_t loff,
 				      struct file *dst_file,
-				      u64 dst_loff)
+				      loff_t dst_loff,
+				      loff_t len)
 {
 	int error;
 
diff --git a/fs/read_write.c b/fs/read_write.c
index c41e2a1eb7c7..1818581cadf6 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -2046,8 +2046,8 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 			info->status = -EINVAL;
 		} else {
 			deduped = dst_file->f_op->dedupe_file_range(file, off,
-							len, dst_file,
-							info->dest_offset);
+							dst_file,
+							info->dest_offset, len);
 			if (deduped == -EBADE)
 				info->status = FILE_DEDUPE_RANGE_DIFFERS;
 			else if (deduped < 0)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index cf51d47efdb6..75704edfba82 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -875,10 +875,10 @@ xfs_file_clone_range(
 STATIC loff_t
 xfs_file_dedupe_range(
 	struct file	*src_file,
-	u64		loff,
-	u64		len,
+	loff_t		loff,
 	struct file	*dst_file,
-	u64		dst_loff)
+	loff_t		dst_loff,
+	loff_t		len)
 {
 	struct inode	*srci = file_inode(src_file);
 	u64		max_dedupe;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8e49defc7aab..b0f290944220 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1738,8 +1738,8 @@ struct file_operations {
 			loff_t, size_t, unsigned int);
 	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
 			u64);
-	loff_t (*dedupe_file_range)(struct file *, u64, u64, struct file *,
-			u64);
+	loff_t (*dedupe_file_range)(struct file *, loff_t,
+				    struct file *, loff_t, loff_t);
 } __randomize_layout;
 
 struct inode_operations {
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 03/39] vfs: dedupe: extract helper for a single dedup
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 01/39] vfs: dedpue: return loff_t Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 02/39] vfs: dedupe: rationalize args Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 15:41   ` Amir Goldstein
  2018-06-04  8:44   ` Christoph Hellwig
  2018-05-29 14:43 ` [PATCH 04/39] vfs: add path_open() Miklos Szeredi
                   ` (35 subsequent siblings)
  38 siblings, 2 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Extract vfs_dedupe_file_range_one() helper to deal with a single dedup
request.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/read_write.c | 89 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 49 insertions(+), 40 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1818581cadf6..82a53c44c0aa 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1964,6 +1964,44 @@ int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 }
 EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
 
+static s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
+				     struct file *dst_file, loff_t dst_pos,
+				     u64 len)
+{
+	s64 ret;
+
+	ret = mnt_want_write_file(dst_file);
+	if (ret)
+		return ret;
+
+	ret = clone_verify_area(dst_file, dst_pos, len, true);
+	if (ret < 0)
+		goto out_drop_write;
+
+	ret = -EINVAL;
+	if (!(capable(CAP_SYS_ADMIN) || (dst_file->f_mode & FMODE_WRITE)))
+		goto out_drop_write;
+
+	ret = -EXDEV;
+	if (src_file->f_path.mnt != dst_file->f_path.mnt)
+		goto out_drop_write;
+
+	ret = -EISDIR;
+	if (S_ISDIR(file_inode(dst_file)->i_mode))
+		goto out_drop_write;
+
+	ret = -EINVAL;
+	if (!dst_file->f_op->dedupe_file_range)
+		goto out_drop_write;
+
+	ret = dst_file->f_op->dedupe_file_range(src_file, src_pos,
+						dst_file, dst_pos, len);
+out_drop_write:
+	mnt_drop_write_file(dst_file);
+
+	return ret;
+}
+
 int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 {
 	struct file_dedupe_range_info *info;
@@ -1972,10 +2010,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 	u64 len;
 	int i;
 	int ret;
-	bool is_admin = capable(CAP_SYS_ADMIN);
 	u16 count = same->dest_count;
-	struct file *dst_file;
-	loff_t dst_off;
 	loff_t deduped;
 
 	if (!(file->f_mode & FMODE_READ))
@@ -2010,54 +2045,28 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 	}
 
 	for (i = 0, info = same->info; i < count; i++, info++) {
-		struct inode *dst;
 		struct fd dst_fd = fdget(info->dest_fd);
+		struct file *dst_file = dst_fd.file;
 
-		dst_file = dst_fd.file;
 		if (!dst_file) {
 			info->status = -EBADF;
 			goto next_loop;
 		}
-		dst = file_inode(dst_file);
-
-		ret = mnt_want_write_file(dst_file);
-		if (ret) {
-			info->status = ret;
-			goto next_loop;
-		}
-
-		dst_off = info->dest_offset;
-		ret = clone_verify_area(dst_file, dst_off, len, true);
-		if (ret < 0) {
-			info->status = ret;
-			goto next_file;
-		}
-		ret = 0;
 
 		if (info->reserved) {
 			info->status = -EINVAL;
-		} else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) {
-			info->status = -EINVAL;
-		} else if (file->f_path.mnt != dst_file->f_path.mnt) {
-			info->status = -EXDEV;
-		} else if (S_ISDIR(dst->i_mode)) {
-			info->status = -EISDIR;
-		} else if (dst_file->f_op->dedupe_file_range == NULL) {
-			info->status = -EINVAL;
-		} else {
-			deduped = dst_file->f_op->dedupe_file_range(file, off,
-							dst_file,
-							info->dest_offset, len);
-			if (deduped == -EBADE)
-				info->status = FILE_DEDUPE_RANGE_DIFFERS;
-			else if (deduped < 0)
-				info->status = deduped;
-			else
-				info->bytes_deduped += deduped;
+			goto next_loop;
 		}
 
-next_file:
-		mnt_drop_write_file(dst_file);
+		deduped = vfs_dedupe_file_range_one(file, off, dst_file,
+						    info->dest_offset, len);
+		if (deduped == -EBADE)
+			info->status = FILE_DEDUPE_RANGE_DIFFERS;
+		else if (deduped < 0)
+			info->status = deduped;
+		else
+			info->bytes_deduped += deduped;
+
 next_loop:
 		fdput(dst_fd);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 04/39] vfs: add path_open()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (2 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 03/39] vfs: dedupe: extract helper for a single dedup Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-04  8:46   ` Christoph Hellwig
  2018-05-29 14:43 ` [PATCH 05/39] vfs: optionally don't account file in nr_files Miklos Szeredi
                   ` (34 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Currently opening an overlay file results in:

 - the real file on the underlying layer being opened
 - f_path being set to the overlay {mount, dentry} pair

This patch adds a new helper that allows the above to be explicitly
performed.  I.e. it's the same as dentry_open(), except the underlying
inode to open is given as a separate argument.

This is in preparation for stacking I/O operations on overlay files.

Later, when implicit opening is removed, dentry_open() can be implemented
by just calling path_open().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/open.c          | 31 +++++++++++++++++++++++++++++++
 include/linux/fs.h |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/fs/open.c b/fs/open.c
index c5ee7cd60424..d0bf7f061a1a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -906,6 +906,37 @@ int vfs_open(const struct path *path, struct file *file,
 	return do_dentry_open(file, d_backing_inode(dentry), NULL, cred);
 }
 
+/**
+ * path_open() - Open an inode by a particular name.
+ * @path: The name of the file.
+ * @flags: The O_ flags used to open this file.
+ * @inode: The inode to open.
+ * @cred: The task's credentials used when opening this file.
+ *
+ * Context: Process context.
+ * Return: A pointer to a struct file or an IS_ERR pointer.  Cannot return NULL.
+ */
+struct file *path_open(const struct path *path, int flags, struct inode *inode,
+		       const struct cred *cred)
+{
+	struct file *file;
+	int retval;
+
+	file = get_empty_filp();
+	if (IS_ERR(file))
+		return file;
+
+	file->f_flags = flags;
+	file->f_path = *path;
+	retval = do_dentry_open(file, inode, NULL, cred);
+	if (retval) {
+		put_filp(file);
+		return ERR_PTR(retval);
+	}
+	return file;
+}
+EXPORT_SYMBOL(path_open);
+
 struct file *dentry_open(const struct path *path, int flags,
 			 const struct cred *cred)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b0f290944220..9473e68280d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2401,6 +2401,8 @@ extern struct file *filp_open(const char *, int, umode_t);
 extern struct file *file_open_root(struct dentry *, struct vfsmount *,
 				   const char *, int, umode_t);
 extern struct file * dentry_open(const struct path *, int, const struct cred *);
+extern struct file *path_open(const struct path *, int, struct inode *,
+			      const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
 
 extern struct filename *getname_flags(const char __user *, int, int *);
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 05/39] vfs: optionally don't account file in nr_files
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (3 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 04/39] vfs: add path_open() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-04  8:47   ` Christoph Hellwig
  2018-06-10  4:41   ` Al Viro
  2018-05-29 14:43 ` [PATCH 06/39] vfs: add f_op->pre_mmap() Miklos Szeredi
                   ` (33 subsequent siblings)
  38 siblings, 2 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Stacking file operations in overlay will store an extra open file for each
overlay file opened.

The overhead is just that of "struct file" which is about 256bytes, because
overlay already pins an extra dentry and inode when the file is open, which
add up to a much larger overhead.

For fear of breaking working setups, don't start accounting the extra file.

The implementation adds a bool argument to path_open() to control whether
the returned file is to be accounted or not.  If the file is not accounted,
f_mode will contain FMODE_NOACCOUNT, so that when freeing the file the
count is not decremented.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/file_table.c    | 13 +++++++++----
 fs/internal.h      |  7 ++++++-
 fs/open.c          | 10 +++++-----
 include/linux/fs.h |  5 ++++-
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 7ec0b3e5f05d..60376bfa04cf 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -51,7 +51,8 @@ static void file_free_rcu(struct rcu_head *head)
 
 static inline void file_free(struct file *f)
 {
-	percpu_counter_dec(&nr_files);
+	if (!(f->f_mode & FMODE_NOACCOUNT))
+		percpu_counter_dec(&nr_files);
 	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
 }
 
@@ -100,7 +101,7 @@ int proc_nr_files(struct ctl_table *table, int write,
  * done, you will imbalance int the mount's writer count
  * and a warning at __fput() time.
  */
-struct file *get_empty_filp(void)
+struct file *__get_empty_filp(bool account)
 {
 	const struct cred *cred = current_cred();
 	static long old_max;
@@ -110,7 +111,8 @@ struct file *get_empty_filp(void)
 	/*
 	 * Privileged users can go above max_files
 	 */
-	if (get_nr_files() >= files_stat.max_files && !capable(CAP_SYS_ADMIN)) {
+	if (account &&
+	    get_nr_files() >= files_stat.max_files && !capable(CAP_SYS_ADMIN)) {
 		/*
 		 * percpu_counters are inaccurate.  Do an expensive check before
 		 * we go and fail.
@@ -123,7 +125,10 @@ struct file *get_empty_filp(void)
 	if (unlikely(!f))
 		return ERR_PTR(-ENOMEM);
 
-	percpu_counter_inc(&nr_files);
+	if (account)
+		percpu_counter_inc(&nr_files);
+	else
+		f->f_mode = FMODE_NOACCOUNT;
 	f->f_cred = get_cred(cred);
 	error = security_file_alloc(f);
 	if (unlikely(error)) {
diff --git a/fs/internal.h b/fs/internal.h
index e08972db0303..b82725ba3054 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -93,7 +93,12 @@ extern void chroot_fs_refs(const struct path *, const struct path *);
 /*
  * file_table.c
  */
-extern struct file *get_empty_filp(void);
+extern struct file *__get_empty_filp(bool account);
+
+static inline struct file *get_empty_filp(void)
+{
+	return __get_empty_filp(true);
+}
 
 /*
  * super.c
diff --git a/fs/open.c b/fs/open.c
index d0bf7f061a1a..6e52fd6fea7c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -732,8 +732,8 @@ static int do_dentry_open(struct file *f,
 	static const struct file_operations empty_fops = {};
 	int error;
 
-	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
-				FMODE_PREAD | FMODE_PWRITE;
+	f->f_mode = (f->f_mode & FMODE_NOACCOUNT) | OPEN_FMODE(f->f_flags) |
+		FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE;
 
 	path_get(&f->f_path);
 	f->f_inode = inode;
@@ -743,7 +743,7 @@ static int do_dentry_open(struct file *f,
 	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
 
 	if (unlikely(f->f_flags & O_PATH)) {
-		f->f_mode = FMODE_PATH;
+		f->f_mode = (f->f_mode & FMODE_NOACCOUNT) | FMODE_PATH;
 		f->f_op = &empty_fops;
 		goto done;
 	}
@@ -917,12 +917,12 @@ int vfs_open(const struct path *path, struct file *file,
  * Return: A pointer to a struct file or an IS_ERR pointer.  Cannot return NULL.
  */
 struct file *path_open(const struct path *path, int flags, struct inode *inode,
-		       const struct cred *cred)
+		       const struct cred *cred, bool account)
 {
 	struct file *file;
 	int retval;
 
-	file = get_empty_filp();
+	file = __get_empty_filp(account);
 	if (IS_ERR(file))
 		return file;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9473e68280d0..ecc854c75611 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -153,6 +153,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File is capable of returning -EAGAIN if I/O will block */
 #define FMODE_NOWAIT	((__force fmode_t)0x8000000)
 
+/* File does not contribute to nr_files count */
+#define FMODE_NOACCOUNT	((__force fmode_t)0x10000000)
+
 /*
  * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
  * that indicates that they should check the contents of the iovec are
@@ -2402,7 +2405,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
 				   const char *, int, umode_t);
 extern struct file * dentry_open(const struct path *, int, const struct cred *);
 extern struct file *path_open(const struct path *, int, struct inode *,
-			      const struct cred *);
+			      const struct cred *, bool);
 extern int filp_close(struct file *, fl_owner_t id);
 
 extern struct filename *getname_flags(const char __user *, int, int *);
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 06/39] vfs: add f_op->pre_mmap()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (4 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 05/39] vfs: optionally don't account file in nr_files Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-04  8:48   ` Christoph Hellwig
  2018-05-29 14:43 ` [PATCH 07/39] vfs: export vfs_ioctl() to modules Miklos Szeredi
                   ` (32 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This is needed by overlayfs to be able to copy up a file from a read-only
lower layer to a writable layer when being mapped shared.  When copying up,
overlayfs takes VFS locks that would violate locking order when nested
inside mmap_sem.

Add a new f_op->pre_mmap method, which is called before taking mmap_sem.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 Documentation/filesystems/Locking | 1 +
 Documentation/filesystems/vfs.txt | 3 +++
 include/linux/fs.h                | 1 +
 mm/util.c                         | 5 +++++
 4 files changed, 10 insertions(+)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75d2d57e2c44..60e76060baff 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -442,6 +442,7 @@ prototypes:
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+	int (*pre_mmap) (struct file *, unsigned long, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5fd325df59e2..2bc77ea8aef4 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -859,6 +859,7 @@ struct file_operations {
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+	int (*pre_mmap) (struct file *, unsigned long, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*mremap)(struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
@@ -906,6 +907,8 @@ otherwise noted.
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
  	 are used on 64 bit kernels.
 
+  pre_mmap: called before mmap, without mmap_sem being held yet.
+
   mmap: called by the mmap(2) system call
 
   open: called by the VFS when an inode should be opened. When the VFS
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ecc854c75611..1ea3f153b7f8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1716,6 +1716,7 @@ struct file_operations {
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+	int (*pre_mmap) (struct file *, unsigned long, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	unsigned long mmap_supported_flags;
 	int (*open) (struct inode *, struct file *);
diff --git a/mm/util.c b/mm/util.c
index 45fc3169e7b0..11cd375e1a19 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -352,6 +352,11 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
+		if (file && file->f_op->pre_mmap) {
+			ret = file->f_op->pre_mmap(file, prot, flag);
+			if (ret)
+				return ret;
+		}
 		if (down_write_killable(&mm->mmap_sem))
 			return -EINTR;
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (5 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 06/39] vfs: add f_op->pre_mmap() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-04  8:49   ` Christoph Hellwig
  2018-05-29 14:43 ` [PATCH 08/39] vfs: export vfs_dedupe_file_range_one() " Miklos Szeredi
                   ` (31 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This is needed by the stacked ioctl implementation in overlayfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/internal.h      | 1 -
 fs/ioctl.c         | 1 +
 include/linux/fs.h | 2 ++
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/internal.h b/fs/internal.h
index b82725ba3054..6821cf475fc6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -189,7 +189,6 @@ extern const struct dentry_operations ns_dentry_operations;
  */
 extern int do_vfs_ioctl(struct file *file, unsigned int fd, unsigned int cmd,
 		    unsigned long arg);
-extern long vfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 
 /*
  * iomap support:
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 4823431d1c9d..41071915f411 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -49,6 +49,7 @@ long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
  out:
 	return error;
 }
+EXPORT_SYMBOL(vfs_ioctl);
 
 static int ioctl_fibmap(struct file *filp, int __user *p)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1ea3f153b7f8..598c60092c11 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1623,6 +1623,8 @@ int vfs_mkobj(struct dentry *, umode_t,
 		int (*f)(struct dentry *, umode_t, void *),
 		void *);
 
+extern long vfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+
 /*
  * VFS file helper functions.
  */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 08/39] vfs: export vfs_dedupe_file_range_one() to modules
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (6 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 07/39] vfs: export vfs_ioctl() to modules Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 09/39] ovl: copy up times Miklos Szeredi
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This is needed by the stacked dedupe implementation in overlayfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/read_write.c    | 6 +++---
 include/linux/fs.h | 4 ++++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 82a53c44c0aa..4d61375a0de4 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1964,9 +1964,8 @@ int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 }
 EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
 
-static s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
-				     struct file *dst_file, loff_t dst_pos,
-				     u64 len)
+s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
+			      struct file *dst_file, loff_t dst_pos, u64 len)
 {
 	s64 ret;
 
@@ -2001,6 +2000,7 @@ static s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
 
 	return ret;
 }
+EXPORT_SYMBOL(vfs_dedupe_file_range_one);
 
 int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 598c60092c11..6961feda6915 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1817,6 +1817,10 @@ extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 					 loff_t len, bool *is_same);
 extern int vfs_dedupe_file_range(struct file *file,
 				 struct file_dedupe_range *same);
+extern s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
+				     struct file *dst_file, loff_t dst_pos,
+				     u64 len);
+
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 09/39] ovl: copy up times
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (7 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 08/39] vfs: export vfs_dedupe_file_range_one() " Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 10/39] ovl: copy up inode flags Miklos Szeredi
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Copy up mtime and ctime to overlay inode after times in real object are
modified.  Be careful not to dirty cachelines when not necessary.

This is in preparation for moving overlay functionality out of the VFS.

This patch shouldn't have any observable effect.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/dir.c       | 31 ++++++++++++++++++++++++-------
 fs/overlayfs/inode.c     |  3 +++
 fs/overlayfs/overlayfs.h |  2 +-
 fs/overlayfs/util.c      | 10 +++++++++-
 4 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index b2bc313241a6..8d8e063e4706 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -241,7 +241,7 @@ static int ovl_instantiate(struct dentry *dentry, struct inode *inode,
 		.newinode = inode,
 	};
 
-	ovl_dentry_version_inc(dentry->d_parent, false);
+	ovl_dir_modified(dentry->d_parent, false);
 	ovl_dentry_set_upper_alias(dentry);
 	if (!hardlink) {
 		/*
@@ -721,7 +721,7 @@ static int ovl_remove_and_whiteout(struct dentry *dentry,
 	if (err)
 		goto out_d_drop;
 
-	ovl_dentry_version_inc(dentry->d_parent, true);
+	ovl_dir_modified(dentry->d_parent, true);
 out_d_drop:
 	d_drop(dentry);
 out_dput_upper:
@@ -766,7 +766,7 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
 		err = vfs_rmdir(dir, upper);
 	else
 		err = vfs_unlink(dir, upper, NULL);
-	ovl_dentry_version_inc(dentry->d_parent, ovl_type_origin(dentry));
+	ovl_dir_modified(dentry->d_parent, ovl_type_origin(dentry));
 
 	/*
 	 * Keeping this dentry hashed would mean having to release
@@ -796,6 +796,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	int err;
 	bool locked = false;
 	const struct cred *old_cred;
+	struct dentry *upperdentry;
 	bool lower_positive = ovl_lower_positive(dentry);
 	LIST_HEAD(list);
 
@@ -831,6 +832,17 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 			drop_nlink(dentry->d_inode);
 	}
 	ovl_nlink_end(dentry, locked);
+
+	/*
+	 * Copy ctime
+	 *
+	 * Note: we fail to update ctime if there was no copy-up, only a
+	 * whiteout
+	 */
+	upperdentry = ovl_dentry_upper(dentry);
+	if (upperdentry)
+		ovl_copyattr(d_inode(upperdentry), d_inode(dentry));
+
 out_drop_write:
 	ovl_drop_write(dentry);
 out:
@@ -1137,10 +1149,15 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 			drop_nlink(d_inode(new));
 	}
 
-	ovl_dentry_version_inc(old->d_parent, ovl_type_origin(old) ||
-			       (!overwrite && ovl_type_origin(new)));
-	ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old) ||
-			       (d_inode(new) && ovl_type_origin(new)));
+	ovl_dir_modified(old->d_parent, ovl_type_origin(old) ||
+			 (!overwrite && ovl_type_origin(new)));
+	ovl_dir_modified(new->d_parent, ovl_type_origin(old) ||
+			 (d_inode(new) && ovl_type_origin(new)));
+
+	/* copy ctime: */
+	ovl_copyattr(d_inode(olddentry), d_inode(old));
+	if (d_inode(new) && ovl_dentry_upper(new))
+		ovl_copyattr(d_inode(newdentry), d_inode(new));
 
 out_dput:
 	dput(newdentry);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1db5b3b458a1..24fc27683a57 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -304,6 +304,9 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 	}
 	revert_creds(old_cred);
 
+	/* copy c/mtime */
+	ovl_copyattr(d_inode(realdentry), inode);
+
 out_drop_write:
 	ovl_drop_write(dentry);
 out:
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 3c5e9f18b0d9..eeaad0710704 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -231,7 +231,7 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
 void ovl_inode_init(struct inode *inode, struct dentry *upperdentry,
 		    struct dentry *lowerdentry);
 void ovl_inode_update(struct inode *inode, struct dentry *upperdentry);
-void ovl_dentry_version_inc(struct dentry *dentry, bool impurity);
+void ovl_dir_modified(struct dentry *dentry, bool impurity);
 u64 ovl_dentry_version_get(struct dentry *dentry);
 bool ovl_is_whiteout(struct dentry *dentry);
 struct file *ovl_path_open(struct path *path, int flags);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 6f1078028c66..30a05d1d679d 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -333,7 +333,7 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
 	}
 }
 
-void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
+static void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
 {
 	struct inode *inode = d_inode(dentry);
 
@@ -348,6 +348,14 @@ void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
 		OVL_I(inode)->version++;
 }
 
+void ovl_dir_modified(struct dentry *dentry, bool impurity)
+{
+	/* Copy mtime/ctime */
+	ovl_copyattr(d_inode(ovl_dentry_upper(dentry)), d_inode(dentry));
+
+	ovl_dentry_version_inc(dentry, impurity);
+}
+
 u64 ovl_dentry_version_get(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 10/39] ovl: copy up inode flags
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (8 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 09/39] ovl: copy up times Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 11/39] Revert "Revert "ovl: get_write_access() in truncate"" Miklos Szeredi
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

On inode creation copy certain inode flags from the underlying real inode
to the overlay inode.

This is in preparation for moving overlay functionality out of the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/overlayfs.h | 7 +++++++
 fs/overlayfs/util.c      | 1 +
 2 files changed, 8 insertions(+)

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index eeaad0710704..e9dab319c8b2 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -350,6 +350,13 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
 	to->i_ctime = from->i_ctime;
 }
 
+static inline void ovl_copyflags(struct inode *from, struct inode *to)
+{
+	unsigned int mask = S_SYNC | S_IMMUTABLE | S_APPEND | S_NOATIME;
+
+	inode_set_flags(to, from->i_flags & mask, mask);
+}
+
 /* dir.c */
 extern const struct inode_operations ovl_dir_inode_operations;
 int ovl_cleanup_and_whiteout(struct dentry *workdir, struct inode *dir,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 30a05d1d679d..25d202b47326 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -310,6 +310,7 @@ void ovl_inode_init(struct inode *inode, struct dentry *upperdentry,
 		OVL_I(inode)->lower = igrab(d_inode(lowerdentry));
 
 	ovl_copyattr(realinode, inode);
+	ovl_copyflags(realinode, inode);
 	if (!inode->i_ino)
 		inode->i_ino = realinode->i_ino;
 }
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 11/39] Revert "Revert "ovl: get_write_access() in truncate""
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (9 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 10/39] ovl: copy up inode flags Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 12/39] ovl: copy up file size as well Miklos Szeredi
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit 31c3a7069593b072bd57192b63b62f9a7e994e9a.

Re-add functionality dealing with i_writecount on truncate to overlayfs.
This patch shouldn't have any observable effects, since we just re-assert
the writecout that vfs_truncate() already got for us.

This is in preparation for moving overlay functionality out of the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/inode.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 24fc27683a57..0116ec12451d 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -39,10 +39,27 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 	if (err)
 		goto out;
 
+	if (attr->ia_valid & ATTR_SIZE) {
+		struct inode *realinode = d_inode(ovl_dentry_real(dentry));
+
+		err = -ETXTBSY;
+		if (atomic_read(&realinode->i_writecount) < 0)
+			goto out_drop_write;
+	}
+
 	err = ovl_copy_up(dentry);
 	if (!err) {
+		struct inode *winode = NULL;
+
 		upperdentry = ovl_dentry_upper(dentry);
 
+		if (attr->ia_valid & ATTR_SIZE) {
+			winode = d_inode(upperdentry);
+			err = get_write_access(winode);
+			if (err)
+				goto out_drop_write;
+		}
+
 		if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
 			attr->ia_valid &= ~ATTR_MODE;
 
@@ -53,7 +70,11 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 		if (!err)
 			ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
 		inode_unlock(upperdentry->d_inode);
+
+		if (winode)
+			put_write_access(winode);
 	}
+out_drop_write:
 	ovl_drop_write(dentry);
 out:
 	return err;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 12/39] ovl: copy up file size as well
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (10 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 11/39] Revert "Revert "ovl: get_write_access() in truncate"" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 13/39] ovl: deal with overlay files in ovl_d_real() Miklos Szeredi
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Copy i_size of the underlying inode to the overlay inode in ovl_copyattr().

This is in preparation for stacking I/O operations on overlay files.

This patch shouldn't have any observable effect.

Remove stale comment from ovl_setattr() [spotted by Vivek Goyal].

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/inode.c     | 9 ---------
 fs/overlayfs/overlayfs.h | 2 ++
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 0116ec12451d..6682ea63c4fd 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -22,15 +22,6 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 	struct dentry *upperdentry;
 	const struct cred *old_cred;
 
-	/*
-	 * Check for permissions before trying to copy-up.  This is redundant
-	 * since it will be rechecked later by ->setattr() on upper dentry.  But
-	 * without this, copy-up can be triggered by just about anybody.
-	 *
-	 * We don't initialize inode->size, which just means that
-	 * inode_newsize_ok() will always check against MAX_LFS_FILESIZE and not
-	 * check for a swapfile (which this won't be anyway).
-	 */
 	err = setattr_prepare(dentry, attr);
 	if (err)
 		return err;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index e9dab319c8b2..419c80a0024e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -9,6 +9,7 @@
 
 #include <linux/kernel.h>
 #include <linux/uuid.h>
+#include <linux/fs.h>
 #include "ovl_entry.h"
 
 enum ovl_path_type {
@@ -348,6 +349,7 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
 	to->i_atime = from->i_atime;
 	to->i_mtime = from->i_mtime;
 	to->i_ctime = from->i_ctime;
+	i_size_write(to, i_size_read(from));
 }
 
 static inline void ovl_copyflags(struct inode *from, struct inode *to)
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 13/39] ovl: deal with overlay files in ovl_d_real()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (11 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 12/39] ovl: copy up file size as well Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 14/39] ovl: stack file ops Miklos Szeredi
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/super.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 704b37311467..211975921a90 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -97,6 +97,10 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 	struct dentry *real;
 	int err;
 
+	/* It's an overlay file */
+	if (inode && d_inode(dentry) == inode)
+		return dentry;
+
 	if (flags & D_REAL_UPPER)
 		return ovl_dentry_upper(dentry);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 14/39] ovl: stack file ops
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (12 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 13/39] ovl: deal with overlay files in ovl_d_real() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-10  4:13   ` Al Viro
  2018-05-29 14:43 ` [PATCH 15/39] ovl: add helper to return real file Miklos Szeredi
                   ` (24 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement file operations on a regular overlay file.  The underlying file
is opened separately and cached in ->private_data.

It might be worth making an exception for such files when accounting in
nr_file to confirm to userspace expectations.  We are only adding a small
overhead (248bytes for the struct file) since the real inode and dentry are
pinned by overlayfs anyway.

This patch doesn't have any effect, since the vfs will use d_real() to find
the real underlying file to open.  The patch at the end of the series will
actually enable this functionality.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/Makefile    |  4 +--
 fs/overlayfs/file.c      | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c     |  1 +
 fs/overlayfs/overlayfs.h |  3 ++
 4 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 fs/overlayfs/file.c

diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
index 30802347a020..46e1ff8ac056 100644
--- a/fs/overlayfs/Makefile
+++ b/fs/overlayfs/Makefile
@@ -4,5 +4,5 @@
 
 obj-$(CONFIG_OVERLAY_FS) += overlay.o
 
-overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o \
-		export.o
+overlay-objs := super.o namei.o util.o inode.o file.o dir.o readdir.o \
+		copy_up.o export.o
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
new file mode 100644
index 000000000000..a0b606885c41
--- /dev/null
+++ b/fs/overlayfs/file.c
@@ -0,0 +1,76 @@
+/*
+ * Copyright (C) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/xattr.h>
+#include "overlayfs.h"
+
+static struct file *ovl_open_realfile(const struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct inode *upperinode = ovl_inode_upper(inode);
+	struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
+	struct file *realfile;
+	const struct cred *old_cred;
+
+	old_cred = ovl_override_creds(inode->i_sb);
+	realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
+			     realinode, current_cred(), false);
+	revert_creds(old_cred);
+
+	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
+		 file, file, upperinode ? 'u' : 'l', file->f_flags,
+		 realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
+
+	return realfile;
+}
+
+static int ovl_open(struct inode *inode, struct file *file)
+{
+	struct dentry *dentry = file_dentry(file);
+	struct file *realfile;
+	int err;
+
+	err = ovl_open_maybe_copy_up(dentry, file->f_flags);
+	if (err)
+		return err;
+
+	/* No longer need these flags, so don't pass them on to underlying fs */
+	file->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
+
+	realfile = ovl_open_realfile(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	file->private_data = realfile;
+
+	return 0;
+}
+
+static int ovl_release(struct inode *inode, struct file *file)
+{
+	fput(file->private_data);
+
+	return 0;
+}
+
+static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct inode *realinode = ovl_inode_real(file_inode(file));
+
+	return generic_file_llseek_size(file, offset, whence,
+					realinode->i_sb->s_maxbytes,
+					i_size_read(realinode));
+}
+
+const struct file_operations ovl_file_operations = {
+	.open		= ovl_open,
+	.release	= ovl_release,
+	.llseek		= ovl_llseek,
+};
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 6682ea63c4fd..d6a13da0740f 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -535,6 +535,7 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
 	switch (mode & S_IFMT) {
 	case S_IFREG:
 		inode->i_op = &ovl_file_inode_operations;
+		inode->i_fop = &ovl_file_operations;
 		break;
 
 	case S_IFDIR:
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 419c80a0024e..3f6e39a2f51e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -377,6 +377,9 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
 int ovl_cleanup(struct inode *dir, struct dentry *dentry);
 struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr);
 
+/* file.c */
+extern const struct file_operations ovl_file_operations;
+
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 15/39] ovl: add helper to return real file
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (13 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 14/39] ovl: stack file ops Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-10  5:42   ` Al Viro
  2018-05-29 14:43 ` [PATCH 16/39] ovl: add ovl_read_iter() Miklos Szeredi
                   ` (23 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

In the common case we can just use the real file cached in
file->private_data.  There are two exceptions:

1) File has been copied up since open: in this unlikely corner case just
use a throwaway real file for the operation.  If ever this becomes a
perfomance problem (very unlikely, since overlayfs has been doing most fine
without correctly handling this case at all), then we can deal with that by
updating the cached real file.

2) File's f_flags have changed since open: no need to reopen the cached
real file, we can just change the flags there as well.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index a0b606885c41..db8778e7c37a 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -31,6 +31,66 @@ static struct file *ovl_open_realfile(const struct file *file)
 	return realfile;
 }
 
+#define OVL_SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT)
+
+static int ovl_change_flags(struct file *file, unsigned int flags)
+{
+	struct inode *inode = file_inode(file);
+	int err;
+
+	/* No atime modificaton on underlying */
+	flags |= O_NOATIME;
+
+	/* If some flag changed that cannot be changed then something's amiss */
+	if (WARN_ON((file->f_flags ^ flags) & ~OVL_SETFL_MASK))
+		return -EIO;
+
+	flags &= OVL_SETFL_MASK;
+
+	if (((flags ^ file->f_flags) & O_APPEND) && IS_APPEND(inode))
+		return -EPERM;
+
+	if (flags & O_DIRECT) {
+		if (!file->f_mapping->a_ops ||
+		    !file->f_mapping->a_ops->direct_IO)
+			return -EINVAL;
+	}
+
+	if (file->f_op->check_flags) {
+		err = file->f_op->check_flags(flags);
+		if (err)
+			return err;
+	}
+
+	spin_lock(&file->f_lock);
+	file->f_flags = (file->f_flags & ~OVL_SETFL_MASK) | flags;
+	spin_unlock(&file->f_lock);
+
+	return 0;
+}
+
+static int ovl_real_fdget(const struct file *file, struct fd *real)
+{
+	struct inode *inode = file_inode(file);
+
+	real->flags = 0;
+	real->file = file->private_data;
+
+	/* Has it been copied up since we'd opened it? */
+	if (unlikely(file_inode(real->file) != ovl_inode_real(inode))) {
+		real->flags = FDPUT_FPUT;
+		real->file = ovl_open_realfile(file);
+
+		return PTR_ERR_OR_ZERO(real->file);
+	}
+
+	/* Did the flags change since open? */
+	if (unlikely((file->f_flags ^ real->file->f_flags) & ~O_NOATIME))
+		return ovl_change_flags(real->file, file->f_flags);
+
+	return 0;
+}
+
 static int ovl_open(struct inode *inode, struct file *file)
 {
 	struct dentry *dentry = file_dentry(file);
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 16/39] ovl: add ovl_read_iter()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (14 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 15/39] ovl: add helper to return real file Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 17/39] ovl: add ovl_write_iter() Miklos Szeredi
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement stacked reading.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index db8778e7c37a..bbc40a14acf8 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -9,6 +9,7 @@
 #include <linux/cred.h>
 #include <linux/file.h>
 #include <linux/xattr.h>
+#include <linux/uio.h>
 #include "overlayfs.h"
 
 static struct file *ovl_open_realfile(const struct file *file)
@@ -129,8 +130,74 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 					i_size_read(realinode));
 }
 
+static void ovl_file_accessed(struct file *file)
+{
+	struct inode *inode, *upperinode;
+
+	if (file->f_flags & O_NOATIME)
+		return;
+
+	inode = file_inode(file);
+	upperinode = ovl_inode_upper(inode);
+
+	if (!upperinode)
+		return;
+
+	if ((!timespec_equal(&inode->i_mtime, &upperinode->i_mtime) ||
+	     !timespec_equal(&inode->i_ctime, &upperinode->i_ctime))) {
+		inode->i_mtime = upperinode->i_mtime;
+		inode->i_ctime = upperinode->i_ctime;
+	}
+
+	touch_atime(&file->f_path);
+}
+
+static rwf_t ovl_iocb_to_rwf(struct kiocb *iocb)
+{
+	int ifl = iocb->ki_flags;
+	rwf_t flags = 0;
+
+	if (ifl & IOCB_NOWAIT)
+		flags |= RWF_NOWAIT;
+	if (ifl & IOCB_HIPRI)
+		flags |= RWF_HIPRI;
+	if (ifl & IOCB_DSYNC)
+		flags |= RWF_DSYNC;
+	if (ifl & IOCB_SYNC)
+		flags |= RWF_SYNC;
+
+	return flags;
+}
+
+static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	struct fd real;
+	const struct cred *old_cred;
+	ssize_t ret;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		return ret;
+
+	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
+			    ovl_iocb_to_rwf(iocb));
+	revert_creds(old_cred);
+
+	ovl_file_accessed(file);
+
+	fdput(real);
+
+	return ret;
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
 	.llseek		= ovl_llseek,
+	.read_iter	= ovl_read_iter,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 17/39] ovl: add ovl_write_iter()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (15 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 16/39] ovl: add ovl_read_iter() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 18/39] ovl: add ovl_fsync() Miklos Szeredi
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement stacked writes.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index bbc40a14acf8..a7af56861aa5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -195,9 +195,48 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	return ret;
 }
 
+static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct fd real;
+	const struct cred *old_cred;
+	ssize_t ret;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	inode_lock(inode);
+	/* Update mode */
+	ovl_copyattr(ovl_inode_real(inode), inode);
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out_unlock;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		goto out_unlock;
+
+	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
+			     ovl_iocb_to_rwf(iocb));
+	revert_creds(old_cred);
+
+	/* Update size */
+	ovl_copyattr(ovl_inode_real(inode), inode);
+
+	fdput(real);
+
+out_unlock:
+	inode_unlock(inode);
+
+	return ret;
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
 	.llseek		= ovl_llseek,
 	.read_iter	= ovl_read_iter,
+	.write_iter	= ovl_write_iter,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 18/39] ovl: add ovl_fsync()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (16 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 17/39] ovl: add ovl_write_iter() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 19/39] ovl: add ovl_mmap() Miklos Szeredi
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement stacked fsync().

Don't sync if lower (noticed by Amir Goldstein).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index a7af56861aa5..7b47dce4b072 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -233,10 +233,33 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	return ret;
 }
 
+static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fd real;
+	const struct cred *old_cred;
+	int ret;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		return ret;
+
+	/* Don't sync lower file for fear of receiving EROFS error */
+	if (file_inode(real.file) == ovl_inode_upper(file_inode(file))) {
+		old_cred = ovl_override_creds(file_inode(file)->i_sb);
+		ret = vfs_fsync_range(real.file, start, end, datasync);
+		revert_creds(old_cred);
+	}
+
+	fdput(real);
+
+	return ret;
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
 	.llseek		= ovl_llseek,
 	.read_iter	= ovl_read_iter,
 	.write_iter	= ovl_write_iter,
+	.fsync		= ovl_fsync,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 19/39] ovl: add ovl_mmap()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (17 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 18/39] ovl: add ovl_fsync() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-10  5:24   ` Al Viro
  2018-05-29 14:43 ` [PATCH 20/39] ovl: add ovl_fallocate() Miklos Szeredi
                   ` (19 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement stacked mmap.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 7b47dce4b072..4057bbf2e141 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -255,6 +255,33 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	return ret;
 }
 
+static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct fd real;
+	const struct cred *old_cred;
+	int ret;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		return ret;
+
+	/* transfer ref: */
+	fput(vma->vm_file);
+	vma->vm_file = get_file(real.file);
+	fdput(real);
+
+	if (!vma->vm_file->f_op->mmap)
+		return -ENODEV;
+
+	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = call_mmap(vma->vm_file, vma);
+	revert_creds(old_cred);
+
+	ovl_file_accessed(file);
+
+	return ret;
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -262,4 +289,5 @@ const struct file_operations ovl_file_operations = {
 	.read_iter	= ovl_read_iter,
 	.write_iter	= ovl_write_iter,
 	.fsync		= ovl_fsync,
+	.mmap		= ovl_mmap,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 20/39] ovl: add ovl_fallocate()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (18 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 19/39] ovl: add ovl_mmap() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 21/39] ovl: add lsattr/chattr support Miklos Szeredi
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement stacked fallocate.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4057bbf2e141..069599d53511 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -282,6 +282,29 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 	return ret;
 }
 
+static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
+{
+	struct inode *inode = file_inode(file);
+	struct fd real;
+	const struct cred *old_cred;
+	int ret;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		return ret;
+
+	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = vfs_fallocate(real.file, mode, offset, len);
+	revert_creds(old_cred);
+
+	/* Update size */
+	ovl_copyattr(ovl_inode_real(inode), inode);
+
+	fdput(real);
+
+	return ret;
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -290,4 +313,5 @@ const struct file_operations ovl_file_operations = {
 	.write_iter	= ovl_write_iter,
 	.fsync		= ovl_fsync,
 	.mmap		= ovl_mmap,
+	.fallocate	= ovl_fallocate,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 21/39] ovl: add lsattr/chattr support
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (19 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 20/39] ovl: add ovl_fallocate() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 22/39] ovl: add ovl_fiemap() Miklos Szeredi
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 069599d53511..3f610a5b38e4 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -8,6 +8,7 @@
 
 #include <linux/cred.h>
 #include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/xattr.h>
 #include <linux/uio.h>
 #include "overlayfs.h"
@@ -305,6 +306,82 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	return ret;
 }
 
+static long ovl_real_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	struct fd real;
+	const struct cred *old_cred;
+	long ret;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		return ret;
+
+	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = vfs_ioctl(real.file, cmd, arg);
+	revert_creds(old_cred);
+
+	fdput(real);
+
+	return ret;
+}
+
+static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	long ret;
+	struct inode *inode = file_inode(file);
+
+	switch (cmd) {
+	case FS_IOC_GETFLAGS:
+		ret = ovl_real_ioctl(file, cmd, arg);
+		break;
+
+	case FS_IOC_SETFLAGS:
+		if (!inode_owner_or_capable(inode))
+			return -EACCES;
+
+		ret = mnt_want_write_file(file);
+		if (ret)
+			return ret;
+
+		ret = ovl_copy_up(file_dentry(file));
+		if (!ret) {
+			ret = ovl_real_ioctl(file, cmd, arg);
+
+			inode_lock(inode);
+			ovl_copyflags(ovl_inode_real(inode), inode);
+			inode_unlock(inode);
+		}
+
+		mnt_drop_write_file(file);
+		break;
+
+	default:
+		ret = -ENOTTY;
+	}
+
+	return ret;
+}
+
+static long ovl_compat_ioctl(struct file *file, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case FS_IOC32_GETFLAGS:
+		cmd = FS_IOC_GETFLAGS;
+		break;
+
+	case FS_IOC32_SETFLAGS:
+		cmd = FS_IOC_SETFLAGS;
+		break;
+
+	default:
+		return -ENOIOCTLCMD;
+	}
+
+	return ovl_ioctl(file, cmd, arg);
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -314,4 +391,6 @@ const struct file_operations ovl_file_operations = {
 	.fsync		= ovl_fsync,
 	.mmap		= ovl_mmap,
 	.fallocate	= ovl_fallocate,
+	.unlocked_ioctl	= ovl_ioctl,
+	.compat_ioctl	= ovl_compat_ioctl,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 22/39] ovl: add ovl_fiemap()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (20 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 21/39] ovl: add lsattr/chattr support Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 23/39] ovl: add O_DIRECT support Miklos Szeredi
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Implement stacked fiemap().

Need to split inode operations for regular file (which has fiemap) and
special file (which doesn't have fiemap).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/inode.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index d6a13da0740f..cd46dd8e7e54 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -448,6 +448,23 @@ int ovl_update_time(struct inode *inode, struct timespec *ts, int flags)
 	return 0;
 }
 
+static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+		      u64 start, u64 len)
+{
+	int err;
+	struct inode *realinode = ovl_inode_real(inode);
+	const struct cred *old_cred;
+
+	if (!realinode->i_op->fiemap)
+		return -EOPNOTSUPP;
+
+	old_cred = ovl_override_creds(inode->i_sb);
+	err = realinode->i_op->fiemap(realinode, fieinfo, start, len);
+	revert_creds(old_cred);
+
+	return err;
+}
+
 static const struct inode_operations ovl_file_inode_operations = {
 	.setattr	= ovl_setattr,
 	.permission	= ovl_permission,
@@ -455,6 +472,7 @@ static const struct inode_operations ovl_file_inode_operations = {
 	.listxattr	= ovl_listxattr,
 	.get_acl	= ovl_get_acl,
 	.update_time	= ovl_update_time,
+	.fiemap		= ovl_fiemap,
 };
 
 static const struct inode_operations ovl_symlink_inode_operations = {
@@ -465,6 +483,15 @@ static const struct inode_operations ovl_symlink_inode_operations = {
 	.update_time	= ovl_update_time,
 };
 
+static const struct inode_operations ovl_special_inode_operations = {
+	.setattr	= ovl_setattr,
+	.permission	= ovl_permission,
+	.getattr	= ovl_getattr,
+	.listxattr	= ovl_listxattr,
+	.get_acl	= ovl_get_acl,
+	.update_time	= ovl_update_time,
+};
+
 /*
  * It is possible to stack overlayfs instance on top of another
  * overlayfs instance as lower layer. We need to annonate the
@@ -548,7 +575,7 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
 		break;
 
 	default:
-		inode->i_op = &ovl_file_inode_operations;
+		inode->i_op = &ovl_special_inode_operations;
 		init_special_inode(inode, mode, rdev);
 		break;
 	}
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 23/39] ovl: add O_DIRECT support
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (21 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 22/39] ovl: add ovl_fiemap() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-10  5:31   ` Al Viro
  2018-05-29 14:43 ` [PATCH 24/39] ovl: add reflink/copyfile/dedup support Miklos Szeredi
                   ` (15 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 3f610a5b38e4..e5e7ccaaf9ec 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -110,6 +110,9 @@ static int ovl_open(struct inode *inode, struct file *file)
 	if (IS_ERR(realfile))
 		return PTR_ERR(realfile);
 
+	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
+	file->f_mapping = realfile->f_mapping;
+
 	file->private_data = realfile;
 
 	return 0;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 24/39] ovl: add reflink/copyfile/dedup support
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (22 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 23/39] ovl: add O_DIRECT support Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 25/39] vfs: don't open real Miklos Szeredi
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Since set of arguments are so similar, handle in a common helper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/file.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index e5e7ccaaf9ec..ef4bcc80572f 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -385,6 +385,90 @@ static long ovl_compat_ioctl(struct file *file, unsigned int cmd,
 	return ovl_ioctl(file, cmd, arg);
 }
 
+enum ovl_copyop {
+	OVL_COPY,
+	OVL_CLONE,
+	OVL_DEDUPE,
+};
+
+static s64 ovl_copyfile(struct file *file_in, loff_t pos_in,
+			struct file *file_out, loff_t pos_out,
+			u64 len, unsigned int flags, enum ovl_copyop op)
+{
+	struct inode *inode_out = file_inode(file_out);
+	struct fd real_in, real_out;
+	const struct cred *old_cred;
+	s64 ret;
+
+	ret = ovl_real_fdget(file_out, &real_out);
+	if (ret)
+		return ret;
+
+	ret = ovl_real_fdget(file_in, &real_in);
+	if (ret) {
+		fdput(real_out);
+		return ret;
+	}
+
+	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
+	switch (op) {
+	case OVL_COPY:
+		ret = vfs_copy_file_range(real_in.file, pos_in,
+					  real_out.file, pos_out, len, flags);
+		break;
+
+	case OVL_CLONE:
+		ret = vfs_clone_file_range(real_in.file, pos_in,
+					   real_out.file, pos_out, len);
+		break;
+
+	case OVL_DEDUPE:
+		ret = vfs_dedupe_file_range_one(real_in.file, pos_in,
+						real_out.file, pos_out, len);
+		break;
+	}
+	revert_creds(old_cred);
+
+	/* Update size */
+	ovl_copyattr(ovl_inode_real(inode_out), inode_out);
+
+	fdput(real_in);
+	fdput(real_out);
+
+	return ret;
+}
+
+static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
+				   struct file *file_out, loff_t pos_out,
+				   size_t len, unsigned int flags)
+{
+	return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
+			    OVL_COPY);
+}
+
+static int ovl_clone_file_range(struct file *file_in, loff_t pos_in,
+				struct file *file_out, loff_t pos_out, u64 len)
+{
+	return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, 0,
+			    OVL_CLONE);
+}
+
+static loff_t ovl_dedupe_file_range(struct file *file_in, loff_t pos_in,
+				    struct file *file_out, loff_t pos_out,
+				    loff_t len)
+{
+	/*
+	 * Don't copy up because of a dedupe request, this wouldn't make sense
+	 * most of the time (data would be duplicated instead of deduplicated).
+	 */
+	if (!ovl_inode_upper(file_inode(file_in)) ||
+	    !ovl_inode_upper(file_inode(file_out)))
+		return -EPERM;
+
+	return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, 0,
+			    OVL_DEDUPE);
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -396,4 +480,8 @@ const struct file_operations ovl_file_operations = {
 	.fallocate	= ovl_fallocate,
 	.unlocked_ioctl	= ovl_ioctl,
 	.compat_ioctl	= ovl_compat_ioctl,
+
+	.copy_file_range	= ovl_copy_file_range,
+	.clone_file_range	= ovl_clone_file_range,
+	.dedupe_file_range	= ovl_dedupe_file_range,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 25/39] vfs: don't open real
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (23 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 24/39] ovl: add reflink/copyfile/dedup support Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 26/39] ovl: copy-up on MAP_SHARED Miklos Szeredi
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Let overlayfs do its thing when opening a file.

This enables stacking and fixes the corner case when a file is opened for
read, modified through a writable open, and data is read from the read-only
file.  After this patch the read-only open will not return stale data even
in this case.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/open.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 6e52fd6fea7c..244cd2ecfefd 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -897,13 +897,8 @@ EXPORT_SYMBOL(file_path);
 int vfs_open(const struct path *path, struct file *file,
 	     const struct cred *cred)
 {
-	struct dentry *dentry = d_real(path->dentry, NULL, file->f_flags, 0);
-
-	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
-
 	file->f_path = *path;
-	return do_dentry_open(file, d_backing_inode(dentry), NULL, cred);
+	return do_dentry_open(file, d_backing_inode(path->dentry), NULL, cred);
 }
 
 /**
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 26/39] ovl: copy-up on MAP_SHARED
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (24 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 25/39] vfs: don't open real Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 27/39] ovl: obsolete "check_copy_up" module option Miklos Szeredi
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

A corner case of a corner case is when

 - file opened for O_RDONLY
 - which is then memory mapped SHARED
 - file opened for O_WRONLY
 - contents modified
 - contents read back though the shared mapping

Unfortunately it looks very difficult to do anything about the established
shared map after the file is copied up.

Instead, when a read-only file is mapped shared, copy up the file before
actually doing the map.  This may result in unnecessary copy-ups (but so
may copy-up on open(O_RDWR) for exampe).

We can revisit this later if it turns out to be a performance problem in
real life.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/Kconfig     | 21 +++++++++++++++++++++
 fs/overlayfs/file.c      | 22 ++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |  7 +++++++
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     | 22 ++++++++++++++++++++++
 5 files changed, 73 insertions(+)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index 17032631c5cf..5d1d40d745c5 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -103,3 +103,24 @@ config OVERLAY_FS_XINO_AUTO
 	  For more information, see Documentation/filesystems/overlayfs.txt
 
 	  If unsure, say N.
+
+config OVERLAY_FS_COPY_UP_SHARED
+	bool "Overlayfs: copy up when mapping a file shared"
+	default n
+	depends on OVERLAY_FS
+	help
+	  If this option is enabled then on mapping a file with MAP_SHARED
+	  overlayfs copies up the file in anticipation of it being modified
+	  (just like we copy up the file on O_WRONLY and O_RDWR in anticipation
+	  of modification).  This does not interfere with shared library
+	  loading, as that uses MAP_PRIVATE.  But there might be use cases out
+	  there where this impacts performance and disk usage.
+
+	  This just selects the default, the feature can also be enabled or
+	  disabled in the running kernel or individually on each overlay mount.
+
+	  To get maximally standard compliant behavior, enable this option.
+
+	  To get a maximally backward compatible kernel, disable this option.
+
+	  If unsure, say N.
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index ef4bcc80572f..266692ce9a9a 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -10,6 +10,7 @@
 #include <linux/file.h>
 #include <linux/mount.h>
 #include <linux/xattr.h>
+#include <linux/mman.h>
 #include <linux/uio.h>
 #include "overlayfs.h"
 
@@ -259,6 +260,26 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	return ret;
 }
 
+static int ovl_pre_mmap(struct file *file, unsigned long prot,
+			unsigned long flag)
+{
+	int err = 0;
+
+	/*
+	 * Take MAP_SHARED as hint about future writes to the file (through
+	 * another file descriptor).  Caller might not have had such an intent,
+	 * but we hope MAP_PRIVATE will be used in most such cases.
+	 *
+	 * If we don't copy up now and the file is modified, it becomes really
+	 * difficult to change the mapping to match that of the file's content
+	 * later.
+	 */
+	if ((flag & MAP_SHARED) && ovl_copy_up_shared(file_inode(file)->i_sb))
+		err = ovl_copy_up(file_dentry(file));
+
+	return err;
+}
+
 static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct fd real;
@@ -476,6 +497,7 @@ const struct file_operations ovl_file_operations = {
 	.read_iter	= ovl_read_iter,
 	.write_iter	= ovl_write_iter,
 	.fsync		= ovl_fsync,
+	.pre_mmap	= ovl_pre_mmap,
 	.mmap		= ovl_mmap,
 	.fallocate	= ovl_fallocate,
 	.unlocked_ioctl	= ovl_ioctl,
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 3f6e39a2f51e..be4f1664f662 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -266,6 +266,13 @@ static inline unsigned int ovl_xino_bits(struct super_block *sb)
 	return ofs->xino_bits;
 }
 
+static inline bool ovl_copy_up_shared(struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	return !(sb->s_flags & SB_RDONLY) && ofs->config.copy_up_shared;
+}
+
 
 /* namei.c */
 int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 41655a7d6894..3bea47c63fd9 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -18,6 +18,7 @@ struct ovl_config {
 	const char *redirect_mode;
 	bool index;
 	bool nfs_export;
+	bool copy_up_shared;
 	int xino;
 };
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 211975921a90..900ed4c39919 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -56,6 +56,12 @@ module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644);
 MODULE_PARM_DESC(ovl_xino_auto_def,
 		 "Auto enable xino feature");
 
+static bool ovl_copy_up_shared_def =
+	IS_ENABLED(CONFIG_OVERLAY_FS_COPY_UP_SHARED);
+module_param_named(copy_up_shared, ovl_copy_up_shared_def, bool, 0644);
+MODULE_PARM_DESC(ovl_copy_up_shared_def,
+		 "Copy up when mapping a file shared");
+
 static void ovl_entry_stack_free(struct ovl_entry *oe)
 {
 	unsigned int i;
@@ -380,6 +386,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 						"on" : "off");
 	if (ofs->config.xino != ovl_xino_def())
 		seq_printf(m, ",xino=%s", ovl_xino_str[ofs->config.xino]);
+	if (ofs->config.copy_up_shared != ovl_copy_up_shared_def)
+		seq_printf(m, ",copy_up_shared=%s",
+			   ofs->config.copy_up_shared ? "on" : "off");
 	return 0;
 }
 
@@ -417,6 +426,8 @@ enum {
 	OPT_XINO_ON,
 	OPT_XINO_OFF,
 	OPT_XINO_AUTO,
+	OPT_COPY_UP_SHARED_ON,
+	OPT_COPY_UP_SHARED_OFF,
 	OPT_ERR,
 };
 
@@ -433,6 +444,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_XINO_ON,			"xino=on"},
 	{OPT_XINO_OFF,			"xino=off"},
 	{OPT_XINO_AUTO,			"xino=auto"},
+	{OPT_COPY_UP_SHARED_ON,		"copy_up_shared=on"},
+	{OPT_COPY_UP_SHARED_OFF,	"copy_up_shared=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -559,6 +572,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->xino = OVL_XINO_AUTO;
 			break;
 
+		case OPT_COPY_UP_SHARED_ON:
+			config->copy_up_shared = true;
+			break;
+
+		case OPT_COPY_UP_SHARED_OFF:
+			config->copy_up_shared = false;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;
@@ -1379,6 +1400,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	ofs->config.index = ovl_index_def;
 	ofs->config.nfs_export = ovl_nfs_export_def;
 	ofs->config.xino = ovl_xino_def();
+	ofs->config.copy_up_shared = ovl_copy_up_shared_def;
 	err = ovl_parse_opt((char *) data, &ofs->config);
 	if (err)
 		goto out_err;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 27/39] ovl: obsolete "check_copy_up" module option
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (25 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 26/39] ovl: copy-up on MAP_SHARED Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 15:13   ` Amir Goldstein
  2018-05-29 14:43 ` [PATCH 28/39] ovl: fix documentation of non-standard behavior Miklos Szeredi
                   ` (11 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This was provided for debugging the ro/rw inconsistecy.  The inconsitency
is now gone so this option is obsolete.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/copy_up.c | 30 +++++++-----------------------
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ddaddb4ce4c3..e675e8349e71 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -25,35 +25,20 @@
 
 #define OVL_COPY_UP_CHUNK_SIZE (1 << 20)
 
-static bool __read_mostly ovl_check_copy_up;
-module_param_named(check_copy_up, ovl_check_copy_up, bool,
-		   S_IWUSR | S_IRUGO);
-MODULE_PARM_DESC(ovl_check_copy_up,
-		 "Warn on copy-up when causing process also has a R/O fd open");
-
-static int ovl_check_fd(const void *data, struct file *f, unsigned int fd)
+static int ovl_ccup_set(const char *buf, const struct kernel_param *param)
 {
-	const struct dentry *dentry = data;
-
-	if (file_inode(f) == d_inode(dentry))
-		pr_warn_ratelimited("overlayfs: Warning: Copying up %pD, but open R/O on fd %u which will cease to be coherent [pid=%d %s]\n",
-				    f, fd, current->pid, current->comm);
+	WARN(1, "overlayfs: \"check_copy_up\" module option is obsolete\n");
 	return 0;
 }
 
-/*
- * Check the fds open by this process and warn if something like the following
- * scenario is about to occur:
- *
- *	fd1 = open("foo", O_RDONLY);
- *	fd2 = open("foo", O_RDWR);
- */
-static void ovl_do_check_copy_up(struct dentry *dentry)
+static int ovl_ccup_get(char *buf, const struct kernel_param *param)
 {
-	if (ovl_check_copy_up)
-		iterate_fd(current->files, 0, ovl_check_fd, dentry);
+	return sprintf(buf, "N\n");
 }
 
+module_param_call(check_copy_up, ovl_ccup_set, ovl_ccup_get, NULL, 0644);
+MODULE_PARM_DESC(ovl_check_copy_up, "Obsolete; does nothing");
+
 int ovl_copy_xattr(struct dentry *old, struct dentry *new)
 {
 	ssize_t list_size, size, value_size = 0;
@@ -719,7 +704,6 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		if (IS_ERR(ctx.link))
 			return PTR_ERR(ctx.link);
 	}
-	ovl_do_check_copy_up(ctx.lowerpath.dentry);
 
 	err = ovl_copy_up_start(dentry);
 	/* err < 0: interrupted, err > 0: raced with another copy-up */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 28/39] ovl: fix documentation of non-standard behavior
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (26 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 27/39] ovl: obsolete "check_copy_up" module option Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 29/39] vfs: simplify dentry_open() Miklos Szeredi
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

We can now drop description of the ro/rw inconsistency from the
documentation.

Also clarify, that now fully standard compliant behavior can be enabled
with kernel/module/mount options.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 Documentation/filesystems/overlayfs.txt | 60 +++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index 72615a2c0752..f087bc40c6a5 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -10,10 +10,6 @@ union-filesystems).  An overlay-filesystem tries to present a
 filesystem which is the result over overlaying one filesystem on top
 of the other.
 
-The result will inevitably fail to look exactly like a normal
-filesystem for various technical reasons.  The expectation is that
-many use cases will be able to ignore these differences.
-
 
 Overlay objects
 ---------------
@@ -306,27 +302,49 @@ the copied layers will fail the verification of the lower root file handle.
 Non-standard behavior
 ---------------------
 
-The copy_up operation essentially creates a new, identical file and
-moves it over to the old name.  Any open files referring to this inode
-will access the old data.
+Overlayfs can now act as a POSIX compliant filesystem with the following
+features turned on:
+
+1) "redirect_dir"
+
+Enabled with the mount option or module option: "redirect_dir=on" or with
+the kernel config option CONFIG_OVERLAY_FS_REDIRECT_DIR=y.
+
+If this feature is disabled, then rename(2) on a lower or merged directory
+will fail with EXDEV ("Invalid cross-device link").
+
+2) "inode index"
+
+Enabled with the mount option or module option "index=on" or with the
+kernel config option CONFIG_OVERLAY_FS_INDEX=y.
+
+If this feature is disabled and a file with multiple hard links is copied
+up, then this will "break" the link.  Changes will not be propagated to
+other names referring to the same inode.
+
+3) "xino"
+
+Enabled with the mount option "xino=auto" or "xino=on", with the module
+option "xino_auto=on" or with the kernel config option
+CONFIG_OVERLAY_FS_XINO_AUTO=y.  Also implicitly enabled by using the same
+underlying filesystem for all layers making up the overlay.
 
-The new file may be on a different filesystem, so both st_dev and st_ino
-of the real file may change.  The values of st_dev and st_ino returned by
-stat(2) on an overlay object are often not the same as the real file
-stat(2) values to prevent the values from changing on copy_up.
+If this feature is disabled or the underlying filesystem doesn't have
+enough free bits in the inode number, then overlayfs will not be able to
+guarantee that the values of st_ino and st_dev returned by stat(2) and the
+value of d_ino returned by readdir(3) will act like on a normal filesystem.
+E.g. the value of st_dev may be different for two objects in the same
+overlay filesystem and the value of st_ino for directory objects may not be
+persistent and could change even while the overlay filesystem is mounted.
 
-Unless "xino" feature is enabled, when overlay layers are not all on the
-same underlying filesystem, the value of st_dev may be different for two
-non-directory objects in the same overlay filesystem and the value of
-st_ino for directory objects may be non persistent and could change even
-while the overlay filesystem is still mounted.
+4) "copy_up_shared"
 
-Unless "inode index" feature is enabled, if a file with multiple hard
-links is copied up, then this will "break" the link.  Changes will not be
-propagated to other names referring to the same inode.
+Enabled with the mount option or module option "copy_up_shared=on" or with
+the kernel config option CONFIG_OVERLAY_FS_COPY_UP_SHARED=y.
 
-Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged
-directory will fail with EXDEV.
+If this feature is disabled, then a memory mapping created with MAP_SHARED
+might contain stale data if the file has been copied up and modified in the
+meantime.
 
 
 Changes to underlying filesystems
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 29/39] vfs: simplify dentry_open()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (27 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 28/39] ovl: fix documentation of non-standard behavior Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 30/39] Revert "ovl: fix may_write_real() for overlayfs directories" Miklos Szeredi
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

dentry_open() can now just call path_open().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/open.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 244cd2ecfefd..1d4bc541c619 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -935,25 +935,12 @@ EXPORT_SYMBOL(path_open);
 struct file *dentry_open(const struct path *path, int flags,
 			 const struct cred *cred)
 {
-	int error;
-	struct file *f;
-
 	validate_creds(cred);
 
 	/* We must always pass in a valid mount pointer. */
 	BUG_ON(!path->mnt);
 
-	f = get_empty_filp();
-	if (IS_ERR(f))
-		return f;
-
-	f->f_flags = flags;
-	error = vfs_open(path, f, cred);
-	if (error) {
-		put_filp(f);
-		return ERR_PTR(error);
-	}
-	return f;
+	return path_open(path, flags, d_backing_inode(path->dentry), cred, true);
 }
 EXPORT_SYMBOL(dentry_open);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 30/39] Revert "ovl: fix may_write_real() for overlayfs directories"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (28 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 29/39] vfs: simplify dentry_open() Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 31/39] Revert "ovl: don't allow writing ioctl on lower layer" Miklos Szeredi
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit 954c736f865d6c0c68ae4263a2f3502ee7c447a3.

Overlayfs no longer relies on the vfs for checking writability of files.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/namespace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5f75969adff1..c3f7152a8419 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -468,9 +468,7 @@ static inline int may_write_real(struct file *file)
 
 	/* File refers to upper, writable layer? */
 	upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
-	if (upperdentry &&
-	    (file_inode(file) == d_inode(upperdentry) ||
-	     file_inode(file) == d_inode(dentry)))
+	if (upperdentry && file_inode(file) == d_inode(upperdentry))
 		return 0;
 
 	/* Lower layer: can't write to real file, sorry... */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 31/39] Revert "ovl: don't allow writing ioctl on lower layer"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (29 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 30/39] Revert "ovl: fix may_write_real() for overlayfs directories" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 32/39] vfs: fix freeze protection in mnt_want_write_file() for overlayfs Miklos Szeredi
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit 7c6893e3c9abf6a9676e060a1e35e5caca673d57.

Overlayfs no longer relies on the vfs for checking writability of files.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/internal.h  |  2 --
 fs/namespace.c | 64 +++-------------------------------------------------------
 fs/open.c      |  4 ++--
 fs/xattr.c     |  9 ++++-----
 4 files changed, 9 insertions(+), 70 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 6821cf475fc6..29c9a2fab592 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -80,10 +80,8 @@ extern void __init mnt_init(void);
 
 extern int __mnt_want_write(struct vfsmount *);
 extern int __mnt_want_write_file(struct file *);
-extern int mnt_want_write_file_path(struct file *);
 extern void __mnt_drop_write(struct vfsmount *);
 extern void __mnt_drop_write_file(struct file *);
-extern void mnt_drop_write_file_path(struct file *);
 
 /*
  * fs_struct.c
diff --git a/fs/namespace.c b/fs/namespace.c
index c3f7152a8419..5286c5313e67 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -431,18 +431,13 @@ int __mnt_want_write_file(struct file *file)
 }
 
 /**
- * mnt_want_write_file_path - get write access to a file's mount
+ * mnt_want_write_file - get write access to a file's mount
  * @file: the file who's mount on which to take a write
  *
  * This is like mnt_want_write, but it takes a file and can
  * do some optimisations if the file is open for write already
- *
- * Called by the vfs for cases when we have an open file at hand, but will do an
- * inode operation on it (important distinction for files opened on overlayfs,
- * since the file operations will come from the real underlying file, while
- * inode operations come from the overlay).
  */
-int mnt_want_write_file_path(struct file *file)
+int mnt_want_write_file(struct file *file)
 {
 	int ret;
 
@@ -452,53 +447,6 @@ int mnt_want_write_file_path(struct file *file)
 		sb_end_write(file->f_path.mnt->mnt_sb);
 	return ret;
 }
-
-static inline int may_write_real(struct file *file)
-{
-	struct dentry *dentry = file->f_path.dentry;
-	struct dentry *upperdentry;
-
-	/* Writable file? */
-	if (file->f_mode & FMODE_WRITER)
-		return 0;
-
-	/* Not overlayfs? */
-	if (likely(!(dentry->d_flags & DCACHE_OP_REAL)))
-		return 0;
-
-	/* File refers to upper, writable layer? */
-	upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
-	if (upperdentry && file_inode(file) == d_inode(upperdentry))
-		return 0;
-
-	/* Lower layer: can't write to real file, sorry... */
-	return -EPERM;
-}
-
-/**
- * mnt_want_write_file - get write access to a file's mount
- * @file: the file who's mount on which to take a write
- *
- * This is like mnt_want_write, but it takes a file and can
- * do some optimisations if the file is open for write already
- *
- * Mostly called by filesystems from their ioctl operation before performing
- * modification.  On overlayfs this needs to check if the file is on a read-only
- * lower layer and deny access in that case.
- */
-int mnt_want_write_file(struct file *file)
-{
-	int ret;
-
-	ret = may_write_real(file);
-	if (!ret) {
-		sb_start_write(file_inode(file)->i_sb);
-		ret = __mnt_want_write_file(file);
-		if (ret)
-			sb_end_write(file_inode(file)->i_sb);
-	}
-	return ret;
-}
 EXPORT_SYMBOL_GPL(mnt_want_write_file);
 
 /**
@@ -536,15 +484,9 @@ void __mnt_drop_write_file(struct file *file)
 	__mnt_drop_write(file->f_path.mnt);
 }
 
-void mnt_drop_write_file_path(struct file *file)
-{
-	mnt_drop_write(file->f_path.mnt);
-}
-
 void mnt_drop_write_file(struct file *file)
 {
-	__mnt_drop_write(file->f_path.mnt);
-	sb_end_write(file_inode(file)->i_sb);
+	mnt_drop_write(file->f_path.mnt);
 }
 EXPORT_SYMBOL(mnt_drop_write_file);
 
diff --git a/fs/open.c b/fs/open.c
index 1d4bc541c619..2db39216c393 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -707,12 +707,12 @@ int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
 	if (!f.file)
 		goto out;
 
-	error = mnt_want_write_file_path(f.file);
+	error = mnt_want_write_file(f.file);
 	if (error)
 		goto out_fput;
 	audit_file(f.file);
 	error = chown_common(&f.file->f_path, user, group);
-	mnt_drop_write_file_path(f.file);
+	mnt_drop_write_file(f.file);
 out_fput:
 	fdput(f);
 out:
diff --git a/fs/xattr.c b/fs/xattr.c
index 61cd28ba25f3..78eaffbdbee0 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -23,7 +23,6 @@
 #include <linux/posix_acl_xattr.h>
 
 #include <linux/uaccess.h>
-#include "internal.h"
 
 static const char *
 strcmp_prefix(const char *a, const char *a_prefix)
@@ -503,10 +502,10 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 	if (!f.file)
 		return error;
 	audit_file(f.file);
-	error = mnt_want_write_file_path(f.file);
+	error = mnt_want_write_file(f.file);
 	if (!error) {
 		error = setxattr(f.file->f_path.dentry, name, value, size, flags);
-		mnt_drop_write_file_path(f.file);
+		mnt_drop_write_file(f.file);
 	}
 	fdput(f);
 	return error;
@@ -735,10 +734,10 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 	if (!f.file)
 		return error;
 	audit_file(f.file);
-	error = mnt_want_write_file_path(f.file);
+	error = mnt_want_write_file(f.file);
 	if (!error) {
 		error = removexattr(f.file->f_path.dentry, name);
-		mnt_drop_write_file_path(f.file);
+		mnt_drop_write_file(f.file);
 	}
 	fdput(f);
 	return error;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 32/39] vfs: fix freeze protection in mnt_want_write_file() for overlayfs
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (30 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 31/39] Revert "ovl: don't allow writing ioctl on lower layer" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-06-04  8:50   ` Christoph Hellwig
  2018-05-29 14:43 ` [PATCH 33/39] Revert "ovl: fix relatime for directories" Miklos Szeredi
                   ` (6 subsequent siblings)
  38 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

The underlying real file used by overlayfs still contains the overlay path.
This results in mnt_want_write_file() calls by the filesystem getting
freeze protection on the wrong inode (the overlayfs one instead of the real
one).

Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/namespace.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5286c5313e67..0d9023a9af4f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -441,10 +441,10 @@ int mnt_want_write_file(struct file *file)
 {
 	int ret;
 
-	sb_start_write(file->f_path.mnt->mnt_sb);
+	sb_start_write(file_inode(file)->i_sb);
 	ret = __mnt_want_write_file(file);
 	if (ret)
-		sb_end_write(file->f_path.mnt->mnt_sb);
+		sb_end_write(file_inode(file)->i_sb);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(mnt_want_write_file);
@@ -486,7 +486,8 @@ void __mnt_drop_write_file(struct file *file)
 
 void mnt_drop_write_file(struct file *file)
 {
-	mnt_drop_write(file->f_path.mnt);
+	__mnt_drop_write_file(file);
+	sb_end_write(file_inode(file)->i_sb);
 }
 EXPORT_SYMBOL(mnt_drop_write_file);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 33/39] Revert "ovl: fix relatime for directories"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (31 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 32/39] vfs: fix freeze protection in mnt_want_write_file() for overlayfs Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 34/39] Revert "vfs: update ovl inode before relatime check" Miklos Szeredi
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit cd91304e7190b4c4802f8e413ab2214b233e0260.

Overlayfs no longer relies on the vfs correct atime handling.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/inode.c             | 21 ++++-----------------
 fs/overlayfs/super.c   |  3 ---
 include/linux/dcache.h |  3 ---
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 1bea65d37afe..ad0259bafe2d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1559,24 +1559,11 @@ EXPORT_SYMBOL(bmap);
 static void update_ovl_inode_times(struct dentry *dentry, struct inode *inode,
 			       bool rcu)
 {
-	struct dentry *upperdentry;
+	if (!rcu) {
+		struct inode *realinode = d_real_inode(dentry);
 
-	/*
-	 * Nothing to do if in rcu or if non-overlayfs
-	 */
-	if (rcu || likely(!(dentry->d_flags & DCACHE_OP_REAL)))
-		return;
-
-	upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
-
-	/*
-	 * If file is on lower then we can't update atime, so no worries about
-	 * stale mtime/ctime.
-	 */
-	if (upperdentry) {
-		struct inode *realinode = d_inode(upperdentry);
-
-		if ((!timespec_equal(&inode->i_mtime, &realinode->i_mtime) ||
+		if (unlikely(inode != realinode) &&
+		    (!timespec_equal(&inode->i_mtime, &realinode->i_mtime) ||
 		     !timespec_equal(&inode->i_ctime, &realinode->i_ctime))) {
 			inode->i_mtime = realinode->i_mtime;
 			inode->i_ctime = realinode->i_ctime;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 900ed4c39919..65ec661c60e6 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -107,9 +107,6 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 	if (inode && d_inode(dentry) == inode)
 		return dentry;
 
-	if (flags & D_REAL_UPPER)
-		return ovl_dentry_upper(dentry);
-
 	if (!d_is_reg(dentry)) {
 		if (!inode || inode == d_inode(dentry))
 			return dentry;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 66c6e17e61e5..ddae4103d324 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -564,9 +564,6 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper)
 	return upper;
 }
 
-/* d_real() flags */
-#define D_REAL_UPPER	0x2	/* return upper dentry or NULL if non-upper */
-
 /**
  * d_real - Return the real dentry
  * @dentry: the dentry to query
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 34/39] Revert "vfs: update ovl inode before relatime check"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (32 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 33/39] Revert "ovl: fix relatime for directories" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 35/39] Revert "vfs: add flags to d_real()" Miklos Szeredi
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit 598e3c8f72f5b77c84d2cb26cfd936ffb3cfdbaa.

Overlayfs no longer relies on the vfs correct atime handling.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/inode.c         | 33 ++++++---------------------------
 fs/internal.h      |  7 -------
 fs/namei.c         |  2 +-
 include/linux/fs.h |  1 +
 4 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index ad0259bafe2d..348e93f468cc 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1553,37 +1553,17 @@ sector_t bmap(struct inode *inode, sector_t block)
 }
 EXPORT_SYMBOL(bmap);
 
-/*
- * Update times in overlayed inode from underlying real inode
- */
-static void update_ovl_inode_times(struct dentry *dentry, struct inode *inode,
-			       bool rcu)
-{
-	if (!rcu) {
-		struct inode *realinode = d_real_inode(dentry);
-
-		if (unlikely(inode != realinode) &&
-		    (!timespec_equal(&inode->i_mtime, &realinode->i_mtime) ||
-		     !timespec_equal(&inode->i_ctime, &realinode->i_ctime))) {
-			inode->i_mtime = realinode->i_mtime;
-			inode->i_ctime = realinode->i_ctime;
-		}
-	}
-}
-
 /*
  * With relative atime, only update atime if the previous atime is
  * earlier than either the ctime or mtime or if at least a day has
  * passed since the last atime update.
  */
-static int relatime_need_update(const struct path *path, struct inode *inode,
-				struct timespec now, bool rcu)
+static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
+			     struct timespec now)
 {
 
-	if (!(path->mnt->mnt_flags & MNT_RELATIME))
+	if (!(mnt->mnt_flags & MNT_RELATIME))
 		return 1;
-
-	update_ovl_inode_times(path->dentry, inode, rcu);
 	/*
 	 * Is mtime younger than atime? If yes, update atime:
 	 */
@@ -1654,8 +1634,7 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
  *	This function automatically handles read only file systems and media,
  *	as well as the "noatime" flag and inode specific "noatime" markers.
  */
-bool __atime_needs_update(const struct path *path, struct inode *inode,
-			  bool rcu)
+bool atime_needs_update(const struct path *path, struct inode *inode)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct timespec now;
@@ -1681,7 +1660,7 @@ bool __atime_needs_update(const struct path *path, struct inode *inode,
 
 	now = current_time(inode);
 
-	if (!relatime_need_update(path, inode, now, rcu))
+	if (!relatime_need_update(mnt, inode, now))
 		return false;
 
 	if (timespec_equal(&inode->i_atime, &now))
@@ -1696,7 +1675,7 @@ void touch_atime(const struct path *path)
 	struct inode *inode = d_inode(path->dentry);
 	struct timespec now;
 
-	if (!__atime_needs_update(path, inode, false))
+	if (!atime_needs_update(path, inode))
 		return;
 
 	if (!sb_start_write_trylock(inode->i_sb))
diff --git a/fs/internal.h b/fs/internal.h
index 29c9a2fab592..6ada1f356da6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -138,13 +138,6 @@ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
 extern void inode_add_lru(struct inode *inode);
 extern int dentry_needs_remove_privs(struct dentry *dentry);
 
-extern bool __atime_needs_update(const struct path *, struct inode *, bool);
-static inline bool atime_needs_update_rcu(const struct path *path,
-					  struct inode *inode)
-{
-	return __atime_needs_update(path, inode, true);
-}
-
 /*
  * fs-writeback.c
  */
diff --git a/fs/namei.c b/fs/namei.c
index 186bd2464fd5..54ab8ccb8d6d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1013,7 +1013,7 @@ const char *get_link(struct nameidata *nd)
 	if (!(nd->flags & LOOKUP_RCU)) {
 		touch_atime(&last->link);
 		cond_resched();
-	} else if (atime_needs_update_rcu(&last->link, inode)) {
+	} else if (atime_needs_update(&last->link, inode)) {
 		if (unlikely(unlazy_walk(nd)))
 			return ERR_PTR(-ECHILD);
 		touch_atime(&last->link);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6961feda6915..96a4b9fd1f5f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2066,6 +2066,7 @@ enum file_time_flags {
 	S_VERSION = 8,
 };
 
+extern bool atime_needs_update(const struct path *, struct inode *);
 extern void touch_atime(const struct path *);
 static inline void file_accessed(struct file *file)
 {
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 35/39] Revert "vfs: add flags to d_real()"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (33 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 34/39] Revert "vfs: update ovl inode before relatime check" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 36/39] Revert "vfs: do get_write_access() on upper layer of overlayfs" Miklos Szeredi
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit 495e642939114478a5237a7d91661ba93b76f15a.

No user of "flags" argument of d_real() remain.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 Documentation/filesystems/Locking |  2 +-
 Documentation/filesystems/vfs.txt |  2 +-
 fs/open.c                         |  2 +-
 fs/overlayfs/super.c              |  4 ++--
 include/linux/dcache.h            | 11 +++++------
 include/linux/fs.h                |  2 +-
 6 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 60e76060baff..a4afe96f0112 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -22,7 +22,7 @@ prototypes:
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *,
-				 unsigned int, unsigned int);
+				 unsigned int);
 
 locking rules:
 		rename_lock	->d_lock	may block	rcu-walk
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 2bc77ea8aef4..af54d3651ff8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -991,7 +991,7 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *,
-				 unsigned int, unsigned int);
+				 unsigned int);
 };
 
   d_revalidate: called when the VFS needs to revalidate a dentry. This
diff --git a/fs/open.c b/fs/open.c
index 2db39216c393..127b49819afb 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -96,7 +96,7 @@ long vfs_truncate(const struct path *path, loff_t length)
 	 * write access on the upper inode, not on the overlay inode.  For
 	 * non-overlay filesystems d_real() is an identity function.
 	 */
-	upperdentry = d_real(path->dentry, NULL, O_WRONLY, 0);
+	upperdentry = d_real(path->dentry, NULL, O_WRONLY);
 	error = PTR_ERR(upperdentry);
 	if (IS_ERR(upperdentry))
 		goto mnt_drop_write_and_out;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 65ec661c60e6..4bca84a17c43 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -98,7 +98,7 @@ static int ovl_check_append_only(struct inode *inode, int flag)
 
 static struct dentry *ovl_d_real(struct dentry *dentry,
 				 const struct inode *inode,
-				 unsigned int open_flags, unsigned int flags)
+				 unsigned int open_flags)
 {
 	struct dentry *real;
 	int err;
@@ -134,7 +134,7 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 		goto bug;
 
 	/* Handle recursion */
-	real = d_real(real, inode, open_flags, 0);
+	real = d_real(real, inode, open_flags);
 
 	if (!inode || inode == d_inode(real))
 		return real;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index ddae4103d324..8fe4efa94af6 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -146,7 +146,7 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *,
-				 unsigned int, unsigned int);
+				 unsigned int);
 } ____cacheline_aligned;
 
 /*
@@ -568,8 +568,7 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper)
  * d_real - Return the real dentry
  * @dentry: the dentry to query
  * @inode: inode to select the dentry from multiple layers (can be NULL)
- * @open_flags: open flags to control copy-up behavior
- * @flags: flags to control what is returned by this function
+ * @flags: open flags to control copy-up behavior
  *
  * If dentry is on a union/overlay, then return the underlying, real dentry.
  * Otherwise return the dentry itself.
@@ -578,10 +577,10 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper)
  */
 static inline struct dentry *d_real(struct dentry *dentry,
 				    const struct inode *inode,
-				    unsigned int open_flags, unsigned int flags)
+				    unsigned int flags)
 {
 	if (unlikely(dentry->d_flags & DCACHE_OP_REAL))
-		return dentry->d_op->d_real(dentry, inode, open_flags, flags);
+		return dentry->d_op->d_real(dentry, inode, flags);
 	else
 		return dentry;
 }
@@ -596,7 +595,7 @@ static inline struct dentry *d_real(struct dentry *dentry,
 static inline struct inode *d_real_inode(const struct dentry *dentry)
 {
 	/* This usage of d_real() results in const dentry */
-	return d_backing_inode(d_real((struct dentry *) dentry, NULL, 0, 0));
+	return d_backing_inode(d_real((struct dentry *) dentry, NULL, 0));
 }
 
 struct name_snapshot {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 96a4b9fd1f5f..797d6f28a8f0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1244,7 +1244,7 @@ static inline struct inode *file_inode(const struct file *f)
 
 static inline struct dentry *file_dentry(const struct file *file)
 {
-	return d_real(file->f_path.dentry, file_inode(file), 0, 0);
+	return d_real(file->f_path.dentry, file_inode(file), 0);
 }
 
 static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 36/39] Revert "vfs: do get_write_access() on upper layer of overlayfs"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (34 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 35/39] Revert "vfs: add flags to d_real()" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 37/39] Partially revert "locks: fix file locking on overlayfs" Miklos Szeredi
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit 4d0c5ba2ff79ef9f5188998b29fd28fcb05f3667.

We now get write access on both overlay and underlying layers so this patch
is no longer needed for correct operation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/locks.c |  3 +--
 fs/open.c  | 15 ++-------------
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 62bbe8b31f26..9c0e5f3da66c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1654,8 +1654,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
 	if (flags & FL_LAYOUT)
 		return 0;
 
-	if ((arg == F_RDLCK) &&
-	    (atomic_read(&d_real_inode(dentry)->i_writecount) > 0))
+	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 		return -EAGAIN;
 
 	if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
diff --git a/fs/open.c b/fs/open.c
index 127b49819afb..0d63b57c7f89 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -68,7 +68,6 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 long vfs_truncate(const struct path *path, loff_t length)
 {
 	struct inode *inode;
-	struct dentry *upperdentry;
 	long error;
 
 	inode = path->dentry->d_inode;
@@ -91,17 +90,7 @@ long vfs_truncate(const struct path *path, loff_t length)
 	if (IS_APPEND(inode))
 		goto mnt_drop_write_and_out;
 
-	/*
-	 * If this is an overlayfs then do as if opening the file so we get
-	 * write access on the upper inode, not on the overlay inode.  For
-	 * non-overlay filesystems d_real() is an identity function.
-	 */
-	upperdentry = d_real(path->dentry, NULL, O_WRONLY);
-	error = PTR_ERR(upperdentry);
-	if (IS_ERR(upperdentry))
-		goto mnt_drop_write_and_out;
-
-	error = get_write_access(upperdentry->d_inode);
+	error = get_write_access(inode);
 	if (error)
 		goto mnt_drop_write_and_out;
 
@@ -120,7 +109,7 @@ long vfs_truncate(const struct path *path, loff_t length)
 		error = do_truncate(path->dentry, length, 0, NULL);
 
 put_write_and_out:
-	put_write_access(upperdentry->d_inode);
+	put_write_access(inode);
 mnt_drop_write_and_out:
 	mnt_drop_write(path->mnt);
 out:
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 37/39] Partially revert "locks: fix file locking on overlayfs"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (35 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 36/39] Revert "vfs: do get_write_access() on upper layer of overlayfs" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 38/39] Revert "fsnotify: support overlayfs" Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 39/39] vfs: remove open_flags from d_real() Miklos Szeredi
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This partially reverts commit c568d68341be7030f5647def68851e469b21ca11.

Overlayfs files will now automatically get the correct locks, no need to
hack overlay support in VFS.

It is a partial revert, because it leaves the locks_inode() calls in place
and defines locks_inode() to file_inode().  We could revert those as well,
but it would be unnecessary code churn and it makes sense to document that
we are getting the inode for locking purposes.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 fs/locks.c              | 17 ++++++-----------
 fs/overlayfs/super.c    |  2 +-
 include/linux/fs.h      | 13 +------------
 include/uapi/linux/fs.h |  1 -
 4 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 9c0e5f3da66c..40bcbaaa3f52 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -139,11 +139,6 @@
 #define IS_OFDLCK(fl)	(fl->fl_flags & FL_OFDLCK)
 #define IS_REMOTELCK(fl)	(fl->fl_pid <= 0)
 
-static inline bool is_remote_lock(struct file *filp)
-{
-	return likely(!(filp->f_path.dentry->d_sb->s_flags & SB_NOREMOTELOCK));
-}
-
 static bool lease_breaking(struct file_lock *fl)
 {
 	return fl->fl_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
@@ -1875,7 +1870,7 @@ EXPORT_SYMBOL(generic_setlease);
 int
 vfs_setlease(struct file *filp, long arg, struct file_lock **lease, void **priv)
 {
-	if (filp->f_op->setlease && is_remote_lock(filp))
+	if (filp->f_op->setlease)
 		return filp->f_op->setlease(filp, arg, lease, priv);
 	else
 		return generic_setlease(filp, arg, lease, priv);
@@ -2022,7 +2017,7 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 	if (error)
 		goto out_free;
 
-	if (f.file->f_op->flock && is_remote_lock(f.file))
+	if (f.file->f_op->flock)
 		error = f.file->f_op->flock(f.file,
 					  (can_sleep) ? F_SETLKW : F_SETLK,
 					  lock);
@@ -2048,7 +2043,7 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
  */
 int vfs_test_lock(struct file *filp, struct file_lock *fl)
 {
-	if (filp->f_op->lock && is_remote_lock(filp))
+	if (filp->f_op->lock)
 		return filp->f_op->lock(filp, F_GETLK, fl);
 	posix_test_lock(filp, fl);
 	return 0;
@@ -2191,7 +2186,7 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
  */
 int vfs_lock_file(struct file *filp, unsigned int cmd, struct file_lock *fl, struct file_lock *conf)
 {
-	if (filp->f_op->lock && is_remote_lock(filp))
+	if (filp->f_op->lock)
 		return filp->f_op->lock(filp, cmd, fl);
 	else
 		return posix_lock_file(filp, fl, conf);
@@ -2513,7 +2508,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
 	if (list_empty(&flctx->flc_flock))
 		return;
 
-	if (filp->f_op->flock && is_remote_lock(filp))
+	if (filp->f_op->flock)
 		filp->f_op->flock(filp, F_SETLKW, &fl);
 	else
 		flock_lock_inode(inode, &fl);
@@ -2600,7 +2595,7 @@ EXPORT_SYMBOL(posix_unblock_lock);
  */
 int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
 {
-	if (filp->f_op->lock && is_remote_lock(filp))
+	if (filp->f_op->lock)
 		return filp->f_op->lock(filp, F_CANCELLK, fl);
 	return 0;
 }
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4bca84a17c43..d7df69e5b674 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1478,7 +1478,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_op = &ovl_super_operations;
 	sb->s_xattr = ovl_xattr_handlers;
 	sb->s_fs_info = ofs;
-	sb->s_flags |= SB_POSIXACL | SB_NOREMOTELOCK;
+	sb->s_flags |= SB_POSIXACL;
 
 	err = -ENOMEM;
 	root_dentry = d_make_root(ovl_new_inode(sb, S_IFDIR, 0));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 797d6f28a8f0..7471a4208fdc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1050,17 +1050,7 @@ struct file_lock_context {
 
 extern void send_sigio(struct fown_struct *fown, int fd, int band);
 
-/*
- * Return the inode to use for locking
- *
- * For overlayfs this should be the overlay inode, not the real inode returned
- * by file_inode().  For any other fs file_inode(filp) and locks_inode(filp) are
- * equal.
- */
-static inline struct inode *locks_inode(const struct file *f)
-{
-	return f->f_path.dentry->d_inode;
-}
+#define locks_inode(f) file_inode(f)
 
 #ifdef CONFIG_FILE_LOCKING
 extern int fcntl_getlk(struct file *, unsigned int, struct flock *);
@@ -1300,7 +1290,6 @@ extern int send_sigurg(struct fown_struct *fown);
 
 /* These sb flags are internal to the kernel */
 #define SB_SUBMOUNT     (1<<26)
-#define SB_NOREMOTELOCK	(1<<27)
 #define SB_NOSEC	(1<<28)
 #define SB_BORN		(1<<29)
 #define SB_ACTIVE	(1<<30)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313fabd7..2840ddcece73 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -134,7 +134,6 @@ struct inodes_stat_t {
 
 /* These sb flags are internal to the kernel */
 #define MS_SUBMOUNT     (1<<26)
-#define MS_NOREMOTELOCK	(1<<27)
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 38/39] Revert "fsnotify: support overlayfs"
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (36 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 37/39] Partially revert "locks: fix file locking on overlayfs" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  2018-05-29 14:43 ` [PATCH 39/39] vfs: remove open_flags from d_real() Miklos Szeredi
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

This reverts commit f3fbbb079263bd29ae592478de6808db7e708267.

Overlayfs now works correctly without adding hacks to fsnotify.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 include/linux/fsnotify.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index bdaf22582f6e..fd1ce10553bf 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -30,11 +30,7 @@ static inline int fsnotify_parent(const struct path *path, struct dentry *dentry
 static inline int fsnotify_perm(struct file *file, int mask)
 {
 	const struct path *path = &file->f_path;
-	/*
-	 * Do not use file_inode() here or anywhere in this file to get the
-	 * inode.  That would break *notity on overlayfs.
-	 */
-	struct inode *inode = path->dentry->d_inode;
+	struct inode *inode = file_inode(file);
 	__u32 fsnotify_mask = 0;
 	int ret;
 
@@ -178,7 +174,7 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 static inline void fsnotify_access(struct file *file)
 {
 	const struct path *path = &file->f_path;
-	struct inode *inode = path->dentry->d_inode;
+	struct inode *inode = file_inode(file);
 	__u32 mask = FS_ACCESS;
 
 	if (S_ISDIR(inode->i_mode))
@@ -196,7 +192,7 @@ static inline void fsnotify_access(struct file *file)
 static inline void fsnotify_modify(struct file *file)
 {
 	const struct path *path = &file->f_path;
-	struct inode *inode = path->dentry->d_inode;
+	struct inode *inode = file_inode(file);
 	__u32 mask = FS_MODIFY;
 
 	if (S_ISDIR(inode->i_mode))
@@ -214,7 +210,7 @@ static inline void fsnotify_modify(struct file *file)
 static inline void fsnotify_open(struct file *file)
 {
 	const struct path *path = &file->f_path;
-	struct inode *inode = path->dentry->d_inode;
+	struct inode *inode = file_inode(file);
 	__u32 mask = FS_OPEN;
 
 	if (S_ISDIR(inode->i_mode))
@@ -230,7 +226,7 @@ static inline void fsnotify_open(struct file *file)
 static inline void fsnotify_close(struct file *file)
 {
 	const struct path *path = &file->f_path;
-	struct inode *inode = path->dentry->d_inode;
+	struct inode *inode = file_inode(file);
 	fmode_t mode = file->f_mode;
 	__u32 mask = (mode & FMODE_WRITE) ? FS_CLOSE_WRITE : FS_CLOSE_NOWRITE;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 39/39] vfs: remove open_flags from d_real()
  2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
                   ` (37 preceding siblings ...)
  2018-05-29 14:43 ` [PATCH 38/39] Revert "fsnotify: support overlayfs" Miklos Szeredi
@ 2018-05-29 14:43 ` Miklos Szeredi
  38 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-29 14:43 UTC (permalink / raw)
  To: linux-unionfs; +Cc: linux-fsdevel, linux-kernel

Opening regular files on overlayfs is now handled via ovl_open().  Remove
the now unused "open_flags" argument from d_op->d_real() and the d_real()
helper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 Documentation/filesystems/Locking |  3 +--
 Documentation/filesystems/vfs.txt | 16 ++++------------
 fs/overlayfs/super.c              | 36 +++---------------------------------
 include/linux/dcache.h            | 11 ++++-------
 include/linux/fs.h                |  2 +-
 5 files changed, 13 insertions(+), 55 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index a4afe96f0112..e1d7e43d302c 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -21,8 +21,7 @@ prototypes:
 	char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
-	struct dentry *(*d_real)(struct dentry *, const struct inode *,
-				 unsigned int);
+	struct dentry *(*d_real)(struct dentry *, const struct inode *);
 
 locking rules:
 		rename_lock	->d_lock	may block	rcu-walk
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index af54d3651ff8..8b03c5e675bf 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -990,8 +990,7 @@ struct dentry_operations {
 	char *(*d_dname)(struct dentry *, char *, int);
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
-	struct dentry *(*d_real)(struct dentry *, const struct inode *,
-				 unsigned int);
+	struct dentry *(*d_real)(struct dentry *, const struct inode *);
 };
 
   d_revalidate: called when the VFS needs to revalidate a dentry. This
@@ -1125,22 +1124,15 @@ struct dentry_operations {
 	dentry being transited from.
 
   d_real: overlay/union type filesystems implement this method to return one of
-	the underlying dentries hidden by the overlay.  It is used in three
+	the underlying dentries hidden by the overlay.  It is used in two
 	different modes:
 
-	Called from open it may need to copy-up the file depending on the
-	supplied open flags.  This mode is selected with a non-zero flags
-	argument.  In this mode the d_real method can return an error.
-
 	Called from file_dentry() it returns the real dentry matching the inode
 	argument.  The real dentry may be from a lower layer already copied up,
 	but still referenced from the file.  This mode is selected with a
-	non-NULL inode argument.  This will always succeed.
-
-	With NULL inode and zero flags the topmost real underlying dentry is
-	returned.  This will always succeed.
+	non-NULL inode argument.
 
-	This method is never called with both non-NULL inode and non-zero flags.
+	With NULL inode the topmost real underlying dentry is returned.
 
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries. Child dentries are basically like files in a
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d7df69e5b674..cd5c82f105d6 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -80,28 +80,10 @@ static void ovl_dentry_release(struct dentry *dentry)
 	}
 }
 
-static int ovl_check_append_only(struct inode *inode, int flag)
-{
-	/*
-	 * This test was moot in vfs may_open() because overlay inode does
-	 * not have the S_APPEND flag, so re-check on real upper inode
-	 */
-	if (IS_APPEND(inode)) {
-		if  ((flag & O_ACCMODE) != O_RDONLY && !(flag & O_APPEND))
-			return -EPERM;
-		if (flag & O_TRUNC)
-			return -EPERM;
-	}
-
-	return 0;
-}
-
 static struct dentry *ovl_d_real(struct dentry *dentry,
-				 const struct inode *inode,
-				 unsigned int open_flags)
+				 const struct inode *inode)
 {
 	struct dentry *real;
-	int err;
 
 	/* It's an overlay file */
 	if (inode && d_inode(dentry) == inode)
@@ -113,28 +95,16 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
 		goto bug;
 	}
 
-	if (open_flags) {
-		err = ovl_open_maybe_copy_up(dentry, open_flags);
-		if (err)
-			return ERR_PTR(err);
-	}
-
 	real = ovl_dentry_upper(dentry);
-	if (real && (!inode || inode == d_inode(real))) {
-		if (!inode) {
-			err = ovl_check_append_only(d_inode(real), open_flags);
-			if (err)
-				return ERR_PTR(err);
-		}
+	if (real && (!inode || inode == d_inode(real)))
 		return real;
-	}
 
 	real = ovl_dentry_lower(dentry);
 	if (!real)
 		goto bug;
 
 	/* Handle recursion */
-	real = d_real(real, inode, open_flags);
+	real = d_real(real, inode);
 
 	if (!inode || inode == d_inode(real))
 		return real;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 8fe4efa94af6..78cea80423a3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -145,8 +145,7 @@ struct dentry_operations {
 	char *(*d_dname)(struct dentry *, char *, int);
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
-	struct dentry *(*d_real)(struct dentry *, const struct inode *,
-				 unsigned int);
+	struct dentry *(*d_real)(struct dentry *, const struct inode *);
 } ____cacheline_aligned;
 
 /*
@@ -568,7 +567,6 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper)
  * d_real - Return the real dentry
  * @dentry: the dentry to query
  * @inode: inode to select the dentry from multiple layers (can be NULL)
- * @flags: open flags to control copy-up behavior
  *
  * If dentry is on a union/overlay, then return the underlying, real dentry.
  * Otherwise return the dentry itself.
@@ -576,11 +574,10 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper)
  * See also: Documentation/filesystems/vfs.txt
  */
 static inline struct dentry *d_real(struct dentry *dentry,
-				    const struct inode *inode,
-				    unsigned int flags)
+				    const struct inode *inode)
 {
 	if (unlikely(dentry->d_flags & DCACHE_OP_REAL))
-		return dentry->d_op->d_real(dentry, inode, flags);
+		return dentry->d_op->d_real(dentry, inode);
 	else
 		return dentry;
 }
@@ -595,7 +592,7 @@ static inline struct dentry *d_real(struct dentry *dentry,
 static inline struct inode *d_real_inode(const struct dentry *dentry)
 {
 	/* This usage of d_real() results in const dentry */
-	return d_backing_inode(d_real((struct dentry *) dentry, NULL, 0));
+	return d_backing_inode(d_real((struct dentry *) dentry, NULL));
 }
 
 struct name_snapshot {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7471a4208fdc..eb4c189d34ba 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1234,7 +1234,7 @@ static inline struct inode *file_inode(const struct file *f)
 
 static inline struct dentry *file_dentry(const struct file *file)
 {
-	return d_real(file->f_path.dentry, file_inode(file), 0);
+	return d_real(file->f_path.dentry, file_inode(file));
 }
 
 static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
-- 
2.14.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 27/39] ovl: obsolete "check_copy_up" module option
  2018-05-29 14:43 ` [PATCH 27/39] ovl: obsolete "check_copy_up" module option Miklos Szeredi
@ 2018-05-29 15:13   ` Amir Goldstein
  2018-05-30  8:26     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Amir Goldstein @ 2018-05-29 15:13 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 5:43 PM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> This was provided for debugging the ro/rw inconsistecy.  The inconsitency
> is now gone so this option is obsolete.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/copy_up.c | 30 +++++++-----------------------
>  1 file changed, 7 insertions(+), 23 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index ddaddb4ce4c3..e675e8349e71 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -25,35 +25,20 @@
>
>  #define OVL_COPY_UP_CHUNK_SIZE (1 << 20)
>
> -static bool __read_mostly ovl_check_copy_up;
> -module_param_named(check_copy_up, ovl_check_copy_up, bool,
> -                  S_IWUSR | S_IRUGO);
> -MODULE_PARM_DESC(ovl_check_copy_up,
> -                "Warn on copy-up when causing process also has a R/O fd open");
> -
> -static int ovl_check_fd(const void *data, struct file *f, unsigned int fd)
> +static int ovl_ccup_set(const char *buf, const struct kernel_param *param)
>  {
> -       const struct dentry *dentry = data;
> -
> -       if (file_inode(f) == d_inode(dentry))
> -               pr_warn_ratelimited("overlayfs: Warning: Copying up %pD, but open R/O on fd %u which will cease to be coherent [pid=%d %s]\n",
> -                                   f, fd, current->pid, current->comm);
> +       WARN(1, "overlayfs: \"check_copy_up\" module option is obsolete\n");

I was under the impression that user controlled input should not be generating
WARNings... did you mean pr_warm?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/39] vfs: dedupe: extract helper for a single dedup
  2018-05-29 14:43 ` [PATCH 03/39] vfs: dedupe: extract helper for a single dedup Miklos Szeredi
@ 2018-05-29 15:41   ` Amir Goldstein
  2018-05-29 16:04     ` Amir Goldstein
  2018-06-04  8:44   ` Christoph Hellwig
  1 sibling, 1 reply; 83+ messages in thread
From: Amir Goldstein @ 2018-05-29 15:41 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 5:43 PM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> Extract vfs_dedupe_file_range_one() helper to deal with a single dedup
> request.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/read_write.c | 89 +++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 49 insertions(+), 40 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 1818581cadf6..82a53c44c0aa 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1964,6 +1964,44 @@ int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>  }
>  EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
>
> +static s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
> +                                    struct file *dst_file, loff_t dst_pos,
> +                                    u64 len)
> +{
> +       s64 ret;
> +
> +       ret = mnt_want_write_file(dst_file);
> +       if (ret)
> +               return ret;
> +
> +       ret = clone_verify_area(dst_file, dst_pos, len, true);
> +       if (ret < 0)
> +               goto out_drop_write;
> +
> +       ret = -EINVAL;
> +       if (!(capable(CAP_SYS_ADMIN) || (dst_file->f_mode & FMODE_WRITE)))
> +               goto out_drop_write;
> +
> +       ret = -EXDEV;
> +       if (src_file->f_path.mnt != dst_file->f_path.mnt)
> +               goto out_drop_write;
> +
> +       ret = -EISDIR;
> +       if (S_ISDIR(file_inode(dst_file)->i_mode))
> +               goto out_drop_write;
> +
> +       ret = -EINVAL;
> +       if (!dst_file->f_op->dedupe_file_range)
> +               goto out_drop_write;
> +
> +       ret = dst_file->f_op->dedupe_file_range(src_file, src_pos,
> +                                               dst_file, dst_pos, len);
> +out_drop_write:
> +       mnt_drop_write_file(dst_file);
> +
> +       return ret;
> +}
> +
>  int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>  {
>         struct file_dedupe_range_info *info;
> @@ -1972,10 +2010,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>         u64 len;
>         int i;
>         int ret;
> -       bool is_admin = capable(CAP_SYS_ADMIN);
>         u16 count = same->dest_count;
> -       struct file *dst_file;
> -       loff_t dst_off;
>         loff_t deduped;
>
>         if (!(file->f_mode & FMODE_READ))
> @@ -2010,54 +2045,28 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>         }
>
>         for (i = 0, info = same->info; i < count; i++, info++) {
> -               struct inode *dst;
>                 struct fd dst_fd = fdget(info->dest_fd);
> +               struct file *dst_file = dst_fd.file;
>
> -               dst_file = dst_fd.file;
>                 if (!dst_file) {
>                         info->status = -EBADF;
>                         goto next_loop;
>                 }
> -               dst = file_inode(dst_file);
> -
> -               ret = mnt_want_write_file(dst_file);
> -               if (ret) {
> -                       info->status = ret;
> -                       goto next_loop;
> -               }
> -
> -               dst_off = info->dest_offset;
> -               ret = clone_verify_area(dst_file, dst_off, len, true);
> -               if (ret < 0) {
> -                       info->status = ret;
> -                       goto next_file;
> -               }
> -               ret = 0;
>
>                 if (info->reserved) {
>                         info->status = -EINVAL;
> -               } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) {
> -                       info->status = -EINVAL;
> -               } else if (file->f_path.mnt != dst_file->f_path.mnt) {
> -                       info->status = -EXDEV;
> -               } else if (S_ISDIR(dst->i_mode)) {
> -                       info->status = -EISDIR;
> -               } else if (dst_file->f_op->dedupe_file_range == NULL) {
> -                       info->status = -EINVAL;
> -               } else {
> -                       deduped = dst_file->f_op->dedupe_file_range(file, off,
> -                                                       dst_file,
> -                                                       info->dest_offset, len);
> -                       if (deduped == -EBADE)
> -                               info->status = FILE_DEDUPE_RANGE_DIFFERS;
> -                       else if (deduped < 0)
> -                               info->status = deduped;
> -                       else
> -                               info->bytes_deduped += deduped;
> +                       goto next_loop;
>                 }
>
> -next_file:
> -               mnt_drop_write_file(dst_file);
> +               deduped = vfs_dedupe_file_range_one(file, off, dst_file,
> +                                                   info->dest_offset, len);
> +               if (deduped == -EBADE)
> +                       info->status = FILE_DEDUPE_RANGE_DIFFERS;
> +               else if (deduped < 0)
> +                       info->status = deduped;
> +               else
> +                       info->bytes_deduped += deduped;
> +
>  next_loop:
>                 fdput(dst_fd);
>

Please note that this patch conflicts with but is also an alternative to commit
227627114799 fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
on Al's fixes => for-next branch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/39] vfs: dedupe: extract helper for a single dedup
  2018-05-29 15:41   ` Amir Goldstein
@ 2018-05-29 16:04     ` Amir Goldstein
  0 siblings, 0 replies; 83+ messages in thread
From: Amir Goldstein @ 2018-05-29 16:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 6:41 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, May 29, 2018 at 5:43 PM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>> Extract vfs_dedupe_file_range_one() helper to deal with a single dedup
>> request.
>>
>> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
>> ---
>>  fs/read_write.c | 89 +++++++++++++++++++++++++++++++--------------------------
>>  1 file changed, 49 insertions(+), 40 deletions(-)
>>
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 1818581cadf6..82a53c44c0aa 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -1964,6 +1964,44 @@ int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>>  }
>>  EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
>>
>> +static s64 vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
>> +                                    struct file *dst_file, loff_t dst_pos,
>> +                                    u64 len)
>> +{
>> +       s64 ret;
>> +
>> +       ret = mnt_want_write_file(dst_file);
>> +       if (ret)
>> +               return ret;
>> +
>> +       ret = clone_verify_area(dst_file, dst_pos, len, true);
>> +       if (ret < 0)
>> +               goto out_drop_write;
>> +
>> +       ret = -EINVAL;
>> +       if (!(capable(CAP_SYS_ADMIN) || (dst_file->f_mode & FMODE_WRITE)))
>> +               goto out_drop_write;
>> +
>> +       ret = -EXDEV;
>> +       if (src_file->f_path.mnt != dst_file->f_path.mnt)
>> +               goto out_drop_write;
>> +
>> +       ret = -EISDIR;
>> +       if (S_ISDIR(file_inode(dst_file)->i_mode))
>> +               goto out_drop_write;
>> +
>> +       ret = -EINVAL;
>> +       if (!dst_file->f_op->dedupe_file_range)
>> +               goto out_drop_write;
>> +
>> +       ret = dst_file->f_op->dedupe_file_range(src_file, src_pos,
>> +                                               dst_file, dst_pos, len);
>> +out_drop_write:
>> +       mnt_drop_write_file(dst_file);
>> +
>> +       return ret;
>> +}
>> +
>>  int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>>  {
>>         struct file_dedupe_range_info *info;
>> @@ -1972,10 +2010,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>>         u64 len;
>>         int i;
>>         int ret;
>> -       bool is_admin = capable(CAP_SYS_ADMIN);
>>         u16 count = same->dest_count;
>> -       struct file *dst_file;
>> -       loff_t dst_off;
>>         loff_t deduped;
>>
>>         if (!(file->f_mode & FMODE_READ))
>> @@ -2010,54 +2045,28 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>>         }
>>
>>         for (i = 0, info = same->info; i < count; i++, info++) {
>> -               struct inode *dst;
>>                 struct fd dst_fd = fdget(info->dest_fd);
>> +               struct file *dst_file = dst_fd.file;
>>
>> -               dst_file = dst_fd.file;
>>                 if (!dst_file) {
>>                         info->status = -EBADF;
>>                         goto next_loop;
>>                 }
>> -               dst = file_inode(dst_file);
>> -
>> -               ret = mnt_want_write_file(dst_file);
>> -               if (ret) {
>> -                       info->status = ret;
>> -                       goto next_loop;
>> -               }
>> -
>> -               dst_off = info->dest_offset;
>> -               ret = clone_verify_area(dst_file, dst_off, len, true);
>> -               if (ret < 0) {
>> -                       info->status = ret;
>> -                       goto next_file;
>> -               }
>> -               ret = 0;
>>
>>                 if (info->reserved) {
>>                         info->status = -EINVAL;
>> -               } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) {
>> -                       info->status = -EINVAL;
>> -               } else if (file->f_path.mnt != dst_file->f_path.mnt) {
>> -                       info->status = -EXDEV;
>> -               } else if (S_ISDIR(dst->i_mode)) {
>> -                       info->status = -EISDIR;
>> -               } else if (dst_file->f_op->dedupe_file_range == NULL) {
>> -                       info->status = -EINVAL;
>> -               } else {
>> -                       deduped = dst_file->f_op->dedupe_file_range(file, off,
>> -                                                       dst_file,
>> -                                                       info->dest_offset, len);
>> -                       if (deduped == -EBADE)
>> -                               info->status = FILE_DEDUPE_RANGE_DIFFERS;
>> -                       else if (deduped < 0)
>> -                               info->status = deduped;
>> -                       else
>> -                               info->bytes_deduped += deduped;
>> +                       goto next_loop;
>>                 }
>>
>> -next_file:
>> -               mnt_drop_write_file(dst_file);
>> +               deduped = vfs_dedupe_file_range_one(file, off, dst_file,
>> +                                                   info->dest_offset, len);
>> +               if (deduped == -EBADE)
>> +                       info->status = FILE_DEDUPE_RANGE_DIFFERS;
>> +               else if (deduped < 0)
>> +                       info->status = deduped;
>> +               else
>> +                       info->bytes_deduped += deduped;
>> +
>>  next_loop:
>>                 fdput(dst_fd);
>>
>
> Please note that this patch conflicts with but is also an alternative to commit
> 227627114799 fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
> on Al's fixes => for-next branch.
>

Sorry, that's a conflict, and a rather trivial one, but Miklos' patch is not
an alternative fix.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 27/39] ovl: obsolete "check_copy_up" module option
  2018-05-29 15:13   ` Amir Goldstein
@ 2018-05-30  8:26     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-05-30  8:26 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 5:13 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, May 29, 2018 at 5:43 PM, Miklos Szeredi <mszeredi@redhat.com> wrote:

>> +       WARN(1, "overlayfs: \"check_copy_up\" module option is obsolete\n");
>
> I was under the impression that user controlled input should not be generating
> WARNings... did you mean pr_warm?

Okay.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/39] vfs: dedpue: return loff_t
  2018-05-29 14:43 ` [PATCH 01/39] vfs: dedpue: return loff_t Miklos Szeredi
@ 2018-06-04  8:43   ` Christoph Hellwig
  2018-06-05  8:33     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-unionfs, linux-fsdevel, linux-kernel, linux-xfs, ocfs2-devel

On Tue, May 29, 2018 at 04:43:01PM +0200, Miklos Szeredi wrote:
> f_op->dedupe_file_range() gets a u64 length to dedup and returns an ssize_t
> actual length deduped.  This breaks badly on 32bit archs since the returned
> length will be truncated and possibly overflow into the sign bit (xfs and
> ocfs2 are affected, btrfs limits actual length to 16MiB).

Can we just make it return 0 vs errno?  The only time we return
a different length than the one passed in is due to the btrfs cap.

Given that this API started out on btrfs we should just do the cap
everywhere to not confuse userspace.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/39] vfs: dedupe: extract helper for a single dedup
  2018-05-29 14:43 ` [PATCH 03/39] vfs: dedupe: extract helper for a single dedup Miklos Szeredi
  2018-05-29 15:41   ` Amir Goldstein
@ 2018-06-04  8:44   ` Christoph Hellwig
  1 sibling, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:44 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 04/39] vfs: add path_open()
  2018-05-29 14:43 ` [PATCH 04/39] vfs: add path_open() Miklos Szeredi
@ 2018-06-04  8:46   ` Christoph Hellwig
  2018-06-10  4:36     ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:46 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

> +EXPORT_SYMBOL(path_open);

EXPORT_SYMBOL_GPL, please.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/39] vfs: optionally don't account file in nr_files
  2018-05-29 14:43 ` [PATCH 05/39] vfs: optionally don't account file in nr_files Miklos Szeredi
@ 2018-06-04  8:47   ` Christoph Hellwig
  2018-06-04  8:57     ` Miklos Szeredi
  2018-06-10  4:41   ` Al Viro
  1 sibling, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:05PM +0200, Miklos Szeredi wrote:
> Stacking file operations in overlay will store an extra open file for each
> overlay file opened.
> 
> The overhead is just that of "struct file" which is about 256bytes, because
> overlay already pins an extra dentry and inode when the file is open, which
> add up to a much larger overhead.

But that overhead is exactly what nr_files accounts for, so this looks
bogus to me.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/39] vfs: add f_op->pre_mmap()
  2018-05-29 14:43 ` [PATCH 06/39] vfs: add f_op->pre_mmap() Miklos Szeredi
@ 2018-06-04  8:48   ` Christoph Hellwig
  2018-06-05 11:36     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:48 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:06PM +0200, Miklos Szeredi wrote:
> This is needed by overlayfs to be able to copy up a file from a read-only
> lower layer to a writable layer when being mapped shared.  When copying up,
> overlayfs takes VFS locks that would violate locking order when nested
> inside mmap_sem.
> 
> Add a new f_op->pre_mmap method, which is called before taking mmap_sem.

NAK.  We really should not add multiple methods for mmap, and everytime
this came up we found a better way to solve the problem instead.  Most
recent example was the socket zero copy receive code.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-05-29 14:43 ` [PATCH 07/39] vfs: export vfs_ioctl() to modules Miklos Szeredi
@ 2018-06-04  8:49   ` Christoph Hellwig
  2018-06-10  4:57     ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:49 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:07PM +0200, Miklos Szeredi wrote:
> This is needed by the stacked ioctl implementation in overlayfs.

EXPORT_SYMBOL_GPL for exporting random internals, please.  Same
for any following patches.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 32/39] vfs: fix freeze protection in mnt_want_write_file() for overlayfs
  2018-05-29 14:43 ` [PATCH 32/39] vfs: fix freeze protection in mnt_want_write_file() for overlayfs Miklos Szeredi
@ 2018-06-04  8:50   ` Christoph Hellwig
  0 siblings, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-04  8:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:32PM +0200, Miklos Szeredi wrote:
> The underlying real file used by overlayfs still contains the overlay path.
> This results in mnt_want_write_file() calls by the filesystem getting
> freeze protection on the wrong inode (the overlayfs one instead of the real
> one).
> 
> Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.

Looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/39] vfs: optionally don't account file in nr_files
  2018-06-04  8:47   ` Christoph Hellwig
@ 2018-06-04  8:57     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-04  8:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Mon, Jun 4, 2018 at 10:47 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, May 29, 2018 at 04:43:05PM +0200, Miklos Szeredi wrote:
>> Stacking file operations in overlay will store an extra open file for each
>> overlay file opened.
>>
>> The overhead is just that of "struct file" which is about 256bytes, because
>> overlay already pins an extra dentry and inode when the file is open, which
>> add up to a much larger overhead.
>
> But that overhead is exactly what nr_files accounts for, so this looks
> bogus to me.

According to comment above  files_maxfiles_init() one open file uses
roughly 1k, which is the total from struct file + pinned dentry +
pinned inode.  The actual struct file is just a quarter of that.

So while overlayfs does currently pin almost 2k per file and,
according to that calculation should already be using two nr_file
slots, it isn't.  And switching to using two slots means current
setups might well have regressions due to that.

I'm not against switching to two slots, but it's something that would
need to come with backward compatibility guarantees (e.g. explicitly
enabled with boot option, or whatever) and I don't think it's worth
the trouble.

Maintaining the two versions  of overlayfs (with and without stacked
fops) also makes little sense.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/39] vfs: dedpue: return loff_t
  2018-06-04  8:43   ` Christoph Hellwig
@ 2018-06-05  8:33     ` Miklos Szeredi
  2018-06-06 15:09       ` Darrick J. Wong
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-05  8:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel,
	linux-xfs, ocfs2-devel

On Mon, Jun 4, 2018 at 10:43 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, May 29, 2018 at 04:43:01PM +0200, Miklos Szeredi wrote:
>> f_op->dedupe_file_range() gets a u64 length to dedup and returns an ssize_t
>> actual length deduped.  This breaks badly on 32bit archs since the returned
>> length will be truncated and possibly overflow into the sign bit (xfs and
>> ocfs2 are affected, btrfs limits actual length to 16MiB).
>
> Can we just make it return 0 vs errno?  The only time we return
> a different length than the one passed in is due to the btrfs cap.
>
> Given that this API started out on btrfs we should just do the cap
> everywhere to not confuse userspace.

And that's a completely arbitrary cap; sure btrfs started out with
that, but there's no fundamental reason for that becoming the global
limit.  Xfs now added a different, larger limit, so based on what
reason should that limit be reduced?

I don't care either way, but at this stage I'm not going to change
this patch, unless there's a very good reason to do so, because if I
do someone will come and suggest another improvement, ad-infinitum...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/39] vfs: add f_op->pre_mmap()
  2018-06-04  8:48   ` Christoph Hellwig
@ 2018-06-05 11:36     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-05 11:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Mon, Jun 4, 2018 at 10:48 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, May 29, 2018 at 04:43:06PM +0200, Miklos Szeredi wrote:
>> This is needed by overlayfs to be able to copy up a file from a read-only
>> lower layer to a writable layer when being mapped shared.  When copying up,
>> overlayfs takes VFS locks that would violate locking order when nested
>> inside mmap_sem.
>>
>> Add a new f_op->pre_mmap method, which is called before taking mmap_sem.
>
> NAK.  We really should not add multiple methods for mmap, and everytime
> this came up we found a better way to solve the problem instead.  Most
> recent example was the socket zero copy receive code.

Okay, I'll drop this.

Not sure if it's better, but I have an idea for solving this without pre_mmap():

 - Private maps of lower files continue to use the underlying fs'
mapping.  This keeps the nice page sharing properties of overlays for
shared libraries, executables and most read-only uses.

 - Shared maps of lower file and all maps of upper files go to
overlayfs's own page cache.  In these cases we can't have shared
mappings, so it basically doesn't matter if the cache resides in the
underlying inode or the overlay inode.

The implementation is certainly going to be more complex, since we'll
have to add address space ops to overlayfs. .  The advantage will be
that we won't actually have to do the copy up when a lower file is
mapped with MAP_SHARED.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 02/39] vfs: dedupe: rationalize args
  2018-05-29 14:43 ` [PATCH 02/39] vfs: dedupe: rationalize args Miklos Szeredi
@ 2018-06-06 15:02   ` Darrick J. Wong
  0 siblings, 0 replies; 83+ messages in thread
From: Darrick J. Wong @ 2018-06-06 15:02 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:02PM +0200, Miklos Szeredi wrote:
> Clean up f_op->dedupe_file_range() interface.
> 
> 1) Use loff_t for offsets and length instead of u64
> 2) Order the arguments the same way as {copy|clone}_file_range().
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/btrfs/ctree.h   | 5 +++--
>  fs/btrfs/ioctl.c   | 5 +++--
>  fs/ocfs2/file.c    | 6 +++---
>  fs/read_write.c    | 4 ++--
>  fs/xfs/xfs_file.c  | 6 +++---
>  include/linux/fs.h | 4 ++--
>  6 files changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 990e011c9f0c..5968ba5aa0d1 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3271,8 +3271,9 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
>  				struct btrfs_ioctl_space_info *space);
>  void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
>  			       struct btrfs_ioctl_balance_args *bargs);
> -loff_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
> -			    struct file *dst_file, u64 dst_loff);
> +loff_t btrfs_dedupe_file_range(struct file *src_file, loff_t loff,
> +			       struct file *dst_file, loff_t dst_loff,
> +			       loff_t olen);
>  
>  /* file.c */
>  int __init btrfs_auto_defrag_init(void);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 1b5cc5fd4868..70eac76804df 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3194,8 +3194,9 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
>  
>  #define BTRFS_MAX_DEDUPE_LEN	SZ_16M
>  
> -loff_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
> -			       struct file *dst_file, u64 dst_loff)
> +loff_t btrfs_dedupe_file_range(struct file *src_file, loff_t loff,
> +			       struct file *dst_file, loff_t dst_loff,
> +			       loff_t olen)
>  {
>  	struct inode *src = file_inode(src_file);
>  	struct inode *dst = file_inode(dst_file);
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 4a81d82ab7f6..a024715cd227 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2538,10 +2538,10 @@ static int ocfs2_file_clone_range(struct file *file_in,
>  }
>  
>  static loff_t ocfs2_file_dedupe_range(struct file *src_file,
> -				      u64 loff,
> -				      u64 len,
> +				      loff_t loff,
>  				      struct file *dst_file,
> -				      u64 dst_loff)
> +				      loff_t dst_loff,
> +				      loff_t len)
>  {
>  	int error;
>  
> diff --git a/fs/read_write.c b/fs/read_write.c
> index c41e2a1eb7c7..1818581cadf6 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -2046,8 +2046,8 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>  			info->status = -EINVAL;
>  		} else {
>  			deduped = dst_file->f_op->dedupe_file_range(file, off,
> -							len, dst_file,
> -							info->dest_offset);
> +							dst_file,
> +							info->dest_offset, len);
>  			if (deduped == -EBADE)
>  				info->status = FILE_DEDUPE_RANGE_DIFFERS;
>  			else if (deduped < 0)
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index cf51d47efdb6..75704edfba82 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -875,10 +875,10 @@ xfs_file_clone_range(
>  STATIC loff_t
>  xfs_file_dedupe_range(
>  	struct file	*src_file,
> -	u64		loff,
> -	u64		len,
> +	loff_t		loff,
>  	struct file	*dst_file,
> -	u64		dst_loff)
> +	loff_t		dst_loff,
> +	loff_t		len)
>  {
>  	struct inode	*srci = file_inode(src_file);
>  	u64		max_dedupe;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8e49defc7aab..b0f290944220 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1738,8 +1738,8 @@ struct file_operations {
>  			loff_t, size_t, unsigned int);
>  	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
>  			u64);
> -	loff_t (*dedupe_file_range)(struct file *, u64, u64, struct file *,
> -			u64);
> +	loff_t (*dedupe_file_range)(struct file *, loff_t,
> +				    struct file *, loff_t, loff_t);
>  } __randomize_layout;

XFS/vfs parts look ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

>  
>  struct inode_operations {
> -- 
> 2.14.3
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/39] vfs: dedpue: return loff_t
  2018-06-05  8:33     ` Miklos Szeredi
@ 2018-06-06 15:09       ` Darrick J. Wong
  2018-06-18 20:08         ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Darrick J. Wong @ 2018-06-06 15:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christoph Hellwig, Miklos Szeredi, overlayfs, linux-fsdevel,
	linux-kernel, linux-xfs, ocfs2-devel

On Tue, Jun 05, 2018 at 10:33:22AM +0200, Miklos Szeredi wrote:
> On Mon, Jun 4, 2018 at 10:43 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > On Tue, May 29, 2018 at 04:43:01PM +0200, Miklos Szeredi wrote:
> >> f_op->dedupe_file_range() gets a u64 length to dedup and returns an ssize_t
> >> actual length deduped.  This breaks badly on 32bit archs since the returned
> >> length will be truncated and possibly overflow into the sign bit (xfs and
> >> ocfs2 are affected, btrfs limits actual length to 16MiB).
> >
> > Can we just make it return 0 vs errno?  The only time we return
> > a different length than the one passed in is due to the btrfs cap.
> >
> > Given that this API started out on btrfs we should just do the cap
> > everywhere to not confuse userspace.
> 
> And that's a completely arbitrary cap; sure btrfs started out with
> that, but there's no fundamental reason for that becoming the global
> limit.  Xfs now added a different, larger limit, so based on what
> reason should that limit be reduced?
> 
> I don't care either way, but at this stage I'm not going to change
> this patch, unless there's a very good reason to do so, because if I
> do someone will come and suggest another improvement, ad-infinitum...

I think we should hoist the MAX_RW_COUNT/2 limit to the VFS helpers
since afaict we generally cap max IO per call at MAX_RW_COUNT.  (I
probably should've capped ocfs2 back when I did xfs, but forgot).  If
btrfs wants to keep their lower (16M) limit then they're free to do so;
the interface documentation allows for this.  One of the btrfs
developers seems to be working on a patch series to raise the limit[1]
anyway.

--D

[1] https://www.spinics.net/lists/linux-btrfs/msg78392.html

> 
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-05-29 14:43 ` [PATCH 14/39] ovl: stack file ops Miklos Szeredi
@ 2018-06-10  4:13   ` Al Viro
  2018-06-11  7:09     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-10  4:13 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote:
> Implement file operations on a regular overlay file.  The underlying file
> is opened separately and cached in ->private_data.
> 
> It might be worth making an exception for such files when accounting in
> nr_file to confirm to userspace expectations.  We are only adding a small
> overhead (248bytes for the struct file) since the real inode and dentry are
> pinned by overlayfs anyway.
> 
> This patch doesn't have any effect, since the vfs will use d_real() to find
> the real underlying file to open.  The patch at the end of the series will
> actually enable this functionality.

> +static struct file *ovl_open_realfile(const struct file *file)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct inode *upperinode = ovl_inode_upper(inode);
> +	struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
> +	struct file *realfile;
> +	const struct cred *old_cred;
> +
> +	old_cred = ovl_override_creds(inode->i_sb);
> +	realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
> +			     realinode, current_cred(), false);
> +	revert_creds(old_cred);
> +
> +	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
> +		 file, file, upperinode ? 'u' : 'l', file->f_flags,
> +		 realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
> +
> +	return realfile;
> +}

IDGI.  OK, you open a file in the layer you want; good, but why the hell do you
*not* use the dentry/vfsmount from the same layer?

IOW, why does your path_open() get an explicit inode argument at all?  With the
rest of the work done in that series it looks like you should be able to use
vfs_open() instead...  Sure, for ovlfs file you want ->f_path on overlayfs and
not in a layer, but why do the same for those?

And why bother with override_creds at all?  What's wrong with simply passing
->creator_cred to path_open()/vfs_open()/whatnot?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 04/39] vfs: add path_open()
  2018-06-04  8:46   ` Christoph Hellwig
@ 2018-06-10  4:36     ` Al Viro
  0 siblings, 0 replies; 83+ messages in thread
From: Al Viro @ 2018-06-10  4:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miklos Szeredi, linux-unionfs, linux-fsdevel, linux-kernel

On Mon, Jun 04, 2018 at 01:46:09AM -0700, Christoph Hellwig wrote:
> > +EXPORT_SYMBOL(path_open);
> 
> EXPORT_SYMBOL_GPL, please.

	No.

	If interface makes sense, export it.  If it doens't, don't.
Don't mix "it's a shit API, but we need it for some in-kernel module"
with "out-of-tree code should be GPL, especially if it uses this".
For non-trivial work I will, teeth gritting, accept that kind of
stuff.  For anything as trivial as this - fuck, no.

	In this particular case, it *is* a dubious API - AFAICS,
ovl_open_realfile() could just pass vfsmount/dentry from the right
layer to vfs_open().  We might or might not need path_open() for the
duration of the series (I hadn't looked into the PITA it would be
to reorder), but it really looks like it could disappear by the end
of it, along with the temporary export.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/39] vfs: optionally don't account file in nr_files
  2018-05-29 14:43 ` [PATCH 05/39] vfs: optionally don't account file in nr_files Miklos Szeredi
  2018-06-04  8:47   ` Christoph Hellwig
@ 2018-06-10  4:41   ` Al Viro
  1 sibling, 0 replies; 83+ messages in thread
From: Al Viro @ 2018-06-10  4:41 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:05PM +0200, Miklos Szeredi wrote:
> +++ b/fs/open.c
> @@ -732,8 +732,8 @@ static int do_dentry_open(struct file *f,
>  	static const struct file_operations empty_fops = {};
>  	int error;
>  
> -	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
> -				FMODE_PREAD | FMODE_PWRITE;
> +	f->f_mode = (f->f_mode & FMODE_NOACCOUNT) | OPEN_FMODE(f->f_flags) |
> +		FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE;

Why bother with this complexity?  I mean, why not simply
	f->f_mode |= OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
				FMODE_PREAD | FMODE_PWRITE;

and be done with that...

> @@ -743,7 +743,7 @@ static int do_dentry_open(struct file *f,
>  	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
>  
>  	if (unlikely(f->f_flags & O_PATH)) {
> -		f->f_mode = FMODE_PATH;
> +		f->f_mode = (f->f_mode & FMODE_NOACCOUNT) | FMODE_PATH;

That makes no sense at all.  What would ever pass O_PATH opens
from "noaccount" call site?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-04  8:49   ` Christoph Hellwig
@ 2018-06-10  4:57     ` Al Viro
  2018-06-11  7:19       ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-10  4:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miklos Szeredi, linux-unionfs, linux-fsdevel, linux-kernel

On Mon, Jun 04, 2018 at 01:49:04AM -0700, Christoph Hellwig wrote:
> On Tue, May 29, 2018 at 04:43:07PM +0200, Miklos Szeredi wrote:
> > This is needed by the stacked ioctl implementation in overlayfs.
> 
> EXPORT_SYMBOL_GPL for exporting random internals, please.  Same
> for any following patches.

*blink*

Christoph, get real and RTFS - vfs_ioctl() simply calls ->unlocked_ioctl();
all there is to it.

This isn't even a case of "using that function establishes that the
caller is a derived work" - *anyone* who can see definition of
file_operations can bloody well open-code it.  There isn't anything
establishing derivation here.

Hell, it could've been a static inline in include/linux/fs.h and it would
neither differ from many other inlines in there nor need an export at all.

This is really getting close to lxo-worthy levels of bogosity...

More interesting question is why do we want to pass those ioctls to layers
in the first place, especially if it's something with different availability
(or, worse yet, argument layouts) before and after copyup.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 19/39] ovl: add ovl_mmap()
  2018-05-29 14:43 ` [PATCH 19/39] ovl: add ovl_mmap() Miklos Szeredi
@ 2018-06-10  5:24   ` Al Viro
  2018-06-11  7:58     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-10  5:24 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:19PM +0200, Miklos Szeredi wrote:
> Implement stacked mmap.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/file.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 7b47dce4b072..4057bbf2e141 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -255,6 +255,33 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>  	return ret;
>  }
>  
> +static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct fd real;
> +	const struct cred *old_cred;
> +	int ret;
> +
> +	ret = ovl_real_fdget(file, &real);
> +	if (ret)
> +		return ret;
> +
> +	/* transfer ref: */
> +	fput(vma->vm_file);
> +	vma->vm_file = get_file(real.file);
> +	fdput(real);
> +
> +	if (!vma->vm_file->f_op->mmap)
> +		return -ENODEV;

That's broken.  ->mmap() failure will fput(file), not fput(vma->vm_file).
What's more, _here_ your "corner case" is a huge DoS - open file r/o,
then have somebody else trigger copyup, then do tons of MAP_PRIVATE
mmaps on the r/o descriptor.  *EACH* *OF* *THEM* will open a separate
struct file and stash into into new vmas.

NAK with extreme prejudice, sensu PTerry...

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 23/39] ovl: add O_DIRECT support
  2018-05-29 14:43 ` [PATCH 23/39] ovl: add O_DIRECT support Miklos Szeredi
@ 2018-06-10  5:31   ` Al Viro
  2018-06-11  8:08     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-10  5:31 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:23PM +0200, Miklos Szeredi wrote:
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/file.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 3f610a5b38e4..e5e7ccaaf9ec 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -110,6 +110,9 @@ static int ovl_open(struct inode *inode, struct file *file)
>  	if (IS_ERR(realfile))
>  		return PTR_ERR(realfile);
>  
> +	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
> +	file->f_mapping = realfile->f_mapping;

Umm...  What happens if upper layer doesn't allow O_DIRECT, while the lower one does?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 15/39] ovl: add helper to return real file
  2018-05-29 14:43 ` [PATCH 15/39] ovl: add helper to return real file Miklos Szeredi
@ 2018-06-10  5:42   ` Al Viro
  2018-06-11  8:11     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-10  5:42 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs, linux-fsdevel, linux-kernel

On Tue, May 29, 2018 at 04:43:15PM +0200, Miklos Szeredi wrote:
> In the common case we can just use the real file cached in
> file->private_data.  There are two exceptions:
> 
> 1) File has been copied up since open: in this unlikely corner case just
> use a throwaway real file for the operation.  If ever this becomes a
> perfomance problem (very unlikely, since overlayfs has been doing most fine
> without correctly handling this case at all), then we can deal with that by
> updating the cached real file.

See the ovl_mmap() problem.  FWIW, I would probably suggest something along
the lines of
	->private_data either points to struct file, or is 1 | address of
2-element array of struct file *
	odd value => mask bit 0 away, cast to struct file ** and dereference
	even value and it's still in the right layer => use that
	even value and it is in the wrong layer =>
		allocate a two-pointer array
		open in the right layer
		stick that into array[0] and original - into array[1]
		cmpxchg array | 1 into ->private_data
		if that succeeds
			return array[0]
		else
			fput array[0], free array, then use the value returned
			by cmpxchg - mask bit 0 away, cast and dereference

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-10  4:13   ` Al Viro
@ 2018-06-11  7:09     ` Miklos Szeredi
  2018-06-12  2:29       ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-11  7:09 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Sun, Jun 10, 2018 at 6:13 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote:
>> Implement file operations on a regular overlay file.  The underlying file
>> is opened separately and cached in ->private_data.
>>
>> It might be worth making an exception for such files when accounting in
>> nr_file to confirm to userspace expectations.  We are only adding a small
>> overhead (248bytes for the struct file) since the real inode and dentry are
>> pinned by overlayfs anyway.
>>
>> This patch doesn't have any effect, since the vfs will use d_real() to find
>> the real underlying file to open.  The patch at the end of the series will
>> actually enable this functionality.
>
>> +static struct file *ovl_open_realfile(const struct file *file)
>> +{
>> +     struct inode *inode = file_inode(file);
>> +     struct inode *upperinode = ovl_inode_upper(inode);
>> +     struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
>> +     struct file *realfile;
>> +     const struct cred *old_cred;
>> +
>> +     old_cred = ovl_override_creds(inode->i_sb);
>> +     realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
>> +                          realinode, current_cred(), false);
>> +     revert_creds(old_cred);
>> +
>> +     pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
>> +              file, file, upperinode ? 'u' : 'l', file->f_flags,
>> +              realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
>> +
>> +     return realfile;
>> +}
>
> IDGI.  OK, you open a file in the layer you want; good, but why the hell do you
> *not* use the dentry/vfsmount from the same layer?
>
> IOW, why does your path_open() get an explicit inode argument at all?  With the
> rest of the work done in that series it looks like you should be able to use
> vfs_open() instead...  Sure, for ovlfs file you want ->f_path on overlayfs and
> not in a layer, but why do the same for those?

I'd really like to get there some time but...

List of basic requirements:

 - Private mmap of overlay file shares page cache with lower file (and
hence with all other overlays using the same lower file).

 - /proc/PID/maps shows correct path.

Thought about setting f_mapping/i_mapping of overlay file to that of
underlying file.  But that breaks when doing a copy-up.  We can't just
go and change those mapping pointers, assumption is that those remain
constant (we'd need READ_ONCE() for all cases where we use the mapping
more than once).  It's probably doable, but it's a large and fragile
change.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-10  4:57     ` Al Viro
@ 2018-06-11  7:19       ` Miklos Szeredi
  2018-06-11 16:24         ` Christoph Hellwig
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-11  7:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Miklos Szeredi, overlayfs, linux-fsdevel,
	linux-kernel

On Sun, Jun 10, 2018 at 6:57 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Jun 04, 2018 at 01:49:04AM -0700, Christoph Hellwig wrote:
>> On Tue, May 29, 2018 at 04:43:07PM +0200, Miklos Szeredi wrote:
>> > This is needed by the stacked ioctl implementation in overlayfs.
>>
>> EXPORT_SYMBOL_GPL for exporting random internals, please.  Same
>> for any following patches.
>
> *blink*
>
> Christoph, get real and RTFS - vfs_ioctl() simply calls ->unlocked_ioctl();
> all there is to it.
>
> This isn't even a case of "using that function establishes that the
> caller is a derived work" - *anyone* who can see definition of
> file_operations can bloody well open-code it.  There isn't anything
> establishing derivation here.
>
> Hell, it could've been a static inline in include/linux/fs.h and it would
> neither differ from many other inlines in there nor need an export at all.
>
> This is really getting close to lxo-worthy levels of bogosity...
>
> More interesting question is why do we want to pass those ioctls to layers
> in the first place, especially if it's something with different availability
> (or, worse yet, argument layouts) before and after copyup.

We don't.  Obviously need to make sure to only ever do ioctl's in
overlayfs that have a common definition across filesystems.  Not a lot
of those, luckily...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 19/39] ovl: add ovl_mmap()
  2018-06-10  5:24   ` Al Viro
@ 2018-06-11  7:58     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-11  7:58 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-unionfs, linux-fsdevel, linux-kernel

On Sun, Jun 10, 2018 at 06:24:59AM +0100, Al Viro wrote:
> On Tue, May 29, 2018 at 04:43:19PM +0200, Miklos Szeredi wrote:
> > Implement stacked mmap.
> > 
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> >  fs/overlayfs/file.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index 7b47dce4b072..4057bbf2e141 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -255,6 +255,33 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >  	return ret;
> >  }
> >  
> > +static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	struct fd real;
> > +	const struct cred *old_cred;
> > +	int ret;
> > +
> > +	ret = ovl_real_fdget(file, &real);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* transfer ref: */
> > +	fput(vma->vm_file);
> > +	vma->vm_file = get_file(real.file);
> > +	fdput(real);
> > +
> > +	if (!vma->vm_file->f_op->mmap)
> > +		return -ENODEV;
> 
> That's broken.  ->mmap() failure will fput(file), not fput(vma->vm_file).
> What's more, _here_ your "corner case" is a huge DoS - open file r/o,
> then have somebody else trigger copyup, then do tons of MAP_PRIVATE
> mmaps on the r/o descriptor.  *EACH* *OF* *THEM* will open a separate
> struct file and stash into into new vmas.
> 
> NAK with extreme prejudice, sensu PTerry...

Okay, okay, got it now.  Incremental below.  It's a step back (mmap after
copy-up will get old data), but not a regression from current state.  Obviously
need to fix properly and I think that's doable together with dealing with shared
map coherency.


diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index f801e1175a0b..b5a6bcc1bcfa 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -282,26 +282,30 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 
 static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	struct fd real;
+	struct file *realfile = file->private_data;
 	const struct cred *old_cred;
 	int ret;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	if (!realfile->f_op->mmap)
+		return -ENODEV;
 
-	/* transfer ref: */
-	fput(vma->vm_file);
-	vma->vm_file = get_file(real.file);
-	fdput(real);
+	if (WARN_ON(file != vma->vm_file))
+		return -EIO;
 
-	if (!vma->vm_file->f_op->mmap)
-		return -ENODEV;
+	vma->vm_file = get_file(realfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = call_mmap(vma->vm_file, vma);
 	revert_creds(old_cred);
 
+	if (ret) {
+		/* Drop reference count from new vm_file value */
+		fput(realfile);
+	} else {
+		/* Drop reference count from previous vm_file value */
+		fput(file);
+	}
+
 	ovl_file_accessed(file);
 
 	return ret;

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 23/39] ovl: add O_DIRECT support
  2018-06-10  5:31   ` Al Viro
@ 2018-06-11  8:08     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-11  8:08 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Sun, Jun 10, 2018 at 7:31 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, May 29, 2018 at 04:43:23PM +0200, Miklos Szeredi wrote:
>> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
>> ---
>>  fs/overlayfs/file.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
>> index 3f610a5b38e4..e5e7ccaaf9ec 100644
>> --- a/fs/overlayfs/file.c
>> +++ b/fs/overlayfs/file.c
>> @@ -110,6 +110,9 @@ static int ovl_open(struct inode *inode, struct file *file)
>>       if (IS_ERR(realfile))
>>               return PTR_ERR(realfile);
>>
>> +     /* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
>> +     file->f_mapping = realfile->f_mapping;
>
> Umm...  What happens if upper layer doesn't allow O_DIRECT, while the lower one does?

Will get EINVAL on read(2) after copy up.  Not sure if it can be
called a regression, since it's a corner case of a corner case.

I think proper solution is to support O_DIRECT unconditionally on
upper (and for the likes of shmfs, just fall back to "cached" I/O).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 15/39] ovl: add helper to return real file
  2018-06-10  5:42   ` Al Viro
@ 2018-06-11  8:11     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-11  8:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Sun, Jun 10, 2018 at 7:42 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, May 29, 2018 at 04:43:15PM +0200, Miklos Szeredi wrote:
>> In the common case we can just use the real file cached in
>> file->private_data.  There are two exceptions:
>>
>> 1) File has been copied up since open: in this unlikely corner case just
>> use a throwaway real file for the operation.  If ever this becomes a
>> perfomance problem (very unlikely, since overlayfs has been doing most fine
>> without correctly handling this case at all), then we can deal with that by
>> updating the cached real file.
>
> See the ovl_mmap() problem.  FWIW, I would probably suggest something along
> the lines of
>         ->private_data either points to struct file, or is 1 | address of
> 2-element array of struct file *
>         odd value => mask bit 0 away, cast to struct file ** and dereference
>         even value and it's still in the right layer => use that
>         even value and it is in the wrong layer =>
>                 allocate a two-pointer array
>                 open in the right layer
>                 stick that into array[0] and original - into array[1]
>                 cmpxchg array | 1 into ->private_data
>                 if that succeeds
>                         return array[0]
>                 else
>                         fput array[0], free array, then use the value returned
>                         by cmpxchg - mask bit 0 away, cast and dereference

Iff we really need that complexity, then yes, that's a nice solution.
But I think we don't:  see incremental posted for ->mmap() issue + for
plain I/O we don't really care about this case, since it happens so
rarely.  Maybe later...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-11  7:19       ` Miklos Szeredi
@ 2018-06-11 16:24         ` Christoph Hellwig
  2018-06-19 14:04           ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-11 16:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Christoph Hellwig, Miklos Szeredi, overlayfs,
	linux-fsdevel, linux-kernel

On Mon, Jun 11, 2018 at 09:19:01AM +0200, Miklos Szeredi wrote:
> We don't.  Obviously need to make sure to only ever do ioctl's in
> overlayfs that have a common definition across filesystems.  Not a lot
> of those, luckily...

Which are those?  If they are common and possibly called from kernel
code they should probably be made into methods instead.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-11  7:09     ` Miklos Szeredi
@ 2018-06-12  2:29       ` Al Viro
  2018-06-12  2:40         ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-12  2:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Mon, Jun 11, 2018 at 09:09:04AM +0200, Miklos Szeredi wrote:

[context: opening files in layers with unholy mix of overlayfs
->f_path and layer's ->f_inode/->f_op]

> I'd really like to get there some time but...
> 
> List of basic requirements:
> 
>  - Private mmap of overlay file shares page cache with lower file (and
> hence with all other overlays using the same lower file).
> 
>  - /proc/PID/maps shows correct path.
> 
> Thought about setting f_mapping/i_mapping of overlay file to that of
> underlying file.  But that breaks when doing a copy-up.  We can't just
> go and change those mapping pointers, assumption is that those remain
> constant (we'd need READ_ONCE() for all cases where we use the mapping
> more than once).  It's probably doable, but it's a large and fragile
> change.

We are really asking for trouble here - anything with e.g. ->read_iter()
using dentry will get in trouble with that kind of games.  Consider something
like

foo_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
	struct file *file = iocb->ki_filp;
	struct foo_data *p = file->f_path.dentry->d_fsdata;
	...
}

which will work just fine for files on foofs, where we have ->d_fsdata set
on lookup.  Now, try to use foofs as a layer; suddenly, you get foofs
files with ->f_path.dentry being *overlayfs* dentry, with ->d_fsdata
being nothing like struct foo_data *.

Better yet, consider

foo_open(struct inode *inode, struct file *file)
{
	struct dentry *dentry = file->f_path.dentry;
	...
	foo_add_splat(dentry, splat);
	...
}
where foo_add_splat() inserts struct foo_splat into an hlist starting
in dentry->d_fsdata.  That's not a pure theory - we *do* have ->open()
instances doing things of that sort.  That'll bugger overlayfs quite
badly, not to mention that foofs methods won't be happy with overlayfs
dentries.

It might (or might not) work for the filesystems you'd been testing
on, but it's a lot of trouble waiting to happen.  Hell, try and use
ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
a dumb testcase, but I don't see how to check if something more
realistic is trouble-free.

I'd been trying to come up with some way to salvage that kludge of yours,
but I don't see any solutions.  We don't have good proxies for "this
filesystem might be unsafe as lower layer" ;-/

Frankly, it might be saner and safer to teach procfs (and similar
places) to do more than just use ->vm_file->f_path.  _That_ at least
is much more local in impact.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-12  2:29       ` Al Viro
@ 2018-06-12  2:40         ` Al Viro
  2018-06-12  9:24           ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-12  2:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote:

> It might (or might not) work for the filesystems you'd been testing
> on, but it's a lot of trouble waiting to happen.  Hell, try and use
> ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
> a dumb testcase, but I don't see how to check if something more
> realistic is trouble-free.
> 
> I'd been trying to come up with some way to salvage that kludge of yours,
> but I don't see any solutions.  We don't have good proxies for "this
> filesystem might be unsafe as lower layer" ;-/

Note that anything that uses file_dentry() anywhere near ->open(),
->read_iter() or ->write_iter() is an instant trouble with your scheme.
Such as
int nfs_open(struct inode *inode, struct file *filp)
{
        struct nfs_open_context *ctx;

        ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
        if (IS_ERR(ctx)) 
                return PTR_ERR(ctx);
        nfs_file_set_open_context(filp, ctx);
        put_nfs_open_context(ctx);
        nfs_fscache_open_file(inode, filp);
        return 0;
}

You do want to support NFS for lower layers, right?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-12  2:40         ` Al Viro
@ 2018-06-12  9:24           ` Miklos Szeredi
  2018-06-12 18:24             ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-12  9:24 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Tue, Jun 12, 2018 at 4:40 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote:
>
>> It might (or might not) work for the filesystems you'd been testing
>> on, but it's a lot of trouble waiting to happen.  Hell, try and use
>> ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
>> a dumb testcase, but I don't see how to check if something more
>> realistic is trouble-free.

That's funny, because when dhowells added the patch to make f_path
point to the overlay, I was fighting tooth and claw against that
change on the grounds of being unsafe, but it went through regardless
(and was in fact one of the biggest headaches in overlay/vfs
interaction).

So you might be right that there are bugs in the handling of ecryptfs,
etc, however the patchset is guaranteed not to cause regressions in
this area.

And yes, it would be best to get rid of that kludge once and for all.

>>
>> I'd been trying to come up with some way to salvage that kludge of yours,
>> but I don't see any solutions.  We don't have good proxies for "this
>> filesystem might be unsafe as lower layer" ;-/
>
> Note that anything that uses file_dentry() anywhere near ->open(),
> ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> Such as
> int nfs_open(struct inode *inode, struct file *filp)
> {
>         struct nfs_open_context *ctx;
>
>         ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
>         if (IS_ERR(ctx))
>                 return PTR_ERR(ctx);
>         nfs_file_set_open_context(filp, ctx);
>         put_nfs_open_context(ctx);
>         nfs_fscache_open_file(inode, filp);
>         return 0;
> }
>
> You do want to support NFS for lower layers, right?

There's no change regarding how file_dentry() works.  We've just
pushed these weird files (f_path points to overlay, f_inode points to
underlay) down into the guts of overlayfs and are not directly
referenced from the file table anymore.  That shouldn't make *any*
difference from the lower fs's pov.

The only difference is that now the real file has creds inherited from
mounter task.  If lower filesystem's a_ops did some permission
checking based on that, then that might make a difference in behavior.
But I guess that difference would be in the positive direction, making
behavior more consistent.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-12  9:24           ` Miklos Szeredi
@ 2018-06-12 18:24             ` Al Viro
  2018-06-12 18:31               ` Al Viro
  2018-06-13 11:56               ` J. R. Okajima
  0 siblings, 2 replies; 83+ messages in thread
From: Al Viro @ 2018-06-12 18:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote:

> > Note that anything that uses file_dentry() anywhere near ->open(),
> > ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> > Such as
> > int nfs_open(struct inode *inode, struct file *filp)
> > {
> >         struct nfs_open_context *ctx;
> >
> >         ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
> >         if (IS_ERR(ctx))
> >                 return PTR_ERR(ctx);
> >         nfs_file_set_open_context(filp, ctx);
> >         put_nfs_open_context(ctx);
> >         nfs_fscache_open_file(inode, filp);
> >         return 0;
> > }
> >
> > You do want to support NFS for lower layers, right?
> 
> There's no change regarding how file_dentry() works.  We've just
> pushed these weird files (f_path points to overlay, f_inode points to
> underlay) down into the guts of overlayfs and are not directly
> referenced from the file table anymore.  That shouldn't make *any*
> difference from the lower fs's pov.

*owwww*
I'd managed to push that particular nest of horrors out of mind ;-/
Having dug out my notes from back then and grepped around...  The real
mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
reasons for that kludge are still valid ;-/

Fuck.  OK, so we want to get rid of ->f_path.dentry accesses and see
that they don't come back.  Leaving them around due to "it won't come
anywhere near overlayfs" was a mistake of the same kind as leaving
d_add() in ->lookup() instances where we'd been certain that filesystem
would never get exported over NFS.  Just as we'd got open-by-handle for
e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in
overlayfs...

I hate it, but... consider path_open() objections withdrawn for now.
Uses of ->vm_file (and rules for those) are too convoluted to untangle
at the moment.  I still would love to get that straightened out, but
it's not this cycle fodder, more's the pity...

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-12 18:24             ` Al Viro
@ 2018-06-12 18:31               ` Al Viro
  2018-06-13  9:21                 ` Miklos Szeredi
  2018-06-13 11:56               ` J. R. Okajima
  1 sibling, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-12 18:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:
> On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote:
> 
> > > Note that anything that uses file_dentry() anywhere near ->open(),
> > > ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> > > Such as
> > > int nfs_open(struct inode *inode, struct file *filp)
> > > {
> > >         struct nfs_open_context *ctx;
> > >
> > >         ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
> > >         if (IS_ERR(ctx))
> > >                 return PTR_ERR(ctx);
> > >         nfs_file_set_open_context(filp, ctx);
> > >         put_nfs_open_context(ctx);
> > >         nfs_fscache_open_file(inode, filp);
> > >         return 0;
> > > }
> > >
> > > You do want to support NFS for lower layers, right?
> > 
> > There's no change regarding how file_dentry() works.  We've just
> > pushed these weird files (f_path points to overlay, f_inode points to
> > underlay) down into the guts of overlayfs and are not directly
> > referenced from the file table anymore.  That shouldn't make *any*
> > difference from the lower fs's pov.
> 
> *owwww*
> I'd managed to push that particular nest of horrors out of mind ;-/
> Having dug out my notes from back then and grepped around...  The real
> mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
> reasons for that kludge are still valid ;-/
> 
> Fuck.  OK, so we want to get rid of ->f_path.dentry accesses and see
> that they don't come back.  Leaving them around due to "it won't come
> anywhere near overlayfs" was a mistake of the same kind as leaving
> d_add() in ->lookup() instances where we'd been certain that filesystem
> would never get exported over NFS.  Just as we'd got open-by-handle for
> e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in
> overlayfs...
> 
> I hate it, but... consider path_open() objections withdrawn for now.
> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> at the moment.  I still would love to get that straightened out, but
> it's not this cycle fodder, more's the pity...

PS: conversion of ->f_path.dentry is easy and that can probably go this
cycle - it's a fairly trivial change, with no functional changes unless
overlayfs is used with <filesystem>, fixing really bad shit if it ever
gets used thus.  I'm not asking to put that into overlayfs pull *and*
it's independent from the "want to kill that fucking kludge" stuff.
The latter is too hard for this cycle, unfortunately.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-12 18:31               ` Al Viro
@ 2018-06-13  9:21                 ` Miklos Szeredi
  2018-06-15  5:47                   ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-13  9:21 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Tue, Jun 12, 2018 at 8:31 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:

>> I hate it, but... consider path_open() objections withdrawn for now.

Is that an ACK for the pull if I follow up with fixes for mmap botch, etc?

>> Uses of ->vm_file (and rules for those) are too convoluted to untangle
>> at the moment.  I still would love to get that straightened out, but
>> it's not this cycle fodder, more's the pity...

Looked at some other options...  What coda mmap does looks very
dubious.  It only sets f_mapping, not vm_file.  That's going to get
into all sorts of trouble when underlying fs tries to look at
file_inode() or worse, ->private_data.  Looks like that should be
converted to what overlayfs does, to have a remote chance of actually
not crashing on most filesystems.  Does anybody actually use coda
still?

> PS: conversion of ->f_path.dentry is easy and that can probably go this
> cycle - it's a fairly trivial change, with no functional changes unless
> overlayfs is used with <filesystem>, fixing really bad shit if it ever
> gets used thus.  I'm not asking to put that into overlayfs pull *and*
> it's independent from the "want to kill that fucking kludge" stuff.
> The latter is too hard for this cycle, unfortunately.

So this is about adding a file_dentry_check() (or whatever we want to
call it) helper to be used by all filesystems when dereferecing
f_path.dentry, right?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-12 18:24             ` Al Viro
  2018-06-12 18:31               ` Al Viro
@ 2018-06-13 11:56               ` J. R. Okajima
  1 sibling, 0 replies; 83+ messages in thread
From: J. R. Okajima @ 2018-06-13 11:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Miklos Szeredi, overlayfs, linux-fsdevel,
	linux-kernel, Linus Torvalds

Al Viro:
> I'd managed to push that particular nest of horrors out of mind ;-/
> Having dug out my notes from back then and grepped around...  The real
> mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
> reasons for that kludge are still valid ;-/
	:::
> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> at the moment.  I still would love to get that straightened out, but
> it's not this cycle fodder, more's the pity...

I don't fully read this thread, but the discussion is related to the
file path printed in /proc/$$/maps?  If so, as just for your
information, here is an approach that aufs took.

In linux-v2.6 era, aufs tried implementing mmap by customzing
address_space ops, but it was not good and failed completing the
implementation.
As wel as overlayfs, aufs has two struct file objects for a single
a regular file.  One is for a virtual aufs' entry, and the other is for
a real layer's entry.  When a user issues mmap(2) for the virtual file,
aufs redirects the request to the real file on the layer internally.  So
the vm_file points to the real file.  It means /proc/$$/maps prints the
unexpected file path.

Aufs added another struct file* vm_prfile in struct vma.  It points to
the virtual aufs file, and /proc/$$/maps prints vm_prfile instead of
vm_file. Of cource, maintaining vm_prfile is important since vma may be
merged or splitted.
Still I don't like this approach, but I don't have another better idea,
also it works for many years.  You can get the patch in
aufs4-standalone.git on sourceforge if you want.


J. R. Okajima

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-13  9:21                 ` Miklos Szeredi
@ 2018-06-15  5:47                   ` Al Viro
  2018-06-18 11:50                     ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Al Viro @ 2018-06-15  5:47 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote:
> On Tue, Jun 12, 2018 at 8:31 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:
> 
> >> I hate it, but... consider path_open() objections withdrawn for now.
> 
> Is that an ACK for the pull if I follow up with fixes for mmap botch, etc?

Yes.

> >> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> >> at the moment.  I still would love to get that straightened out, but
> >> it's not this cycle fodder, more's the pity...
> 
> Looked at some other options...  What coda mmap does looks very
> dubious.  It only sets f_mapping, not vm_file.  That's going to get
> into all sorts of trouble when underlying fs tries to look at
> file_inode() or worse, ->private_data.  Looks like that should be
> converted to what overlayfs does, to have a remote chance of actually
> not crashing on most filesystems.  Does anybody actually use coda
> still?

Keep in mind that coda is using the local fs only as cache; IOW, its needs
are much more limited than those of overlayfs - local r/w filesystem,
disk-backed or tmpfs, used pretty much as a scratch space.

> > PS: conversion of ->f_path.dentry is easy and that can probably go this
> > cycle - it's a fairly trivial change, with no functional changes unless
> > overlayfs is used with <filesystem>, fixing really bad shit if it ever
> > gets used thus.  I'm not asking to put that into overlayfs pull *and*
> > it's independent from the "want to kill that fucking kludge" stuff.
> > The latter is too hard for this cycle, unfortunately.
> 
> So this is about adding a file_dentry_check() (or whatever we want to
> call it) helper to be used by all filesystems when dereferecing
> f_path.dentry, right?

file_dentry(), and some of the users should be converted to file_inode().
There's also a missing helper for debugfs uses - more or less a combination
of file_dentry() and debugfs_file_get() (if not a conversion of
debugfs_file_get() to taking struct file - almost all users are of that
form, if not entirely all of them).  I've some of that done in local
branch...

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/39] ovl: stack file ops
  2018-06-15  5:47                   ` Al Viro
@ 2018-06-18 11:50                     ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-18 11:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel, Linus Torvalds

On Fri, Jun 15, 2018 at 06:47:17AM +0100, Al Viro wrote:
> On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote:

> > Looked at some other options...  What coda mmap does looks very
> > dubious.  It only sets f_mapping, not vm_file.  That's going to get
> > into all sorts of trouble when underlying fs tries to look at
> > file_inode() or worse, ->private_data.  Looks like that should be
> > converted to what overlayfs does, to have a remote chance of actually
> > not crashing on most filesystems.  Does anybody actually use coda
> > still?
> 
> Keep in mind that coda is using the local fs only as cache; IOW, its needs
> are much more limited than those of overlayfs - local r/w filesystem,
> disk-backed or tmpfs, used pretty much as a scratch space.

Look:

coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
{
[...]
	coda_file->f_mapping = host_file->f_mapping;
[...]
	return call_mmap(host_file, vma);
}

So that'll end up with vma->vm_file pointing to coda file, coda_file->f_mapping
pointing to host mapping.  Hence vm_ops and a_ops are going to come from host
file, but they'll be getting a "foreign" file with ->private_data and ->f_inode
pointing to coda structures.

For example:

int ext4_filemap_fault(struct vm_fault *vmf)
{
	struct inode *inode = file_inode(vmf->vma->vm_file)
	int err;

	down_read(&EXT4_I(inode)->i_mmap_sem);
[...]

There you have it: coda inode being interpreted as ext4 inode.  How is that
supposed to work?  How is it not blowing up?  What am I missing?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/39] vfs: dedpue: return loff_t
  2018-06-06 15:09       ` Darrick J. Wong
@ 2018-06-18 20:08         ` Miklos Szeredi
  0 siblings, 0 replies; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-18 20:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Miklos Szeredi, overlayfs, linux-fsdevel,
	linux-kernel, linux-xfs, ocfs2-devel

On Wed, Jun 6, 2018 at 5:09 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Tue, Jun 05, 2018 at 10:33:22AM +0200, Miklos Szeredi wrote:
>> On Mon, Jun 4, 2018 at 10:43 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> > On Tue, May 29, 2018 at 04:43:01PM +0200, Miklos Szeredi wrote:
>> >> f_op->dedupe_file_range() gets a u64 length to dedup and returns an ssize_t
>> >> actual length deduped.  This breaks badly on 32bit archs since the returned
>> >> length will be truncated and possibly overflow into the sign bit (xfs and
>> >> ocfs2 are affected, btrfs limits actual length to 16MiB).
>> >
>> > Can we just make it return 0 vs errno?  The only time we return
>> > a different length than the one passed in is due to the btrfs cap.
>> >
>> > Given that this API started out on btrfs we should just do the cap
>> > everywhere to not confuse userspace.
>>
>> And that's a completely arbitrary cap; sure btrfs started out with
>> that, but there's no fundamental reason for that becoming the global
>> limit.  Xfs now added a different, larger limit, so based on what
>> reason should that limit be reduced?
>>
>> I don't care either way, but at this stage I'm not going to change
>> this patch, unless there's a very good reason to do so, because if I
>> do someone will come and suggest another improvement, ad-infinitum...
>
> I think we should hoist the MAX_RW_COUNT/2 limit to the VFS helpers
> since afaict we generally cap max IO per call at MAX_RW_COUNT.

I don't quite get it.   That MAX_RW_COUNT is to protect against
overflows in signed int.

Here we have a 64bit interface, so that's irrelevant, we can invent
any cap we want.  Lets choose our favorite bike shed size.  Mine is
1G.  But if that turns out too limiting it can be raised arbitrarily
later.

>  (I
> probably should've capped ocfs2 back when I did xfs, but forgot).  If
> btrfs wants to keep their lower (16M) limit then they're free to do so;
> the interface documentation allows for this.  One of the btrfs
> developers seems to be working on a patch series to raise the limit[1]
> anyway.

Yep, that got upstreamed now.  Which is good, we can just return zero
or error from ->dedupe_file_range() and be done with that.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-11 16:24         ` Christoph Hellwig
@ 2018-06-19 14:04           ` Miklos Szeredi
  2018-06-19 14:24             ` Christoph Hellwig
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-19 14:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Mon, Jun 11, 2018 at 6:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Jun 11, 2018 at 09:19:01AM +0200, Miklos Szeredi wrote:
>> We don't.  Obviously need to make sure to only ever do ioctl's in
>> overlayfs that have a common definition across filesystems.  Not a lot
>> of those, luckily...
>
> Which are those?  If they are common and possibly called from kernel
> code they should probably be made into methods instead.

FS_IOC*

Haven't looked deeply.  For now overlayfs just implements
FS_IOC_{GET|SET}FLAGS because some of these flags are quite generic
and implementing them on the overlay is easy.

Yes, turning into a method makes sense.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-19 14:04           ` Miklos Szeredi
@ 2018-06-19 14:24             ` Christoph Hellwig
  2018-06-19 14:34               ` Miklos Szeredi
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2018-06-19 14:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christoph Hellwig, Al Viro, Miklos Szeredi, overlayfs,
	linux-fsdevel, linux-kernel

On Tue, Jun 19, 2018 at 04:04:41PM +0200, Miklos Szeredi wrote:
> FS_IOC*
> 
> Haven't looked deeply.  For now overlayfs just implements
> FS_IOC_{GET|SET}FLAGS because some of these flags are quite generic
> and implementing them on the overlay is easy.
> 
> Yes, turning into a method makes sense.

Do you want to do this or should I send a patch?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-19 14:24             ` Christoph Hellwig
@ 2018-06-19 14:34               ` Miklos Szeredi
  2018-06-19 14:54                 ` Al Viro
  0 siblings, 1 reply; 83+ messages in thread
From: Miklos Szeredi @ 2018-06-19 14:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Miklos Szeredi, overlayfs, linux-fsdevel, linux-kernel

On Tue, Jun 19, 2018 at 4:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Jun 19, 2018 at 04:04:41PM +0200, Miklos Szeredi wrote:
>> FS_IOC*
>>
>> Haven't looked deeply.  For now overlayfs just implements
>> FS_IOC_{GET|SET}FLAGS because some of these flags are quite generic
>> and implementing them on the overlay is easy.
>>
>> Yes, turning into a method makes sense.
>
> Do you want to do this or should I send a patch?

Do it.  You are much more familiar with regular fs that implement
these ioctls.  Untangling overlap between FS_IOC_...FLAGS and
FS_IOC_...XATTR looks "interesting".

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/39] vfs: export vfs_ioctl() to modules
  2018-06-19 14:34               ` Miklos Szeredi
@ 2018-06-19 14:54                 ` Al Viro
  0 siblings, 0 replies; 83+ messages in thread
From: Al Viro @ 2018-06-19 14:54 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christoph Hellwig, Miklos Szeredi, overlayfs, linux-fsdevel,
	linux-kernel

On Tue, Jun 19, 2018 at 04:34:33PM +0200, Miklos Szeredi wrote:
> On Tue, Jun 19, 2018 at 4:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> > On Tue, Jun 19, 2018 at 04:04:41PM +0200, Miklos Szeredi wrote:
> >> FS_IOC*
> >>
> >> Haven't looked deeply.  For now overlayfs just implements
> >> FS_IOC_{GET|SET}FLAGS because some of these flags are quite generic
> >> and implementing them on the overlay is easy.
> >>
> >> Yes, turning into a method makes sense.
> >
> > Do you want to do this or should I send a patch?
> 
> Do it.  You are much more familiar with regular fs that implement
> these ioctls.  Untangling overlap between FS_IOC_...FLAGS and
> FS_IOC_...XATTR looks "interesting".

Suggestion: have that go through ->setattr(); that's what
ATTR_ATTR_FLAG was supposed to be for, IIRC.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2018-06-19 14:54 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-29 14:43 [PATCH 00/39] overlayfs: stack file operations Miklos Szeredi
2018-05-29 14:43 ` [PATCH 01/39] vfs: dedpue: return loff_t Miklos Szeredi
2018-06-04  8:43   ` Christoph Hellwig
2018-06-05  8:33     ` Miklos Szeredi
2018-06-06 15:09       ` Darrick J. Wong
2018-06-18 20:08         ` Miklos Szeredi
2018-05-29 14:43 ` [PATCH 02/39] vfs: dedupe: rationalize args Miklos Szeredi
2018-06-06 15:02   ` Darrick J. Wong
2018-05-29 14:43 ` [PATCH 03/39] vfs: dedupe: extract helper for a single dedup Miklos Szeredi
2018-05-29 15:41   ` Amir Goldstein
2018-05-29 16:04     ` Amir Goldstein
2018-06-04  8:44   ` Christoph Hellwig
2018-05-29 14:43 ` [PATCH 04/39] vfs: add path_open() Miklos Szeredi
2018-06-04  8:46   ` Christoph Hellwig
2018-06-10  4:36     ` Al Viro
2018-05-29 14:43 ` [PATCH 05/39] vfs: optionally don't account file in nr_files Miklos Szeredi
2018-06-04  8:47   ` Christoph Hellwig
2018-06-04  8:57     ` Miklos Szeredi
2018-06-10  4:41   ` Al Viro
2018-05-29 14:43 ` [PATCH 06/39] vfs: add f_op->pre_mmap() Miklos Szeredi
2018-06-04  8:48   ` Christoph Hellwig
2018-06-05 11:36     ` Miklos Szeredi
2018-05-29 14:43 ` [PATCH 07/39] vfs: export vfs_ioctl() to modules Miklos Szeredi
2018-06-04  8:49   ` Christoph Hellwig
2018-06-10  4:57     ` Al Viro
2018-06-11  7:19       ` Miklos Szeredi
2018-06-11 16:24         ` Christoph Hellwig
2018-06-19 14:04           ` Miklos Szeredi
2018-06-19 14:24             ` Christoph Hellwig
2018-06-19 14:34               ` Miklos Szeredi
2018-06-19 14:54                 ` Al Viro
2018-05-29 14:43 ` [PATCH 08/39] vfs: export vfs_dedupe_file_range_one() " Miklos Szeredi
2018-05-29 14:43 ` [PATCH 09/39] ovl: copy up times Miklos Szeredi
2018-05-29 14:43 ` [PATCH 10/39] ovl: copy up inode flags Miklos Szeredi
2018-05-29 14:43 ` [PATCH 11/39] Revert "Revert "ovl: get_write_access() in truncate"" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 12/39] ovl: copy up file size as well Miklos Szeredi
2018-05-29 14:43 ` [PATCH 13/39] ovl: deal with overlay files in ovl_d_real() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 14/39] ovl: stack file ops Miklos Szeredi
2018-06-10  4:13   ` Al Viro
2018-06-11  7:09     ` Miklos Szeredi
2018-06-12  2:29       ` Al Viro
2018-06-12  2:40         ` Al Viro
2018-06-12  9:24           ` Miklos Szeredi
2018-06-12 18:24             ` Al Viro
2018-06-12 18:31               ` Al Viro
2018-06-13  9:21                 ` Miklos Szeredi
2018-06-15  5:47                   ` Al Viro
2018-06-18 11:50                     ` Miklos Szeredi
2018-06-13 11:56               ` J. R. Okajima
2018-05-29 14:43 ` [PATCH 15/39] ovl: add helper to return real file Miklos Szeredi
2018-06-10  5:42   ` Al Viro
2018-06-11  8:11     ` Miklos Szeredi
2018-05-29 14:43 ` [PATCH 16/39] ovl: add ovl_read_iter() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 17/39] ovl: add ovl_write_iter() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 18/39] ovl: add ovl_fsync() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 19/39] ovl: add ovl_mmap() Miklos Szeredi
2018-06-10  5:24   ` Al Viro
2018-06-11  7:58     ` Miklos Szeredi
2018-05-29 14:43 ` [PATCH 20/39] ovl: add ovl_fallocate() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 21/39] ovl: add lsattr/chattr support Miklos Szeredi
2018-05-29 14:43 ` [PATCH 22/39] ovl: add ovl_fiemap() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 23/39] ovl: add O_DIRECT support Miklos Szeredi
2018-06-10  5:31   ` Al Viro
2018-06-11  8:08     ` Miklos Szeredi
2018-05-29 14:43 ` [PATCH 24/39] ovl: add reflink/copyfile/dedup support Miklos Szeredi
2018-05-29 14:43 ` [PATCH 25/39] vfs: don't open real Miklos Szeredi
2018-05-29 14:43 ` [PATCH 26/39] ovl: copy-up on MAP_SHARED Miklos Szeredi
2018-05-29 14:43 ` [PATCH 27/39] ovl: obsolete "check_copy_up" module option Miklos Szeredi
2018-05-29 15:13   ` Amir Goldstein
2018-05-30  8:26     ` Miklos Szeredi
2018-05-29 14:43 ` [PATCH 28/39] ovl: fix documentation of non-standard behavior Miklos Szeredi
2018-05-29 14:43 ` [PATCH 29/39] vfs: simplify dentry_open() Miklos Szeredi
2018-05-29 14:43 ` [PATCH 30/39] Revert "ovl: fix may_write_real() for overlayfs directories" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 31/39] Revert "ovl: don't allow writing ioctl on lower layer" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 32/39] vfs: fix freeze protection in mnt_want_write_file() for overlayfs Miklos Szeredi
2018-06-04  8:50   ` Christoph Hellwig
2018-05-29 14:43 ` [PATCH 33/39] Revert "ovl: fix relatime for directories" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 34/39] Revert "vfs: update ovl inode before relatime check" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 35/39] Revert "vfs: add flags to d_real()" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 36/39] Revert "vfs: do get_write_access() on upper layer of overlayfs" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 37/39] Partially revert "locks: fix file locking on overlayfs" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 38/39] Revert "fsnotify: support overlayfs" Miklos Szeredi
2018-05-29 14:43 ` [PATCH 39/39] vfs: remove open_flags from d_real() Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).