linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] overlayfs update for 4.10
@ 2016-12-10 20:49 Miklos Szeredi
  2016-12-11  2:12 ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2016-12-10 20:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel, linux-unionfs

Hi Al,

I usually send overlayfs pulls directly to Linus, but it it suits you, please
feel free to pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-linus

This update contains:

 - try to clone on copy-up;
 - allow renaming a directory;
 - fix data inconsistency of read-only fds after copy up;
 - misc cleanups and fixes.

Thanks,
Miklos

---
Al Viro (1):
      ovl: clean up kstat usage

Amir Goldstein (9):
      vfs: allow vfs_clone_file_range() across mount points
      vfs: call vfs_clone_file_range() under freeze protection
      vfs: fix vfs_clone_file_range() for overlayfs files
      ovl: use vfs_clone_file_range() for copy up if possible
      ovl: fix nested overlayfs mount
      ovl: check for emptiness of redirect dir
      ovl: show redirect_dir mount option
      ovl: create directories inside merged parent opaque
      ovl: fold ovl_copy_up_truncate() into ovl_copy_up()

Geliang Tang (1):
      ovl: fix return value of ovl_fill_super

Miklos Szeredi (32):
      Revert "af_unix: fix hard linked sockets on overlay"
      Revert "vfs: rename: check backing inode being equal"
      vfs: no mnt_want_write_file() in vfs_{copy,clone}_file_range()
      vfs: allow overlayfs to intercept file ops
      vfs: export filp_clone_open()
      mm: ovl: copy-up on MAP_SHARED
      ovl: update doc
      Revert "ovl: get_write_access() in truncate"
      ovl: rename ovl_rename2() to ovl_rename()
      ovl: treat special files like a regular fs
      ovl: don't check rename to self
      ovl: don't check sticky
      ovl: add ovl_dentry_is_whiteout()
      ovl: check lower existence when removing
      ovl: get rid of PURE type
      ovl: rename: simplify handling of lower/merged directory
      ovl: check lower existence of rename target
      ovl: simplify lookup
      ovl: use d_is_dir()
      ovl: split super.c
      ovl: check namelen
      ovl: consolidate lookup for underlying layers
      ovl: lookup redirects
      ovl: redirect on rename-dir
      ovl: allow redirect_dir to default to "on"
      ovl: allow setting max size of redirect
      ovl: opaque cleanup
      ovl: add infrastructure for intercepting file ops
      ovl: intercept read_iter
      ovl: intercept mmap
      ovl: intercept fsync
      Revert "ovl: Warn on copy up if a process has a R/O fd open to the lower file"

---
 Documentation/filesystems/overlayfs.txt |  23 +-
 fs/internal.h                           |   1 -
 fs/ioctl.c                              |   6 +-
 fs/namei.c                              |   6 +-
 fs/nfsd/vfs.c                           |   3 +-
 fs/open.c                               |   2 +-
 fs/overlayfs/Kconfig                    |  14 +
 fs/overlayfs/Makefile                   |   2 +-
 fs/overlayfs/copy_up.c                  |  95 ++----
 fs/overlayfs/dir.c                      | 375 ++++++++++++---------
 fs/overlayfs/inode.c                    | 305 +++++++++++++----
 fs/overlayfs/namei.c                    | 401 +++++++++++++++++++++++
 fs/overlayfs/overlayfs.h                |  64 ++--
 fs/overlayfs/ovl_entry.h                |  53 +++
 fs/overlayfs/super.c                    | 558 ++++----------------------------
 fs/overlayfs/util.c                     | 265 +++++++++++++++
 fs/read_write.c                         |  23 +-
 include/linux/fs.h                      |  14 +
 mm/util.c                               |  22 ++
 net/unix/af_unix.c                      |   6 +-
 20 files changed, 1391 insertions(+), 847 deletions(-)
 create mode 100644 fs/overlayfs/namei.c
 create mode 100644 fs/overlayfs/ovl_entry.h
 create mode 100644 fs/overlayfs/util.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] overlayfs update for 4.10
  2016-12-10 20:49 [GIT PULL] overlayfs update for 4.10 Miklos Szeredi
@ 2016-12-11  2:12 ` Al Viro
  2016-12-11 13:51   ` Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2016-12-11  2:12 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, linux-unionfs

On Sat, Dec 10, 2016 at 09:49:26PM +0100, Miklos Szeredi wrote:
> Hi Al,
> 
> I usually send overlayfs pulls directly to Linus, but it it suits you, please
> feel free to pull from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-linus
> 
> This update contains:
> 
>  - try to clone on copy-up;
>  - allow renaming a directory;
>  - fix data inconsistency of read-only fds after copy up;
>  - misc cleanups and fixes.

	Miklos, I'm very tempted to just let Linus do the... explaining
why "ovl: add infrastructure for intercepting file ops" is not nicely done.
It relies upon so damn many subtle things that result is a minefield for
any later work.  If nothing else, you've just created a magical place that
will have to be modified every time somebody adds a method.  Moreover, ->open()
instances have every right to expect that nothing will change ->f_op after
they return, period.  That includes things like later comparisons of ->f_op
with known pointers, etc.

Worse, there's nothing to prohibit embedding file_operations into an object
with lifetime shorter than that of a module.  Your approach will blow up on
those.  Sure, at the moment all of them live on weird filesystems that will be
(hopefully) rejected before you get to that point.  With no promise whatsoever
that this situation will persist.

overlayfs is already one hell of a special snowflake, but this is just plain
ridiculous - that sticks its fingers into so many places that making sure they
don't get squashed will be very hard.  IMO that kind of stuff is on the
"this should be handled by VFS or not at all" side of things, and I'm not
at all sure that doing that anywhere is a good idea.

PS: macros like
+#define OVL_CALL_REAL_FOP(file, call) \
+       ({ struct ovl_fops *__ofop =                                     \
+                       container_of(file->f_op, struct ovl_fops, fops); \
+          WARN_ON(__ofop->magic != OVL_FOPS_MAGIC) ? -EIO :             \
+                  __ofop->orig_fops->call;                              \
+       })

with uses along the lines of
+               return OVL_CALL_REAL_FOP(file,
+                                        fsync(file, start, end, datasync));
make some things (like, you know, "find all places where a method could
be called") harder for no good reason.

While we are at it,
+               module_put(ofop->owner);
+               fops_put(ofop->orig_fops);
is wrong - if that was the last reference to a module, your fops_put()
might very well try and access a vfree'd area...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] overlayfs update for 4.10
  2016-12-11  2:12 ` Al Viro
@ 2016-12-11 13:51   ` Miklos Szeredi
  2016-12-11 21:32     ` Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2016-12-11 13:51 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel, linux-unionfs

On Sun, Dec 11, 2016 at 3:12 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sat, Dec 10, 2016 at 09:49:26PM +0100, Miklos Szeredi wrote:
>> Hi Al,
>>
>> I usually send overlayfs pulls directly to Linus, but it it suits you, please
>> feel free to pull from:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-linus
>>
>> This update contains:
>>
>>  - try to clone on copy-up;
>>  - allow renaming a directory;
>>  - fix data inconsistency of read-only fds after copy up;
>>  - misc cleanups and fixes.
>
>         Miklos, I'm very tempted to just let Linus do the... explaining
> why "ovl: add infrastructure for intercepting file ops" is not nicely done.
> It relies upon so damn many subtle things that result is a minefield for
> any later work.  If nothing else, you've just created a magical place that
> will have to be modified every time somebody adds a method.  Moreover, ->open()
> instances have every right to expect that nothing will change ->f_op after
> they return, period.  That includes things like later comparisons of ->f_op
> with known pointers, etc.
>
> Worse, there's nothing to prohibit embedding file_operations into an object
> with lifetime shorter than that of a module.  Your approach will blow up on
> those.  Sure, at the moment all of them live on weird filesystems that will be
> (hopefully) rejected before you get to that point.  With no promise whatsoever
> that this situation will persist.
>
> overlayfs is already one hell of a special snowflake, but this is just plain
> ridiculous - that sticks its fingers into so many places that making sure they
> don't get squashed will be very hard.  IMO that kind of stuff is on the
> "this should be handled by VFS or not at all" side of things, and I'm not
> at all sure that doing that anywhere is a good idea.

Let me just argue back with what happened with f_path.  We've seen the
breakage, and still nothing guarantees that filesystems won't assume
f_path.dentry isn't theirs.   This isn't much different IMO, except I
suspect the fallout from this will be much much smaller than from the
f_path change.  Having said that, I can try fixing in the VFS but I
suspect you won't like it much better.

And I tend to agree with you about the usefulness of this whole
change.  However (intelligent) people will argue about not building on
overlayfs because it's "not a POSIX fs" having quirks like this.  So
it's really the perception that needs to be fixed, and AFAICS the only
way to fix that is to fix the quirks.


> PS: macros like
> +#define OVL_CALL_REAL_FOP(file, call) \
> +       ({ struct ovl_fops *__ofop =                                     \
> +                       container_of(file->f_op, struct ovl_fops, fops); \
> +          WARN_ON(__ofop->magic != OVL_FOPS_MAGIC) ? -EIO :             \
> +                  __ofop->orig_fops->call;                              \
> +       })
>
> with uses along the lines of
> +               return OVL_CALL_REAL_FOP(file,
> +                                        fsync(file, start, end, datasync));
> make some things (like, you know, "find all places where a method could
> be called") harder for no good reason.

Makes sense.  I can expand them inline.

>
> While we are at it,
> +               module_put(ofop->owner);
> +               fops_put(ofop->orig_fops);
> is wrong - if that was the last reference to a module, your fops_put()
> might very well try and access a vfree'd area...

Yeah the order is wrong.  Will fix.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] overlayfs update for 4.10
  2016-12-11 13:51   ` Miklos Szeredi
@ 2016-12-11 21:32     ` Miklos Szeredi
  0 siblings, 0 replies; 5+ messages in thread
From: Miklos Szeredi @ 2016-12-11 21:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel, linux-unionfs

On Sun, Dec 11, 2016 at 02:51:15PM +0100, Miklos Szeredi wrote:
> On Sun, Dec 11, 2016 at 3:12 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Sat, Dec 10, 2016 at 09:49:26PM +0100, Miklos Szeredi wrote:
> >> Hi Al,
> >>
> >> I usually send overlayfs pulls directly to Linus, but it it suits you, please
> >> feel free to pull from:
> >>
> >>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-linus
> >>

Force pushed updated patchset.  Changes:

 - fix module put order;
 - don't hide file op calls in a macro;
 - don't intercept ops by default (no action required on addition of new ones).

Incremental diff attached.

Please let me know if this will fly or not.  If not, I'll drop this part and do
one which handles this within the VFS.

Thanks,
Miklos

---
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 2dc252e262c1..7512e3e4bd1f 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -326,31 +326,37 @@ void ovl_cleanup_fops_htable(void)
 	struct ovl_fops *ofop;
 
 	hash_for_each_safe(ovl_fops_htable, bkt, tmp, ofop, entry) {
-		module_put(ofop->owner);
 		fops_put(ofop->orig_fops);
+		module_put(ofop->owner);
 		kfree(ofop);
 	}
 }
 
-#define OVL_CALL_REAL_FOP(file, call) \
-	({ struct ovl_fops *__ofop =					 \
-			container_of(file->f_op, struct ovl_fops, fops); \
-	   WARN_ON(__ofop->magic != OVL_FOPS_MAGIC) ? -EIO :		 \
-		   __ofop->orig_fops->call;				 \
-	})
-
 static bool ovl_file_is_lower(struct file *file)
 {
 	return !OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry));
 }
 
+static const struct file_operations *ovl_orig_fops(struct file *file)
+{
+	struct ovl_fops *ofop = container_of(file->f_op, struct ovl_fops, fops);
+
+	if (WARN_ON(ofop->magic != OVL_FOPS_MAGIC))
+		return NULL;
+
+	return ofop->orig_fops;
+}
+
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
 	ssize_t ret;
 
-	if (likely(ovl_file_is_lower(file)))
-		return OVL_CALL_REAL_FOP(file, read_iter(iocb, to));
+	if (likely(ovl_file_is_lower(file))) {
+		const struct file_operations *f_op = ovl_orig_fops(file);
+
+		return f_op ? f_op->read_iter(iocb, to) : -EIO;
+	}
 
 	file = filp_clone_open(file);
 	if (IS_ERR(file))
@@ -364,8 +370,11 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	if (likely(ovl_file_is_lower(file)))
-		return OVL_CALL_REAL_FOP(file, mmap(file, vma));
+	if (likely(ovl_file_is_lower(file))) {
+		const struct file_operations *f_op = ovl_orig_fops(file);
+
+		return f_op ? f_op->mmap(file, vma) : -EIO;
+	}
 
 	file = filp_clone_open(file);
 	if (IS_ERR(file))
@@ -386,8 +395,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	int ret;
 
 	if (likely(ovl_file_is_lower(file))) {
-		return OVL_CALL_REAL_FOP(file,
-					 fsync(file, start, end, datasync));
+		const struct file_operations *f_op = ovl_orig_fops(file);
+
+		return f_op ? f_op->fsync(file, start, end, datasync) : -EIO;
 	}
 	file = filp_clone_open(file);
 	if (IS_ERR(file))
@@ -437,6 +447,9 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
 	ofop->magic = OVL_FOPS_MAGIC;
 	ofop->orig_fops = fops_get(orig);
 
+	/* By default don't intercept: */
+	ofop->fops = *orig;
+
 	/* Intercept these: */
 	if (orig->read_iter)
 		ofop->fops.read_iter = ovl_read_iter;
@@ -447,24 +460,39 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
 
 	/*
 	 * These should be intercepted, but they are very unlikely to be
-	 * a problem in practice.  Leave them alone for now.
+	 * a problem in practice.  Leave them alone for now:
+	 *
+	 * - copy_file_range
+	 * - clone_file_range
+	 * - dedupe_file_range
+	 *
+	 * Don't intercept these:
+	 *
+	 * - llseek
+	 * - unlocked_ioctl
+	 * - compat_ioctl
+	 * - flush
+	 * - release
+	 * - get_unmapped_area
+	 * - check_flags
+	 *
+	 * These will never be called on R/O file descriptors:
+	 *
+	 * - write
+	 * - write_iter
+	 * - splice_write
+	 * - sendpage
+	 * - fallocate
+	 *
+	 * Locking operations are already intercepted by vfs for ovl:
+	 *
+	 * - lock
+	 * - flock
+	 * - setlease
 	 */
-	ofop->fops.copy_file_range = orig->copy_file_range;
-	ofop->fops.clone_file_range = orig->clone_file_range;
-	ofop->fops.dedupe_file_range = orig->dedupe_file_range;
-
-	/* Don't intercept these: */
-	ofop->fops.llseek = orig->llseek;
-	ofop->fops.unlocked_ioctl = orig->unlocked_ioctl;
-	ofop->fops.compat_ioctl = orig->compat_ioctl;
-	ofop->fops.flush = orig->flush;
-	ofop->fops.release = orig->release;
-	ofop->fops.get_unmapped_area = orig->get_unmapped_area;
-	ofop->fops.check_flags = orig->check_flags;
 
 	/* splice_read should be generic_file_splice_read */
 	WARN_ON(orig->splice_read != generic_file_splice_read);
-	ofop->fops.splice_read = generic_file_splice_read;
 
 	/* These make no sense for "normal" files: */
 	WARN_ON(orig->read);
@@ -474,22 +502,6 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
 	WARN_ON(orig->fasync);
 	WARN_ON(orig->show_fdinfo);
 
-	/*
-	 * Don't add those which are unneeded for O_RDONLY:
-	 *
-	 *  write
-	 *  write_iter
-	 *  splice_write
-	 *  sendpage
-	 *  fallocate
-	 *
-	 * Locking operations are already intercepted by vfs for ovl:
-	 *
-	 *  lock
-	 *  flock
-	 *  setlease
-	 */
-
 	hash_add_rcu(ovl_fops_htable, &ofop->entry, (long) orig);
 
 out_unlock:

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [GIT PULL] overlayfs update for 4.10
@ 2016-12-16 17:32 Miklos Szeredi
  0 siblings, 0 replies; 5+ messages in thread
From: Miklos Szeredi @ 2016-12-16 17:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-kernel, linux-fsdevel, linux-unionfs

Hi Linus,

Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-linus

This update contains:

 - try to clone on copy-up;
 - allow renaming a directory;
 - split source into managable chunks;
 - misc cleanups and fixes.

It does not contain the read-only fd data inconsistency fix, which Al didn't
like.  I'll leave that to the next year...

Thanks,
Miklos

---
Al Viro (1):
      ovl: clean up kstat usage

Amir Goldstein (10):
      vfs: allow vfs_clone_file_range() across mount points
      vfs: call vfs_clone_file_range() under freeze protection
      vfs: fix vfs_clone_file_range() for overlayfs files
      ovl: use vfs_clone_file_range() for copy up if possible
      ovl: fix nested overlayfs mount
      ovl: check for emptiness of redirect dir
      ovl: show redirect_dir mount option
      ovl: create directories inside merged parent opaque
      ovl: fold ovl_copy_up_truncate() into ovl_copy_up()
      ovl: fix reStructuredText syntax errors in documentation

Geliang Tang (1):
      ovl: fix return value of ovl_fill_super

Miklos Szeredi (24):
      Revert "af_unix: fix hard linked sockets on overlay"
      Revert "vfs: rename: check backing inode being equal"
      vfs: no mnt_want_write_file() in vfs_{copy,clone}_file_range()
      ovl: update doc
      Revert "ovl: get_write_access() in truncate"
      ovl: rename ovl_rename2() to ovl_rename()
      ovl: treat special files like a regular fs
      ovl: don't check rename to self
      ovl: don't check sticky
      ovl: add ovl_dentry_is_whiteout()
      ovl: check lower existence when removing
      ovl: get rid of PURE type
      ovl: rename: simplify handling of lower/merged directory
      ovl: check lower existence of rename target
      ovl: simplify lookup
      ovl: use d_is_dir()
      ovl: split super.c
      ovl: check namelen
      ovl: consolidate lookup for underlying layers
      ovl: lookup redirects
      ovl: redirect on rename-dir
      ovl: allow redirect_dir to default to "on"
      ovl: allow setting max size of redirect
      ovl: opaque cleanup

---
 Documentation/filesystems/overlayfs.txt |  26 +-
 fs/ioctl.c                              |   6 +-
 fs/namei.c                              |   6 +-
 fs/nfsd/vfs.c                           |   3 +-
 fs/overlayfs/Kconfig                    |  14 +
 fs/overlayfs/Makefile                   |   2 +-
 fs/overlayfs/copy_up.c                  |  61 ++--
 fs/overlayfs/dir.c                      | 375 ++++++++++++---------
 fs/overlayfs/inode.c                    |  78 +----
 fs/overlayfs/namei.c                    | 401 +++++++++++++++++++++++
 fs/overlayfs/overlayfs.h                |  62 ++--
 fs/overlayfs/ovl_entry.h                |  53 +++
 fs/overlayfs/super.c                    | 557 ++++----------------------------
 fs/overlayfs/util.c                     | 265 +++++++++++++++
 fs/read_write.c                         |  23 +-
 include/linux/fs.h                      |  13 +
 net/unix/af_unix.c                      |   6 +-
 17 files changed, 1139 insertions(+), 812 deletions(-)
 create mode 100644 fs/overlayfs/namei.c
 create mode 100644 fs/overlayfs/ovl_entry.h
 create mode 100644 fs/overlayfs/util.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-16 17:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-10 20:49 [GIT PULL] overlayfs update for 4.10 Miklos Szeredi
2016-12-11  2:12 ` Al Viro
2016-12-11 13:51   ` Miklos Szeredi
2016-12-11 21:32     ` Miklos Szeredi
2016-12-16 17:32 Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).