linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [git pull] new mount API
@ 2018-08-23 22:31 Al Viro
  2018-08-23 23:24 ` Andy Lutomirski
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Al Viro @ 2018-08-23 22:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dhowells, linux-fsdevel, linux-api

new mount API series from Dave Howells

To quote his cover letter,
	Here are a set of patches to create a filesystem context prior to setting
	up a new mount, populating it with the parsed options/binary data, creating
	the superblock and then effecting the mount.  This is also used for remount
	since much of the parsing stuff is common in many filesystems.

	This allows namespaces and other information to be conveyed through the
	mount procedure.

	This also allows Mikl�s Szeredi's idea of doing:

		fd = fsopen("nfs");
		fsconfig(fd, FSCONFIG_SET_STRING, "option", "val", 0);
		fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
		mfd = fsmount(fd, MS_NODEV);
		move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

	that he presented at LSF-2017 to be implemented (see the relevant patches
	in the series).

	I didn't use netlink as that would make the core kernel depend on
	CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing
	issues.

	I've implemented filesystem context handling for procfs, nfs, mqueue,
	cpuset, kernfs, sysfs, cgroup and afs filesystems.

	Unconverted filesystems are handled by a legacy filesystem wrapper.

One trivial conflict in fs/file_table.c:__fput(); resolved as
	if (unlikely(mode & FMODE_NEED_UNMOUNT))
		dissolve_on_fput(mnt);
	dput(dentry);
	mntput(mnt);
out:
	file_free(file);

The following changes since commit 32206ab36553be8714d9253ce33f4085681d369c:

  x86/intel_rdt: Provide pseudo-locking hooks within rdt_mount (2018-06-23 12:53:19 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.mount

for you to fetch changes up to 294fb3407bf29abf653fbe169af4bbb38a146db5:

  proc: Set correct userns for new proc super created by a new pid_namespace (2018-08-21 09:28:43 +0100)

----------------------------------------------------------------
Al Viro (2):
      vfs: syscall: Add open_tree(2) to reference or clone a mount
      teach move_mount(2) to work with OPEN_TREE_CLONE

Andrei Vagin (1):
      proc: Set correct userns for new proc super created by a new pid_namespace

David Howells (42):
      vfs: Require specification of size of mount data for internal mounts
      vfs: syscall: Add move_mount(2) to move mounts around
      vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled
      vfs: Introduce the basic header for the new mount API's filesystem context
      vfs: Introduce logging functions
      vfs: Add configuration parser helpers
      vfs: Add LSM hooks for the new mount API
      selinux: Implement the new mount API LSM hooks
      smack: Implement filesystem context security hooks
      apparmor: Implement security hooks for the new mount API
      tomoyo: Implement security hooks for the new mount API
      vfs: Separate changing mount flags full remount
      vfs: Implement a filesystem superblock creation/configuration context
      vfs: Remove unused code after filesystem context changes
      procfs: Move proc_fill_super() to fs/proc/root.c
      proc: Add fs_context support to procfs
      ipc: Convert mqueue fs to fs_context
      cpuset: Use fs_context
      kernfs, sysfs, cgroup, intel_rdt: Support fs_context
      hugetlbfs: Convert to fs_context
      vfs: Remove kern_mount_data()
      vfs: Provide documentation for new mount API
      Make anon_inodes unconditional
      vfs: syscall: Add fsopen() to prepare for superblock creation
      vfs: Implement logging through fs_context
      vfs: Add some logging to the core users of the fs_context log
      vfs: syscall: Add fsconfig() for configuring and managing a context
      vfs: syscall: Add fsmount() to create a mount for a superblock
      vfs: syscall: Add fspick() to select a superblock for reconfiguration
      afs: Add fs_context support
      afs: Use fs_context to pass parameters over automount
      vfs: Add a sample program for the new mount API
      vfs: syscall: Add fsinfo() to query filesystem information
      afs: Add fsinfo support
      vfs: Allow fsinfo() to query what's in an fs_context
      vfs: Allow fsinfo() to be used to query an fs parameter description
      vfs: Implement parameter value retrieval with fsinfo()
      vfs: Fix vfs_dup_fs_context()
      vfs: Fix fs_context logging when there's no log
      afs: Move the source fs parameter to the first position
      vfs: Pass path info fsinfo and rename get_fsinfo sb op to fsinfo
      vfs: Adjust fsinfo sample output

 Documentation/filesystems/mount_api.txt   | 706 ++++++++++++++++++++++++
 arch/arc/kernel/setup.c                   |   1 +
 arch/arm/kernel/atags_parse.c             |   1 +
 arch/arm/kvm/Kconfig                      |   1 -
 arch/arm64/kvm/Kconfig                    |   1 -
 arch/ia64/kernel/perfmon.c                |   3 +-
 arch/mips/kvm/Kconfig                     |   1 -
 arch/powerpc/kvm/Kconfig                  |   1 -
 arch/powerpc/platforms/cell/spufs/inode.c |   6 +-
 arch/s390/hypfs/inode.c                   |   7 +-
 arch/s390/kvm/Kconfig                     |   1 -
 arch/sh/kernel/setup.c                    |   1 +
 arch/sparc/kernel/setup_32.c              |   1 +
 arch/sparc/kernel/setup_64.c              |   1 +
 arch/x86/Kconfig                          |   1 -
 arch/x86/entry/syscalls/syscall_32.tbl    |   7 +
 arch/x86/entry/syscalls/syscall_64.tbl    |   7 +
 arch/x86/kernel/cpu/intel_rdt.h           |  15 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c  | 184 ++++---
 arch/x86/kernel/setup.c                   |   1 +
 arch/x86/kvm/Kconfig                      |   1 -
 drivers/base/Kconfig                      |   1 -
 drivers/base/devtmpfs.c                   |   7 +-
 drivers/char/tpm/Kconfig                  |   1 -
 drivers/dax/super.c                       |   2 +-
 drivers/dma-buf/Kconfig                   |   1 -
 drivers/gpio/Kconfig                      |   1 -
 drivers/gpu/drm/drm_drv.c                 |   3 +-
 drivers/gpu/drm/i915/i915_gemfs.c         |   2 +-
 drivers/iio/Kconfig                       |   1 -
 drivers/infiniband/Kconfig                |   1 -
 drivers/infiniband/hw/qib/qib_fs.c        |   7 +-
 drivers/misc/cxl/api.c                    |   3 +-
 drivers/misc/ibmasm/ibmasmfs.c            |  11 +-
 drivers/mtd/mtdsuper.c                    |  26 +-
 drivers/oprofile/oprofilefs.c             |   8 +-
 drivers/scsi/cxlflash/ocxl_hw.c           |   2 +-
 drivers/usb/gadget/function/f_fs.c        |   7 +-
 drivers/usb/gadget/legacy/inode.c         |   7 +-
 drivers/vfio/Kconfig                      |   1 -
 drivers/virtio/virtio_balloon.c           |   2 +-
 drivers/xen/xenfs/super.c                 |   7 +-
 fs/9p/vfs_super.c                         |   2 +-
 fs/Kconfig                                |   7 +
 fs/Makefile                               |   5 +-
 fs/adfs/super.c                           |   9 +-
 fs/affs/super.c                           |  13 +-
 fs/afs/internal.h                         |  10 +-
 fs/afs/mntpt.c                            | 147 ++---
 fs/afs/super.c                            | 634 +++++++++++++--------
 fs/afs/volume.c                           |   4 +-
 fs/aio.c                                  |   3 +-
 fs/anon_inodes.c                          |   3 +-
 fs/autofs/autofs_i.h                      |   2 +-
 fs/autofs/init.c                          |   4 +-
 fs/autofs/inode.c                         |   3 +-
 fs/befs/linuxvfs.c                        |  11 +-
 fs/bfs/inode.c                            |   8 +-
 fs/binfmt_misc.c                          |   7 +-
 fs/block_dev.c                            |   2 +-
 fs/btrfs/super.c                          |  30 +-
 fs/btrfs/tests/btrfs-tests.c              |   2 +-
 fs/ceph/super.c                           |   3 +-
 fs/cifs/cifs_dfs_ref.c                    |   3 +-
 fs/cifs/cifsfs.c                          |  18 +-
 fs/coda/inode.c                           |  11 +-
 fs/configfs/mount.c                       |   7 +-
 fs/cramfs/inode.c                         |  17 +-
 fs/debugfs/inode.c                        |  14 +-
 fs/devpts/inode.c                         |  10 +-
 fs/ecryptfs/main.c                        |   2 +-
 fs/efivarfs/super.c                       |   9 +-
 fs/efs/super.c                            |  14 +-
 fs/exofs/super.c                          |   7 +-
 fs/ext2/super.c                           |  14 +-
 fs/ext4/super.c                           |  16 +-
 fs/f2fs/super.c                           |  13 +-
 fs/fat/inode.c                            |   3 +-
 fs/fat/namei_msdos.c                      |   8 +-
 fs/fat/namei_vfat.c                       |   8 +-
 fs/file_table.c                           |   9 +-
 fs/filesystems.c                          |   4 +
 fs/freevxfs/vxfs_super.c                  |  12 +-
 fs/fs_context.c                           | 779 ++++++++++++++++++++++++++
 fs/fs_parser.c                            | 483 ++++++++++++++++
 fs/fsopen.c                               | 491 +++++++++++++++++
 fs/fuse/control.c                         |   9 +-
 fs/fuse/inode.c                           |  16 +-
 fs/gfs2/ops_fstype.c                      |   6 +-
 fs/gfs2/super.c                           |   4 +-
 fs/hfs/super.c                            |  12 +-
 fs/hfsplus/super.c                        |  12 +-
 fs/hostfs/hostfs_kern.c                   |   7 +-
 fs/hpfs/super.c                           |  11 +-
 fs/hugetlbfs/inode.c                      | 455 ++++++++++------
 fs/internal.h                             |  13 +-
 fs/isofs/inode.c                          |  11 +-
 fs/jffs2/super.c                          |  10 +-
 fs/jfs/super.c                            |  11 +-
 fs/kernfs/mount.c                         | 102 ++--
 fs/libfs.c                                |  19 +-
 fs/minix/inode.c                          |  14 +-
 fs/namei.c                                |   4 +-
 fs/namespace.c                            | 879 ++++++++++++++++++++++--------
 fs/nfs/internal.h                         |   4 +-
 fs/nfs/namespace.c                        |   3 +-
 fs/nfs/nfs4namespace.c                    |   3 +-
 fs/nfs/nfs4super.c                        |  27 +-
 fs/nfs/super.c                            |  22 +-
 fs/nfsd/nfsctl.c                          |   8 +-
 fs/nilfs2/super.c                         |  10 +-
 fs/notify/fanotify/Kconfig                |   1 -
 fs/notify/inotify/Kconfig                 |   1 -
 fs/nsfs.c                                 |   3 +-
 fs/ntfs/super.c                           |  13 +-
 fs/ocfs2/dlmfs/dlmfs.c                    |   5 +-
 fs/ocfs2/super.c                          |  14 +-
 fs/omfs/inode.c                           |   9 +-
 fs/openpromfs/inode.c                     |  11 +-
 fs/orangefs/orangefs-kernel.h             |   2 +-
 fs/orangefs/super.c                       |   5 +-
 fs/overlayfs/super.c                      |  11 +-
 fs/pipe.c                                 |   3 +-
 fs/pnode.c                                |   1 +
 fs/proc/inode.c                           |  50 +-
 fs/proc/internal.h                        |   6 +-
 fs/proc/root.c                            | 247 ++++++---
 fs/pstore/inode.c                         |  10 +-
 fs/qnx4/inode.c                           |  14 +-
 fs/qnx6/inode.c                           |  14 +-
 fs/ramfs/inode.c                          |   6 +-
 fs/reiserfs/super.c                       |  14 +-
 fs/romfs/super.c                          |  13 +-
 fs/squashfs/super.c                       |  12 +-
 fs/statfs.c                               | 571 +++++++++++++++++++
 fs/super.c                                | 397 +++++++++++---
 fs/sysfs/mount.c                          |  67 ++-
 fs/sysv/inode.c                           |   3 +-
 fs/sysv/super.c                           |  16 +-
 fs/tracefs/inode.c                        |  10 +-
 fs/ubifs/super.c                          |   5 +-
 fs/udf/super.c                            |  16 +-
 fs/ufs/super.c                            |  11 +-
 fs/xfs/xfs_super.c                        |  10 +-
 include/linux/cgroup.h                    |   3 +-
 include/linux/debugfs.h                   |   8 +-
 include/linux/fs.h                        |  50 +-
 include/linux/fs_context.h                | 208 +++++++
 include/linux/fs_parser.h                 | 117 ++++
 include/linux/fsinfo.h                    |  41 ++
 include/linux/kernfs.h                    |  41 +-
 include/linux/lsm_hooks.h                 |  79 ++-
 include/linux/module.h                    |   6 +
 include/linux/mount.h                     |  10 +-
 include/linux/mtd/super.h                 |   4 +-
 include/linux/ramfs.h                     |   4 +-
 include/linux/security.h                  |  72 ++-
 include/linux/shmem_fs.h                  |   3 +-
 include/linux/syscalls.h                  |  13 +
 include/uapi/linux/fcntl.h                |   2 +
 include/uapi/linux/fs.h                   |  82 ++-
 include/uapi/linux/fsinfo.h               | 302 ++++++++++
 include/uapi/linux/mount.h                |  75 +++
 init/Kconfig                              |  10 -
 init/do_mounts.c                          |   5 +-
 init/do_mounts_initrd.c                   |   1 +
 ipc/mqueue.c                              | 120 +++-
 kernel/bpf/inode.c                        |   7 +-
 kernel/cgroup/cgroup-internal.h           |  50 +-
 kernel/cgroup/cgroup-v1.c                 | 415 ++++++++------
 kernel/cgroup/cgroup.c                    | 285 +++++++---
 kernel/cgroup/cpuset.c                    |  67 ++-
 kernel/trace/trace.c                      |   7 +-
 mm/shmem.c                                |  10 +-
 mm/zsmalloc.c                             |   3 +-
 net/socket.c                              |   3 +-
 net/sunrpc/rpc_pipe.c                     |   7 +-
 samples/Kconfig                           |   6 +
 samples/Makefile                          |   2 +-
 samples/mount_api/Makefile                |   7 +
 samples/mount_api/test-fsmount.c          | 118 ++++
 samples/statx/Makefile                    |   7 +-
 samples/statx/test-fs-query.c             | 137 +++++
 samples/statx/test-fsinfo.c               | 584 ++++++++++++++++++++
 security/apparmor/apparmorfs.c            |   8 +-
 security/apparmor/include/mount.h         |  11 +-
 security/apparmor/lsm.c                   | 111 +++-
 security/apparmor/mount.c                 |  47 ++
 security/inode.c                          |   7 +-
 security/security.c                       |  65 ++-
 security/selinux/hooks.c                  | 320 ++++++++++-
 security/selinux/selinuxfs.c              |   8 +-
 security/smack/smack.h                    |  11 +-
 security/smack/smack_lsm.c                | 373 +++++++++++--
 security/smack/smackfs.c                  |   9 +-
 security/tomoyo/common.h                  |   3 +
 security/tomoyo/mount.c                   |  46 ++
 security/tomoyo/tomoyo.c                  |  19 +-
 198 files changed, 9288 insertions(+), 1874 deletions(-)
 create mode 100644 Documentation/filesystems/mount_api.txt
 create mode 100644 fs/fs_context.c
 create mode 100644 fs/fs_parser.c
 create mode 100644 fs/fsopen.c
 create mode 100644 include/linux/fs_context.h
 create mode 100644 include/linux/fs_parser.h
 create mode 100644 include/linux/fsinfo.h
 create mode 100644 include/uapi/linux/fsinfo.h
 create mode 100644 include/uapi/linux/mount.h
 create mode 100644 samples/mount_api/Makefile
 create mode 100644 samples/mount_api/test-fsmount.c
 create mode 100644 samples/statx/test-fs-query.c
 create mode 100644 samples/statx/test-fsinfo.c

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-23 22:31 [git pull] new mount API Al Viro
@ 2018-08-23 23:24 ` Andy Lutomirski
  2018-08-24  0:08 ` David Howells
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-23 23:24 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, David Howells, Linux FS Devel, Linux API

On Thu, Aug 23, 2018 at 3:31 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> new mount API series from Dave Howells
>
> To quote his cover letter,
>         Here are a set of patches to create a filesystem context prior to setting
>         up a new mount, populating it with the parsed options/binary data, creating
>         the superblock and then effecting the mount.  This is also used for remount
>         since much of the parsing stuff is common in many filesystems.
>
>         This allows namespaces and other information to be conveyed through the
>         mount procedure.
>
>         This also allows Miklós Szeredi's idea of doing:
>
>                 fd = fsopen("nfs");
>                 fsconfig(fd, FSCONFIG_SET_STRING, "option", "val", 0);
>                 fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
>                 mfd = fsmount(fd, MS_NODEV);
>                 move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
>
>         that he presented at LSF-2017 to be implemented (see the relevant patches
>         in the series).
>
>         I didn't use netlink as that would make the core kernel depend on
>         CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing
>         issues.
>
>         I've implemented filesystem context handling for procfs, nfs, mqueue,
>         cpuset, kernfs, sysfs, cgroup and afs filesystems.
>
>         Unconverted filesystems are handled by a legacy filesystem wrapper.
>
> One trivial conflict in fs/file_table.c:__fput(); resolved as
>         if (unlikely(mode & FMODE_NEED_UNMOUNT))
>                 dissolve_on_fput(mnt);
>         dput(dentry);
>         mntput(mnt);
> out:
>         file_free(file);
>

Has anything been done to ensure that the behavior when doing
FSCONFIG_CMD_CREATE against an already-mounted block device is
reasonable?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-23 22:31 [git pull] new mount API Al Viro
  2018-08-23 23:24 ` Andy Lutomirski
@ 2018-08-24  0:08 ` David Howells
  2018-08-24  0:16   ` Andy Lutomirski
  2018-08-26  3:22 ` Eric W. Biederman
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: David Howells @ 2018-08-24  0:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dhowells, Al Viro, Linus Torvalds, Linux FS Devel, Linux API

Andy Lutomirski <luto@amacapital.net> wrote:

> Has anything been done to ensure that the behavior when doing
> FSCONFIG_CMD_CREATE against an already-mounted block device is
> reasonable?

For the moment, I've left it as the same behaviour as for mount(2) since
mount(2) now uses the same mechanism internally and we aren't permitted to
break userspace.

I would like to add at least one flag to stipulate that, in the case of an
incompatible collision, you can get a failure - but defining what is meant by
incompatible isn't necessarily trivial, and would vary by filesystem *and* the
LSM.

However, I don't want to start reengineering everything this close to the
merge window and we don't really need it immediately.

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  0:08 ` David Howells
@ 2018-08-24  0:16   ` Andy Lutomirski
  2018-08-24  0:31     ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-24  0:16 UTC (permalink / raw)
  To: David Howells; +Cc: Al Viro, Linus Torvalds, Linux FS Devel, Linux API

On Thu, Aug 23, 2018 at 5:08 PM, David Howells <dhowells@redhat.com> wrote:
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> Has anything been done to ensure that the behavior when doing
>> FSCONFIG_CMD_CREATE against an already-mounted block device is
>> reasonable?
>
> For the moment, I've left it as the same behaviour as for mount(2) since
> mount(2) now uses the same mechanism internally and we aren't permitted to
> break userspace.
>
> I would like to add at least one flag to stipulate that, in the case of an
> incompatible collision, you can get a failure - but defining what is meant by
> incompatible isn't necessarily trivial, and would vary by filesystem *and* the
> LSM.
>
> However, I don't want to start reengineering everything this close to the
> merge window and we don't really need it immediately.
>

The problem is that, once this ends up merged, then we're kind of
stuck with it, too.  It would be a bit sad if your better proposal for
handling nfs and instantiation of filesystems in general were added in
the next release and then we end up with the current
FSCONFIG_CMD_CREATE as a permanently supported but non-preferred
option.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  0:16   ` Andy Lutomirski
@ 2018-08-24  0:31     ` Al Viro
  2018-08-24  2:36       ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2018-08-24  0:31 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API

On Thu, Aug 23, 2018 at 05:16:53PM -0700, Andy Lutomirski wrote:
> On Thu, Aug 23, 2018 at 5:08 PM, David Howells <dhowells@redhat.com> wrote:
> > Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >> Has anything been done to ensure that the behavior when doing
> >> FSCONFIG_CMD_CREATE against an already-mounted block device is
> >> reasonable?
> >
> > For the moment, I've left it as the same behaviour as for mount(2) since
> > mount(2) now uses the same mechanism internally and we aren't permitted to
> > break userspace.
> >
> > I would like to add at least one flag to stipulate that, in the case of an
> > incompatible collision, you can get a failure - but defining what is meant by
> > incompatible isn't necessarily trivial, and would vary by filesystem *and* the
> > LSM.
> >
> > However, I don't want to start reengineering everything this close to the
> > merge window and we don't really need it immediately.
> >
> 
> The problem is that, once this ends up merged, then we're kind of
> stuck with it, too.  It would be a bit sad if your better proposal for
> handling nfs and instantiation of filesystems in general were added in
> the next release and then we end up with the current
> FSCONFIG_CMD_CREATE as a permanently supported but non-preferred
> option.

For fuck sake, mount(2) is a permanently supported option!  Folks, get over it -
you are mixing entirely different issues.  You know, I know and everyone even
remotely sane knows that mount(2) *IS* *NOT* *GOING* *AWAY*.  And it's not
changing semantics either.  So bemoaning the "permanently supported non-preferred
options" is utter lunacy.  It's there and it will remain there, period.

We do not break userland.  And "their local scripts would break terribly inside
userns container" does *NOT* render those second-class in any sense.  So you
can curse the current behaviour of mount(2) and I even agree with some of that,
but we are not going to be able to remove that.  Yes, it would've been nice if,
etc., but it's not going to happen.  Not now, not for many years.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  0:31     ` Al Viro
@ 2018-08-24  2:36       ` Andy Lutomirski
  2018-08-24  3:13         ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-24  2:36 UTC (permalink / raw)
  To: Al Viro; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API


> On Aug 23, 2018, at 5:31 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
>> On Thu, Aug 23, 2018 at 05:16:53PM -0700, Andy Lutomirski wrote:
>>> On Thu, Aug 23, 2018 at 5:08 PM, David Howells <dhowells@redhat.com> wrote:
>>> Andy Lutomirski <luto@amacapital.net> wrote:
>>> 
>>>> Has anything been done to ensure that the behavior when doing
>>>> FSCONFIG_CMD_CREATE against an already-mounted block device is
>>>> reasonable?
>>> 
>>> For the moment, I've left it as the same behaviour as for mount(2) since
>>> mount(2) now uses the same mechanism internally and we aren't permitted to
>>> break userspace.
>>> 
>>> I would like to add at least one flag to stipulate that, in the case of an
>>> incompatible collision, you can get a failure - but defining what is meant by
>>> incompatible isn't necessarily trivial, and would vary by filesystem *and* the
>>> LSM.
>>> 
>>> However, I don't want to start reengineering everything this close to the
>>> merge window and we don't really need it immediately.
>>> 
>> 
>> The problem is that, once this ends up merged, then we're kind of
>> stuck with it, too.  It would be a bit sad if your better proposal for
>> handling nfs and instantiation of filesystems in general were added in
>> the next release and then we end up with the current
>> FSCONFIG_CMD_CREATE as a permanently supported but non-preferred
>> option.
> 
> For fuck sake, mount(2) is a permanently supported option!  

Exactly. mount(2) has broken semantics and it’s permanently supported.

If this merge request gets pulled, then FSCONFIG_CMD_CREATE will *also* be a permanently supported API with broken semantics.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  2:36       ` Andy Lutomirski
@ 2018-08-24  3:13         ` Al Viro
  2018-08-24  4:51           ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2018-08-24  3:13 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API

On Thu, Aug 23, 2018 at 07:36:15PM -0700, Andy Lutomirski wrote:

> > For fuck sake, mount(2) is a permanently supported option!  
> 
> Exactly. mount(2) has broken semantics and it’s permanently supported.
> 
> If this merge request gets pulled, then FSCONFIG_CMD_CREATE will *also* be a permanently supported API with broken semantics.

Oh no - mount(2) behaviour can be expressed that way!  The horror...

Andy, this is bullshit.  You are saying that dealing with mount(2)
mess of ABI (badly unorthogonal set of operations, overloading from
hell, etc.) must be tied with massive rework of fs drivers.  Why?

It's a simple enough question and pardon me, but your "it's broken"
doesn't inspire any confidence that you even understand what the
current behaviour is and what its problems (they are real enough)
are.  I have seen a lot of handwaving from you in these threads,
combined with "surely, it must work thus", followed with "it's
a special case/class/whatnot" or "surely, it must be broken" every
time you find something that doesn't "work thus".

Bugger it; explain why we must combine untangling the existing ABI
(and we *will* have to keep the existing semantics possible to
express for sys_mount() sake) with this, or with anything else,
for that matter.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  3:13         ` Al Viro
@ 2018-08-24  4:51           ` Andy Lutomirski
  2018-08-24  6:05             ` Theodore Y. Ts'o
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-24  4:51 UTC (permalink / raw)
  To: Al Viro; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API

On Thu, Aug 23, 2018 at 8:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Aug 23, 2018 at 07:36:15PM -0700, Andy Lutomirski wrote:
>
>> > For fuck sake, mount(2) is a permanently supported option!
>>
>> Exactly. mount(2) has broken semantics and it’s permanently supported.
>>
>> If this merge request gets pulled, then FSCONFIG_CMD_CREATE will *also* be a permanently supported API with broken semantics.
>
> Oh no - mount(2) behaviour can be expressed that way!  The horror...
>
> Andy, this is bullshit.  You are saying that dealing with mount(2)
> mess of ABI (badly unorthogonal set of operations, overloading from
> hell, etc.) must be tied with massive rework of fs drivers.  Why?
>

When this was reviewed earlier, a problem was identified.  I asked if
it had been addressed.  I did *not* say that it was mandatory to
address it, nor did I say anything about reworking fs drivers.

A reasonable answer might have been "avoiding this pitfall in the new
API would involve a large amount of reworking of existing filesystem
drivers.  I think that the new API, as is, has enough benefits that it
makes sense to merge it even with this pitfall, and, if needed, we can
introduce an improved version down the road."

I had at least hoped for a better answer than "bugger it."

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  4:51           ` Andy Lutomirski
@ 2018-08-24  6:05             ` Theodore Y. Ts'o
  2018-08-24  8:38               ` Miklos Szeredi
  2018-08-24  9:45               ` David Howells
  0 siblings, 2 replies; 32+ messages in thread
From: Theodore Y. Ts'o @ 2018-08-24  6:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, David Howells, Linus Torvalds, Linux FS Devel, Linux API

On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote:
> When this was reviewed earlier, a problem was identified.  I asked if
> it had been addressed.  I did *not* say that it was mandatory to
> address it, nor did I say anything about reworking fs drivers.
> 
> A reasonable answer might have been "avoiding this pitfall in the new
> API would involve a large amount of reworking of existing filesystem
> drivers.  I think that the new API, as is, has enough benefits that it
> makes sense to merge it even with this pitfall, and, if needed, we can
> introduce an improved version down the road."

It's also not clear that an API that you think is "cleaner" would
actually be more usable.  In fact, I believe it's going to be a sh*t
show for userspace, because it won't be obvious what will work, and
what will cause an error of the form, "sorry we can't do this cleaner
thing that some people think is better".  Which means a huge amount of
special casing in the program, or a lot of very surprising failures
that will then get exposed to the system administrator, many of whom
haven't really had much of a problem with the existing mount(8) user
interface.

It may very well be that the "cleaner" approach will be hellacious
from a human usability perspective.  Figuring all of this out could
take months and months (stalling the new mount API for a long time),
and we may never know for sure until we implement a full prototype of
kernel changes, massive rewrite of the file systems, and userspace
programs --- and then see what is the best way to expose the radically
different semantics depending on what individual file systems can and
can not do to the poor human system administrator who is just trying
to mount the d*mned file system.

	      	     	  	    - Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  6:05             ` Theodore Y. Ts'o
@ 2018-08-24  8:38               ` Miklos Szeredi
  2018-08-24  8:56                 ` Miklos Szeredi
  2018-08-24  9:45               ` David Howells
  1 sibling, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24  8:38 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Andy Lutomirski, Al Viro, David Howells, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 8:05 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote:
>> When this was reviewed earlier, a problem was identified.  I asked if
>> it had been addressed.  I did *not* say that it was mandatory to
>> address it, nor did I say anything about reworking fs drivers.
>>
>> A reasonable answer might have been "avoiding this pitfall in the new
>> API would involve a large amount of reworking of existing filesystem
>> drivers.  I think that the new API, as is, has enough benefits that it
>> makes sense to merge it even with this pitfall, and, if needed, we can
>> introduce an improved version down the road."
>
> It's also not clear that an API that you think is "cleaner" would
> actually be more usable.  In fact, I believe it's going to be a sh*t
> show for userspace, because it won't be obvious what will work, and
> what will cause an error of the form, "sorry we can't do this cleaner
> thing that some people think is better".  Which means a huge amount of
> special casing in the program, or a lot of very surprising failures
> that will then get exposed to the system administrator, many of whom
> haven't really had much of a problem with the existing mount(8) user
> interface.

In what way is the kernel better suited to read the mind of the poor
sysadmin, than a userspace helper program?

No, I don't think this argument is about the mount(8) interface, which
is what the sysadmin is using and can continue to use for the
foreseeable future, with basically the same feature set

It's about new uses of the mount(2) API  that *would* be possible if
it was a saner interface.   And so doing the new mount API radically
different than what current mount(2) is doing is quite all right in my
opinion.

> It may very well be that the "cleaner" approach will be hellacious
> from a human usability perspective.  Figuring all of this out could
> take months and months (stalling the new mount API for a long time),
> and we may never know for sure until we implement a full prototype of
> kernel changes, massive rewrite of the file systems, and userspace
> programs --- and then see what is the best way to expose the radically
> different semantics depending on what individual file systems can and
> can not do to the poor human system administrator who is just trying
> to mount the d*mned file system.

I'm not against merging this patchset (have nits about resuing the
MS_* constants for the new API that I've complained about, but that's
really easy to fix before -final), but it may make sense to
differentiate the legacy behavior from a saner one from the very
start.  I.e. rename FSCONFIG_CMD_CREATE to something implying it's
actually *not* a create in exclusive sense that one would first imply
(and yeah, we have creat() and O_CREAT, which don't imply exclusivity,
yet they at least have clear semantics that current super block
creation does definitely not).

Also, we don't have a massive conversion of filesystems yet to the new
internal API, so we could require anything converted to support the
exclusive create semantics.

None of this requires months of figuring out, just a bit of tweaking
and acknowledging that the current semantics are weird, to say the
least.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  8:38               ` Miklos Szeredi
@ 2018-08-24  8:56                 ` Miklos Szeredi
  2018-08-24  9:29                   ` Miklos Szeredi
  0 siblings, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24  8:56 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Andy Lutomirski, Al Viro, David Howells, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 10:38 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Aug 24, 2018 at 8:05 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
>> On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote:
>>> When this was reviewed earlier, a problem was identified.  I asked if
>>> it had been addressed.  I did *not* say that it was mandatory to
>>> address it, nor did I say anything about reworking fs drivers.
>>>
>>> A reasonable answer might have been "avoiding this pitfall in the new
>>> API would involve a large amount of reworking of existing filesystem
>>> drivers.  I think that the new API, as is, has enough benefits that it
>>> makes sense to merge it even with this pitfall, and, if needed, we can
>>> introduce an improved version down the road."
>>
>> It's also not clear that an API that you think is "cleaner" would
>> actually be more usable.  In fact, I believe it's going to be a sh*t
>> show for userspace, because it won't be obvious what will work, and
>> what will cause an error of the form, "sorry we can't do this cleaner
>> thing that some people think is better".  Which means a huge amount of
>> special casing in the program, or a lot of very surprising failures
>> that will then get exposed to the system administrator, many of whom
>> haven't really had much of a problem with the existing mount(8) user
>> interface.
>
> In what way is the kernel better suited to read the mind of the poor
> sysadmin, than a userspace helper program?

I have a concrete example:  mount -oloop.  You can leave it to
mount(8) to automagically find an existing loop device or setup a new
one, or you can do the low level thing and set up your own loop device
and mount it.  Same story as NFS and friends, except it's not the
kernel that does the magic "same source -> same sb" policy but the
mount(8) utility.  Ever notice the difference?  See?   And yeah, there
are races involved, and userspace is perfectly suited to deal with
them.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  8:56                 ` Miklos Szeredi
@ 2018-08-24  9:29                   ` Miklos Szeredi
  0 siblings, 0 replies; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24  9:29 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Andy Lutomirski, Al Viro, David Howells, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 10:56 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Aug 24, 2018 at 10:38 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Aug 24, 2018 at 8:05 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
>>> On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote:
>>>> When this was reviewed earlier, a problem was identified.  I asked if
>>>> it had been addressed.  I did *not* say that it was mandatory to
>>>> address it, nor did I say anything about reworking fs drivers.
>>>>
>>>> A reasonable answer might have been "avoiding this pitfall in the new
>>>> API would involve a large amount of reworking of existing filesystem
>>>> drivers.  I think that the new API, as is, has enough benefits that it
>>>> makes sense to merge it even with this pitfall, and, if needed, we can
>>>> introduce an improved version down the road."
>>>
>>> It's also not clear that an API that you think is "cleaner" would
>>> actually be more usable.  In fact, I believe it's going to be a sh*t
>>> show for userspace, because it won't be obvious what will work, and
>>> what will cause an error of the form, "sorry we can't do this cleaner
>>> thing that some people think is better".  Which means a huge amount of
>>> special casing in the program, or a lot of very surprising failures
>>> that will then get exposed to the system administrator, many of whom
>>> haven't really had much of a problem with the existing mount(8) user
>>> interface.
>>
>> In what way is the kernel better suited to read the mind of the poor
>> sysadmin, than a userspace helper program?
>
> I have a concrete example:  mount -oloop.  You can leave it to
> mount(8) to automagically find an existing loop device or setup a new
> one, or you can do the low level thing and set up your own loop device
> and mount it.  Same story as NFS and friends, except it's not the
> kernel that does the magic "same source -> same sb" policy but the
> mount(8) utility.  Ever notice the difference?  See?   And yeah, there
> are races involved, and userspace is perfectly suited to deal with
> them.

And to continue from that thought, a namespace for filesystem
instances could, for example, live under /dev/fs/$FSTYPE/$INSTANCE

Naming the $INSTANCE could be done by the one creating that instance,
or in case of legacy mounts could have a fixed prefix + sequence
number, or something similar.

All this could come later, let's just not exclude the possibility of
using the new API in this way.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  6:05             ` Theodore Y. Ts'o
  2018-08-24  8:38               ` Miklos Szeredi
@ 2018-08-24  9:45               ` David Howells
  2018-08-24 10:06                 ` Miklos Szeredi
                                   ` (3 more replies)
  1 sibling, 4 replies; 32+ messages in thread
From: David Howells @ 2018-08-24  9:45 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Theodore Y. Ts'o, Andy Lutomirski, Al Viro,
	Linus Torvalds, Linux FS Devel, Linux API

Miklos Szeredi <miklos@szeredi.hu> wrote:

> I'm not against merging this patchset (have nits about resuing the
> MS_* constants for the new API that I've complained about, but that's
> really easy to fix before -final),

Can you send me a patch that does what you want here?  They're only used by
fsmount() for creating a vfsmount and not used for the superblock creation or
reconfiguration.

> but it may make sense to differentiate the legacy behavior from a saner one
> from the very start.  I.e. rename FSCONFIG_CMD_CREATE

I would suggest leaving it as-is and add an FSCONFIG_CMD_CREATE_EXCLUSIVE.

> to something implying it's actually *not* a create in exclusive sense that
> one would first imply (and yeah, we have creat() and O_CREAT, which don't
> imply exclusivity, yet they at least have clear semantics that current super
> block creation does definitely not).

The problem is that "exclusivity" isn't necessarily an easy thing to define.
Take nfs4 and btrfs for example.  They creating a backing superblock that the
actual node is derived from (though in different ways).  How do you define
what "exclusive" means in their case?

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  9:45               ` David Howells
@ 2018-08-24 10:06                 ` Miklos Szeredi
  2018-08-24 14:18                 ` Miklos Szeredi
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24 10:06 UTC (permalink / raw)
  To: David Howells
  Cc: Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 11:45 AM, David Howells <dhowells@redhat.com> wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> I'm not against merging this patchset (have nits about resuing the
>> MS_* constants for the new API that I've complained about, but that's
>> really easy to fix before -final),
>
> Can you send me a patch that does what you want here?  They're only used by
> fsmount() for creating a vfsmount and not used for the superblock creation or
> reconfiguration.
>
>> but it may make sense to differentiate the legacy behavior from a saner one
>> from the very start.  I.e. rename FSCONFIG_CMD_CREATE
>
> I would suggest leaving it as-is and add an FSCONFIG_CMD_CREATE_EXCLUSIVE.
>
>> to something implying it's actually *not* a create in exclusive sense that
>> one would first imply (and yeah, we have creat() and O_CREAT, which don't
>> imply exclusivity, yet they at least have clear semantics that current super
>> block creation does definitely not).
>
> The problem is that "exclusivity" isn't necessarily an easy thing to define.
> Take nfs4 and btrfs for example.  They creating a backing superblock that the
> actual node is derived from (though in different ways).  How do you define
> what "exclusive" means in their case?

Exclusive creation of a filesystem instance is just that:  we promise
that no one else is using that filesystem instance, and we can set
options as we like, guaranteed that no bother to any other uses.

Now, with block filesystems that can only be done once, obviously, so
if someone already created a btrfs instance using /dev/foobar then
trying to create another instance will fail with EEXIST, EBUSY or
whaterver.

For NFS it's a different story, it *is* possible to create several
instances of the filesystem even if connecting to the same server,
except then they will not share caches, will not have local coherency,
etc..

Regardless, I think a "filesystem instance" is a pretty clear concept,
that we can base a future API on and it's not just an internal
implementation detail that the current mount(2) API makes look like.
It *is* in fact the only concept that makes FSCONFIG_CMD_RECONFIGURE
have sane meaning (i.e. I want to change *that* instance, and am fully
aware what affect that will have on mounts of that instance).

Compare that to "mount -o remount path"; that one would imply we are
changing something for a mount that is at "path", but if there are
more than one users of that instance (which is based on magic
algorithm inside the filesystem) than that might have unwanted side
effects.  It's what Eric's complaint is about, and it's I think the
root of the issue at hand.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  9:45               ` David Howells
  2018-08-24 10:06                 ` Miklos Szeredi
@ 2018-08-24 14:18                 ` Miklos Szeredi
  2018-08-24 14:26                   ` Karel Zak
  2018-08-24 14:26                 ` David Howells
  2018-08-24 14:49                 ` Andy Lutomirski
  3 siblings, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24 14:18 UTC (permalink / raw)
  To: David Howells
  Cc: Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 10:45:31AM +0100, David Howells wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > I'm not against merging this patchset (have nits about resuing the
> > MS_* constants for the new API that I've complained about, but that's
> > really easy to fix before -final),
> 
> Can you send me a patch that does what you want here?  They're only used by
> fsmount() for creating a vfsmount and not used for the superblock creation or
> reconfiguration.

I'm bothered by the fact that we use the same MS_ prefix in the old mount(2) api
and the new fsmount(2) api.  What happens if we introduce new flags for
fsmount(2) and are already out of flags for mount(2)?  I see a big mess that
way.

Also notice, how the old code just totally ignored MS_RELATIME?  Bit of rot in
the new interface already.

So here's a patch. The MNT_ prefix is already used by libmount, and we don't
want any confusion arising from that.  So how about M_*?  Short and sweet, just
like O_* for open(2).

Reusing the same values as MNT_ isn't important, but it does reduce the code
size a bit.  Especially if we'd switch to MNT_STRICTATIME internally as well.


Thanks,
Miklos
---

diff --git a/fs/namespace.c b/fs/namespace.c
index e34e3fd064b0..a4b8503f80b8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3351,7 +3351,7 @@ EXPORT_SYMBOL_GPL(kern_mount);
  * Create a kernel mount representation for a new, prepared superblock
  * (specified by fs_fd) and attach to an open_tree-like file descriptor.
  */
-SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags)
+SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, m_flags)
 {
 	struct fs_context *fc;
 	struct file *file;
@@ -3366,27 +3366,21 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags
 	if ((flags & ~(FSMOUNT_CLOEXEC)) != 0)
 		return -EINVAL;
 
-	if (ms_flags & ~(MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC |
-			 MS_NOATIME | MS_NODIRATIME | MS_RELATIME |
-			 MS_STRICTATIME))
+	if (m_flags & ~(M_NOSUID | M_NODEV | M_NOEXEC |	M_NOATIME |
+			M_NODIRATIME | M_STRICTATIME | M_RDONLY))
 		return -EINVAL;
 
-	if (ms_flags & MS_RDONLY)
-		mnt_flags |= MNT_READONLY;
-	if (ms_flags & MS_NOSUID)
-		mnt_flags |= MNT_NOSUID;
-	if (ms_flags & MS_NODEV)
-		mnt_flags |= MNT_NODEV;
-	if (ms_flags & MS_NOEXEC)
-		mnt_flags |= MNT_NOEXEC;
-	if (ms_flags & MS_NODIRATIME)
-		mnt_flags |= MNT_NODIRATIME;
+	BUILD_BUG_ON(M_NOSUID != MNT_NOSUID || M_NODEV != MNT_NODEV ||
+		     M_NOEXEC != MNT_NOEXEC || M_NOATIME != MNT_NOATIME ||
+		     M_NODIRATIME != MNT_NODIRATIME ||
+		     M_STRICTATIME != MNT_RELATIME || M_RDONLY != MNT_READONLY);
+
+	/* Relatime is the default, but internally that's what we flag */
+	mnt_flags = m_flags & ~M_STRICTATIME;
 
-	if (ms_flags & MS_STRICTATIME) {
-		if (ms_flags & MS_NOATIME)
+	if (m_flags & M_STRICTATIME) {
+		if (m_flags & M_NOATIME)
 			return -EINVAL;
-	} else if (ms_flags & MS_NOATIME) {
-		mnt_flags |= MNT_NOATIME;
 	} else {
 		mnt_flags |= MNT_RELATIME;
 	}
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 3634e065836c..331e69f5a7bf 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -1,6 +1,15 @@
 #ifndef _UAPI_LINUX_MOUNT_H
 #define _UAPI_LINUX_MOUNT_H
 
+/* Mount flags passed to fsmount(2) */
+#define M_NOSUID	0x01
+#define M_NODEV		0x02
+#define M_NOEXEC	0x04
+#define M_NOATIME	0x08
+#define M_NODIRATIME	0x10
+#define M_STRICTATIME	0x20
+#define M_RDONLY	0x40
+
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported
  *

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  9:45               ` David Howells
  2018-08-24 10:06                 ` Miklos Szeredi
  2018-08-24 14:18                 ` Miklos Szeredi
@ 2018-08-24 14:26                 ` David Howells
  2018-08-24 14:30                   ` Miklos Szeredi
  2018-08-24 14:49                 ` Andy Lutomirski
  3 siblings, 1 reply; 32+ messages in thread
From: David Howells @ 2018-08-24 14:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Theodore Y. Ts'o, Andy Lutomirski, Al Viro,
	Linus Torvalds, Linux FS Devel, Linux API

Miklos Szeredi <miklos@szeredi.hu> wrote:

> +/* Mount flags passed to fsmount(2) */
> +#define M_NOSUID	0x01
> +#define M_NODEV		0x02
> +#define M_NOEXEC	0x04
> +#define M_NOATIME	0x08
> +#define M_NODIRATIME	0x10
> +#define M_STRICTATIME	0x20
> +#define M_RDONLY	0x40

If we're going to do this, I would suggest a longer prefix than just 'M' and
renumber them to put *_RDONLY first.

> +	BUILD_BUG_ON(M_NOSUID != MNT_NOSUID || M_NODEV != MNT_NODEV ||
> +		     M_NOEXEC != MNT_NOEXEC || M_NOATIME != MNT_NOATIME ||
> +		     M_NODIRATIME != MNT_NODIRATIME ||
> +		     M_STRICTATIME != MNT_RELATIME || M_RDONLY != MNT_READONLY);

Please don't, please do:

	if (ms_flags & M_RDONLY)
		mnt_flags |= MNT_READONLY;

Yes, and at some point I'd also like to compress the numbering on the SB_*
constants and break the identity as Christoph suggested.

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 14:18                 ` Miklos Szeredi
@ 2018-08-24 14:26                   ` Karel Zak
  0 siblings, 0 replies; 32+ messages in thread
From: Karel Zak @ 2018-08-24 14:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Theodore Y. Ts'o, Andy Lutomirski, Al Viro,
	Linus Torvalds, Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 04:18:10PM +0200, Miklos Szeredi wrote:
> I'm bothered by the fact that we use the same MS_ prefix in the old mount(2) api
> and the new fsmount(2) api.  What happens if we introduce new flags for
> fsmount(2) and are already out of flags for mount(2)?  I see a big mess that
> way.
> 
> Also notice, how the old code just totally ignored MS_RELATIME?  Bit of rot in
> the new interface already.
> 
> So here's a patch. The MNT_ prefix is already used by libmount, and we don't
> want any confusion arising from that.  So how about M_*?  Short and sweet, just
> like O_* for open(2).

+1 Please.

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 14:26                 ` David Howells
@ 2018-08-24 14:30                   ` Miklos Szeredi
  0 siblings, 0 replies; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24 14:30 UTC (permalink / raw)
  To: David Howells
  Cc: Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 4:26 PM, David Howells <dhowells@redhat.com> wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> +/* Mount flags passed to fsmount(2) */
>> +#define M_NOSUID     0x01
>> +#define M_NODEV              0x02
>> +#define M_NOEXEC     0x04
>> +#define M_NOATIME    0x08
>> +#define M_NODIRATIME 0x10
>> +#define M_STRICTATIME        0x20
>> +#define M_RDONLY     0x40
>
> If we're going to do this, I would suggest a longer prefix than just 'M'

Any suggestions?

We could do "MOUNT_", but that sounds tad too long.  But hey, that
doesn't matter much either.   Let me know your preference.

>  and
> renumber them to put *_RDONLY first.
>
>> +     BUILD_BUG_ON(M_NOSUID != MNT_NOSUID || M_NODEV != MNT_NODEV ||
>> +                  M_NOEXEC != MNT_NOEXEC || M_NOATIME != MNT_NOATIME ||
>> +                  M_NODIRATIME != MNT_NODIRATIME ||
>> +                  M_STRICTATIME != MNT_RELATIME || M_RDONLY != MNT_READONLY);
>
> Please don't, please do:
>
>         if (ms_flags & M_RDONLY)
>                 mnt_flags |= MNT_READONLY;

As I said, I don't really care either way.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24  9:45               ` David Howells
                                   ` (2 preceding siblings ...)
  2018-08-24 14:26                 ` David Howells
@ 2018-08-24 14:49                 ` Andy Lutomirski
  2018-08-24 15:02                   ` Miklos Szeredi
  3 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-24 14:49 UTC (permalink / raw)
  To: David Howells
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API



> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote:

> 
> 
> The problem is that "exclusivity" isn't necessarily an easy thing to define.
> Take nfs4 and btrfs for example.  They creating a backing superblock that the
> actual node is derived from (though in different ways).  How do you define
> what "exclusive" means in their case?
> 

I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.”  This means, at least, that all options need to be respected as specified (or fail if they can’t be) and future reconfigure/remount actions only affect *this* instance and not others.

NFS might already meet these requirements, although I’m not sure about the latter one.

But it might be that adding CREATE_EXCLUSIVE is not all that useful and it would be better to have an API like you sketched where you first explicitly instantiate the driver and then you create however many trees (a la fsmount()) you want, where each tree has its own set of per-tree options like “subvol” for btrfs.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 14:49                 ` Andy Lutomirski
@ 2018-08-24 15:02                   ` Miklos Szeredi
  2018-08-24 15:09                     ` Andy Lutomirski
  2018-08-24 17:10                     ` David Howells
  0 siblings, 2 replies; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24 15:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote:
>
>>
>>
>> The problem is that "exclusivity" isn't necessarily an easy thing to define.
>> Take nfs4 and btrfs for example.  They creating a backing superblock that the
>> actual node is derived from (though in different ways).  How do you define
>> what "exclusive" means in their case?
>>
>
> I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.”

The only way to get semantically equivalent instance is to create a
fully independent instance.  See reconfigure (nee remount).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 15:02                   ` Miklos Szeredi
@ 2018-08-24 15:09                     ` Andy Lutomirski
  2018-08-24 17:08                       ` Miklos Szeredi
  2018-08-24 17:10                     ` David Howells
  1 sibling, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-24 15:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API


> On Aug 24, 2018, at 8:02 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
>> On Fri, Aug 24, 2018 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>> 
>>>> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote:
>>> 
>>> 
>>> 
>>> The problem is that "exclusivity" isn't necessarily an easy thing to define.
>>> Take nfs4 and btrfs for example.  They creating a backing superblock that the
>>> actual node is derived from (though in different ways).  How do you define
>>> what "exclusive" means in their case?
>>> 
>> 
>> I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.”
> 
> The only way to get semantically equivalent instance is to create a
> fully independent instance.  See reconfigure (nee remount).
> 
> 

Hmm. Is it that case in the current patchset that you can do CMD_CREATE and reconfigure the result and some *other* existing mount will change?  If so, that’s rather unfriendly to users.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 15:09                     ` Andy Lutomirski
@ 2018-08-24 17:08                       ` Miklos Szeredi
  0 siblings, 0 replies; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24 17:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 5:09 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Aug 24, 2018, at 8:02 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>> On Fri, Aug 24, 2018 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>
>>>
>>>>> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> The problem is that "exclusivity" isn't necessarily an easy thing to define.
>>>> Take nfs4 and btrfs for example.  They creating a backing superblock that the
>>>> actual node is derived from (though in different ways).  How do you define
>>>> what "exclusive" means in their case?
>>>>
>>>
>>> I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.”
>>
>> The only way to get semantically equivalent instance is to create a
>> fully independent instance.  See reconfigure (nee remount).
>>
>>
>
> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and reconfigure the result and some *other* existing mount will change?  If so, that’s rather unfriendly to users.

Exactly.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 15:02                   ` Miklos Szeredi
  2018-08-24 15:09                     ` Andy Lutomirski
@ 2018-08-24 17:10                     ` David Howells
  2018-08-24 17:43                       ` Andy Lutomirski
  1 sibling, 1 reply; 32+ messages in thread
From: David Howells @ 2018-08-24 17:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dhowells, Miklos Szeredi, Theodore Y. Ts'o, Al Viro,
	Linus Torvalds, Linux FS Devel, Linux API

Andy Lutomirski <luto@amacapital.net> wrote:

> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and
> reconfigure the result and some *other* existing mount will change?  If so,
> that’s rather unfriendly to users.

The default behaviour has to be the same as mount(2).

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 17:10                     ` David Howells
@ 2018-08-24 17:43                       ` Andy Lutomirski
  2018-08-24 19:25                         ` Miklos Szeredi
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2018-08-24 17:43 UTC (permalink / raw)
  To: David Howells
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote:
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and
>> reconfigure the result and some *other* existing mount will change?  If so,
>> that’s rather unfriendly to users.
>
> The default behaviour has to be the same as mount(2).
>

Why?

Obviously mount(2) needs to keep working the way it does now.  Most
likely mount(8) also needs to retain its current behavior.  But I
don't see why the new API needs be bug-compatible with mount(2).

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 17:43                       ` Andy Lutomirski
@ 2018-08-24 19:25                         ` Miklos Szeredi
  2018-08-24 19:51                           ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-24 19:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds,
	Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 7:43 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> wrote:
>>
>>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and
>>> reconfigure the result and some *other* existing mount will change?  If so,
>>> that’s rather unfriendly to users.
>>
>> The default behaviour has to be the same as mount(2).

This is rubbish.  Anyone wanting the mount(2) behavior can use mount(2).

About exclusive create: can't we just look at the active reference
count of the superblock returned by ->get_tree() (if it's one, we are
the only users, i.e. the create was exclusive)?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 19:25                         ` Miklos Szeredi
@ 2018-08-24 19:51                           ` Al Viro
  2018-08-29 12:32                             ` Miklos Szeredi
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2018-08-24 19:51 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andy Lutomirski, David Howells, Theodore Y. Ts'o,
	Linus Torvalds, Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 09:25:58PM +0200, Miklos Szeredi wrote:
> On Fri, Aug 24, 2018 at 7:43 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote:
> >> Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and
> >>> reconfigure the result and some *other* existing mount will change?  If so,
> >>> that’s rather unfriendly to users.
> >>
> >> The default behaviour has to be the same as mount(2).
> 
> This is rubbish.  Anyone wanting the mount(2) behavior can use mount(2).
> 
> About exclusive create: can't we just look at the active reference
> count of the superblock returned by ->get_tree() (if it's one, we are
> the only users, i.e. the create was exclusive)?

Let me get it straight - your default behaviour would routinely refuse NFS mount
simply because somebody has already mounted from the same server?

The main reason for keeping existing semantics is very, very simple: nobody
has offered a sane replacement.  For all warts (and I admit that policy
re sharing turned out to be rather bad for any kind of situation with
non-cooperative admins - in partial defense, back in 2000/2001 when it was
done anybody talking about something like userns would've gotten laughed at,
for a lot of good reasons) mount(2) semantics is defined *and* needs to be
supported anyway.

All suggested "better replacements" were bloody problematic.  Sure, we'll need
to sort that out, but again, why the hell tie that to untangling the sodding
mount(2) ABI?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-23 22:31 [git pull] new mount API Al Viro
  2018-08-23 23:24 ` Andy Lutomirski
  2018-08-24  0:08 ` David Howells
@ 2018-08-26  3:22 ` Eric W. Biederman
  2018-08-26 20:42 ` David Howells
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Eric W. Biederman @ 2018-08-26  3:22 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, dhowells, linux-fsdevel, linux-api


Sigh.

I just noticed that this tree reintroduces bugs that were fixed with
cfb2f6f6e0ba ("Revert "mqueue: switch to on-demand creation of internal mount"")

So in addition to ongoing discussion about what the new mount api should
look like we have bugs in the implementation.  This really does not look
like this branch is ready to be merged yet.

Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-23 22:31 [git pull] new mount API Al Viro
                   ` (2 preceding siblings ...)
  2018-08-26  3:22 ` Eric W. Biederman
@ 2018-08-26 20:42 ` David Howells
  2018-08-26 20:46 ` David Howells
  2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells
  5 siblings, 0 replies; 32+ messages in thread
From: David Howells @ 2018-08-26 20:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: dhowells, Al Viro, Linus Torvalds, linux-fsdevel, linux-api

Eric W. Biederman <ebiederm@xmission.com> wrote:

> I just noticed that this tree reintroduces bugs that were fixed with
> cfb2f6f6e0ba ("Revert "mqueue: switch to on-demand creation of internal
> mount"")
> 
> So in addition to ongoing discussion about what the new mount api should
> look like ...

Which don't need to be solved *before* the current patches go in.  No strong
reason has been given why the desired changes cannot wait one cycle.

The core needs to be upstream before we can start porting most of the
filesystems to it.

> ... we have bugs in the implementation.

Allegedly.  It would be useful if you could've pointed out what it was that
you think you see.  I don't see offhand what it reintroduces.

And how come this hasn't been noticed since the patches have been sat in
linux-next for a while?

> This really does not look like this branch is ready to be merged yet.

I disagree (probably obviously).

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-23 22:31 [git pull] new mount API Al Viro
                   ` (3 preceding siblings ...)
  2018-08-26 20:42 ` David Howells
@ 2018-08-26 20:46 ` David Howells
  2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells
  5 siblings, 0 replies; 32+ messages in thread
From: David Howells @ 2018-08-26 20:46 UTC (permalink / raw)
  Cc: dhowells, Eric W. Biederman, Al Viro, Linus Torvalds,
	linux-fsdevel, linux-api

David Howells <dhowells@redhat.com> wrote:

> > ... we have bugs in the implementation.
> 
> Allegedly.  It would be useful if you could've pointed out what it was that
> you think you see.  I don't see offhand what it reintroduces.

This?

-	ns->mq_mnt = kern_mount_data(&mqueue_fs_type, ns, 0);
-	if (IS_ERR(ns->mq_mnt)) {
-		int err = PTR_ERR(ns->mq_mnt);
-		ns->mq_mnt = NULL;
-		return err;
-	}
+	m = mq_create_mount(&init_ipc_ns);
+	if (IS_ERR(m))
+		return PTR_ERR(m);
+	ns->mq_mnt = m;

Should be passing ns into mq_create_mount() rather than init_ipc_ns maybe?

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] mqueue: Fix bug from mount API conversion
  2018-08-23 22:31 [git pull] new mount API Al Viro
                   ` (4 preceding siblings ...)
  2018-08-26 20:46 ` David Howells
@ 2018-08-26 21:03 ` David Howells
  2018-08-26 21:22   ` Al Viro
  5 siblings, 1 reply; 32+ messages in thread
From: David Howells @ 2018-08-26 21:03 UTC (permalink / raw)
  To: Al Viro; +Cc: dhowells, Linus Torvalds, linux-fsdevel, linux-api

Hi Al, Linus,

I *think* the attached patch is needed with the new mount API.

David
---
mqueue: Fix bug from mount API conversion

Fix a possible bug from the mount API conversion of the mqueue filesystem
whereby mq_init_ns() passes the root IPC namespace to mq_create_mount()
rather than the namespace passed in.

I don't know if this is the bug Eric is reporting since he didn't give any
details.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 0f102210f89e..4fbbd6f5b588 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1618,7 +1618,7 @@ int mq_init_ns(struct ipc_namespace *ns)
 	ns->mq_msg_default   = DFLT_MSG;
 	ns->mq_msgsize_default  = DFLT_MSGSIZE;
 
-	m = mq_create_mount(&init_ipc_ns);
+	m = mq_create_mount(ns);
 	if (IS_ERR(m))
 		return PTR_ERR(m);
 	ns->mq_mnt = m;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mqueue: Fix bug from mount API conversion
  2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells
@ 2018-08-26 21:22   ` Al Viro
  0 siblings, 0 replies; 32+ messages in thread
From: Al Viro @ 2018-08-26 21:22 UTC (permalink / raw)
  To: David Howells; +Cc: Linus Torvalds, linux-fsdevel, linux-api

On Sun, Aug 26, 2018 at 10:03:48PM +0100, David Howells wrote:
> Hi Al, Linus,
> 
> I *think* the attached patch is needed with the new mount API.
> 
> David
> ---
> mqueue: Fix bug from mount API conversion
> 
> Fix a possible bug from the mount API conversion of the mqueue filesystem
> whereby mq_init_ns() passes the root IPC namespace to mq_create_mount()
> rather than the namespace passed in.
> 
> I don't know if this is the bug Eric is reporting since he didn't give any
> details.

Applied and pushed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [git pull] new mount API
  2018-08-24 19:51                           ` Al Viro
@ 2018-08-29 12:32                             ` Miklos Szeredi
  0 siblings, 0 replies; 32+ messages in thread
From: Miklos Szeredi @ 2018-08-29 12:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, David Howells, Theodore Y. Ts'o,
	Linus Torvalds, Linux FS Devel, Linux API

On Fri, Aug 24, 2018 at 9:51 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Aug 24, 2018 at 09:25:58PM +0200, Miklos Szeredi wrote:
>> On Fri, Aug 24, 2018 at 7:43 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote:
>> >> Andy Lutomirski <luto@amacapital.net> wrote:
>> >>
>> >>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and
>> >>> reconfigure the result and some *other* existing mount will change?  If so,
>> >>> that’s rather unfriendly to users.
>> >>
>> >> The default behaviour has to be the same as mount(2).
>>
>> This is rubbish.  Anyone wanting the mount(2) behavior can use mount(2).
>>
>> About exclusive create: can't we just look at the active reference
>> count of the superblock returned by ->get_tree() (if it's one, we are
>> the only users, i.e. the create was exclusive)?
>
> Let me get it straight - your default behaviour would routinely refuse NFS mount
> simply because somebody has already mounted from the same server?

Yeah, bad idea.

> The main reason for keeping existing semantics is very, very simple: nobody
> has offered a sane replacement.  For all warts (and I admit that policy
> re sharing turned out to be rather bad for any kind of situation with
> non-cooperative admins - in partial defense, back in 2000/2001 when it was
> done anybody talking about something like userns would've gotten laughed at,
> for a lot of good reasons) mount(2) semantics is defined *and* needs to be
> supported anyway.
>
> All suggested "better replacements" were bloody problematic.  Sure, we'll need
> to sort that out, but again, why the hell tie that to untangling the sodding
> mount(2) ABI?

What I was primarily suggesting is pretty simple, yet nobody seems to
get it: name the current behavior what it is: legacy. Call this op
e.g. CMD_LEGACY_FIND_OR_CREATE.  It means "you're using the shiny new
API, but be warned: we'll still use the old and stinky way to get the
super block".

We'll hopefully untangle it, sooner rather than later, and then have
ops with less scary names.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-08-29 16:29 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-23 22:31 [git pull] new mount API Al Viro
2018-08-23 23:24 ` Andy Lutomirski
2018-08-24  0:08 ` David Howells
2018-08-24  0:16   ` Andy Lutomirski
2018-08-24  0:31     ` Al Viro
2018-08-24  2:36       ` Andy Lutomirski
2018-08-24  3:13         ` Al Viro
2018-08-24  4:51           ` Andy Lutomirski
2018-08-24  6:05             ` Theodore Y. Ts'o
2018-08-24  8:38               ` Miklos Szeredi
2018-08-24  8:56                 ` Miklos Szeredi
2018-08-24  9:29                   ` Miklos Szeredi
2018-08-24  9:45               ` David Howells
2018-08-24 10:06                 ` Miklos Szeredi
2018-08-24 14:18                 ` Miklos Szeredi
2018-08-24 14:26                   ` Karel Zak
2018-08-24 14:26                 ` David Howells
2018-08-24 14:30                   ` Miklos Szeredi
2018-08-24 14:49                 ` Andy Lutomirski
2018-08-24 15:02                   ` Miklos Szeredi
2018-08-24 15:09                     ` Andy Lutomirski
2018-08-24 17:08                       ` Miklos Szeredi
2018-08-24 17:10                     ` David Howells
2018-08-24 17:43                       ` Andy Lutomirski
2018-08-24 19:25                         ` Miklos Szeredi
2018-08-24 19:51                           ` Al Viro
2018-08-29 12:32                             ` Miklos Szeredi
2018-08-26  3:22 ` Eric W. Biederman
2018-08-26 20:42 ` David Howells
2018-08-26 20:46 ` David Howells
2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells
2018-08-26 21:22   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).