* [git pull] new mount API @ 2018-08-23 22:31 Al Viro 2018-08-23 23:24 ` Andy Lutomirski ` (5 more replies) 0 siblings, 6 replies; 32+ messages in thread From: Al Viro @ 2018-08-23 22:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: dhowells, linux-fsdevel, linux-api new mount API series from Dave Howells To quote his cover letter, Here are a set of patches to create a filesystem context prior to setting up a new mount, populating it with the parsed options/binary data, creating the superblock and then effecting the mount. This is also used for remount since much of the parsing stuff is common in many filesystems. This allows namespaces and other information to be conveyed through the mount procedure. This also allows Mikl�s Szeredi's idea of doing: fd = fsopen("nfs"); fsconfig(fd, FSCONFIG_SET_STRING, "option", "val", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); mfd = fsmount(fd, MS_NODEV); move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); that he presented at LSF-2017 to be implemented (see the relevant patches in the series). I didn't use netlink as that would make the core kernel depend on CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing issues. I've implemented filesystem context handling for procfs, nfs, mqueue, cpuset, kernfs, sysfs, cgroup and afs filesystems. Unconverted filesystems are handled by a legacy filesystem wrapper. One trivial conflict in fs/file_table.c:__fput(); resolved as if (unlikely(mode & FMODE_NEED_UNMOUNT)) dissolve_on_fput(mnt); dput(dentry); mntput(mnt); out: file_free(file); The following changes since commit 32206ab36553be8714d9253ce33f4085681d369c: x86/intel_rdt: Provide pseudo-locking hooks within rdt_mount (2018-06-23 12:53:19 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.mount for you to fetch changes up to 294fb3407bf29abf653fbe169af4bbb38a146db5: proc: Set correct userns for new proc super created by a new pid_namespace (2018-08-21 09:28:43 +0100) ---------------------------------------------------------------- Al Viro (2): vfs: syscall: Add open_tree(2) to reference or clone a mount teach move_mount(2) to work with OPEN_TREE_CLONE Andrei Vagin (1): proc: Set correct userns for new proc super created by a new pid_namespace David Howells (42): vfs: Require specification of size of mount data for internal mounts vfs: syscall: Add move_mount(2) to move mounts around vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled vfs: Introduce the basic header for the new mount API's filesystem context vfs: Introduce logging functions vfs: Add configuration parser helpers vfs: Add LSM hooks for the new mount API selinux: Implement the new mount API LSM hooks smack: Implement filesystem context security hooks apparmor: Implement security hooks for the new mount API tomoyo: Implement security hooks for the new mount API vfs: Separate changing mount flags full remount vfs: Implement a filesystem superblock creation/configuration context vfs: Remove unused code after filesystem context changes procfs: Move proc_fill_super() to fs/proc/root.c proc: Add fs_context support to procfs ipc: Convert mqueue fs to fs_context cpuset: Use fs_context kernfs, sysfs, cgroup, intel_rdt: Support fs_context hugetlbfs: Convert to fs_context vfs: Remove kern_mount_data() vfs: Provide documentation for new mount API Make anon_inodes unconditional vfs: syscall: Add fsopen() to prepare for superblock creation vfs: Implement logging through fs_context vfs: Add some logging to the core users of the fs_context log vfs: syscall: Add fsconfig() for configuring and managing a context vfs: syscall: Add fsmount() to create a mount for a superblock vfs: syscall: Add fspick() to select a superblock for reconfiguration afs: Add fs_context support afs: Use fs_context to pass parameters over automount vfs: Add a sample program for the new mount API vfs: syscall: Add fsinfo() to query filesystem information afs: Add fsinfo support vfs: Allow fsinfo() to query what's in an fs_context vfs: Allow fsinfo() to be used to query an fs parameter description vfs: Implement parameter value retrieval with fsinfo() vfs: Fix vfs_dup_fs_context() vfs: Fix fs_context logging when there's no log afs: Move the source fs parameter to the first position vfs: Pass path info fsinfo and rename get_fsinfo sb op to fsinfo vfs: Adjust fsinfo sample output Documentation/filesystems/mount_api.txt | 706 ++++++++++++++++++++++++ arch/arc/kernel/setup.c | 1 + arch/arm/kernel/atags_parse.c | 1 + arch/arm/kvm/Kconfig | 1 - arch/arm64/kvm/Kconfig | 1 - arch/ia64/kernel/perfmon.c | 3 +- arch/mips/kvm/Kconfig | 1 - arch/powerpc/kvm/Kconfig | 1 - arch/powerpc/platforms/cell/spufs/inode.c | 6 +- arch/s390/hypfs/inode.c | 7 +- arch/s390/kvm/Kconfig | 1 - arch/sh/kernel/setup.c | 1 + arch/sparc/kernel/setup_32.c | 1 + arch/sparc/kernel/setup_64.c | 1 + arch/x86/Kconfig | 1 - arch/x86/entry/syscalls/syscall_32.tbl | 7 + arch/x86/entry/syscalls/syscall_64.tbl | 7 + arch/x86/kernel/cpu/intel_rdt.h | 15 + arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 184 ++++--- arch/x86/kernel/setup.c | 1 + arch/x86/kvm/Kconfig | 1 - drivers/base/Kconfig | 1 - drivers/base/devtmpfs.c | 7 +- drivers/char/tpm/Kconfig | 1 - drivers/dax/super.c | 2 +- drivers/dma-buf/Kconfig | 1 - drivers/gpio/Kconfig | 1 - drivers/gpu/drm/drm_drv.c | 3 +- drivers/gpu/drm/i915/i915_gemfs.c | 2 +- drivers/iio/Kconfig | 1 - drivers/infiniband/Kconfig | 1 - drivers/infiniband/hw/qib/qib_fs.c | 7 +- drivers/misc/cxl/api.c | 3 +- drivers/misc/ibmasm/ibmasmfs.c | 11 +- drivers/mtd/mtdsuper.c | 26 +- drivers/oprofile/oprofilefs.c | 8 +- drivers/scsi/cxlflash/ocxl_hw.c | 2 +- drivers/usb/gadget/function/f_fs.c | 7 +- drivers/usb/gadget/legacy/inode.c | 7 +- drivers/vfio/Kconfig | 1 - drivers/virtio/virtio_balloon.c | 2 +- drivers/xen/xenfs/super.c | 7 +- fs/9p/vfs_super.c | 2 +- fs/Kconfig | 7 + fs/Makefile | 5 +- fs/adfs/super.c | 9 +- fs/affs/super.c | 13 +- fs/afs/internal.h | 10 +- fs/afs/mntpt.c | 147 ++--- fs/afs/super.c | 634 +++++++++++++-------- fs/afs/volume.c | 4 +- fs/aio.c | 3 +- fs/anon_inodes.c | 3 +- fs/autofs/autofs_i.h | 2 +- fs/autofs/init.c | 4 +- fs/autofs/inode.c | 3 +- fs/befs/linuxvfs.c | 11 +- fs/bfs/inode.c | 8 +- fs/binfmt_misc.c | 7 +- fs/block_dev.c | 2 +- fs/btrfs/super.c | 30 +- fs/btrfs/tests/btrfs-tests.c | 2 +- fs/ceph/super.c | 3 +- fs/cifs/cifs_dfs_ref.c | 3 +- fs/cifs/cifsfs.c | 18 +- fs/coda/inode.c | 11 +- fs/configfs/mount.c | 7 +- fs/cramfs/inode.c | 17 +- fs/debugfs/inode.c | 14 +- fs/devpts/inode.c | 10 +- fs/ecryptfs/main.c | 2 +- fs/efivarfs/super.c | 9 +- fs/efs/super.c | 14 +- fs/exofs/super.c | 7 +- fs/ext2/super.c | 14 +- fs/ext4/super.c | 16 +- fs/f2fs/super.c | 13 +- fs/fat/inode.c | 3 +- fs/fat/namei_msdos.c | 8 +- fs/fat/namei_vfat.c | 8 +- fs/file_table.c | 9 +- fs/filesystems.c | 4 + fs/freevxfs/vxfs_super.c | 12 +- fs/fs_context.c | 779 ++++++++++++++++++++++++++ fs/fs_parser.c | 483 ++++++++++++++++ fs/fsopen.c | 491 +++++++++++++++++ fs/fuse/control.c | 9 +- fs/fuse/inode.c | 16 +- fs/gfs2/ops_fstype.c | 6 +- fs/gfs2/super.c | 4 +- fs/hfs/super.c | 12 +- fs/hfsplus/super.c | 12 +- fs/hostfs/hostfs_kern.c | 7 +- fs/hpfs/super.c | 11 +- fs/hugetlbfs/inode.c | 455 ++++++++++------ fs/internal.h | 13 +- fs/isofs/inode.c | 11 +- fs/jffs2/super.c | 10 +- fs/jfs/super.c | 11 +- fs/kernfs/mount.c | 102 ++-- fs/libfs.c | 19 +- fs/minix/inode.c | 14 +- fs/namei.c | 4 +- fs/namespace.c | 879 ++++++++++++++++++++++-------- fs/nfs/internal.h | 4 +- fs/nfs/namespace.c | 3 +- fs/nfs/nfs4namespace.c | 3 +- fs/nfs/nfs4super.c | 27 +- fs/nfs/super.c | 22 +- fs/nfsd/nfsctl.c | 8 +- fs/nilfs2/super.c | 10 +- fs/notify/fanotify/Kconfig | 1 - fs/notify/inotify/Kconfig | 1 - fs/nsfs.c | 3 +- fs/ntfs/super.c | 13 +- fs/ocfs2/dlmfs/dlmfs.c | 5 +- fs/ocfs2/super.c | 14 +- fs/omfs/inode.c | 9 +- fs/openpromfs/inode.c | 11 +- fs/orangefs/orangefs-kernel.h | 2 +- fs/orangefs/super.c | 5 +- fs/overlayfs/super.c | 11 +- fs/pipe.c | 3 +- fs/pnode.c | 1 + fs/proc/inode.c | 50 +- fs/proc/internal.h | 6 +- fs/proc/root.c | 247 ++++++--- fs/pstore/inode.c | 10 +- fs/qnx4/inode.c | 14 +- fs/qnx6/inode.c | 14 +- fs/ramfs/inode.c | 6 +- fs/reiserfs/super.c | 14 +- fs/romfs/super.c | 13 +- fs/squashfs/super.c | 12 +- fs/statfs.c | 571 +++++++++++++++++++ fs/super.c | 397 +++++++++++--- fs/sysfs/mount.c | 67 ++- fs/sysv/inode.c | 3 +- fs/sysv/super.c | 16 +- fs/tracefs/inode.c | 10 +- fs/ubifs/super.c | 5 +- fs/udf/super.c | 16 +- fs/ufs/super.c | 11 +- fs/xfs/xfs_super.c | 10 +- include/linux/cgroup.h | 3 +- include/linux/debugfs.h | 8 +- include/linux/fs.h | 50 +- include/linux/fs_context.h | 208 +++++++ include/linux/fs_parser.h | 117 ++++ include/linux/fsinfo.h | 41 ++ include/linux/kernfs.h | 41 +- include/linux/lsm_hooks.h | 79 ++- include/linux/module.h | 6 + include/linux/mount.h | 10 +- include/linux/mtd/super.h | 4 +- include/linux/ramfs.h | 4 +- include/linux/security.h | 72 ++- include/linux/shmem_fs.h | 3 +- include/linux/syscalls.h | 13 + include/uapi/linux/fcntl.h | 2 + include/uapi/linux/fs.h | 82 ++- include/uapi/linux/fsinfo.h | 302 ++++++++++ include/uapi/linux/mount.h | 75 +++ init/Kconfig | 10 - init/do_mounts.c | 5 +- init/do_mounts_initrd.c | 1 + ipc/mqueue.c | 120 +++- kernel/bpf/inode.c | 7 +- kernel/cgroup/cgroup-internal.h | 50 +- kernel/cgroup/cgroup-v1.c | 415 ++++++++------ kernel/cgroup/cgroup.c | 285 +++++++--- kernel/cgroup/cpuset.c | 67 ++- kernel/trace/trace.c | 7 +- mm/shmem.c | 10 +- mm/zsmalloc.c | 3 +- net/socket.c | 3 +- net/sunrpc/rpc_pipe.c | 7 +- samples/Kconfig | 6 + samples/Makefile | 2 +- samples/mount_api/Makefile | 7 + samples/mount_api/test-fsmount.c | 118 ++++ samples/statx/Makefile | 7 +- samples/statx/test-fs-query.c | 137 +++++ samples/statx/test-fsinfo.c | 584 ++++++++++++++++++++ security/apparmor/apparmorfs.c | 8 +- security/apparmor/include/mount.h | 11 +- security/apparmor/lsm.c | 111 +++- security/apparmor/mount.c | 47 ++ security/inode.c | 7 +- security/security.c | 65 ++- security/selinux/hooks.c | 320 ++++++++++- security/selinux/selinuxfs.c | 8 +- security/smack/smack.h | 11 +- security/smack/smack_lsm.c | 373 +++++++++++-- security/smack/smackfs.c | 9 +- security/tomoyo/common.h | 3 + security/tomoyo/mount.c | 46 ++ security/tomoyo/tomoyo.c | 19 +- 198 files changed, 9288 insertions(+), 1874 deletions(-) create mode 100644 Documentation/filesystems/mount_api.txt create mode 100644 fs/fs_context.c create mode 100644 fs/fs_parser.c create mode 100644 fs/fsopen.c create mode 100644 include/linux/fs_context.h create mode 100644 include/linux/fs_parser.h create mode 100644 include/linux/fsinfo.h create mode 100644 include/uapi/linux/fsinfo.h create mode 100644 include/uapi/linux/mount.h create mode 100644 samples/mount_api/Makefile create mode 100644 samples/mount_api/test-fsmount.c create mode 100644 samples/statx/test-fs-query.c create mode 100644 samples/statx/test-fsinfo.c ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-23 22:31 [git pull] new mount API Al Viro @ 2018-08-23 23:24 ` Andy Lutomirski 2018-08-24 0:08 ` David Howells ` (4 subsequent siblings) 5 siblings, 0 replies; 32+ messages in thread From: Andy Lutomirski @ 2018-08-23 23:24 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, David Howells, Linux FS Devel, Linux API On Thu, Aug 23, 2018 at 3:31 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > new mount API series from Dave Howells > > To quote his cover letter, > Here are a set of patches to create a filesystem context prior to setting > up a new mount, populating it with the parsed options/binary data, creating > the superblock and then effecting the mount. This is also used for remount > since much of the parsing stuff is common in many filesystems. > > This allows namespaces and other information to be conveyed through the > mount procedure. > > This also allows Miklós Szeredi's idea of doing: > > fd = fsopen("nfs"); > fsconfig(fd, FSCONFIG_SET_STRING, "option", "val", 0); > fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); > mfd = fsmount(fd, MS_NODEV); > move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); > > that he presented at LSF-2017 to be implemented (see the relevant patches > in the series). > > I didn't use netlink as that would make the core kernel depend on > CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing > issues. > > I've implemented filesystem context handling for procfs, nfs, mqueue, > cpuset, kernfs, sysfs, cgroup and afs filesystems. > > Unconverted filesystems are handled by a legacy filesystem wrapper. > > One trivial conflict in fs/file_table.c:__fput(); resolved as > if (unlikely(mode & FMODE_NEED_UNMOUNT)) > dissolve_on_fput(mnt); > dput(dentry); > mntput(mnt); > out: > file_free(file); > Has anything been done to ensure that the behavior when doing FSCONFIG_CMD_CREATE against an already-mounted block device is reasonable? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-23 22:31 [git pull] new mount API Al Viro 2018-08-23 23:24 ` Andy Lutomirski @ 2018-08-24 0:08 ` David Howells 2018-08-24 0:16 ` Andy Lutomirski 2018-08-26 3:22 ` Eric W. Biederman ` (3 subsequent siblings) 5 siblings, 1 reply; 32+ messages in thread From: David Howells @ 2018-08-24 0:08 UTC (permalink / raw) To: Andy Lutomirski Cc: dhowells, Al Viro, Linus Torvalds, Linux FS Devel, Linux API Andy Lutomirski <luto@amacapital.net> wrote: > Has anything been done to ensure that the behavior when doing > FSCONFIG_CMD_CREATE against an already-mounted block device is > reasonable? For the moment, I've left it as the same behaviour as for mount(2) since mount(2) now uses the same mechanism internally and we aren't permitted to break userspace. I would like to add at least one flag to stipulate that, in the case of an incompatible collision, you can get a failure - but defining what is meant by incompatible isn't necessarily trivial, and would vary by filesystem *and* the LSM. However, I don't want to start reengineering everything this close to the merge window and we don't really need it immediately. David ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 0:08 ` David Howells @ 2018-08-24 0:16 ` Andy Lutomirski 2018-08-24 0:31 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2018-08-24 0:16 UTC (permalink / raw) To: David Howells; +Cc: Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Thu, Aug 23, 2018 at 5:08 PM, David Howells <dhowells@redhat.com> wrote: > Andy Lutomirski <luto@amacapital.net> wrote: > >> Has anything been done to ensure that the behavior when doing >> FSCONFIG_CMD_CREATE against an already-mounted block device is >> reasonable? > > For the moment, I've left it as the same behaviour as for mount(2) since > mount(2) now uses the same mechanism internally and we aren't permitted to > break userspace. > > I would like to add at least one flag to stipulate that, in the case of an > incompatible collision, you can get a failure - but defining what is meant by > incompatible isn't necessarily trivial, and would vary by filesystem *and* the > LSM. > > However, I don't want to start reengineering everything this close to the > merge window and we don't really need it immediately. > The problem is that, once this ends up merged, then we're kind of stuck with it, too. It would be a bit sad if your better proposal for handling nfs and instantiation of filesystems in general were added in the next release and then we end up with the current FSCONFIG_CMD_CREATE as a permanently supported but non-preferred option. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 0:16 ` Andy Lutomirski @ 2018-08-24 0:31 ` Al Viro 2018-08-24 2:36 ` Andy Lutomirski 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2018-08-24 0:31 UTC (permalink / raw) To: Andy Lutomirski; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API On Thu, Aug 23, 2018 at 05:16:53PM -0700, Andy Lutomirski wrote: > On Thu, Aug 23, 2018 at 5:08 PM, David Howells <dhowells@redhat.com> wrote: > > Andy Lutomirski <luto@amacapital.net> wrote: > > > >> Has anything been done to ensure that the behavior when doing > >> FSCONFIG_CMD_CREATE against an already-mounted block device is > >> reasonable? > > > > For the moment, I've left it as the same behaviour as for mount(2) since > > mount(2) now uses the same mechanism internally and we aren't permitted to > > break userspace. > > > > I would like to add at least one flag to stipulate that, in the case of an > > incompatible collision, you can get a failure - but defining what is meant by > > incompatible isn't necessarily trivial, and would vary by filesystem *and* the > > LSM. > > > > However, I don't want to start reengineering everything this close to the > > merge window and we don't really need it immediately. > > > > The problem is that, once this ends up merged, then we're kind of > stuck with it, too. It would be a bit sad if your better proposal for > handling nfs and instantiation of filesystems in general were added in > the next release and then we end up with the current > FSCONFIG_CMD_CREATE as a permanently supported but non-preferred > option. For fuck sake, mount(2) is a permanently supported option! Folks, get over it - you are mixing entirely different issues. You know, I know and everyone even remotely sane knows that mount(2) *IS* *NOT* *GOING* *AWAY*. And it's not changing semantics either. So bemoaning the "permanently supported non-preferred options" is utter lunacy. It's there and it will remain there, period. We do not break userland. And "their local scripts would break terribly inside userns container" does *NOT* render those second-class in any sense. So you can curse the current behaviour of mount(2) and I even agree with some of that, but we are not going to be able to remove that. Yes, it would've been nice if, etc., but it's not going to happen. Not now, not for many years. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 0:31 ` Al Viro @ 2018-08-24 2:36 ` Andy Lutomirski 2018-08-24 3:13 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2018-08-24 2:36 UTC (permalink / raw) To: Al Viro; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API > On Aug 23, 2018, at 5:31 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote: > >> On Thu, Aug 23, 2018 at 05:16:53PM -0700, Andy Lutomirski wrote: >>> On Thu, Aug 23, 2018 at 5:08 PM, David Howells <dhowells@redhat.com> wrote: >>> Andy Lutomirski <luto@amacapital.net> wrote: >>> >>>> Has anything been done to ensure that the behavior when doing >>>> FSCONFIG_CMD_CREATE against an already-mounted block device is >>>> reasonable? >>> >>> For the moment, I've left it as the same behaviour as for mount(2) since >>> mount(2) now uses the same mechanism internally and we aren't permitted to >>> break userspace. >>> >>> I would like to add at least one flag to stipulate that, in the case of an >>> incompatible collision, you can get a failure - but defining what is meant by >>> incompatible isn't necessarily trivial, and would vary by filesystem *and* the >>> LSM. >>> >>> However, I don't want to start reengineering everything this close to the >>> merge window and we don't really need it immediately. >>> >> >> The problem is that, once this ends up merged, then we're kind of >> stuck with it, too. It would be a bit sad if your better proposal for >> handling nfs and instantiation of filesystems in general were added in >> the next release and then we end up with the current >> FSCONFIG_CMD_CREATE as a permanently supported but non-preferred >> option. > > For fuck sake, mount(2) is a permanently supported option! Exactly. mount(2) has broken semantics and it’s permanently supported. If this merge request gets pulled, then FSCONFIG_CMD_CREATE will *also* be a permanently supported API with broken semantics. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 2:36 ` Andy Lutomirski @ 2018-08-24 3:13 ` Al Viro 2018-08-24 4:51 ` Andy Lutomirski 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2018-08-24 3:13 UTC (permalink / raw) To: Andy Lutomirski; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API On Thu, Aug 23, 2018 at 07:36:15PM -0700, Andy Lutomirski wrote: > > For fuck sake, mount(2) is a permanently supported option! > > Exactly. mount(2) has broken semantics and it’s permanently supported. > > If this merge request gets pulled, then FSCONFIG_CMD_CREATE will *also* be a permanently supported API with broken semantics. Oh no - mount(2) behaviour can be expressed that way! The horror... Andy, this is bullshit. You are saying that dealing with mount(2) mess of ABI (badly unorthogonal set of operations, overloading from hell, etc.) must be tied with massive rework of fs drivers. Why? It's a simple enough question and pardon me, but your "it's broken" doesn't inspire any confidence that you even understand what the current behaviour is and what its problems (they are real enough) are. I have seen a lot of handwaving from you in these threads, combined with "surely, it must work thus", followed with "it's a special case/class/whatnot" or "surely, it must be broken" every time you find something that doesn't "work thus". Bugger it; explain why we must combine untangling the existing ABI (and we *will* have to keep the existing semantics possible to express for sys_mount() sake) with this, or with anything else, for that matter. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 3:13 ` Al Viro @ 2018-08-24 4:51 ` Andy Lutomirski 2018-08-24 6:05 ` Theodore Y. Ts'o 0 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2018-08-24 4:51 UTC (permalink / raw) To: Al Viro; +Cc: David Howells, Linus Torvalds, Linux FS Devel, Linux API On Thu, Aug 23, 2018 at 8:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Aug 23, 2018 at 07:36:15PM -0700, Andy Lutomirski wrote: > >> > For fuck sake, mount(2) is a permanently supported option! >> >> Exactly. mount(2) has broken semantics and it’s permanently supported. >> >> If this merge request gets pulled, then FSCONFIG_CMD_CREATE will *also* be a permanently supported API with broken semantics. > > Oh no - mount(2) behaviour can be expressed that way! The horror... > > Andy, this is bullshit. You are saying that dealing with mount(2) > mess of ABI (badly unorthogonal set of operations, overloading from > hell, etc.) must be tied with massive rework of fs drivers. Why? > When this was reviewed earlier, a problem was identified. I asked if it had been addressed. I did *not* say that it was mandatory to address it, nor did I say anything about reworking fs drivers. A reasonable answer might have been "avoiding this pitfall in the new API would involve a large amount of reworking of existing filesystem drivers. I think that the new API, as is, has enough benefits that it makes sense to merge it even with this pitfall, and, if needed, we can introduce an improved version down the road." I had at least hoped for a better answer than "bugger it." --Andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 4:51 ` Andy Lutomirski @ 2018-08-24 6:05 ` Theodore Y. Ts'o 2018-08-24 8:38 ` Miklos Szeredi 2018-08-24 9:45 ` David Howells 0 siblings, 2 replies; 32+ messages in thread From: Theodore Y. Ts'o @ 2018-08-24 6:05 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, David Howells, Linus Torvalds, Linux FS Devel, Linux API On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote: > When this was reviewed earlier, a problem was identified. I asked if > it had been addressed. I did *not* say that it was mandatory to > address it, nor did I say anything about reworking fs drivers. > > A reasonable answer might have been "avoiding this pitfall in the new > API would involve a large amount of reworking of existing filesystem > drivers. I think that the new API, as is, has enough benefits that it > makes sense to merge it even with this pitfall, and, if needed, we can > introduce an improved version down the road." It's also not clear that an API that you think is "cleaner" would actually be more usable. In fact, I believe it's going to be a sh*t show for userspace, because it won't be obvious what will work, and what will cause an error of the form, "sorry we can't do this cleaner thing that some people think is better". Which means a huge amount of special casing in the program, or a lot of very surprising failures that will then get exposed to the system administrator, many of whom haven't really had much of a problem with the existing mount(8) user interface. It may very well be that the "cleaner" approach will be hellacious from a human usability perspective. Figuring all of this out could take months and months (stalling the new mount API for a long time), and we may never know for sure until we implement a full prototype of kernel changes, massive rewrite of the file systems, and userspace programs --- and then see what is the best way to expose the radically different semantics depending on what individual file systems can and can not do to the poor human system administrator who is just trying to mount the d*mned file system. - Ted ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 6:05 ` Theodore Y. Ts'o @ 2018-08-24 8:38 ` Miklos Szeredi 2018-08-24 8:56 ` Miklos Szeredi 2018-08-24 9:45 ` David Howells 1 sibling, 1 reply; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 8:38 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Andy Lutomirski, Al Viro, David Howells, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 8:05 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote: > On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote: >> When this was reviewed earlier, a problem was identified. I asked if >> it had been addressed. I did *not* say that it was mandatory to >> address it, nor did I say anything about reworking fs drivers. >> >> A reasonable answer might have been "avoiding this pitfall in the new >> API would involve a large amount of reworking of existing filesystem >> drivers. I think that the new API, as is, has enough benefits that it >> makes sense to merge it even with this pitfall, and, if needed, we can >> introduce an improved version down the road." > > It's also not clear that an API that you think is "cleaner" would > actually be more usable. In fact, I believe it's going to be a sh*t > show for userspace, because it won't be obvious what will work, and > what will cause an error of the form, "sorry we can't do this cleaner > thing that some people think is better". Which means a huge amount of > special casing in the program, or a lot of very surprising failures > that will then get exposed to the system administrator, many of whom > haven't really had much of a problem with the existing mount(8) user > interface. In what way is the kernel better suited to read the mind of the poor sysadmin, than a userspace helper program? No, I don't think this argument is about the mount(8) interface, which is what the sysadmin is using and can continue to use for the foreseeable future, with basically the same feature set It's about new uses of the mount(2) API that *would* be possible if it was a saner interface. And so doing the new mount API radically different than what current mount(2) is doing is quite all right in my opinion. > It may very well be that the "cleaner" approach will be hellacious > from a human usability perspective. Figuring all of this out could > take months and months (stalling the new mount API for a long time), > and we may never know for sure until we implement a full prototype of > kernel changes, massive rewrite of the file systems, and userspace > programs --- and then see what is the best way to expose the radically > different semantics depending on what individual file systems can and > can not do to the poor human system administrator who is just trying > to mount the d*mned file system. I'm not against merging this patchset (have nits about resuing the MS_* constants for the new API that I've complained about, but that's really easy to fix before -final), but it may make sense to differentiate the legacy behavior from a saner one from the very start. I.e. rename FSCONFIG_CMD_CREATE to something implying it's actually *not* a create in exclusive sense that one would first imply (and yeah, we have creat() and O_CREAT, which don't imply exclusivity, yet they at least have clear semantics that current super block creation does definitely not). Also, we don't have a massive conversion of filesystems yet to the new internal API, so we could require anything converted to support the exclusive create semantics. None of this requires months of figuring out, just a bit of tweaking and acknowledging that the current semantics are weird, to say the least. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 8:38 ` Miklos Szeredi @ 2018-08-24 8:56 ` Miklos Szeredi 2018-08-24 9:29 ` Miklos Szeredi 0 siblings, 1 reply; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 8:56 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Andy Lutomirski, Al Viro, David Howells, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 10:38 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Aug 24, 2018 at 8:05 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote: >> On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote: >>> When this was reviewed earlier, a problem was identified. I asked if >>> it had been addressed. I did *not* say that it was mandatory to >>> address it, nor did I say anything about reworking fs drivers. >>> >>> A reasonable answer might have been "avoiding this pitfall in the new >>> API would involve a large amount of reworking of existing filesystem >>> drivers. I think that the new API, as is, has enough benefits that it >>> makes sense to merge it even with this pitfall, and, if needed, we can >>> introduce an improved version down the road." >> >> It's also not clear that an API that you think is "cleaner" would >> actually be more usable. In fact, I believe it's going to be a sh*t >> show for userspace, because it won't be obvious what will work, and >> what will cause an error of the form, "sorry we can't do this cleaner >> thing that some people think is better". Which means a huge amount of >> special casing in the program, or a lot of very surprising failures >> that will then get exposed to the system administrator, many of whom >> haven't really had much of a problem with the existing mount(8) user >> interface. > > In what way is the kernel better suited to read the mind of the poor > sysadmin, than a userspace helper program? I have a concrete example: mount -oloop. You can leave it to mount(8) to automagically find an existing loop device or setup a new one, or you can do the low level thing and set up your own loop device and mount it. Same story as NFS and friends, except it's not the kernel that does the magic "same source -> same sb" policy but the mount(8) utility. Ever notice the difference? See? And yeah, there are races involved, and userspace is perfectly suited to deal with them. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 8:56 ` Miklos Szeredi @ 2018-08-24 9:29 ` Miklos Szeredi 0 siblings, 0 replies; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 9:29 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Andy Lutomirski, Al Viro, David Howells, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 10:56 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Aug 24, 2018 at 10:38 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Fri, Aug 24, 2018 at 8:05 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote: >>> On Thu, Aug 23, 2018 at 09:51:00PM -0700, Andy Lutomirski wrote: >>>> When this was reviewed earlier, a problem was identified. I asked if >>>> it had been addressed. I did *not* say that it was mandatory to >>>> address it, nor did I say anything about reworking fs drivers. >>>> >>>> A reasonable answer might have been "avoiding this pitfall in the new >>>> API would involve a large amount of reworking of existing filesystem >>>> drivers. I think that the new API, as is, has enough benefits that it >>>> makes sense to merge it even with this pitfall, and, if needed, we can >>>> introduce an improved version down the road." >>> >>> It's also not clear that an API that you think is "cleaner" would >>> actually be more usable. In fact, I believe it's going to be a sh*t >>> show for userspace, because it won't be obvious what will work, and >>> what will cause an error of the form, "sorry we can't do this cleaner >>> thing that some people think is better". Which means a huge amount of >>> special casing in the program, or a lot of very surprising failures >>> that will then get exposed to the system administrator, many of whom >>> haven't really had much of a problem with the existing mount(8) user >>> interface. >> >> In what way is the kernel better suited to read the mind of the poor >> sysadmin, than a userspace helper program? > > I have a concrete example: mount -oloop. You can leave it to > mount(8) to automagically find an existing loop device or setup a new > one, or you can do the low level thing and set up your own loop device > and mount it. Same story as NFS and friends, except it's not the > kernel that does the magic "same source -> same sb" policy but the > mount(8) utility. Ever notice the difference? See? And yeah, there > are races involved, and userspace is perfectly suited to deal with > them. And to continue from that thought, a namespace for filesystem instances could, for example, live under /dev/fs/$FSTYPE/$INSTANCE Naming the $INSTANCE could be done by the one creating that instance, or in case of legacy mounts could have a fixed prefix + sequence number, or something similar. All this could come later, let's just not exclude the possibility of using the new API in this way. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 6:05 ` Theodore Y. Ts'o 2018-08-24 8:38 ` Miklos Szeredi @ 2018-08-24 9:45 ` David Howells 2018-08-24 10:06 ` Miklos Szeredi ` (3 more replies) 1 sibling, 4 replies; 32+ messages in thread From: David Howells @ 2018-08-24 9:45 UTC (permalink / raw) To: Miklos Szeredi Cc: dhowells, Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds, Linux FS Devel, Linux API Miklos Szeredi <miklos@szeredi.hu> wrote: > I'm not against merging this patchset (have nits about resuing the > MS_* constants for the new API that I've complained about, but that's > really easy to fix before -final), Can you send me a patch that does what you want here? They're only used by fsmount() for creating a vfsmount and not used for the superblock creation or reconfiguration. > but it may make sense to differentiate the legacy behavior from a saner one > from the very start. I.e. rename FSCONFIG_CMD_CREATE I would suggest leaving it as-is and add an FSCONFIG_CMD_CREATE_EXCLUSIVE. > to something implying it's actually *not* a create in exclusive sense that > one would first imply (and yeah, we have creat() and O_CREAT, which don't > imply exclusivity, yet they at least have clear semantics that current super > block creation does definitely not). The problem is that "exclusivity" isn't necessarily an easy thing to define. Take nfs4 and btrfs for example. They creating a backing superblock that the actual node is derived from (though in different ways). How do you define what "exclusive" means in their case? David ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 9:45 ` David Howells @ 2018-08-24 10:06 ` Miklos Szeredi 2018-08-24 14:18 ` Miklos Szeredi ` (2 subsequent siblings) 3 siblings, 0 replies; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 10:06 UTC (permalink / raw) To: David Howells Cc: Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 11:45 AM, David Howells <dhowells@redhat.com> wrote: > Miklos Szeredi <miklos@szeredi.hu> wrote: > >> I'm not against merging this patchset (have nits about resuing the >> MS_* constants for the new API that I've complained about, but that's >> really easy to fix before -final), > > Can you send me a patch that does what you want here? They're only used by > fsmount() for creating a vfsmount and not used for the superblock creation or > reconfiguration. > >> but it may make sense to differentiate the legacy behavior from a saner one >> from the very start. I.e. rename FSCONFIG_CMD_CREATE > > I would suggest leaving it as-is and add an FSCONFIG_CMD_CREATE_EXCLUSIVE. > >> to something implying it's actually *not* a create in exclusive sense that >> one would first imply (and yeah, we have creat() and O_CREAT, which don't >> imply exclusivity, yet they at least have clear semantics that current super >> block creation does definitely not). > > The problem is that "exclusivity" isn't necessarily an easy thing to define. > Take nfs4 and btrfs for example. They creating a backing superblock that the > actual node is derived from (though in different ways). How do you define > what "exclusive" means in their case? Exclusive creation of a filesystem instance is just that: we promise that no one else is using that filesystem instance, and we can set options as we like, guaranteed that no bother to any other uses. Now, with block filesystems that can only be done once, obviously, so if someone already created a btrfs instance using /dev/foobar then trying to create another instance will fail with EEXIST, EBUSY or whaterver. For NFS it's a different story, it *is* possible to create several instances of the filesystem even if connecting to the same server, except then they will not share caches, will not have local coherency, etc.. Regardless, I think a "filesystem instance" is a pretty clear concept, that we can base a future API on and it's not just an internal implementation detail that the current mount(2) API makes look like. It *is* in fact the only concept that makes FSCONFIG_CMD_RECONFIGURE have sane meaning (i.e. I want to change *that* instance, and am fully aware what affect that will have on mounts of that instance). Compare that to "mount -o remount path"; that one would imply we are changing something for a mount that is at "path", but if there are more than one users of that instance (which is based on magic algorithm inside the filesystem) than that might have unwanted side effects. It's what Eric's complaint is about, and it's I think the root of the issue at hand. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 9:45 ` David Howells 2018-08-24 10:06 ` Miklos Szeredi @ 2018-08-24 14:18 ` Miklos Szeredi 2018-08-24 14:26 ` Karel Zak 2018-08-24 14:26 ` David Howells 2018-08-24 14:49 ` Andy Lutomirski 3 siblings, 1 reply; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 14:18 UTC (permalink / raw) To: David Howells Cc: Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 10:45:31AM +0100, David Howells wrote: > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > I'm not against merging this patchset (have nits about resuing the > > MS_* constants for the new API that I've complained about, but that's > > really easy to fix before -final), > > Can you send me a patch that does what you want here? They're only used by > fsmount() for creating a vfsmount and not used for the superblock creation or > reconfiguration. I'm bothered by the fact that we use the same MS_ prefix in the old mount(2) api and the new fsmount(2) api. What happens if we introduce new flags for fsmount(2) and are already out of flags for mount(2)? I see a big mess that way. Also notice, how the old code just totally ignored MS_RELATIME? Bit of rot in the new interface already. So here's a patch. The MNT_ prefix is already used by libmount, and we don't want any confusion arising from that. So how about M_*? Short and sweet, just like O_* for open(2). Reusing the same values as MNT_ isn't important, but it does reduce the code size a bit. Especially if we'd switch to MNT_STRICTATIME internally as well. Thanks, Miklos --- diff --git a/fs/namespace.c b/fs/namespace.c index e34e3fd064b0..a4b8503f80b8 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3351,7 +3351,7 @@ EXPORT_SYMBOL_GPL(kern_mount); * Create a kernel mount representation for a new, prepared superblock * (specified by fs_fd) and attach to an open_tree-like file descriptor. */ -SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags) +SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, m_flags) { struct fs_context *fc; struct file *file; @@ -3366,27 +3366,21 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags if ((flags & ~(FSMOUNT_CLOEXEC)) != 0) return -EINVAL; - if (ms_flags & ~(MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME | - MS_STRICTATIME)) + if (m_flags & ~(M_NOSUID | M_NODEV | M_NOEXEC | M_NOATIME | + M_NODIRATIME | M_STRICTATIME | M_RDONLY)) return -EINVAL; - if (ms_flags & MS_RDONLY) - mnt_flags |= MNT_READONLY; - if (ms_flags & MS_NOSUID) - mnt_flags |= MNT_NOSUID; - if (ms_flags & MS_NODEV) - mnt_flags |= MNT_NODEV; - if (ms_flags & MS_NOEXEC) - mnt_flags |= MNT_NOEXEC; - if (ms_flags & MS_NODIRATIME) - mnt_flags |= MNT_NODIRATIME; + BUILD_BUG_ON(M_NOSUID != MNT_NOSUID || M_NODEV != MNT_NODEV || + M_NOEXEC != MNT_NOEXEC || M_NOATIME != MNT_NOATIME || + M_NODIRATIME != MNT_NODIRATIME || + M_STRICTATIME != MNT_RELATIME || M_RDONLY != MNT_READONLY); + + /* Relatime is the default, but internally that's what we flag */ + mnt_flags = m_flags & ~M_STRICTATIME; - if (ms_flags & MS_STRICTATIME) { - if (ms_flags & MS_NOATIME) + if (m_flags & M_STRICTATIME) { + if (m_flags & M_NOATIME) return -EINVAL; - } else if (ms_flags & MS_NOATIME) { - mnt_flags |= MNT_NOATIME; } else { mnt_flags |= MNT_RELATIME; } diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 3634e065836c..331e69f5a7bf 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -1,6 +1,15 @@ #ifndef _UAPI_LINUX_MOUNT_H #define _UAPI_LINUX_MOUNT_H +/* Mount flags passed to fsmount(2) */ +#define M_NOSUID 0x01 +#define M_NODEV 0x02 +#define M_NOEXEC 0x04 +#define M_NOATIME 0x08 +#define M_NODIRATIME 0x10 +#define M_STRICTATIME 0x20 +#define M_RDONLY 0x40 + /* * These are the fs-independent mount-flags: up to 32 flags are supported * ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 14:18 ` Miklos Szeredi @ 2018-08-24 14:26 ` Karel Zak 0 siblings, 0 replies; 32+ messages in thread From: Karel Zak @ 2018-08-24 14:26 UTC (permalink / raw) To: Miklos Szeredi Cc: David Howells, Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 04:18:10PM +0200, Miklos Szeredi wrote: > I'm bothered by the fact that we use the same MS_ prefix in the old mount(2) api > and the new fsmount(2) api. What happens if we introduce new flags for > fsmount(2) and are already out of flags for mount(2)? I see a big mess that > way. > > Also notice, how the old code just totally ignored MS_RELATIME? Bit of rot in > the new interface already. > > So here's a patch. The MNT_ prefix is already used by libmount, and we don't > want any confusion arising from that. So how about M_*? Short and sweet, just > like O_* for open(2). +1 Please. -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 9:45 ` David Howells 2018-08-24 10:06 ` Miklos Szeredi 2018-08-24 14:18 ` Miklos Szeredi @ 2018-08-24 14:26 ` David Howells 2018-08-24 14:30 ` Miklos Szeredi 2018-08-24 14:49 ` Andy Lutomirski 3 siblings, 1 reply; 32+ messages in thread From: David Howells @ 2018-08-24 14:26 UTC (permalink / raw) To: Miklos Szeredi Cc: dhowells, Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds, Linux FS Devel, Linux API Miklos Szeredi <miklos@szeredi.hu> wrote: > +/* Mount flags passed to fsmount(2) */ > +#define M_NOSUID 0x01 > +#define M_NODEV 0x02 > +#define M_NOEXEC 0x04 > +#define M_NOATIME 0x08 > +#define M_NODIRATIME 0x10 > +#define M_STRICTATIME 0x20 > +#define M_RDONLY 0x40 If we're going to do this, I would suggest a longer prefix than just 'M' and renumber them to put *_RDONLY first. > + BUILD_BUG_ON(M_NOSUID != MNT_NOSUID || M_NODEV != MNT_NODEV || > + M_NOEXEC != MNT_NOEXEC || M_NOATIME != MNT_NOATIME || > + M_NODIRATIME != MNT_NODIRATIME || > + M_STRICTATIME != MNT_RELATIME || M_RDONLY != MNT_READONLY); Please don't, please do: if (ms_flags & M_RDONLY) mnt_flags |= MNT_READONLY; Yes, and at some point I'd also like to compress the numbering on the SB_* constants and break the identity as Christoph suggested. David ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 14:26 ` David Howells @ 2018-08-24 14:30 ` Miklos Szeredi 0 siblings, 0 replies; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 14:30 UTC (permalink / raw) To: David Howells Cc: Theodore Y. Ts'o, Andy Lutomirski, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 4:26 PM, David Howells <dhowells@redhat.com> wrote: > Miklos Szeredi <miklos@szeredi.hu> wrote: > >> +/* Mount flags passed to fsmount(2) */ >> +#define M_NOSUID 0x01 >> +#define M_NODEV 0x02 >> +#define M_NOEXEC 0x04 >> +#define M_NOATIME 0x08 >> +#define M_NODIRATIME 0x10 >> +#define M_STRICTATIME 0x20 >> +#define M_RDONLY 0x40 > > If we're going to do this, I would suggest a longer prefix than just 'M' Any suggestions? We could do "MOUNT_", but that sounds tad too long. But hey, that doesn't matter much either. Let me know your preference. > and > renumber them to put *_RDONLY first. > >> + BUILD_BUG_ON(M_NOSUID != MNT_NOSUID || M_NODEV != MNT_NODEV || >> + M_NOEXEC != MNT_NOEXEC || M_NOATIME != MNT_NOATIME || >> + M_NODIRATIME != MNT_NODIRATIME || >> + M_STRICTATIME != MNT_RELATIME || M_RDONLY != MNT_READONLY); > > Please don't, please do: > > if (ms_flags & M_RDONLY) > mnt_flags |= MNT_READONLY; As I said, I don't really care either way. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 9:45 ` David Howells ` (2 preceding siblings ...) 2018-08-24 14:26 ` David Howells @ 2018-08-24 14:49 ` Andy Lutomirski 2018-08-24 15:02 ` Miklos Szeredi 3 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2018-08-24 14:49 UTC (permalink / raw) To: David Howells Cc: Miklos Szeredi, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API > On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote: > > > The problem is that "exclusivity" isn't necessarily an easy thing to define. > Take nfs4 and btrfs for example. They creating a backing superblock that the > actual node is derived from (though in different ways). How do you define > what "exclusive" means in their case? > I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.” This means, at least, that all options need to be respected as specified (or fail if they can’t be) and future reconfigure/remount actions only affect *this* instance and not others. NFS might already meet these requirements, although I’m not sure about the latter one. But it might be that adding CREATE_EXCLUSIVE is not all that useful and it would be better to have an API like you sketched where you first explicitly instantiate the driver and then you create however many trees (a la fsmount()) you want, where each tree has its own set of per-tree options like “subvol” for btrfs. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 14:49 ` Andy Lutomirski @ 2018-08-24 15:02 ` Miklos Szeredi 2018-08-24 15:09 ` Andy Lutomirski 2018-08-24 17:10 ` David Howells 0 siblings, 2 replies; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 15:02 UTC (permalink / raw) To: Andy Lutomirski Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > >> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote: > >> >> >> The problem is that "exclusivity" isn't necessarily an easy thing to define. >> Take nfs4 and btrfs for example. They creating a backing superblock that the >> actual node is derived from (though in different ways). How do you define >> what "exclusive" means in their case? >> > > I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.” The only way to get semantically equivalent instance is to create a fully independent instance. See reconfigure (nee remount). Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 15:02 ` Miklos Szeredi @ 2018-08-24 15:09 ` Andy Lutomirski 2018-08-24 17:08 ` Miklos Szeredi 2018-08-24 17:10 ` David Howells 1 sibling, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2018-08-24 15:09 UTC (permalink / raw) To: Miklos Szeredi Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API > On Aug 24, 2018, at 8:02 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > >> On Fri, Aug 24, 2018 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> >>>> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote: >>> >>> >>> >>> The problem is that "exclusivity" isn't necessarily an easy thing to define. >>> Take nfs4 and btrfs for example. They creating a backing superblock that the >>> actual node is derived from (though in different ways). How do you define >>> what "exclusive" means in their case? >>> >> >> I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.” > > The only way to get semantically equivalent instance is to create a > fully independent instance. See reconfigure (nee remount). > > Hmm. Is it that case in the current patchset that you can do CMD_CREATE and reconfigure the result and some *other* existing mount will change? If so, that’s rather unfriendly to users. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 15:09 ` Andy Lutomirski @ 2018-08-24 17:08 ` Miklos Szeredi 0 siblings, 0 replies; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 17:08 UTC (permalink / raw) To: Andy Lutomirski Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 5:09 PM, Andy Lutomirski <luto@amacapital.net> wrote: > >> On Aug 24, 2018, at 8:02 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> >>> On Fri, Aug 24, 2018 at 4:49 PM, Andy Lutomirski <luto@amacapital.net> wrote: >>> >>> >>>>> On Aug 24, 2018, at 2:45 AM, David Howells <dhowells@redhat.com> wrote: >>>> >>>> >>>> >>>> The problem is that "exclusivity" isn't necessarily an easy thing to define. >>>> Take nfs4 and btrfs for example. They creating a backing superblock that the >>>> actual node is derived from (though in different ways). How do you define >>>> what "exclusive" means in their case? >>>> >>> >>> I would argue that “exclusive” means “semantically equivalent to getting a fully independent instance.” >> >> The only way to get semantically equivalent instance is to create a >> fully independent instance. See reconfigure (nee remount). >> >> > > Hmm. Is it that case in the current patchset that you can do CMD_CREATE and reconfigure the result and some *other* existing mount will change? If so, that’s rather unfriendly to users. Exactly. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 15:02 ` Miklos Szeredi 2018-08-24 15:09 ` Andy Lutomirski @ 2018-08-24 17:10 ` David Howells 2018-08-24 17:43 ` Andy Lutomirski 1 sibling, 1 reply; 32+ messages in thread From: David Howells @ 2018-08-24 17:10 UTC (permalink / raw) To: Andy Lutomirski Cc: dhowells, Miklos Szeredi, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API Andy Lutomirski <luto@amacapital.net> wrote: > Hmm. Is it that case in the current patchset that you can do CMD_CREATE and > reconfigure the result and some *other* existing mount will change? If so, > that’s rather unfriendly to users. The default behaviour has to be the same as mount(2). David ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 17:10 ` David Howells @ 2018-08-24 17:43 ` Andy Lutomirski 2018-08-24 19:25 ` Miklos Szeredi 0 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2018-08-24 17:43 UTC (permalink / raw) To: David Howells Cc: Miklos Szeredi, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote: > Andy Lutomirski <luto@amacapital.net> wrote: > >> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and >> reconfigure the result and some *other* existing mount will change? If so, >> that’s rather unfriendly to users. > > The default behaviour has to be the same as mount(2). > Why? Obviously mount(2) needs to keep working the way it does now. Most likely mount(8) also needs to retain its current behavior. But I don't see why the new API needs be bug-compatible with mount(2). --Andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 17:43 ` Andy Lutomirski @ 2018-08-24 19:25 ` Miklos Szeredi 2018-08-24 19:51 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Miklos Szeredi @ 2018-08-24 19:25 UTC (permalink / raw) To: Andy Lutomirski Cc: David Howells, Theodore Y. Ts'o, Al Viro, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 7:43 PM, Andy Lutomirski <luto@kernel.org> wrote: > On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote: >> Andy Lutomirski <luto@amacapital.net> wrote: >> >>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and >>> reconfigure the result and some *other* existing mount will change? If so, >>> that’s rather unfriendly to users. >> >> The default behaviour has to be the same as mount(2). This is rubbish. Anyone wanting the mount(2) behavior can use mount(2). About exclusive create: can't we just look at the active reference count of the superblock returned by ->get_tree() (if it's one, we are the only users, i.e. the create was exclusive)? Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 19:25 ` Miklos Szeredi @ 2018-08-24 19:51 ` Al Viro 2018-08-29 12:32 ` Miklos Szeredi 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2018-08-24 19:51 UTC (permalink / raw) To: Miklos Szeredi Cc: Andy Lutomirski, David Howells, Theodore Y. Ts'o, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 09:25:58PM +0200, Miklos Szeredi wrote: > On Fri, Aug 24, 2018 at 7:43 PM, Andy Lutomirski <luto@kernel.org> wrote: > > On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote: > >> Andy Lutomirski <luto@amacapital.net> wrote: > >> > >>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and > >>> reconfigure the result and some *other* existing mount will change? If so, > >>> that’s rather unfriendly to users. > >> > >> The default behaviour has to be the same as mount(2). > > This is rubbish. Anyone wanting the mount(2) behavior can use mount(2). > > About exclusive create: can't we just look at the active reference > count of the superblock returned by ->get_tree() (if it's one, we are > the only users, i.e. the create was exclusive)? Let me get it straight - your default behaviour would routinely refuse NFS mount simply because somebody has already mounted from the same server? The main reason for keeping existing semantics is very, very simple: nobody has offered a sane replacement. For all warts (and I admit that policy re sharing turned out to be rather bad for any kind of situation with non-cooperative admins - in partial defense, back in 2000/2001 when it was done anybody talking about something like userns would've gotten laughed at, for a lot of good reasons) mount(2) semantics is defined *and* needs to be supported anyway. All suggested "better replacements" were bloody problematic. Sure, we'll need to sort that out, but again, why the hell tie that to untangling the sodding mount(2) ABI? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-24 19:51 ` Al Viro @ 2018-08-29 12:32 ` Miklos Szeredi 0 siblings, 0 replies; 32+ messages in thread From: Miklos Szeredi @ 2018-08-29 12:32 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, David Howells, Theodore Y. Ts'o, Linus Torvalds, Linux FS Devel, Linux API On Fri, Aug 24, 2018 at 9:51 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Aug 24, 2018 at 09:25:58PM +0200, Miklos Szeredi wrote: >> On Fri, Aug 24, 2018 at 7:43 PM, Andy Lutomirski <luto@kernel.org> wrote: >> > On Fri, Aug 24, 2018 at 10:10 AM, David Howells <dhowells@redhat.com> wrote: >> >> Andy Lutomirski <luto@amacapital.net> wrote: >> >> >> >>> Hmm. Is it that case in the current patchset that you can do CMD_CREATE and >> >>> reconfigure the result and some *other* existing mount will change? If so, >> >>> that’s rather unfriendly to users. >> >> >> >> The default behaviour has to be the same as mount(2). >> >> This is rubbish. Anyone wanting the mount(2) behavior can use mount(2). >> >> About exclusive create: can't we just look at the active reference >> count of the superblock returned by ->get_tree() (if it's one, we are >> the only users, i.e. the create was exclusive)? > > Let me get it straight - your default behaviour would routinely refuse NFS mount > simply because somebody has already mounted from the same server? Yeah, bad idea. > The main reason for keeping existing semantics is very, very simple: nobody > has offered a sane replacement. For all warts (and I admit that policy > re sharing turned out to be rather bad for any kind of situation with > non-cooperative admins - in partial defense, back in 2000/2001 when it was > done anybody talking about something like userns would've gotten laughed at, > for a lot of good reasons) mount(2) semantics is defined *and* needs to be > supported anyway. > > All suggested "better replacements" were bloody problematic. Sure, we'll need > to sort that out, but again, why the hell tie that to untangling the sodding > mount(2) ABI? What I was primarily suggesting is pretty simple, yet nobody seems to get it: name the current behavior what it is: legacy. Call this op e.g. CMD_LEGACY_FIND_OR_CREATE. It means "you're using the shiny new API, but be warned: we'll still use the old and stinky way to get the super block". We'll hopefully untangle it, sooner rather than later, and then have ops with less scary names. Thanks, Miklos ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-23 22:31 [git pull] new mount API Al Viro 2018-08-23 23:24 ` Andy Lutomirski 2018-08-24 0:08 ` David Howells @ 2018-08-26 3:22 ` Eric W. Biederman 2018-08-26 20:42 ` David Howells ` (2 subsequent siblings) 5 siblings, 0 replies; 32+ messages in thread From: Eric W. Biederman @ 2018-08-26 3:22 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, dhowells, linux-fsdevel, linux-api Sigh. I just noticed that this tree reintroduces bugs that were fixed with cfb2f6f6e0ba ("Revert "mqueue: switch to on-demand creation of internal mount"") So in addition to ongoing discussion about what the new mount api should look like we have bugs in the implementation. This really does not look like this branch is ready to be merged yet. Eric ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-23 22:31 [git pull] new mount API Al Viro ` (2 preceding siblings ...) 2018-08-26 3:22 ` Eric W. Biederman @ 2018-08-26 20:42 ` David Howells 2018-08-26 20:46 ` David Howells 2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells 5 siblings, 0 replies; 32+ messages in thread From: David Howells @ 2018-08-26 20:42 UTC (permalink / raw) To: Eric W. Biederman Cc: dhowells, Al Viro, Linus Torvalds, linux-fsdevel, linux-api Eric W. Biederman <ebiederm@xmission.com> wrote: > I just noticed that this tree reintroduces bugs that were fixed with > cfb2f6f6e0ba ("Revert "mqueue: switch to on-demand creation of internal > mount"") > > So in addition to ongoing discussion about what the new mount api should > look like ... Which don't need to be solved *before* the current patches go in. No strong reason has been given why the desired changes cannot wait one cycle. The core needs to be upstream before we can start porting most of the filesystems to it. > ... we have bugs in the implementation. Allegedly. It would be useful if you could've pointed out what it was that you think you see. I don't see offhand what it reintroduces. And how come this hasn't been noticed since the patches have been sat in linux-next for a while? > This really does not look like this branch is ready to be merged yet. I disagree (probably obviously). David ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [git pull] new mount API 2018-08-23 22:31 [git pull] new mount API Al Viro ` (3 preceding siblings ...) 2018-08-26 20:42 ` David Howells @ 2018-08-26 20:46 ` David Howells 2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells 5 siblings, 0 replies; 32+ messages in thread From: David Howells @ 2018-08-26 20:46 UTC (permalink / raw) Cc: dhowells, Eric W. Biederman, Al Viro, Linus Torvalds, linux-fsdevel, linux-api David Howells <dhowells@redhat.com> wrote: > > ... we have bugs in the implementation. > > Allegedly. It would be useful if you could've pointed out what it was that > you think you see. I don't see offhand what it reintroduces. This? - ns->mq_mnt = kern_mount_data(&mqueue_fs_type, ns, 0); - if (IS_ERR(ns->mq_mnt)) { - int err = PTR_ERR(ns->mq_mnt); - ns->mq_mnt = NULL; - return err; - } + m = mq_create_mount(&init_ipc_ns); + if (IS_ERR(m)) + return PTR_ERR(m); + ns->mq_mnt = m; Should be passing ns into mq_create_mount() rather than init_ipc_ns maybe? David ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] mqueue: Fix bug from mount API conversion 2018-08-23 22:31 [git pull] new mount API Al Viro ` (4 preceding siblings ...) 2018-08-26 20:46 ` David Howells @ 2018-08-26 21:03 ` David Howells 2018-08-26 21:22 ` Al Viro 5 siblings, 1 reply; 32+ messages in thread From: David Howells @ 2018-08-26 21:03 UTC (permalink / raw) To: Al Viro; +Cc: dhowells, Linus Torvalds, linux-fsdevel, linux-api Hi Al, Linus, I *think* the attached patch is needed with the new mount API. David --- mqueue: Fix bug from mount API conversion Fix a possible bug from the mount API conversion of the mqueue filesystem whereby mq_init_ns() passes the root IPC namespace to mq_create_mount() rather than the namespace passed in. I don't know if this is the bug Eric is reporting since he didn't give any details. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David Howells <dhowells@redhat.com> --- diff --git a/ipc/mqueue.c b/ipc/mqueue.c index 0f102210f89e..4fbbd6f5b588 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -1618,7 +1618,7 @@ int mq_init_ns(struct ipc_namespace *ns) ns->mq_msg_default = DFLT_MSG; ns->mq_msgsize_default = DFLT_MSGSIZE; - m = mq_create_mount(&init_ipc_ns); + m = mq_create_mount(ns); if (IS_ERR(m)) return PTR_ERR(m); ns->mq_mnt = m; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH] mqueue: Fix bug from mount API conversion 2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells @ 2018-08-26 21:22 ` Al Viro 0 siblings, 0 replies; 32+ messages in thread From: Al Viro @ 2018-08-26 21:22 UTC (permalink / raw) To: David Howells; +Cc: Linus Torvalds, linux-fsdevel, linux-api On Sun, Aug 26, 2018 at 10:03:48PM +0100, David Howells wrote: > Hi Al, Linus, > > I *think* the attached patch is needed with the new mount API. > > David > --- > mqueue: Fix bug from mount API conversion > > Fix a possible bug from the mount API conversion of the mqueue filesystem > whereby mq_init_ns() passes the root IPC namespace to mq_create_mount() > rather than the namespace passed in. > > I don't know if this is the bug Eric is reporting since he didn't give any > details. Applied and pushed. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2018-08-29 16:29 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-23 22:31 [git pull] new mount API Al Viro 2018-08-23 23:24 ` Andy Lutomirski 2018-08-24 0:08 ` David Howells 2018-08-24 0:16 ` Andy Lutomirski 2018-08-24 0:31 ` Al Viro 2018-08-24 2:36 ` Andy Lutomirski 2018-08-24 3:13 ` Al Viro 2018-08-24 4:51 ` Andy Lutomirski 2018-08-24 6:05 ` Theodore Y. Ts'o 2018-08-24 8:38 ` Miklos Szeredi 2018-08-24 8:56 ` Miklos Szeredi 2018-08-24 9:29 ` Miklos Szeredi 2018-08-24 9:45 ` David Howells 2018-08-24 10:06 ` Miklos Szeredi 2018-08-24 14:18 ` Miklos Szeredi 2018-08-24 14:26 ` Karel Zak 2018-08-24 14:26 ` David Howells 2018-08-24 14:30 ` Miklos Szeredi 2018-08-24 14:49 ` Andy Lutomirski 2018-08-24 15:02 ` Miklos Szeredi 2018-08-24 15:09 ` Andy Lutomirski 2018-08-24 17:08 ` Miklos Szeredi 2018-08-24 17:10 ` David Howells 2018-08-24 17:43 ` Andy Lutomirski 2018-08-24 19:25 ` Miklos Szeredi 2018-08-24 19:51 ` Al Viro 2018-08-29 12:32 ` Miklos Szeredi 2018-08-26 3:22 ` Eric W. Biederman 2018-08-26 20:42 ` David Howells 2018-08-26 20:46 ` David Howells 2018-08-26 21:03 ` [PATCH] mqueue: Fix bug from mount API conversion David Howells 2018-08-26 21:22 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).