* [RFC] Add option to mount only a pids subset [not found] ` <20170221145746.GA31914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-03-06 23:05 ` Alexey Gladkov 2017-03-07 16:24 ` Andy Lutomirski ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-06 23:05 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley After discussion with Oleg Nesterov I reimplement my patch as an additional option for /proc. This option affects the mountpoint. It means that in one pid namespace it possible to have both the whole traditional /proc and /proc with only pids subset. However, it remains an open question about overlayfs support in the /proc. == Overview == Some of the container virtualization systems are mounted /proc inside the container. This is done in most cases to operate with information about the processes. Knowing that /proc filesystem is not fully virtualized they are mounted on top of dangerous places empty files or directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.). The structure of this filesystem is dynamic and any module can create a new object which will not necessarily be virtualized. There are proprietary modules that aren't in the mainline whose work we can not verify. This opens up a potential threat to the system. The developers of the virtualization system can't predict all dangerous places in /proc by definition. A more effective solution would be to mount into the container only what is necessary and ignore the rest. Right now there is the opportunity to pass in the container any port of the /proc filesystem using mount --bind expect the pids. This patch allows to mount only the part of /proc related to pids without rest objects. Since this is an option for /proc, flags applied to /proc have an effect on this subset of filesystem. Originally the idea was that the container will be mounted only pid sunset and additional required files will be mounted on top using the overlayfs. But I found out that /proc does not support overlayfs and does not allow to mount anything on top or under it. == TODO == There is still work to do: * Show pidonly via proc_show_options. * Add overlayfs support. --- fs/internal.h | 1 + fs/namespace.c | 9 ++++++++ fs/proc/generic.c | 4 ++++ fs/proc/inode.c | 7 +++++- fs/proc/internal.h | 8 +++++++ fs/proc/root.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++---- fs/stat.c | 4 ++++ fs/super.c | 20 +++++++++++++++++ include/linux/fs.h | 2 ++ include/linux/mount.h | 1 + 10 files changed, 113 insertions(+), 5 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 4fcf517..cb44ca7 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -88,6 +88,7 @@ extern struct file *get_empty_filp(void); * super.c */ extern int do_remount_sb(struct super_block *, int, void *, int); +extern int do_mount_sb(struct vfsmount *mnt, int flags, void *data); extern bool trylock_super(struct super_block *sb); extern struct dentry *mount_fs(struct file_system_type *, int, const char *, void *); diff --git a/fs/namespace.c b/fs/namespace.c index e6c234b..6cb2bcb 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -938,6 +938,7 @@ static struct mount *skip_mnt_tree(struct mount *p) struct vfsmount * vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data) { + int err; struct mount *mnt; struct dentry *root; @@ -962,6 +963,14 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void mnt->mnt.mnt_sb = root->d_sb; mnt->mnt_mountpoint = mnt->mnt.mnt_root; mnt->mnt_parent = mnt; + + err = do_mount_sb(&mnt->mnt, flags, data); + if(err) { + mnt_free_id(mnt); + free_vfsmnt(mnt); + return ERR_PTR(err); + } + lock_mount_hash(); list_add_tail(&mnt->mnt_instance, &root->d_sb->s_mounts); unlock_mount_hash(); diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 7eb3cef..dd3da60 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -306,6 +306,10 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file, int proc_readdir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); + struct proc_options *opts = file->f_path.mnt->fs_data; + + if (opts && opts->pid_only) + return 1; return proc_readdir_de(PDE(inode), file, ctx); } diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 783bc19..51b1712 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -108,7 +108,6 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid)); if (pid->hide_pid != 0) seq_printf(seq, ",hidepid=%u", pid->hide_pid); - return 0; } @@ -118,6 +117,8 @@ static const struct super_operations proc_sops = { .drop_inode = generic_delete_inode, .evict_inode = proc_evict_inode, .statfs = simple_statfs, + .getattr_fs = proc_getattrfs, + .mount_fs = proc_mount_cb, .remount_fs = proc_remount, .show_options = proc_show_options, }; @@ -323,6 +324,10 @@ static int proc_reg_open(struct inode *inode, struct file *file) int (*open)(struct inode *, struct file *); int (*release)(struct inode *, struct file *); struct pde_opener *pdeo; + struct proc_options *opts = file->f_path.mnt->fs_data; + + if (opts && opts->pid_only) + return -ENOENT; /* * Ensure that diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 2de5194..7fdbfee 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -267,11 +267,19 @@ static inline void proc_tty_init(void) {} /* * root.c */ +struct proc_options { + kgid_t pid_gid; + int hide_pid; + int pid_only; +}; + extern struct proc_dir_entry proc_root; extern int proc_parse_options(char *options, struct pid_namespace *pid); extern void proc_self_init(void); extern int proc_remount(struct super_block *, int *, char *); +extern int proc_mount_cb(struct vfsmount *mnt, int *flags, char *data); +extern int proc_getattrfs(struct vfsmount *, struct dentry *, struct kstat *); /* * task_[no]mmu.c diff --git a/fs/proc/root.c b/fs/proc/root.c index 4bd0373..e07f37a 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -14,6 +14,7 @@ #include <linux/stat.h> #include <linux/init.h> #include <linux/sched.h> +#include <linux/slab.h> #include <linux/module.h> #include <linux/bitops.h> #include <linux/user_namespace.h> @@ -24,16 +25,17 @@ #include "internal.h" enum { - Opt_gid, Opt_hidepid, Opt_err, + Opt_gid, Opt_hidepid, Opt_pidonly, Opt_err, }; static const match_table_t tokens = { {Opt_hidepid, "hidepid=%u"}, {Opt_gid, "gid=%u"}, + {Opt_pidonly, "pidonly"}, {Opt_err, NULL}, }; -int proc_parse_options(char *options, struct pid_namespace *pid) +static int proc_fill_options(char *options, struct proc_options *fs_opts) { char *p; substring_t args[MAX_OPT_ARGS]; @@ -53,7 +55,7 @@ int proc_parse_options(char *options, struct pid_namespace *pid) case Opt_gid: if (match_int(&args[0], &option)) return 0; - pid->pid_gid = make_kgid(current_user_ns(), option); + fs_opts->pid_gid = make_kgid(current_user_ns(), option); break; case Opt_hidepid: if (match_int(&args[0], &option)) @@ -62,7 +64,10 @@ int proc_parse_options(char *options, struct pid_namespace *pid) pr_err("proc: hidepid value must be between 0 and 2.\n"); return 0; } - pid->hide_pid = option; + fs_opts->hide_pid = option; + break; + case Opt_pidonly: + fs_opts->pid_only = 1; break; default: pr_err("proc: unrecognized mount option \"%s\" " @@ -74,6 +79,19 @@ int proc_parse_options(char *options, struct pid_namespace *pid) return 1; } +int proc_parse_options(char *options, struct pid_namespace *pid) +{ + struct proc_options opts; + + if (!proc_fill_options(options, &opts)) + return 0; + + pid->pid_gid = opts.pid_gid; + pid->hide_pid = opts.hide_pid; + + return 1; +} + int proc_remount(struct super_block *sb, int *flags, char *data) { struct pid_namespace *pid = sb->s_fs_info; @@ -82,6 +100,42 @@ int proc_remount(struct super_block *sb, int *flags, char *data) return !proc_parse_options(data, pid); } +int proc_getattrfs(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +{ + struct inode *inode = d_inode(dentry); + struct pid *pid = proc_pid(dentry->d_inode); + struct proc_options *opts = mnt->fs_data; + + if (opts && opts->pid_only && mnt->mnt_root != dentry && !pid) + return -ENOENT; + + if (!inode->i_op->getattr) { + generic_fillattr(inode, stat); + return 0; + } + + return inode->i_op->getattr(mnt, dentry, stat); +} + +int proc_mount_cb(struct vfsmount *mnt, int *flags, char *data) +{ + struct proc_options *opts; + + if (!data || *flags & MS_KERNMOUNT) + return 0; + + opts = kzalloc(sizeof(struct proc_options), GFP_KERNEL); + if (!opts) + return -ENOMEM; + + if (!proc_fill_options(data, opts)) + return -EINVAL; + + mnt->fs_data = opts; + + return 0; +} + static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { diff --git a/fs/stat.c b/fs/stat.c index bc045c7..1e26308 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -10,6 +10,7 @@ #include <linux/file.h> #include <linux/highuid.h> #include <linux/fs.h> +#include <linux/mount.h> #include <linux/namei.h> #include <linux/security.h> #include <linux/syscalls.h> @@ -53,6 +54,9 @@ int vfs_getattr_nosec(struct path *path, struct kstat *stat) { struct inode *inode = d_backing_inode(path->dentry); + if (path->mnt->mnt_sb->s_op->getattr_fs) + return path->mnt->mnt_sb->s_op->getattr_fs(path->mnt, path->dentry, stat); + if (inode->i_op->getattr) return inode->i_op->getattr(path->mnt, path->dentry, stat); diff --git a/fs/super.c b/fs/super.c index c183835..478dd5b 100644 --- a/fs/super.c +++ b/fs/super.c @@ -832,6 +832,26 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force) return retval; } +/** + * do_mount_sb - asks filesystem to finish mount with options. + * @mnt: vfsmount in question + * @flags: numeric part of options + * @data: the rest of options + * + * Alters the mount options of a mounted filesystem. + */ +int do_mount_sb(struct vfsmount *mnt, int flags, void *data) +{ + if (mnt->mnt_sb->s_writers.frozen != SB_UNFROZEN) + return -EBUSY; + + if (mnt->mnt_sb->s_op->mount_fs) { + return mnt->mnt_sb->s_op->mount_fs(mnt, &flags, data); + } + + return 0; +} + static void do_emergency_remount(struct work_struct *work) { struct super_block *sb, *p = NULL; diff --git a/include/linux/fs.h b/include/linux/fs.h index 83de8b6..5bd1b84 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1759,6 +1759,8 @@ struct super_operations { int (*thaw_super) (struct super_block *); int (*unfreeze_fs) (struct super_block *); int (*statfs) (struct dentry *, struct kstatfs *); + int (*getattr_fs) (struct vfsmount *, struct dentry *, struct kstat *); + int (*mount_fs) (struct vfsmount *, int *, char *); int (*remount_fs) (struct super_block *, int *, char *); void (*umount_begin) (struct super_block *); diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..4bd364e 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -67,6 +67,7 @@ struct vfsmount { struct dentry *mnt_root; /* root of the mounted tree */ struct super_block *mnt_sb; /* pointer to superblock */ int mnt_flags; + void *fs_data; /* fs-specific data */ }; struct file; /* forward dec */ -- 2.10.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-06 23:05 ` [RFC] Add option to mount only a pids subset Alexey Gladkov @ 2017-03-07 16:24 ` Andy Lutomirski 2017-03-09 11:26 ` Djalal Harouni [not found] ` <CALCETrVFnSoBP+LvyjN+1qUqrwgZan1nvscq5hV0Ujt_FF2e3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-07 17:49 ` Oleg Nesterov 2017-03-12 1:54 ` Al Viro 2 siblings, 2 replies; 21+ messages in thread From: Andy Lutomirski @ 2017-03-07 16:24 UTC (permalink / raw) To: Alexey Gladkov, Djalal Harouni Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Mon, Mar 6, 2017 at 3:05 PM, Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > After discussion with Oleg Nesterov I reimplement my patch as an additional > option for /proc. This option affects the mountpoint. It means that in one > pid namespace it possible to have both the whole traditional /proc and > /proc with only pids subset. > I like this. I think you should split it into two patches, though: one that reworks how procfs gets mounted and one that makes adds the new functionality. Djajal had some concerns about the first part breaking applications that use stat and expect certain behavior. This should be manageable, though, but making stat work appropriately. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-07 16:24 ` Andy Lutomirski @ 2017-03-09 11:26 ` Djalal Harouni [not found] ` <CAEiveUczqzHZG7jcM72oWXAKYZSPJ0ywYEXGDV1sn_FAhr28pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> [not found] ` <CALCETrVFnSoBP+LvyjN+1qUqrwgZan1nvscq5hV0Ujt_FF2e3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 21+ messages in thread From: Djalal Harouni @ 2017-03-09 11:26 UTC (permalink / raw) To: Andy Lutomirski Cc: Alexey Gladkov, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Tue, Mar 7, 2017 at 5:24 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > On Mon, Mar 6, 2017 at 3:05 PM, Alexey Gladkov <gladkov.alexey@gmail.com> wrote: > > > > After discussion with Oleg Nesterov I reimplement my patch as an additional > > option for /proc. This option affects the mountpoint. It means that in one > > pid namespace it possible to have both the whole traditional /proc and > > /proc with only pids subset. > > > > I like this. I think you should split it into two patches, though: > one that reworks how procfs gets mounted and one that makes adds the > new functionality. > > Djajal had some concerns about the first part breaking applications > that use stat and expect certain behavior. This should be manageable, > though, but making stat work appropriately. I'm bit lost in the two discussion, however the main concern I was discussing with Andy was if you have per superblock proc mounts then each mount will end up with its own device ID st_dev, right now they share the same ID if they are in the same pid namespace, but if we change that then we may break the following: http://man7.org/linux/man-pages/man7/namespaces.7.html Both new NS_GET_PARENT and NS_GET_USERNS ioctl() that return an fd, suggests to follow up with fstat() to identify the namespaces.. "By applying fstat(2) to the returned file descriptor, one obtains a stat structure whose st_dev (resident device) and st_ino (inode number) fields together identify the owning/parent namespace." Other /proc/self/ns/* comparison and stat() logic... Andy suggested that we may have the same st_dev for mounts in the same pid namespace... I'm not sure which side effect this may bring! Thanks! -- tixxdz ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CAEiveUczqzHZG7jcM72oWXAKYZSPJ0ywYEXGDV1sn_FAhr28pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <CAEiveUczqzHZG7jcM72oWXAKYZSPJ0ywYEXGDV1sn_FAhr28pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-03-09 20:52 ` Eric W. Biederman 2017-03-11 21:51 ` Alexey Gladkov 1 sibling, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2017-03-09 20:52 UTC (permalink / raw) To: Djalal Harouni Cc: Andy Lutomirski, Alexey Gladkov, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Pavel Emelyanov, James Bottomley Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > On Tue, Mar 7, 2017 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: >> >> On Mon, Mar 6, 2017 at 3:05 PM, Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> > >> > After discussion with Oleg Nesterov I reimplement my patch as an additional >> > option for /proc. This option affects the mountpoint. It means that in one >> > pid namespace it possible to have both the whole traditional /proc and >> > /proc with only pids subset. >> > >> >> I like this. I think you should split it into two patches, though: >> one that reworks how procfs gets mounted and one that makes adds the >> new functionality. >> >> Djajal had some concerns about the first part breaking applications >> that use stat and expect certain behavior. This should be manageable, >> though, but making stat work appropriately. > > I'm bit lost in the two discussion, however the main concern I was > discussing with Andy was if you have per superblock proc mounts then > each mount will end up with its own device ID st_dev, right now they > share the same ID if they are in the same pid namespace, but if we > change that then we may break the following: > http://man7.org/linux/man-pages/man7/namespaces.7.html > > Both new NS_GET_PARENT and NS_GET_USERNS ioctl() that return an fd, > suggests to follow up with fstat() to identify the namespaces.. > "By applying fstat(2) to the returned file descriptor, one obtains a > stat structure whose st_dev (resident device) and st_ino (inode > number) fields together identify the owning/parent namespace." > > Other /proc/self/ns/* comparison and stat() logic... > > Andy suggested that we may have the same st_dev for mounts in the same > pid namespace... I'm not sure which side effect this may bring! Well right now it is less of an issue than you image. I suggest you stat /proc and /proc/self/ns/* and look at st_dev. That said anything that changes today's proc needs to be tested with a range of distro's especially those using selinux and apparmor as they tend to have policies that are very sensitive to the implementation details. Such that seemingly innocent changes can cause systems to stop booting. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset [not found] ` <CAEiveUczqzHZG7jcM72oWXAKYZSPJ0ywYEXGDV1sn_FAhr28pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-09 20:52 ` Eric W. Biederman @ 2017-03-11 21:51 ` Alexey Gladkov 1 sibling, 0 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-11 21:51 UTC (permalink / raw) To: Djalal Harouni Cc: Andy Lutomirski, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin On Thu, Mar 09, 2017 at 12:26:49PM +0100, Djalal Harouni wrote: > I'm bit lost in the two discussion, however the main concern I was > discussing with Andy was if you have per superblock proc mounts then > each mount will end up with its own device ID st_dev, right now they > share the same ID if they are in the same pid namespace, but if we > change that then we may break the following: > http://man7.org/linux/man-pages/man7/namespaces.7.html In fact, nothing has changed. I added a parameter that affects the mountpoint, not the entire pid namespace. The procfs will still be global. The device ID will be the same as before. > Both new NS_GET_PARENT and NS_GET_USERNS ioctl() that return an fd, > suggests to follow up with fstat() to identify the namespaces.. > "By applying fstat(2) to the returned file descriptor, one obtains a > stat structure whose st_dev (resident device) and st_ino (inode > number) fields together identify the owning/parent namespace." > > Other /proc/self/ns/* comparison and stat() logic... > > Andy suggested that we may have the same st_dev for mounts in the same > pid namespace... I'm not sure which side effect this may bring! Basically we have here a issue because other proc options (hidepid for example) affect the entire pid namespace, but, I guess, have to affect the mountpoint. # grep ^proc /proc/mounts proc /proc proc rw,relatime 0 0 # mount -t proc proc /tmp/proc # mount -o remount,hidepid=2 -t proc proc /tmp/proc # grep ^proc /proc/mounts proc /proc proc rw,relatime,hidepid=2 0 0 proc /tmp/proc proc rw,relatime,hidepid=2 0 0 -- Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CALCETrVFnSoBP+LvyjN+1qUqrwgZan1nvscq5hV0Ujt_FF2e3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <CALCETrVFnSoBP+LvyjN+1qUqrwgZan1nvscq5hV0Ujt_FF2e3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-03-11 0:05 ` Alexey Gladkov 0 siblings, 0 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-11 0:05 UTC (permalink / raw) To: Andy Lutomirski Cc: Djalal Harouni, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin On Tue, Mar 07, 2017 at 08:24:12AM -0800, Andy Lutomirski wrote: > On Mon, Mar 6, 2017 at 3:05 PM, Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > After discussion with Oleg Nesterov I reimplement my patch as an additional > > option for /proc. This option affects the mountpoint. It means that in one > > pid namespace it possible to have both the whole traditional /proc and > > /proc with only pids subset. > > > > I like this. I think you should split it into two patches, though: > one that reworks how procfs gets mounted and one that makes adds the > new functionality. Sure, but first I wanted to discuss the idea. My patch isn't very useful without the ability to add additional files. I will split it into parts in the new version. -- Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-06 23:05 ` [RFC] Add option to mount only a pids subset Alexey Gladkov 2017-03-07 16:24 ` Andy Lutomirski @ 2017-03-07 17:49 ` Oleg Nesterov [not found] ` <20170307174909.GA24112-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-03-12 1:54 ` Al Viro 2 siblings, 1 reply; 21+ messages in thread From: Oleg Nesterov @ 2017-03-07 17:49 UTC (permalink / raw) To: Alexey Gladkov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Pavel Emelyanov, James Bottomley I can't really review this... but in any case I think you should split this patch to separate the vfs and proc changes. On 03/07, Alexey Gladkov wrote: > > @@ -962,6 +963,14 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void > mnt->mnt.mnt_sb = root->d_sb; > mnt->mnt_mountpoint = mnt->mnt.mnt_root; > mnt->mnt_parent = mnt; > + > + err = do_mount_sb(&mnt->mnt, flags, data); > + if(err) { > + mnt_free_id(mnt); > + free_vfsmnt(mnt); > + return ERR_PTR(err); > + } This duplicates the error handling, we do the same if mount_fs() fails. Perhaps you should move these 2 lines into cleanup block and add goto's. > +int proc_getattrfs(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > +{ > + struct inode *inode = d_inode(dentry); > + struct pid *pid = proc_pid(dentry->d_inode); > + struct proc_options *opts = mnt->fs_data; > + > + if (opts && opts->pid_only && mnt->mnt_root != dentry && !pid) > + return -ENOENT; Hmm. I don't quite understand why do we need this, and how this should work. Yes, "/bin/ls /pidonly-proc/sys" or opendir(/pidonly-proc/sys) should fail, but only because they both do stat() ? Afaics you still can do open("/pidonly-proc/sys") + getdents() and this should work ? I still think proc_dir_operations.open() makes more sense. Yes, as you pointed out we also need to update proc_sys_dir_file_operations too and may be something else... > + > + if (!inode->i_op->getattr) { > + generic_fillattr(inode, stat); > + return 0; > + } > + > + return inode->i_op->getattr(mnt, dentry, stat); > +} Oh, it would be nice to not duplicate the code from the caller, imo. Oleg. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <20170307174909.GA24112-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <20170307174909.GA24112-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-03-10 23:46 ` Alexey Gladkov 0 siblings, 0 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-10 23:46 UTC (permalink / raw) To: Oleg Nesterov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Pavel Emelyanov, James Bottomley, Dmitry V. Levin On Tue, Mar 07, 2017 at 06:49:09PM +0100, Oleg Nesterov wrote: > I can't really review this... but in any case I think you should split > this patch to separate the vfs and proc changes. > > On 03/07, Alexey Gladkov wrote: > > > > @@ -962,6 +963,14 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void > > mnt->mnt.mnt_sb = root->d_sb; > > mnt->mnt_mountpoint = mnt->mnt.mnt_root; > > mnt->mnt_parent = mnt; > > + > > + err = do_mount_sb(&mnt->mnt, flags, data); > > + if(err) { > > + mnt_free_id(mnt); > > + free_vfsmnt(mnt); > > + return ERR_PTR(err); > > + } > > This duplicates the error handling, we do the same if mount_fs() fails. > Perhaps you should move these 2 lines into cleanup block and add goto's. > > > +int proc_getattrfs(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > > +{ > > + struct inode *inode = d_inode(dentry); > > + struct pid *pid = proc_pid(dentry->d_inode); > > + struct proc_options *opts = mnt->fs_data; > > + > > + if (opts && opts->pid_only && mnt->mnt_root != dentry && !pid) > > + return -ENOENT; > > Hmm. I don't quite understand why do we need this, and how this should work. > > Yes, "/bin/ls /pidonly-proc/sys" or opendir(/pidonly-proc/sys) should fail, > but only because they both do stat() ? > > Afaics you still can do open("/pidonly-proc/sys") + getdents() and this should > work ? Yes, you're right! I thought that getattr is called always together with open(). I wanted to prevent all attempts open() for not-pid directories. > I still think proc_dir_operations.open() makes more sense. Yes, as you pointed > out we also need to update proc_sys_dir_file_operations too and may be something > else... My main task was to hide all possible direcitrices from the /proc (in pidonly mode)... even those which we do not know. In this case we can't rely on the fact that everyone will follow the rules and to properly handle open(). My current attempt was to force filesystem level check of mountpoint flag. This is necessary to avoid even the theoretical possibility of ignoring "pidonly" parameter. I guess I need to add callback to vfs_open or something to can be sure that we will not open the wrong file or directory in pidonly mode. -- Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-06 23:05 ` [RFC] Add option to mount only a pids subset Alexey Gladkov 2017-03-07 16:24 ` Andy Lutomirski 2017-03-07 17:49 ` Oleg Nesterov @ 2017-03-12 1:54 ` Al Viro [not found] ` <20170312015430.GO29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2 siblings, 1 reply; 21+ messages in thread From: Al Viro @ 2017-03-12 1:54 UTC (permalink / raw) To: Alexey Gladkov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Tue, Mar 07, 2017 at 12:05:16AM +0100, Alexey Gladkov wrote: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 83de8b6..5bd1b84 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1759,6 +1759,8 @@ struct super_operations { > int (*thaw_super) (struct super_block *); > int (*unfreeze_fs) (struct super_block *); > int (*statfs) (struct dentry *, struct kstatfs *); > + int (*getattr_fs) (struct vfsmount *, struct dentry *, struct kstat *); > + int (*mount_fs) (struct vfsmount *, int *, char *); > int (*remount_fs) (struct super_block *, int *, char *); > void (*umount_begin) (struct super_block *); > > diff --git a/include/linux/mount.h b/include/linux/mount.h > index 1172cce..4bd364e 100644 > --- a/include/linux/mount.h > +++ b/include/linux/mount.h > @@ -67,6 +67,7 @@ struct vfsmount { > struct dentry *mnt_root; /* root of the mounted tree */ > struct super_block *mnt_sb; /* pointer to superblock */ > int mnt_flags; > + void *fs_data; /* fs-specific data */ No. This is really asking for trouble - you *can't* hang fs-specific data structures off vfsmount. Lifetimes make no sense whatsoever. BTW, your patch leaks supreblock reference on failure, and is kludgy as hell wrt to what it's doing in procfs itself, but that's secondary - the quoted part is enough for a hard NAK on architectural grounds. Don't go there. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <20170312015430.GO29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <20170312015430.GO29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2017-03-12 2:13 ` Al Viro [not found] ` <20170312021257.GP29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2017-03-20 12:58 ` [RFC] Add option to mount only a pids subset Alexey Gladkov 0 siblings, 2 replies; 21+ messages in thread From: Al Viro @ 2017-03-12 2:13 UTC (permalink / raw) To: Alexey Gladkov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Sun, Mar 12, 2017 at 01:54:38AM +0000, Al Viro wrote: > On Tue, Mar 07, 2017 at 12:05:16AM +0100, Alexey Gladkov wrote: > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 83de8b6..5bd1b84 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1759,6 +1759,8 @@ struct super_operations { > > int (*thaw_super) (struct super_block *); > > int (*unfreeze_fs) (struct super_block *); > > int (*statfs) (struct dentry *, struct kstatfs *); > > + int (*getattr_fs) (struct vfsmount *, struct dentry *, struct kstat *); > > + int (*mount_fs) (struct vfsmount *, int *, char *); > > int (*remount_fs) (struct super_block *, int *, char *); > > void (*umount_begin) (struct super_block *); > > > > diff --git a/include/linux/mount.h b/include/linux/mount.h > > index 1172cce..4bd364e 100644 > > --- a/include/linux/mount.h > > +++ b/include/linux/mount.h > > @@ -67,6 +67,7 @@ struct vfsmount { > > struct dentry *mnt_root; /* root of the mounted tree */ > > struct super_block *mnt_sb; /* pointer to superblock */ > > int mnt_flags; > > + void *fs_data; /* fs-specific data */ > > No. This is really asking for trouble - you *can't* hang > fs-specific data structures off vfsmount. Lifetimes make no > sense whatsoever. > > BTW, your patch leaks supreblock reference on failure, and > is kludgy as hell wrt to what it's doing in procfs itself, but > that's secondary - the quoted part is enough for a hard NAK on > architectural grounds. Don't go there. PS: AFAICS, simple mount --bind of your pid-only mount will suddenly expose the full thing. And as for the lifetimes making no sense... note that you are simply not freeing these structures of yours. Try to handle that and you'll get a serious PITA all over the place. What are you trying to achieve, anyway? Why not add a second vfsmount pointer per pid_namespace and make it initialized on demand, at the first attempt of no-pid mount? Just have a separate no-pid instance created for those namespaces where it had been asked for, with separate superblock and dentry tree not containing anything other that pid-only parts + self + thread-self... ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <20170312021257.GP29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <20170312021257.GP29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2017-03-13 3:19 ` Andy Lutomirski [not found] ` <CALCETrVT5sfGhNomLKAephrSGj8fc81ZjGTN-Y6UwgAHngVRCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 21+ messages in thread From: Andy Lutomirski @ 2017-03-13 3:19 UTC (permalink / raw) To: Al Viro Cc: Alexey Gladkov, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Sat, Mar 11, 2017 at 6:13 PM, Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > PS: AFAICS, simple mount --bind of your pid-only mount will suddenly > expose the full thing. And as for the lifetimes making no sense... > note that you are simply not freeing these structures of yours. > Try to handle that and you'll get a serious PITA all over the > place. > > What are you trying to achieve, anyway? Why not add a second vfsmount > pointer per pid_namespace and make it initialized on demand, at the > first attempt of no-pid mount? Just have a separate no-pid instance > created for those namespaces where it had been asked for, with > separate superblock and dentry tree not containing anything other > that pid-only parts + self + thread-self... Can't we just make procfs work like most other filesystems and have each mount have its own superblock? If we need to do something funky to stat() output to keep existing userspace working, I think that's okay. As far as I can tell, proc_mnt is very nearly useless -- it seems to be used for proc_flush_task (which claims to be purely an optimization and could be preserved in the common case where there's only one relevant mount) and for sysctl_binary. For the latter, we could create proc_mnt but make actual user-initiated mounts be new superblocks anyway. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CALCETrVT5sfGhNomLKAephrSGj8fc81ZjGTN-Y6UwgAHngVRCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <CALCETrVT5sfGhNomLKAephrSGj8fc81ZjGTN-Y6UwgAHngVRCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-03-13 13:27 ` Al Viro [not found] ` <20170313132732.GR29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 0 siblings, 1 reply; 21+ messages in thread From: Al Viro @ 2017-03-13 13:27 UTC (permalink / raw) To: Andy Lutomirski Cc: Alexey Gladkov, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Sun, Mar 12, 2017 at 08:19:33PM -0700, Andy Lutomirski wrote: > On Sat, Mar 11, 2017 at 6:13 PM, Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > > PS: AFAICS, simple mount --bind of your pid-only mount will suddenly > > expose the full thing. And as for the lifetimes making no sense... > > note that you are simply not freeing these structures of yours. > > Try to handle that and you'll get a serious PITA all over the > > place. > > > > What are you trying to achieve, anyway? Why not add a second vfsmount > > pointer per pid_namespace and make it initialized on demand, at the > > first attempt of no-pid mount? Just have a separate no-pid instance > > created for those namespaces where it had been asked for, with > > separate superblock and dentry tree not containing anything other > > that pid-only parts + self + thread-self... > > Can't we just make procfs work like most other filesystems and have > each mount have its own superblock? If we need to do something funky > to stat() output to keep existing userspace working, I think that's > okay. First of all, most of the filesystems do *NOT* guarantee anything of that sort. And what's the point of having more instances than necessary, anyway? > As far as I can tell, proc_mnt is very nearly useless -- it seems to > be used for proc_flush_task (which claims to be purely an optimization > and could be preserved in the common case where there's only one > relevant mount) and for sysctl_binary. For the latter, we could > create proc_mnt but make actual user-initiated mounts be new > superblocks anyway. Again, what for? It won't salvage that kludge... It's not as if it had been hard to have separate pid-only instance created when asked for (and reused every time when we are asked for pid-only). What's the point of ever having more than two instances per pidns? IDGI... Folks, there is no one-to-one correspondence between mountpoints and superblocks. Not since 2000 or so. Just don't try to shove your per-superblock stuff into vfsmount; it simply won't work. If you want a separate instance for that thing, then just go ahead and have ->mount() decide which one to use (and whether to create a new one). All there is to it... ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <20170313132732.GR29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <20170313132732.GR29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2017-03-13 15:24 ` Andy Lutomirski [not found] ` <CALCETrXqv8VUeO6MpKWDR6DFYBgmmT0nZVezBJsimtmmQgDksw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 21+ messages in thread From: Andy Lutomirski @ 2017-03-13 15:24 UTC (permalink / raw) To: Al Viro Cc: Alexey Gladkov, Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley On Mon, Mar 13, 2017 at 6:27 AM, Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > On Sun, Mar 12, 2017 at 08:19:33PM -0700, Andy Lutomirski wrote: >> On Sat, Mar 11, 2017 at 6:13 PM, Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: >> > PS: AFAICS, simple mount --bind of your pid-only mount will suddenly >> > expose the full thing. And as for the lifetimes making no sense... >> > note that you are simply not freeing these structures of yours. >> > Try to handle that and you'll get a serious PITA all over the >> > place. >> > >> > What are you trying to achieve, anyway? Why not add a second vfsmount >> > pointer per pid_namespace and make it initialized on demand, at the >> > first attempt of no-pid mount? Just have a separate no-pid instance >> > created for those namespaces where it had been asked for, with >> > separate superblock and dentry tree not containing anything other >> > that pid-only parts + self + thread-self... >> >> Can't we just make procfs work like most other filesystems and have >> each mount have its own superblock? If we need to do something funky >> to stat() output to keep existing userspace working, I think that's >> okay. > > First of all, most of the filesystems do *NOT* guarantee anything of > that sort. And what's the point of having more instances than > necessary, anyway? I mean that, if I do: mount -t proc -o foobar none a mount -t proc -o baz none b Then I think that the second mount should create a whole new proc instance rather than just a new vfsmount. Then the options could differ, which would solve a bunch of problems. > > Again, what for? It won't salvage that kludge... It's not as if it > had been hard to have separate pid-only instance created when asked > for (and reused every time when we are asked for pid-only). What's > the point of ever having more than two instances per pidns? IDGI... I can easily procfs growing more than one interesting option. Pid-only and hidepid come to mind, and that's already six possible combinations. The current hidepid implementation is really awful. > > Folks, there is no one-to-one correspondence between mountpoints and > superblocks. Not since 2000 or so. Just don't try to shove your > per-superblock stuff into vfsmount; it simply won't work. If you > want a separate instance for that thing, then just go ahead and > have ->mount() decide which one to use (and whether to create a new > one). All there is to it... That's what I mean. I just don't see the point of going all-out in trying to reuse superblocks. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CALCETrXqv8VUeO6MpKWDR6DFYBgmmT0nZVezBJsimtmmQgDksw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* [PATCH] proc: allow to change proc mount options per mount [not found] ` <CALCETrXqv8VUeO6MpKWDR6DFYBgmmT0nZVezBJsimtmmQgDksw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-03-23 15:59 ` Djalal Harouni 0 siblings, 0 replies; 21+ messages in thread From: Djalal Harouni @ 2017-03-23 15:59 UTC (permalink / raw) To: Linux Kernel Mailing List, Andy Lutomirski, Alexey Gladkov Cc: Al Viro, Linux API, kirill-oKw7cIdHH8eLwutG50LtGA, Vasiliy Kulikov, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Kees Cook, Dongsu Park, Djalal Harouni As per the discussion with Andy, and following what Al Viro suggested maybe this can work ? the patch is still buggy on top of Linus' tree 093b995e3b Currently hidepid mount option is propagated to all proc mounts that are in the same pid namespace. This patch make it possible to have proc mounts with different options inside the same pid namespace. Since this may break userspace or code that checks/expects some device identifiers, this mode is only supported when theo ption "version=2" is passed. [Buggy patch] [tixxdz@fedora-kvm linux]$ sudo mount -t proc none /test [tixxdz@fedora-kvm linux]$ sudo mount -t proc -o hidepid=2,version=2 none /test2 [tixxdz@fedora-kvm linux]$ ls -l /proc | head -6 total 0 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 1 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 10 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 100 dr-xr-xr-x. 9 gdm gdm 0 Mar 23 17:25 1005 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 101 [tixxdz@fedora-kvm linux]$ ls -l /test | head -16 total 0 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 1 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 10 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 100 dr-xr-xr-x. 9 gdm gdm 0 Mar 23 17:25 1005 dr-xr-xr-x. 9 root root 0 Mar 23 17:25 101 [tixxdz@fedora-kvm linux]$ ls -l /test2 | head -6 total 0 dr-xr-xr-x. 9 tixxdz tixxdz 0 Mar 23 17:27 1182 dr-xr-xr-x. 9 tixxdz tixxdz 0 Mar 23 17:27 1197 dr-xr-xr-x. 9 tixxdz tixxdz 0 Mar 23 17:27 1199 dr-xr-xr-x. 9 tixxdz tixxdz 0 Mar 23 17:27 1222 dr-xr-xr-x. 9 tixxdz tixxdz 0 Mar 23 17:27 1225 Signed-off-by: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- fs/locks.c | 6 +- fs/proc/base.c | 51 +++++++++----- fs/proc/generic.c | 5 ++ fs/proc/inode.c | 10 +-- fs/proc/root.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++-- fs/proc/self.c | 8 ++- fs/proc/thread_self.c | 6 +- fs/proc_namespace.c | 14 ++-- include/linux/proc_fs.h | 9 +++ 9 files changed, 248 insertions(+), 42 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 2681132..dab5058 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2617,7 +2617,8 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl, unsigned int fl_pid; if (fl->fl_nspid) { - struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb); + struct pid_namespace *proc_pidns = fs_info->pid_ns; /* Don't let fl_pid change based on who is reading the file */ fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns); @@ -2701,7 +2702,8 @@ static int locks_show(struct seq_file *f, void *v) { struct locks_iterator *iter = f->private; struct file_lock *fl, *bfl; - struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb); + struct pid_namespace *proc_pidns = fs_info->pid_ns; fl = hlist_entry(v, struct file_lock, fl_link); diff --git a/fs/proc/base.c b/fs/proc/base.c index c87b6b9..74b389d 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -681,13 +681,24 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr) * May current process learn task's sched/cmdline info (for hide_pid_min=1) * or euid/egid (for hide_pid_min=2)? */ -static bool has_pid_permissions(struct pid_namespace *pid, +static bool has_pid_permissions(struct proc_fs_info *fs_info, struct task_struct *task, int hide_pid_min) { - if (pid->hide_pid < hide_pid_min) + int hide_pid; + kgid_t pid_gid; + int version = fs_info->version; + + if (version == 2) { + hide_pid = fs_info->hide_pid; + pid_gid = fs_info->pid_gid; + } else { + hide_pid = fs_info->pid_ns->hide_pid; + pid_gid = fs_info->pid_ns->pid_gid; + } + if (hide_pid < hide_pid_min) return true; - if (in_group_p(pid->pid_gid)) + if (in_group_p(pid_gid)) return true; return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); } @@ -695,14 +706,15 @@ static bool has_pid_permissions(struct pid_namespace *pid, static int proc_pid_permission(struct inode *inode, int mask) { - struct pid_namespace *pid = inode->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); + struct pid_namespace *pid = fs_info->pid_ns; struct task_struct *task; bool has_perms; task = get_proc_task(inode); if (!task) return -ESRCH; - has_perms = has_pid_permissions(pid, task, HIDEPID_NO_ACCESS); + has_perms = has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS); put_task_struct(task); if (!has_perms) { @@ -730,12 +742,12 @@ static const struct inode_operations proc_def_inode_operations = { static int proc_single_show(struct seq_file *m, void *v) { struct inode *inode = m->private; - struct pid_namespace *ns; struct pid *pid; struct task_struct *task; int ret; - ns = inode->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; pid = proc_pid(inode); task = get_pid_task(pid, PIDTYPE_PID); if (!task) @@ -1734,7 +1746,7 @@ int pid_getattr(const struct path *path, struct kstat *stat, { struct inode *inode = d_inode(path->dentry); struct task_struct *task; - struct pid_namespace *pid = path->dentry->d_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); generic_fillattr(inode, stat); @@ -1743,7 +1755,7 @@ int pid_getattr(const struct path *path, struct kstat *stat, stat->gid = GLOBAL_ROOT_GID; task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { - if (!has_pid_permissions(pid, task, HIDEPID_INVISIBLE)) { + if (!has_pid_permissions(fs_info, task, HIDEPID_INVISIBLE)) { rcu_read_unlock(); /* * This doesn't prevent learning whether PID exists, @@ -2249,6 +2261,8 @@ static const struct seq_operations proc_timers_seq_ops = { static int proc_timers_open(struct inode *inode, struct file *file) { struct timers_private *tp; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; tp = __seq_open_private(file, &proc_timers_seq_ops, sizeof(struct timers_private)); @@ -2256,7 +2270,7 @@ static int proc_timers_open(struct inode *inode, struct file *file) return -ENOMEM; tp->pid = proc_pid(inode); - tp->ns = inode->i_sb->s_fs_info; + tp->ns = ns; return 0; } @@ -3077,13 +3091,13 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign int result = -ENOENT; struct task_struct *task; unsigned tgid; - struct pid_namespace *ns; + struct proc_fs_info *fs_info = proc_sb(dir->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; tgid = name_to_int(&dentry->d_name); if (tgid == ~0U) goto out; - ns = dentry->d_sb->s_fs_info; rcu_read_lock(); task = find_task_by_pid_ns(tgid, ns); if (task) @@ -3147,7 +3161,8 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite int proc_pid_readdir(struct file *file, struct dir_context *ctx) { struct tgid_iter iter; - struct pid_namespace *ns = file_inode(file)->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(file_inode(file)->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; loff_t pos = ctx->pos; if (pos >= PID_MAX_LIMIT + TGID_OFFSET) @@ -3174,7 +3189,7 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx) int len; cond_resched(); - if (!has_pid_permissions(ns, iter.task, HIDEPID_INVISIBLE)) + if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE)) continue; len = snprintf(name, sizeof(name), "%d", iter.tgid); @@ -3371,7 +3386,8 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry struct task_struct *task; struct task_struct *leader = get_proc_task(dir); unsigned tid; - struct pid_namespace *ns; + struct proc_fs_info *fs_info = proc_sb(dentry->d_sb); + struct pid_namespace *ns = fs_info->pid_ns; if (!leader) goto out_no_task; @@ -3380,7 +3396,6 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry if (tid == ~0U) goto out; - ns = dentry->d_sb->s_fs_info; rcu_read_lock(); task = find_task_by_pid_ns(tid, ns); if (task) @@ -3482,7 +3497,8 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); struct task_struct *task; - struct pid_namespace *ns; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; int tid; if (proc_inode_is_dead(inode)) @@ -3494,7 +3510,6 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx) /* f_version caches the tgid value that the last readdir call couldn't * return. lseek aka telldir automagically resets f_version to 0. */ - ns = inode->i_sb->s_fs_info; tid = (int)file->f_version; file->f_version = 0; for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns); diff --git a/fs/proc/generic.c b/fs/proc/generic.c index ee27feb..49c8cb9 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -28,6 +28,11 @@ static DEFINE_RWLOCK(proc_subdir_lock); +struct proc_fs_info *proc_sb(struct super_block *sb) +{ + return sb->s_fs_info; +} + static int proc_match(unsigned int len, const char *name, struct proc_dir_entry *de) { if (len < de->namelen) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 2cc7a80..4743943 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -103,7 +103,8 @@ void __init proc_init_inodecache(void) static int proc_show_options(struct seq_file *seq, struct dentry *root) { struct super_block *sb = root->d_sb; - struct pid_namespace *pid = sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(sb); + struct pid_namespace *pid = fs_info->pid_ns; if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID)) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid)); @@ -473,11 +474,12 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) int proc_fill_super(struct super_block *s, void *data, int silent) { - struct pid_namespace *ns = get_pid_ns(s->s_fs_info); + struct proc_fs_info *fs_info = proc_sb(s); + struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns); struct inode *root_inode; int ret; - if (!proc_parse_options(data, ns)) + if (fs_info->version == 1 && !proc_parse_options(data, ns)) return -EINVAL; /* User space would break if executables or devices appear on proc */ @@ -495,7 +497,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent) * top of it */ s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH; - + pde_get(&proc_root); root_inode = proc_get_inode(s, &proc_root); if (!root_inode) { diff --git a/fs/proc/root.c b/fs/proc/root.c index deecb39..d4047ef 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -15,6 +15,7 @@ #include <linux/init.h> #include <linux/sched.h> #include <linux/sched/stat.h> +#include <linux/slab.h> #include <linux/module.h> #include <linux/bitops.h> #include <linux/user_namespace.h> @@ -26,12 +27,13 @@ #include "internal.h" enum { - Opt_gid, Opt_hidepid, Opt_err, + Opt_gid, Opt_hidepid, Opt_version ,Opt_err, }; static const match_table_t tokens = { {Opt_hidepid, "hidepid=%u"}, {Opt_gid, "gid=%u"}, + {Opt_version, "version=%u"}, {Opt_err, NULL}, }; @@ -67,6 +69,8 @@ int proc_parse_options(char *options, struct pid_namespace *pid) } pid->hide_pid = option; break; + case Opt_version: + break; default: pr_err("proc: unrecognized mount option \"%s\" " "or missing value\n", p); @@ -77,40 +81,205 @@ int proc_parse_options(char *options, struct pid_namespace *pid) return 1; } +struct proc_options { + int version; /* version field auto set to 1 to not break userspace */ + kgid_t pid_gid; + int hide_pid; +}; + +int proc_parse_early_options(char *options, void *holder, + struct proc_options *fs_options) +{ + char *p, *opts, *orig; + substring_t args[MAX_OPT_ARGS]; + int option, ret = 0; + + if (!options) + return 0; + + opts = kstrdup(options, GFP_KERNEL); + if (!opts) + return -ENOMEM; + + orig = opts; + + while ((p = strsep(&opts, ",")) != NULL) { + int token; + if (!*p) + continue; + + token = match_token(p, tokens, args); + switch (token) { + case Opt_version: + if (match_int(&args[0], &option)) { + ret = -EINVAL; + goto out; + } + if (option < 1 || option > 2) { + pr_err("proc: version value must be 1 or 2.\n"); + ret = -EINVAL; + goto out; + } + fs_options->version = option; + break; + case Opt_gid: + if (match_int(&args[0], &option)) { + ret = -EINVAL; + goto out; + } + fs_options->pid_gid = make_kgid(current_user_ns(), option); + break; + case Opt_hidepid: + if (match_int(&args[0], &option)) { + ret = -EINVAL; + goto out; + } + if (option < 0 || option > 2) { + pr_err("proc: hidepid value must be between 0 and 2.\n"); + ret = -EINVAL; + goto out; + } + fs_options->hide_pid = option; + break; + case Opt_err: + /* + * pr_err("proc: unrecognized mount option \"%s\" ", p); + * ret = -EINVAL; + * goto out; + */ + default: + break; + } + } + +out: + kfree(orig); + return ret; +} + +static int proc_test_super(struct super_block *s, void *data) +{ + int ret = 0; + struct proc_fs_info *p = data; + struct proc_fs_info *fs_info = proc_sb(s); + + if (p->version == 1 && p->pid_ns == fs_info->pid_ns) + ret = 1; + /* + if (p->version == 2 && p->pid_ns == fs_info->pid_ns && + p->hide_pid == fs_info->hide_pid && + gid_eq(p->pid_gid, fs_info->pid_gid)) + ret = 1; + */ + return ret; +} + +static int proc_set_super(struct super_block *sb, void *data) +{ + sb->s_fs_info = data; + return set_anon_super(sb, NULL); +} + int proc_remount(struct super_block *sb, int *flags, char *data) { - struct pid_namespace *pid = sb->s_fs_info; + int error; + struct proc_fs_info *fs_info = proc_sb(sb); + struct pid_namespace *pid = fs_info->pid_ns; + struct proc_options fs_options = { 1, GLOBAL_ROOT_GID, 0 }; sync_filesystem(sb); - return !proc_parse_options(data, pid); + error = proc_parse_early_options(data, sb->s_type, &fs_options); + if (error < 0) + return error; + + if (fs_options.version == 1) { + error = proc_parse_options(data, pid); + if (!error) + return -EINVAL; + } + + fs_info->version = fs_options.version; + fs_info->pid_gid = fs_options.pid_gid; + fs_info->hide_pid = fs_options.hide_pid; + + return 0; } static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { struct pid_namespace *ns; + struct super_block *sb; + struct proc_fs_info *fs_info = NULL; + struct proc_options fs_options = { 1, GLOBAL_ROOT_GID, 0 }; + int error = 0; + + if (!(flags & MS_KERNMOUNT)) { + if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) + return ERR_PTR(-EPERM); + + + error = proc_parse_early_options(data, fs_type, &fs_options); + if (error) + return ERR_PTR(error); + } + + fs_info = kzalloc(sizeof(struct proc_fs_info), GFP_NOFS); + if (!fs_info) + return ERR_PTR(-ENOMEM); + + fs_info->version = fs_options.version; + fs_info->pid_gid = fs_options.pid_gid; + fs_info->hide_pid = fs_options.hide_pid; if (flags & MS_KERNMOUNT) { ns = data; data = NULL; + fs_info->version = 1; /* Lets restore this */ } else { ns = task_active_pid_ns(current); } - return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); + fs_info->pid_ns = ns; + + sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags, + ns->user_ns, fs_info); + if (IS_ERR(sb)) { + error = PTR_ERR(sb); + goto error_fs_info; + } + + if (sb->s_root) { + kfree(fs_info); + } else { + error = proc_fill_super(sb, data, flags & MS_SILENT ? 1 : 0); + if (error) { + deactivate_locked_super(sb); + goto error_fs_info; + } + + sb->s_flags |= MS_ACTIVE; + } + + return dget(sb->s_root); + +error_fs_info: + kfree(fs_info); + return ERR_PTR(error); } static void proc_kill_sb(struct super_block *sb) { - struct pid_namespace *ns; + struct proc_fs_info *fs_info = proc_sb(sb); + struct pid_namespace *ns = (struct pid_namespace *)fs_info->pid_ns; - ns = (struct pid_namespace *)sb->s_fs_info; if (ns->proc_self) dput(ns->proc_self); if (ns->proc_thread_self) dput(ns->proc_thread_self); kill_anon_super(sb); put_pid_ns(ns); + kfree(fs_info); } static struct file_system_type proc_fs_type = { diff --git a/fs/proc/self.c b/fs/proc/self.c index 39857f6..9f95174 100644 --- a/fs/proc/self.c +++ b/fs/proc/self.c @@ -10,7 +10,8 @@ static const char *proc_self_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) { - struct pid_namespace *ns = inode->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; pid_t tgid = task_tgid_nr_ns(current, ns); char *name; @@ -34,9 +35,10 @@ static unsigned self_inum; int proc_setup_self(struct super_block *s) { struct inode *root_inode = d_inode(s->s_root); - struct pid_namespace *ns = s->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(s); + struct pid_namespace *ns = fs_info->pid_ns; struct dentry *self; - + inode_lock(root_inode); self = d_alloc_name(s->s_root, "self"); if (self) { diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c index 20614b6..13d9aef 100644 --- a/fs/proc/thread_self.c +++ b/fs/proc/thread_self.c @@ -10,7 +10,8 @@ static const char *proc_thread_self_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) { - struct pid_namespace *ns = inode->i_sb->s_fs_info; + struct proc_fs_info *fs_info = proc_sb(inode->i_sb); + struct pid_namespace *ns = fs_info->pid_ns; pid_t tgid = task_tgid_nr_ns(current, ns); pid_t pid = task_pid_nr_ns(current, ns); char *name; @@ -34,8 +35,9 @@ static unsigned thread_self_inum; int proc_setup_thread_self(struct super_block *s) { + struct proc_fs_info *fs_info = proc_sb(s); + struct pid_namespace *ns = fs_info->pid_ns; struct inode *root_inode = d_inode(s->s_root); - struct pid_namespace *ns = s->s_fs_info; struct dentry *thread_self; inode_lock(root_inode); diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index b5713fe..d0ae937 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -36,23 +36,23 @@ static unsigned mounts_poll(struct file *file, poll_table *wait) return res; } -struct proc_fs_info { +struct proc_fs_opts { int flag; const char *str; }; static int show_sb_opts(struct seq_file *m, struct super_block *sb) { - static const struct proc_fs_info fs_info[] = { + static const struct proc_fs_opts fs_opts[] = { { MS_SYNCHRONOUS, ",sync" }, { MS_DIRSYNC, ",dirsync" }, { MS_MANDLOCK, ",mand" }, { MS_LAZYTIME, ",lazytime" }, { 0, NULL } }; - const struct proc_fs_info *fs_infop; + const struct proc_fs_opts *fs_infop; - for (fs_infop = fs_info; fs_infop->flag; fs_infop++) { + for (fs_infop = fs_opts; fs_infop->flag; fs_infop++) { if (sb->s_flags & fs_infop->flag) seq_puts(m, fs_infop->str); } @@ -62,7 +62,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb) static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { - static const struct proc_fs_info mnt_info[] = { + static const struct proc_fs_opts mnt_opts[] = { { MNT_NOSUID, ",nosuid" }, { MNT_NODEV, ",nodev" }, { MNT_NOEXEC, ",noexec" }, @@ -71,9 +71,9 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_RELATIME, ",relatime" }, { 0, NULL } }; - const struct proc_fs_info *fs_infop; + const struct proc_fs_opts *fs_infop; - for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) { + for (fs_infop = mnt_opts; fs_infop->flag; fs_infop++) { if (mnt->mnt_flags & fs_infop->flag) seq_puts(m, fs_infop->str); } diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 2d2bf59..27a1e85 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -7,10 +7,18 @@ #include <linux/types.h> #include <linux/fs.h> +struct proc_fs_info { + struct pid_namespace *pid_ns; + int version; + kgid_t pid_gid; + int hide_pid; +}; + struct proc_dir_entry; #ifdef CONFIG_PROC_FS +extern struct proc_fs_info *proc_sb(struct super_block *sb); extern void proc_root_init(void); extern void proc_flush_task(struct task_struct *); @@ -53,6 +61,7 @@ static inline void proc_flush_task(struct task_struct *task) { } +extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;} static inline struct proc_dir_entry *proc_symlink(const char *name, struct proc_dir_entry *parent,const char *dest) { return NULL;} static inline struct proc_dir_entry *proc_mkdir(const char *name, -- 2.10.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC] Add option to mount only a pids subset 2017-03-12 2:13 ` Al Viro [not found] ` <20170312021257.GP29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2017-03-20 12:58 ` Alexey Gladkov 2017-03-23 16:05 ` Oleg Nesterov 2017-03-23 16:06 ` Djalal Harouni 1 sibling, 2 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-20 12:58 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin Al Viro, this patch looks better ? == Overview == Some of the container virtualization systems are mounted /proc inside the container. This is done in most cases to operate with information about the processes. Knowing that /proc filesystem is not fully virtualized they are mounted on top of dangerous places empty files or directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.). The structure of this filesystem is dynamic and any module can create a new object which will not necessarily be virtualized. There are proprietary modules that aren't in the mainline whose work we can not verify. This opens up a potential threat to the system. The developers of the virtualization system can't predict all dangerous places in /proc by definition. A more effective solution would be to mount into the container only what is necessary and ignore the rest. Right now there is the opportunity to pass in the container any port of the /proc filesystem using mount --bind expect the pids. This patch allows to mount only the part of /proc related to pids without rest objects. Since this is an option for /proc, flags applied to /proc have an effect on this subset of filesystem. Originally the idea was that the container will be mounted only pid sunset and additional required files will be mounted on top using the overlayfs. But I found out that /proc does not support overlayfs and does not allow to mount anything on top or under it. == TODO == There is still work to do: * Add overlayfs support. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> --- fs/proc/generic.c | 5 ++ fs/proc/inode.c | 2 + fs/proc/internal.h | 7 +++ fs/proc/root.c | 113 ++++++++++++++++++++++++++++++++++++++++-- include/linux/pid_namespace.h | 1 + 5 files changed, 123 insertions(+), 5 deletions(-) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index ee27feb..50bb1e9 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -23,6 +23,7 @@ #include <linux/spinlock.h> #include <linux/completion.h> #include <linux/uaccess.h> +#include <linux/pid_namespace.h> #include "internal.h" @@ -307,6 +308,10 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file, int proc_readdir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); + struct pid_namespace *ns = inode->i_sb->s_fs_info; + + if (ns->pidfs && inode == d_inode(ns->pidfs)) + return 1; return proc_readdir_de(PDE(inode), file, ctx); } diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 2cc7a80..0c9be65 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -109,6 +109,8 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid)); if (pid->hide_pid != HIDEPID_OFF) seq_printf(seq, ",hidepid=%u", pid->hide_pid); + if (root == pid->pidfs) + seq_printf(seq, ",pidonly"); return 0; } diff --git a/fs/proc/internal.h b/fs/proc/internal.h index c5ae09b..a5a4bf1 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -260,7 +260,14 @@ static inline void proc_tty_init(void) {} /* * root.c */ +struct proc_options { + kgid_t pid_gid; + int hide_pid; + int pid_only; +}; + extern struct proc_dir_entry proc_root; +extern struct proc_dir_entry pidfs_root; extern int proc_parse_options(char *options, struct pid_namespace *pid); extern void proc_self_init(void); diff --git a/fs/proc/root.c b/fs/proc/root.c index deecb39..c2443d5 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -26,16 +26,17 @@ #include "internal.h" enum { - Opt_gid, Opt_hidepid, Opt_err, + Opt_gid, Opt_hidepid, Opt_pidonly, Opt_err, }; static const match_table_t tokens = { {Opt_hidepid, "hidepid=%u"}, {Opt_gid, "gid=%u"}, + {Opt_pidonly, "pidonly"}, {Opt_err, NULL}, }; -int proc_parse_options(char *options, struct pid_namespace *pid) +static int proc_fill_options(char *options, struct proc_options *fs_opts) { char *p; substring_t args[MAX_OPT_ARGS]; @@ -55,7 +56,7 @@ int proc_parse_options(char *options, struct pid_namespace *pid) case Opt_gid: if (match_int(&args[0], &option)) return 0; - pid->pid_gid = make_kgid(current_user_ns(), option); + fs_opts->pid_gid = make_kgid(current_user_ns(), option); break; case Opt_hidepid: if (match_int(&args[0], &option)) @@ -65,7 +66,10 @@ int proc_parse_options(char *options, struct pid_namespace *pid) pr_err("proc: hidepid value must be between 0 and 2.\n"); return 0; } - pid->hide_pid = option; + fs_opts->hide_pid = option; + break; + case Opt_pidonly: + fs_opts->pid_only = 1; break; default: pr_err("proc: unrecognized mount option \"%s\" " @@ -77,6 +81,72 @@ int proc_parse_options(char *options, struct pid_namespace *pid) return 1; } +int proc_parse_options(char *options, struct pid_namespace *pid) +{ + struct proc_options opts = { 0 }; + + if (!proc_fill_options(options, &opts)) + return 0; + + pid->pid_gid = opts.pid_gid; + pid->hide_pid = opts.hide_pid; + + return 1; +} + +static int pidfs_register_dir(struct dentry *root, char *name, struct inode *inode) +{ + struct inode *root_inode = d_inode(root); + struct dentry *child; + + inode_lock(root_inode); + child = d_alloc_name(root, name); + if (child) { + d_add(child, inode); + } else { + child = ERR_PTR(-ENOMEM); + } + inode_unlock(root_inode); + if (IS_ERR(child)) { + pr_err("pidfs_register_dir: can't allocate /pidfs/%s\n", name); + return PTR_ERR(child); + } + return 0; +} + +static int fill_pidfs_root(struct super_block *s) +{ + struct pid_namespace *ns = get_pid_ns(s->s_fs_info); + struct inode *root_inode; + struct dentry *root; + int ret; + + pde_get(&pidfs_root); + root_inode = proc_get_inode(s, &pidfs_root); + if (!root_inode) { + pr_err("pidfs_fill_root: get root inode failed\n"); + return -ENOMEM; + } + + root = d_make_root(root_inode); + if (!root) { + pr_err("pidfs_fill_root: allocate dentry failed\n"); + return -ENOMEM; + } + + ret = pidfs_register_dir(root, "self", d_inode(ns->proc_self)); + if (ret) + return ret; + + ret = pidfs_register_dir(root, "thread-self", d_inode(ns->proc_thread_self)); + if (ret) + return ret; + + ns->pidfs = root; + + return 0; +} + int proc_remount(struct super_block *sb, int *flags, char *data) { struct pid_namespace *pid = sb->s_fs_info; @@ -89,6 +159,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { struct pid_namespace *ns; + static struct dentry *root; + struct proc_options opts = { 0 }; if (flags & MS_KERNMOUNT) { ns = data; @@ -97,7 +169,23 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, ns = task_active_pid_ns(current); } - return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); + root = mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); + + if (!IS_ERR(root)) { + if (!proc_fill_options(data, &opts)) + return ERR_PTR(-EINVAL); + + if (opts.pid_only) { + int ret; + + if (!ns->pidfs && (ret = fill_pidfs_root(root->d_sb))) + return ERR_PTR(ret); + + root = ns->pidfs; + } + } + + return root; } static void proc_kill_sb(struct super_block *sb) @@ -109,6 +197,8 @@ static void proc_kill_sb(struct super_block *sb) dput(ns->proc_self); if (ns->proc_thread_self) dput(ns->proc_thread_self); + if (ns->pidfs) + dput(ns->pidfs); kill_anon_super(sb); put_pid_ns(ns); } @@ -214,6 +304,19 @@ struct proc_dir_entry proc_root = { .name = "/proc", }; +struct proc_dir_entry pidfs_root = { + .low_ino = PROC_ROOT_INO, + .namelen = 5, + .mode = S_IFDIR | S_IRUGO | S_IXUGO, + .nlink = 2, + .count = ATOMIC_INIT(1), + .proc_iops = &proc_root_inode_operations, + .proc_fops = &proc_root_operations, + .parent = &pidfs_root, + .subdir = RB_ROOT, + .name = "/pidfs", +}; + int pid_ns_prepare_proc(struct pid_namespace *ns) { struct vfsmount *mnt; diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index c2a989d..e7b4d64 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -41,6 +41,7 @@ struct pid_namespace { struct vfsmount *proc_mnt; struct dentry *proc_self; struct dentry *proc_thread_self; + struct dentry *pidfs; #endif #ifdef CONFIG_BSD_PROCESS_ACCT struct fs_pin *bacct; -- 2.10.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-20 12:58 ` [RFC] Add option to mount only a pids subset Alexey Gladkov @ 2017-03-23 16:05 ` Oleg Nesterov [not found] ` <20170323160507.GA23135-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-03-23 16:06 ` Djalal Harouni 1 sibling, 1 reply; 21+ messages in thread From: Oleg Nesterov @ 2017-03-23 16:05 UTC (permalink / raw) To: Alexey Gladkov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Pavel Emelyanov, James Bottomley, Dmitry V. Levin Again, I can't really review this, I know nothing about vfs, but since nobody else replied... On 03/20, Alexey Gladkov wrote: > > @@ -97,7 +169,23 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, > ns = task_active_pid_ns(current); > } > > - return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); > + root = mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); > + > + if (!IS_ERR(root)) { > + if (!proc_fill_options(data, &opts)) > + return ERR_PTR(-EINVAL); So we have to call proc_fill_options() twice, not good... Yes, I understand why, but perhaps we factor it out somehow, we can pack options + pid_ns into sb->s_fs_info. Nevermind, this is minor. > + if (opts.pid_only) { > + int ret; > + > + if (!ns->pidfs && (ret = fill_pidfs_root(root->d_sb))) > + return ERR_PTR(ret); > + > + root = ns->pidfs; Afaics this lacks dget(ns->pidfs) which should pair with dput(mnt.mnt_root) in cleanup_mnt(). IIUC otherwise ns->pidfs can go away after umount, OTOH, if we return ns->pidfs then dget(sb->s_root) in mount_ns() is not balanced. But this all is fixeable. So with this change "mount -opidonly" creates another IS_ROOT() dentry which is not equal to sb->s_root. I simply do not know if this is technically correct or not... but, say, the "Only bind mounts can have disconnected paths" comment in path_connected() makes me worry ;) And this obviously means that /path-to-pidonly-mnt/ won't share dentries with the normal /proc mount. Not really good imo even if not really wrong... Lets look at proc_flush_task(). The exiting task will flush its $pid dentries in /proc/ but not in /path-to-pidonly-mnt/ iiuc. Again, not really a bug, but still... Oleg. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <20170323160507.GA23135-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <20170323160507.GA23135-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-03-23 22:57 ` Alexey Gladkov 0 siblings, 0 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-23 22:57 UTC (permalink / raw) To: Oleg Nesterov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Pavel Emelyanov, James Bottomley, Dmitry V. Levin On Thu, Mar 23, 2017 at 05:05:07PM +0100, Oleg Nesterov wrote: > Again, I can't really review this, I know nothing about vfs, but since > nobody else replied... Thanks anyway :) > On 03/20, Alexey Gladkov wrote: > > > > @@ -97,7 +169,23 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, > > ns = task_active_pid_ns(current); > > } > > > > - return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); > > + root = mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super); > > + > > + if (!IS_ERR(root)) { > > + if (!proc_fill_options(data, &opts)) > > + return ERR_PTR(-EINVAL); > > So we have to call proc_fill_options() twice, not good... Yes, I understand > why, but perhaps we factor it out somehow, we can pack options + pid_ns into > sb->s_fs_info. Nevermind, this is minor. It happens only once, when we don't have the s_root yet. > > + if (opts.pid_only) { > > + int ret; > > + > > + if (!ns->pidfs && (ret = fill_pidfs_root(root->d_sb))) > > + return ERR_PTR(ret); > > + > > + root = ns->pidfs; > > Afaics this lacks dget(ns->pidfs) which should pair with dput(mnt.mnt_root) > in cleanup_mnt(). IIUC otherwise ns->pidfs can go away after umount, OTOH, > if we return ns->pidfs then dget(sb->s_root) in mount_ns() is not balanced. > But this all is fixeable. > > So with this change "mount -opidonly" creates another IS_ROOT() dentry which > is not equal to sb->s_root. I simply do not know if this is technically > correct or not... but, say, the "Only bind mounts can have disconnected paths" > comment in path_connected() makes me worry ;) > > And this obviously means that /path-to-pidonly-mnt/ won't share dentries with > the normal /proc mount. Not really good imo even if not really wrong... Lets > look at proc_flush_task(). The exiting task will flush its $pid dentries in > /proc/ but not in /path-to-pidonly-mnt/ iiuc. Again, not really a bug, but > still... I know that I'm cheater, but I did not start first :) -- Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-20 12:58 ` [RFC] Add option to mount only a pids subset Alexey Gladkov 2017-03-23 16:05 ` Oleg Nesterov @ 2017-03-23 16:06 ` Djalal Harouni 2017-03-23 22:07 ` Alexey Gladkov 1 sibling, 1 reply; 21+ messages in thread From: Djalal Harouni @ 2017-03-23 16:06 UTC (permalink / raw) To: Alexey Gladkov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin Hi Alexey, On Mon, Mar 20, 2017 at 1:58 PM, Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Al Viro, this patch looks better ? > > == Overview == > > Some of the container virtualization systems are mounted /proc inside > the container. This is done in most cases to operate with information > about the processes. Knowing that /proc filesystem is not fully > virtualized they are mounted on top of dangerous places empty files or > directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.). > > The structure of this filesystem is dynamic and any module can create a > new object which will not necessarily be virtualized. There are > proprietary modules that aren't in the mainline whose work we can not > verify. > > This opens up a potential threat to the system. The developers of the > virtualization system can't predict all dangerous places in /proc by > definition. > > A more effective solution would be to mount into the container only what > is necessary and ignore the rest. > > Right now there is the opportunity to pass in the container any port of > the /proc filesystem using mount --bind expect the pids. > > This patch allows to mount only the part of /proc related to pids without > rest objects. Since this is an option for /proc, flags applied to /proc > have an effect on this subset of filesystem. I just sent a patch that also has to deal with proc hidepid here: https://lkml.org/lkml/2017/3/23/505 I'm not sure if that's the right approach, it is still buggy, however seems that your patch also stores the mount option inside the pid_namespace which may get propagated to all mounts inside same pidns ? I didn't have enough time but maybe if they are related we can work it out together ? Thank you! -- tixxdz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-23 16:06 ` Djalal Harouni @ 2017-03-23 22:07 ` Alexey Gladkov 2017-03-26 7:03 ` Djalal Harouni 0 siblings, 1 reply; 21+ messages in thread From: Alexey Gladkov @ 2017-03-23 22:07 UTC (permalink / raw) To: Djalal Harouni Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin On Thu, Mar 23, 2017 at 05:06:28PM +0100, Djalal Harouni wrote: > Hi Alexey, > > On Mon, Mar 20, 2017 at 1:58 PM, Alexey Gladkov > <gladkov.alexey@gmail.com> wrote: > > > > > > Al Viro, this patch looks better ? > > > > == Overview == > > > > Some of the container virtualization systems are mounted /proc inside > > the container. This is done in most cases to operate with information > > about the processes. Knowing that /proc filesystem is not fully > > virtualized they are mounted on top of dangerous places empty files or > > directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.). > > > > The structure of this filesystem is dynamic and any module can create a > > new object which will not necessarily be virtualized. There are > > proprietary modules that aren't in the mainline whose work we can not > > verify. > > > > This opens up a potential threat to the system. The developers of the > > virtualization system can't predict all dangerous places in /proc by > > definition. > > > > A more effective solution would be to mount into the container only what > > is necessary and ignore the rest. > > > > Right now there is the opportunity to pass in the container any port of > > the /proc filesystem using mount --bind expect the pids. > > > > This patch allows to mount only the part of /proc related to pids without > > rest objects. Since this is an option for /proc, flags applied to /proc > > have an effect on this subset of filesystem. > > I just sent a patch that also has to deal with proc hidepid here: > https://lkml.org/lkml/2017/3/23/505 I completely agree with you that it looks wrong when options of one mountpoint affect the others mountpoints. > I'm not sure if that's the right approach, it is still buggy, however > seems that your patch also stores the mount option inside the > pid_namespace which may get propagated to all mounts inside same pidns > ? I don't store 'pidonly' option in my current patch. I mean in: https://lkml.org/lkml/2017/3/20/363 I parse options twice at first mount of procfs. It happens before the mounting /proc in userspace. I know it's bad, but I couldn't find place to store this option. > I didn't have enough time but maybe if they are related we can work it > out together ? I don't have enough experience in kenrel hacking, but I would happily do my best :) I also tring to mention it in every patch, as my changes almost completely useless without the ability to use the overlayfs. Now if you remove the restriction: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/inode.c#n497 and mount procfs as lowerdir in overlayfs then you get NULL pointer dereference at: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c#n891 I got it when I tried to do `ls -la /overlay/proc/self/`. -- Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] Add option to mount only a pids subset 2017-03-23 22:07 ` Alexey Gladkov @ 2017-03-26 7:03 ` Djalal Harouni [not found] ` <CAEiveUe15YvZ4hMYSPgm586MkJ20PO515r9krXdjPCUmrG1wSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 21+ messages in thread From: Djalal Harouni @ 2017-03-26 7:03 UTC (permalink / raw) To: Alexey Gladkov Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin, Jann Horn, Kees Cook, Andy Lutomirski, Miklos Szeredi (Cc'ed Kees and Jann for the procfs stacking issue) On Thu, Mar 23, 2017 at 11:07 PM, Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Thu, Mar 23, 2017 at 05:06:28PM +0100, Djalal Harouni wrote: >> Hi Alexey, >> >> On Mon, Mar 20, 2017 at 1:58 PM, Alexey Gladkov >> <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> > >> > >> > Al Viro, this patch looks better ? >> > >> > == Overview == >> > >> > Some of the container virtualization systems are mounted /proc inside >> > the container. This is done in most cases to operate with information >> > about the processes. Knowing that /proc filesystem is not fully >> > virtualized they are mounted on top of dangerous places empty files or >> > directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.). >> > >> > The structure of this filesystem is dynamic and any module can create a >> > new object which will not necessarily be virtualized. There are >> > proprietary modules that aren't in the mainline whose work we can not >> > verify. >> > >> > This opens up a potential threat to the system. The developers of the >> > virtualization system can't predict all dangerous places in /proc by >> > definition. >> > >> > A more effective solution would be to mount into the container only what >> > is necessary and ignore the rest. >> > >> > Right now there is the opportunity to pass in the container any port of >> > the /proc filesystem using mount --bind expect the pids. >> > >> > This patch allows to mount only the part of /proc related to pids without >> > rest objects. Since this is an option for /proc, flags applied to /proc >> > have an effect on this subset of filesystem. >> >> I just sent a patch that also has to deal with proc hidepid here: >> https://lkml.org/lkml/2017/3/23/505 > > I completely agree with you that it looks wrong when options of one > mountpoint affect the others mountpoints. > >> I'm not sure if that's the right approach, it is still buggy, however >> seems that your patch also stores the mount option inside the >> pid_namespace which may get propagated to all mounts inside same pidns >> ? > > I don't store 'pidonly' option in my current patch. I mean in: > https://lkml.org/lkml/2017/3/20/363 > > I parse options twice at first mount of procfs. It happens before > the mounting /proc in userspace. > > I know it's bad, but I couldn't find place to store this option. Ok, then maybe that approach of having a procfs struct holder instead of using pid namespace may help! >> I didn't have enough time but maybe if they are related we can work it >> out together ? > > I don't have enough experience in kenrel hacking, but I would happily do > my best :) OK, let's sync on this then. > I also tring to mention it in every patch, as my changes almost completely > useless without the ability to use the overlayfs. > > Now if you remove the restriction: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/inode.c#n497 This protection was introduced in e54ad7f1ee263 That's a security fix, ecryptfs seems to schedule some code through kthreads which always run in the initial namespaces with full capabilities, bypassing all security checks. I don't know if overlayfs has something similar, if not then maybe if it's possible in ecryptfs to check the lower mount and the fs type and move this procfs there... Cc'ed Jann in case he may comment. > and mount procfs as lowerdir in overlayfs then you get NULL pointer > dereference at: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c#n891 > > I got it when I tried to do `ls -la /overlay/proc/self/`. > > -- > Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CAEiveUe15YvZ4hMYSPgm586MkJ20PO515r9krXdjPCUmrG1wSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [RFC] Add option to mount only a pids subset [not found] ` <CAEiveUe15YvZ4hMYSPgm586MkJ20PO515r9krXdjPCUmrG1wSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-03-30 21:45 ` Alexey Gladkov 0 siblings, 0 replies; 21+ messages in thread From: Alexey Gladkov @ 2017-03-30 21:45 UTC (permalink / raw) To: Djalal Harouni Cc: Linux Kernel Mailing List, Linux API, Kirill A. Shutemov, Vasiliy Kulikov, Al Viro, Eric W. Biederman, Oleg Nesterov, Pavel Emelyanov, James Bottomley, Dmitry V. Levin, Jann Horn, Kees Cook, Andy Lutomirski, Miklos Szeredi On Sun, Mar 26, 2017 at 09:03:33AM +0200, Djalal Harouni wrote: > (Cc'ed Kees and Jann for the procfs stacking issue) > > > I completely agree with you that it looks wrong when options of one > > mountpoint affect the others mountpoints. > > > >> I'm not sure if that's the right approach, it is still buggy, however > >> seems that your patch also stores the mount option inside the > >> pid_namespace which may get propagated to all mounts inside same pidns > >> ? > > > > I don't store 'pidonly' option in my current patch. I mean in: > > https://lkml.org/lkml/2017/3/20/363 > > > > I parse options twice at first mount of procfs. It happens before > > the mounting /proc in userspace. > > > > I know it's bad, but I couldn't find place to store this option. > > Ok, then maybe that approach of having a procfs struct holder instead > of using pid namespace may help! I deside to stop doing my patch. I talked with a few people and found out that the overlayfs doesn't feel very well if on the lower level filesystem appear/disappear files. In addition with the pidfs isn't so simple. Separate the root will lead to a doubling of memory consumption and restrictions on the filesystem operations level can easily be skipped. It means that even I can do this pidfs (or pid subset in /proc), it will be pointless. -- Rgrds, legion ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-03-30 21:45 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20170221145746.GA31914@redhat.com> [not found] ` <20170221145746.GA31914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-03-06 23:05 ` [RFC] Add option to mount only a pids subset Alexey Gladkov 2017-03-07 16:24 ` Andy Lutomirski 2017-03-09 11:26 ` Djalal Harouni [not found] ` <CAEiveUczqzHZG7jcM72oWXAKYZSPJ0ywYEXGDV1sn_FAhr28pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-09 20:52 ` Eric W. Biederman 2017-03-11 21:51 ` Alexey Gladkov [not found] ` <CALCETrVFnSoBP+LvyjN+1qUqrwgZan1nvscq5hV0Ujt_FF2e3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-11 0:05 ` Alexey Gladkov 2017-03-07 17:49 ` Oleg Nesterov [not found] ` <20170307174909.GA24112-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-03-10 23:46 ` Alexey Gladkov 2017-03-12 1:54 ` Al Viro [not found] ` <20170312015430.GO29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2017-03-12 2:13 ` Al Viro [not found] ` <20170312021257.GP29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2017-03-13 3:19 ` Andy Lutomirski [not found] ` <CALCETrVT5sfGhNomLKAephrSGj8fc81ZjGTN-Y6UwgAHngVRCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-13 13:27 ` Al Viro [not found] ` <20170313132732.GR29622-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2017-03-13 15:24 ` Andy Lutomirski [not found] ` <CALCETrXqv8VUeO6MpKWDR6DFYBgmmT0nZVezBJsimtmmQgDksw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-23 15:59 ` [PATCH] proc: allow to change proc mount options per mount Djalal Harouni 2017-03-20 12:58 ` [RFC] Add option to mount only a pids subset Alexey Gladkov 2017-03-23 16:05 ` Oleg Nesterov [not found] ` <20170323160507.GA23135-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-03-23 22:57 ` Alexey Gladkov 2017-03-23 16:06 ` Djalal Harouni 2017-03-23 22:07 ` Alexey Gladkov 2017-03-26 7:03 ` Djalal Harouni [not found] ` <CAEiveUe15YvZ4hMYSPgm586MkJ20PO515r9krXdjPCUmrG1wSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-03-30 21:45 ` Alexey Gladkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).