From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: viro@zeniv.linux.org.uk, adobriyan@gmail.com,
davem@davemloft.net, akpm@linux-foundation.org,
christian.brauner@ubuntu.com, areber@redhat.com,
serge@hallyn.com, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary
Date: Thu, 30 Jul 2020 18:01:20 +0300 [thread overview]
Message-ID: <56928404-f194-4194-5f2a-59acb15b1a04@virtuozzo.com> (raw)
In-Reply-To: <87k0yl5axy.fsf@x220.int.ebiederm.org>
On 30.07.2020 17:34, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>
>> Currently, there is no a way to list or iterate all or subset of namespaces
>> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories,
>> but some also may be as open files, which are not attached to a process.
>> When a namespace open fd is sent over unix socket and then closed, it is
>> impossible to know whether the namespace exists or not.
>>
>> Also, even if namespace is exposed as attached to a process or as open file,
>> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because
>> this multiplies at tasks and fds number.
>
> I am very dubious about this.
>
> I have been avoiding exactly this kind of interface because it can
> create rather fundamental problems with checkpoint restart.
restart/restore :)
> You do have some filtering and the filtering is not based on current.
> Which is good.
>
> A view that is relative to a user namespace might be ok. It almost
> certainly does better as it's own little filesystem than as an extension
> to proc though.
>
> The big thing we want to ensure is that if you migrate you can restore
> everything. I don't see how you will be able to restore these files
> after migration. Anything like this without having a complete
> checkpoint/restore story is a non-starter.
There is no difference between files in /proc/namespaces/ directory and /proc/[pid]/ns/.
CRIU can restore open files in /proc/[pid]/ns, the same will be with /proc/namespaces/ files.
As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't see any
problem here.
If you have a specific worries about, let's discuss them.
CC: Pavel Tikhomirov CRIU maintainer, who knows everything about namespaces C/R.
> Further by not going through the processes it looks like you are
> bypassing the existing permission checks. Which has the potential
> to allow someone to use a namespace who would not be able to otherwise.
I agree, and I wrote to Christian, that permissions should be more strict.
This just should be formalized. Let's discuss this.
> So I think this goes one step too far but I am willing to be persuaded
> otherwise.
>
> Eric
>
>
>
>
>> This patchset introduces a new /proc/namespaces/ directory, which exposes
>> subset of permitted namespaces in linear view:
>>
>> # ls /proc/namespaces/ -l
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'cgroup:[4026531835]' -> 'cgroup:[4026531835]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'ipc:[4026531839]' -> 'ipc:[4026531839]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531840]' -> 'mnt:[4026531840]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531861]' -> 'mnt:[4026531861]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532133]' -> 'mnt:[4026532133]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532134]' -> 'mnt:[4026532134]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532135]' -> 'mnt:[4026532135]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532136]' -> 'mnt:[4026532136]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'net:[4026531993]' -> 'net:[4026531993]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'pid:[4026531836]' -> 'pid:[4026531836]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'time:[4026531834]' -> 'time:[4026531834]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'user:[4026531837]' -> 'user:[4026531837]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'uts:[4026531838]' -> 'uts:[4026531838]'
>>
>> Namespace ns is exposed, in case of its user_ns is permitted from /proc's pid_ns.
>> I.e., /proc is related to pid_ns, so in /proc/namespace we show only a ns, which is
>>
>> in_userns(pid_ns->user_ns, ns->user_ns).
>>
>> In case of ns is a user_ns:
>>
>> in_userns(pid_ns->user_ns, ns).
>>
>> The patchset follows this steps:
>>
>> 1)A generic counter in ns_common is introduced instead of separate
>> counters for every ns type (net::count, uts_namespace::kref,
>> user_namespace::count, etc). Patches [1-8];
>> 2)Patch [9] introduces IDR to link and iterate alive namespaces;
>> 3)Patch [10] is refactoring;
>> 4)Patch [11] actually adds /proc/namespace directory and fs methods;
>> 5)Patches [12-23] make every namespace to use the added methods
>> and to appear in /proc/namespace directory.
>>
>> This may be usefull to write effective debug utils (say, fast build
>> of networks topology) and checkpoint/restore software.
>> ---
>>
>> Kirill Tkhai (23):
>> ns: Add common refcount into ns_common add use it as counter for net_ns
>> uts: Use generic ns_common::count
>> ipc: Use generic ns_common::count
>> pid: Use generic ns_common::count
>> user: Use generic ns_common::count
>> mnt: Use generic ns_common::count
>> cgroup: Use generic ns_common::count
>> time: Use generic ns_common::count
>> ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system
>> fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c
>> fs: Add /proc/namespaces/ directory
>> user: Free user_ns one RCU grace period after final counter put
>> user: Add user namespaces into ns_idr
>> net: Add net namespaces into ns_idr
>> pid: Eextract child_reaper check from pidns_for_children_get()
>> proc_ns_operations: Add can_get method
>> pid: Add pid namespaces into ns_idr
>> uts: Free uts namespace one RCU grace period after final counter put
>> uts: Add uts namespaces into ns_idr
>> ipc: Add ipc namespaces into ns_idr
>> mnt: Add mount namespaces into ns_idr
>> cgroup: Add cgroup namespaces into ns_idr
>> time: Add time namespaces into ns_idr
>>
>>
>> fs/mount.h | 4
>> fs/namespace.c | 14 +
>> fs/nsfs.c | 78 ++++++++
>> fs/proc/Makefile | 1
>> fs/proc/internal.h | 18 +-
>> fs/proc/namespaces.c | 382 +++++++++++++++++++++++++++-------------
>> fs/proc/root.c | 17 ++
>> fs/proc/task_namespaces.c | 183 +++++++++++++++++++
>> include/linux/cgroup.h | 6 -
>> include/linux/ipc_namespace.h | 3
>> include/linux/ns_common.h | 11 +
>> include/linux/pid_namespace.h | 4
>> include/linux/proc_fs.h | 1
>> include/linux/proc_ns.h | 12 +
>> include/linux/time_namespace.h | 10 +
>> include/linux/user_namespace.h | 10 +
>> include/linux/utsname.h | 10 +
>> include/net/net_namespace.h | 11 -
>> init/version.c | 2
>> ipc/msgutil.c | 2
>> ipc/namespace.c | 17 +-
>> ipc/shm.c | 1
>> kernel/cgroup/cgroup.c | 2
>> kernel/cgroup/namespace.c | 25 ++-
>> kernel/pid.c | 2
>> kernel/pid_namespace.c | 46 +++--
>> kernel/time/namespace.c | 20 +-
>> kernel/user.c | 2
>> kernel/user_namespace.c | 23 ++
>> kernel/utsname.c | 23 ++
>> net/core/net-sysfs.c | 6 -
>> net/core/net_namespace.c | 18 +-
>> net/ipv4/inet_timewait_sock.c | 4
>> net/ipv4/tcp_metrics.c | 2
>> 34 files changed, 746 insertions(+), 224 deletions(-)
>> create mode 100644 fs/proc/task_namespaces.c
>>
>> --
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
next prev parent reply other threads:[~2020-07-30 15:01 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 11:59 [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Kirill Tkhai
2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai
2020-07-30 13:35 ` Christian Brauner
2020-07-30 14:07 ` Kirill Tkhai
2020-07-30 15:59 ` Christian Brauner
2020-07-30 14:30 ` Christian Brauner
2020-07-30 14:34 ` Kirill Tkhai
2020-07-30 14:39 ` Christian Brauner
2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai
2020-07-30 14:30 ` Christian Brauner
2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai
2020-07-30 14:32 ` Christian Brauner
2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai
2020-07-30 14:37 ` Christian Brauner
2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai
2020-07-30 14:46 ` Christian Brauner
2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai
2020-07-30 14:49 ` Christian Brauner
2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai
2020-07-30 14:50 ` Christian Brauner
2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai
2020-07-30 14:52 ` Christian Brauner
2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai
2020-07-30 12:23 ` Matthew Wilcox
2020-07-30 13:32 ` Kirill Tkhai
2020-07-30 13:56 ` Matthew Wilcox
2020-07-30 14:12 ` Kirill Tkhai
2020-07-30 14:15 ` Matthew Wilcox
2020-07-30 14:20 ` Kirill Tkhai
2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai
2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai
2020-07-30 12:18 ` Alexey Dobriyan
2020-07-30 13:22 ` Kirill Tkhai
2020-07-30 13:26 ` Christian Brauner
2020-07-30 14:30 ` Kirill Tkhai
2020-07-30 20:47 ` kernel test robot
2020-07-30 22:20 ` kernel test robot
2020-08-05 8:17 ` kernel test robot
2020-08-05 8:17 ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot
2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai
2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai
2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai
2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai
2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai
2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner
2020-07-30 13:38 ` Christian Brauner
2020-07-30 14:34 ` Eric W. Biederman
2020-07-30 14:42 ` Christian Brauner
2020-07-30 15:01 ` Kirill Tkhai [this message]
2020-07-30 22:13 ` Eric W. Biederman
2020-07-31 8:48 ` Pavel Tikhomirov
2020-08-03 10:03 ` Kirill Tkhai
2020-08-03 10:51 ` Alexey Dobriyan
2020-08-06 8:05 ` Andrei Vagin
2020-08-07 8:47 ` Kirill Tkhai
2020-08-10 17:34 ` Andrei Vagin
2020-08-11 10:23 ` Kirill Tkhai
2020-08-12 17:53 ` Andrei Vagin
2020-08-13 8:12 ` Kirill Tkhai
2020-08-14 1:16 ` Andrei Vagin
2020-08-14 15:11 ` Kirill Tkhai
2020-08-14 19:21 ` Andrei Vagin
2020-08-17 14:05 ` Kirill Tkhai
2020-08-17 15:48 ` Eric W. Biederman
2020-08-17 17:47 ` Christian Brauner
2020-08-17 18:53 ` Eric W. Biederman
2020-08-04 5:43 ` Andrei Vagin
2020-08-04 12:11 ` Pavel Tikhomirov
2020-08-04 14:47 ` Kirill Tkhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56928404-f194-4194-5f2a-59acb15b1a04@virtuozzo.com \
--to=ktkhai@virtuozzo.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=areber@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ptikhomirov@virtuozzo.com \
--cc=serge@hallyn.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).