Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: viro@zeniv.linux.org.uk, adobriyan@gmail.com,
	davem@davemloft.net, akpm@linux-foundation.org,
	christian.brauner@ubuntu.com, areber@redhat.com,
	serge@hallyn.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary
Date: Thu, 30 Jul 2020 17:13:23 -0500
Message-ID: <875za43b3w.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <56928404-f194-4194-5f2a-59acb15b1a04@virtuozzo.com> (Kirill Tkhai's message of "Thu, 30 Jul 2020 18:01:20 +0300")

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> On 30.07.2020 17:34, Eric W. Biederman wrote:
>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>> 
>>> Currently, there is no a way to list or iterate all or subset of namespaces
>>> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories,
>>> but some also may be as open files, which are not attached to a process.
>>> When a namespace open fd is sent over unix socket and then closed, it is
>>> impossible to know whether the namespace exists or not.
>>>
>>> Also, even if namespace is exposed as attached to a process or as open file,
>>> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because
>>> this multiplies at tasks and fds number.
>> 
>> I am very dubious about this.
>> 
>> I have been avoiding exactly this kind of interface because it can
>> create rather fundamental problems with checkpoint restart.
>
> restart/restore :)
>
>> You do have some filtering and the filtering is not based on current.
>> Which is good.
>> 
>> A view that is relative to a user namespace might be ok.    It almost
>> certainly does better as it's own little filesystem than as an extension
>> to proc though.
>> 
>> The big thing we want to ensure is that if you migrate you can restore
>> everything.  I don't see how you will be able to restore these files
>> after migration.  Anything like this without having a complete
>> checkpoint/restore story is a non-starter.
>
> There is no difference between files in /proc/namespaces/ directory and /proc/[pid]/ns/.
>
> CRIU can restore open files in /proc/[pid]/ns, the same will be with /proc/namespaces/ files.
> As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't see any
> problem here.

An obvious diffference is that you are adding the inode to the inode to
the file name.  Which means that now you really do have to preserve the
inode numbers during process migration.

Which means now we have to do all of the work to make inode number
restoration possible.  Which means now we need to have multiple
instances of nsfs so that we can restore inode numbers.

I think this is still possible but we have been delaying figuring out
how to restore inode numbers long enough that may be actual technical
problems making it happen.

Now maybe CRIU can handle the names of the files changing during
migration but you have just increased the level of difficulty for doing
that.

> If you have a specific worries about, let's discuss them.

I was asking and I am asking that it be described in the patch
description how a container using this feature can be migrated
from one machine to another.  This code is so close to being problematic
that we need be very careful we don't fundamentally break CRIU while
trying to make it's job simpler and easier.

> CC: Pavel Tikhomirov CRIU maintainer, who knows everything about namespaces C/R.
>  
>> Further by not going through the processes it looks like you are
>> bypassing the existing permission checks.  Which has the potential
>> to allow someone to use a namespace who would not be able to otherwise.
>
> I agree, and I wrote to Christian, that permissions should be more strict.
> This just should be formalized. Let's discuss this.
>
>> So I think this goes one step too far but I am willing to be persuaded
>> otherwise.
>> 

Eric


  reply index

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30 11:59 Kirill Tkhai
2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai
2020-07-30 13:35   ` Christian Brauner
2020-07-30 14:07     ` Kirill Tkhai
2020-07-30 15:59       ` Christian Brauner
2020-07-30 14:30   ` Christian Brauner
2020-07-30 14:34     ` Kirill Tkhai
2020-07-30 14:39       ` Christian Brauner
2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai
2020-07-30 14:30   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai
2020-07-30 14:32   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai
2020-07-30 14:37   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai
2020-07-30 14:46   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai
2020-07-30 14:49   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai
2020-07-30 14:50   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai
2020-07-30 14:52   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai
2020-07-30 12:23   ` Matthew Wilcox
2020-07-30 13:32     ` Kirill Tkhai
2020-07-30 13:56       ` Matthew Wilcox
2020-07-30 14:12         ` Kirill Tkhai
2020-07-30 14:15           ` Matthew Wilcox
2020-07-30 14:20             ` Kirill Tkhai
2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai
2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai
2020-07-30 12:18   ` Alexey Dobriyan
2020-07-30 13:22     ` Kirill Tkhai
2020-07-30 13:26   ` Christian Brauner
2020-07-30 14:30     ` Kirill Tkhai
2020-07-30 20:47   ` kernel test robot
2020-07-30 22:20   ` kernel test robot
2020-08-05  8:17   ` kernel test robot
2020-08-05  8:17   ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot
2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai
2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai
2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai
2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai
2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai
2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner
2020-07-30 13:38   ` Christian Brauner
2020-07-30 14:34 ` Eric W. Biederman
2020-07-30 14:42   ` Christian Brauner
2020-07-30 15:01   ` Kirill Tkhai
2020-07-30 22:13     ` Eric W. Biederman [this message]
2020-07-31  8:48       ` Pavel Tikhomirov
2020-08-03 10:03       ` Kirill Tkhai
2020-08-03 10:51         ` Alexey Dobriyan
2020-08-06  8:05         ` Andrei Vagin
2020-08-07  8:47           ` Kirill Tkhai
2020-08-10 17:34             ` Andrei Vagin
2020-08-11 10:23               ` Kirill Tkhai
2020-08-12 17:53                 ` Andrei Vagin
2020-08-13  8:12                   ` Kirill Tkhai
2020-08-14  1:16                     ` Andrei Vagin
2020-08-14 15:11                       ` Kirill Tkhai
2020-08-14 19:21                         ` Andrei Vagin
2020-08-17 14:05                           ` Kirill Tkhai
2020-08-17 15:48                             ` Eric W. Biederman
2020-08-17 17:47                               ` Christian Brauner
2020-08-17 18:53                                 ` Eric W. Biederman
2020-08-04  5:43     ` Andrei Vagin
2020-08-04 12:11       ` Pavel Tikhomirov
2020-08-04 14:47       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875za43b3w.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=areber@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=davem@davemloft.net \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git