Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>,
	viro@zeniv.linux.org.uk, adobriyan@gmail.com
Cc: davem@davemloft.net, akpm@linux-foundation.org,
	christian.brauner@ubuntu.com, areber@redhat.com,
	serge@hallyn.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary
Date: Mon, 3 Aug 2020 13:03:17 +0300
Message-ID: <9ceb5049-6aea-1429-e35f-d86480f10d72@virtuozzo.com> (raw)
In-Reply-To: <875za43b3w.fsf@x220.int.ebiederm.org>

On 31.07.2020 01:13, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
> 
>> On 30.07.2020 17:34, Eric W. Biederman wrote:
>>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>>>
>>>> Currently, there is no a way to list or iterate all or subset of namespaces
>>>> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories,
>>>> but some also may be as open files, which are not attached to a process.
>>>> When a namespace open fd is sent over unix socket and then closed, it is
>>>> impossible to know whether the namespace exists or not.
>>>>
>>>> Also, even if namespace is exposed as attached to a process or as open file,
>>>> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because
>>>> this multiplies at tasks and fds number.
>>>
>>> I am very dubious about this.
>>>
>>> I have been avoiding exactly this kind of interface because it can
>>> create rather fundamental problems with checkpoint restart.
>>
>> restart/restore :)
>>
>>> You do have some filtering and the filtering is not based on current.
>>> Which is good.
>>>
>>> A view that is relative to a user namespace might be ok.    It almost
>>> certainly does better as it's own little filesystem than as an extension
>>> to proc though.
>>>
>>> The big thing we want to ensure is that if you migrate you can restore
>>> everything.  I don't see how you will be able to restore these files
>>> after migration.  Anything like this without having a complete
>>> checkpoint/restore story is a non-starter.
>>
>> There is no difference between files in /proc/namespaces/ directory and /proc/[pid]/ns/.
>>
>> CRIU can restore open files in /proc/[pid]/ns, the same will be with /proc/namespaces/ files.
>> As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't see any
>> problem here.
> 
> An obvious diffference is that you are adding the inode to the inode to
> the file name.  Which means that now you really do have to preserve the
> inode numbers during process migration.
>
> Which means now we have to do all of the work to make inode number
> restoration possible.  Which means now we need to have multiple
> instances of nsfs so that we can restore inode numbers.
> 
> I think this is still possible but we have been delaying figuring out
> how to restore inode numbers long enough that may be actual technical
> problems making it happen.

Yeah, this matters. But it looks like here is not a dead end. We just need
change the names the namespaces are exported to particular fs and to support
rename().

Before introduction a principally new filesystem type for this, can't
this be solved in current /proc?

Alexey, does rename() is prohibited for /proc fs?
 
> Now maybe CRIU can handle the names of the files changing during
> migration but you have just increased the level of difficulty for doing
> that.
> 
>> If you have a specific worries about, let's discuss them.
> 
> I was asking and I am asking that it be described in the patch
> description how a container using this feature can be migrated
> from one machine to another.  This code is so close to being problematic
> that we need be very careful we don't fundamentally break CRIU while
> trying to make it's job simpler and easier.
> 
>> CC: Pavel Tikhomirov CRIU maintainer, who knows everything about namespaces C/R.
>>  
>>> Further by not going through the processes it looks like you are
>>> bypassing the existing permission checks.  Which has the potential
>>> to allow someone to use a namespace who would not be able to otherwise.
>>
>> I agree, and I wrote to Christian, that permissions should be more strict.
>> This just should be formalized. Let's discuss this.
>>
>>> So I think this goes one step too far but I am willing to be persuaded
>>> otherwise.
>>>
> 
> Eric
> 


  parent reply index

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30 11:59 Kirill Tkhai
2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai
2020-07-30 13:35   ` Christian Brauner
2020-07-30 14:07     ` Kirill Tkhai
2020-07-30 15:59       ` Christian Brauner
2020-07-30 14:30   ` Christian Brauner
2020-07-30 14:34     ` Kirill Tkhai
2020-07-30 14:39       ` Christian Brauner
2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai
2020-07-30 14:30   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai
2020-07-30 14:32   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai
2020-07-30 14:37   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai
2020-07-30 14:46   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai
2020-07-30 14:49   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai
2020-07-30 14:50   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai
2020-07-30 14:52   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai
2020-07-30 12:23   ` Matthew Wilcox
2020-07-30 13:32     ` Kirill Tkhai
2020-07-30 13:56       ` Matthew Wilcox
2020-07-30 14:12         ` Kirill Tkhai
2020-07-30 14:15           ` Matthew Wilcox
2020-07-30 14:20             ` Kirill Tkhai
2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai
2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai
2020-07-30 12:18   ` Alexey Dobriyan
2020-07-30 13:22     ` Kirill Tkhai
2020-07-30 13:26   ` Christian Brauner
2020-07-30 14:30     ` Kirill Tkhai
2020-07-30 20:47   ` kernel test robot
2020-07-30 22:20   ` kernel test robot
2020-08-05  8:17   ` kernel test robot
2020-08-05  8:17   ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot
2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai
2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai
2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai
2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai
2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai
2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner
2020-07-30 13:38   ` Christian Brauner
2020-07-30 14:34 ` Eric W. Biederman
2020-07-30 14:42   ` Christian Brauner
2020-07-30 15:01   ` Kirill Tkhai
2020-07-30 22:13     ` Eric W. Biederman
2020-07-31  8:48       ` Pavel Tikhomirov
2020-08-03 10:03       ` Kirill Tkhai [this message]
2020-08-03 10:51         ` Alexey Dobriyan
2020-08-06  8:05         ` Andrei Vagin
2020-08-07  8:47           ` Kirill Tkhai
2020-08-10 17:34             ` Andrei Vagin
2020-08-11 10:23               ` Kirill Tkhai
2020-08-12 17:53                 ` Andrei Vagin
2020-08-13  8:12                   ` Kirill Tkhai
2020-08-14  1:16                     ` Andrei Vagin
2020-08-14 15:11                       ` Kirill Tkhai
2020-08-14 19:21                         ` Andrei Vagin
2020-08-17 14:05                           ` Kirill Tkhai
2020-08-17 15:48                             ` Eric W. Biederman
2020-08-17 17:47                               ` Christian Brauner
2020-08-17 18:53                                 ` Eric W. Biederman
2020-08-04  5:43     ` Andrei Vagin
2020-08-04 12:11       ` Pavel Tikhomirov
2020-08-04 14:47       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ceb5049-6aea-1429-e35f-d86480f10d72@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=areber@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git