LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Jann Horn <jannh@google.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	jlayton@kernel.org, Bruce Fields <bfields@fieldses.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Arnd Bergmann <arnd@arndb.de>,
	shuah@kernel.org, David Howells <dhowells@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	christian@brauner.io, Tycho Andersen <tycho@tycho.ws>,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	linux-arch <linux-arch@vger.kernel.org>,
	linux-kselftest@vger.kernel.org, dev@opencontainers.org,
	containers@lists.linux-foundation.org,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 2/3] namei: implement AT_THIS_ROOT chroot-like path resolution
Date: Mon, 1 Oct 2018 07:28:34 -0700
Message-ID: <C89D720F-3CC4-4FA9-9CBB-E41A67360A6B@amacapital.net> (raw)
In-Reply-To: <20181001054246.gfinmx3api7kjhmc@ryuk>


>>> Currently most container runtimes try to do this resolution in
>>> userspace[1], causing many potential race conditions. In addition, the
>>> "obvious" alternative (actually performing a {ch,pivot_}root(2))
>>> requires a fork+exec which is *very* costly if necessary for every
>>> filesystem operation involving a container.
>> 
>> Wait. fork() I understand, but why exec? And actually, you don't need
>> a full fork() either, clone() lets you do this with some process parts
>> shared. And then you also shouldn't need to use SCM_RIGHTS, just keep
>> the file descriptor table shared. And why chroot()/pivot_root(),
>> wouldn't you want to use setns()?
> 
> You're right about this -- for C runtimes. In Go we cannot do a raw
> clone() or fork() (if you do it manually with RawSyscall you'll end with
> broken runtime state). So you're forced to do fork+exec (which then
> means that you can't use CLONE_FILES and must use SCM_RIGHTS). Same goes
> for CLONE_VFORK.

I must admit that I’m not very sympathetic to the argument that “Go’s runtime model is incompatible with the simpler solution.”

Anyway, it occurs to me that the real problem is that setns() and chroot() are both overkill for this use case. What’s needed is to start your walk from /proc/pid-in-container/root, with two twists:

1. Do the walk as though rooted at a directory. This is basically just your AT_THIS_ROOT, but the footgun is avoided because the dirfd you use is from a foreign namespace, and, except for symlinks to absolute paths, no amount of .. racing is going to escape the *namespace*.

2. Avoid /proc. It’s not just the *links* — you really don’t want to walk into /proc/self. *Maybe* procfs is already careful enough when mounted in a namespace?

  parent reply index

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-29 10:34 [PATCH 0/3] namei: implement various scoping AT_* flags Aleksa Sarai
2018-09-29 10:34 ` [PATCH 1/3] namei: implement O_BENEATH-style " Aleksa Sarai
2018-09-29 14:48   ` Christian Brauner
2018-09-29 15:34     ` Aleksa Sarai
2018-09-30  4:38   ` Aleksa Sarai
2018-10-01 12:28   ` Jann Horn
2018-10-01 13:00     ` Christian Brauner
2018-10-01 16:04       ` Aleksa Sarai
2018-10-04 17:20         ` Christian Brauner
2018-09-29 13:15 ` [PATCH 2/3] namei: implement AT_THIS_ROOT chroot-like path resolution Aleksa Sarai
2018-09-29 13:15   ` [PATCH 3/3] selftests: vfs: add AT_* path resolution tests Aleksa Sarai
2018-09-29 16:35   ` [PATCH 2/3] namei: implement AT_THIS_ROOT chroot-like path resolution Jann Horn
2018-09-29 17:25     ` Andy Lutomirski
2018-10-01  9:46       ` Aleksa Sarai
2018-10-01  5:44     ` Aleksa Sarai
2018-10-01 10:13       ` Jann Horn
2018-10-01 16:18         ` Aleksa Sarai
2018-10-04 17:27           ` Christian Brauner
2018-10-01 10:42       ` Christian Brauner
2018-10-01 11:29         ` Jann Horn
2018-10-01 12:35           ` Christian Brauner
2018-10-01 13:55       ` Bruce Fields
2018-10-01 14:28       ` Andy Lutomirski [this message]
2018-10-02  7:32         ` Aleksa Sarai
2018-10-03 22:09           ` Andy Lutomirski
2018-10-06 20:56           ` Florian Weimer
2018-10-06 21:49             ` Christian Brauner
2018-10-01 14:00     ` Christian Brauner
2018-10-04 16:26     ` Aleksa Sarai
2018-10-04 17:31       ` Christian Brauner
2018-10-04 18:26       ` Jann Horn
2018-10-05 15:07         ` Aleksa Sarai
2018-10-05 15:55           ` Jann Horn
2018-10-06  2:10             ` Aleksa Sarai
2018-10-08 10:50               ` Jann Horn
2018-09-29 14:25 ` [PATCH 0/3] namei: implement various scoping AT_* flags Andy Lutomirski
2018-09-29 15:45   ` Aleksa Sarai
2018-09-29 16:34     ` Andy Lutomirski
2018-09-29 19:44       ` Matthew Wilcox
2018-09-29 14:38 ` Christian Brauner
2018-09-30  4:44   ` Aleksa Sarai
2018-09-30 13:54 ` Alban Crequy
2018-09-30 14:02   ` Christian Brauner
2018-09-30 19:45 ` Mickaël Salaün
2018-09-30 21:46   ` Jann Horn
2018-09-30 22:37     ` Mickaël Salaün
2018-10-01 20:14       ` James Morris
2018-10-01  4:08 ` Dave Chinner
2018-10-01  5:47   ` Aleksa Sarai
2018-10-01  6:14     ` Dave Chinner
2018-10-01 13:28 ` David Laight
2018-10-01 16:15   ` Aleksa Sarai
2018-10-03 13:21     ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C89D720F-3CC4-4FA9-9CBB-E41A67360A6B@amacapital.net \
    --to=luto@amacapital.net \
    --cc=arnd@arndb.de \
    --cc=bfields@fieldses.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=dev@opencontainers.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=shuah@kernel.org \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git