All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
@ 2022-01-12  9:02 Andrey Zhadchenko
  2022-01-12 14:34 ` Aleksa Sarai
  2022-01-12 14:39 ` Christian Brauner
  0 siblings, 2 replies; 18+ messages in thread
From: Andrey Zhadchenko @ 2022-01-12  9:02 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: cyphar, viro, christian.brauner, ptikhomirov, linux-api,
	andrey.zhadchenko

If you have an opened O_PATH file, currently there is no way to re-open
it with other flags with openat/openat2. As a workaround it is possible
to open it via /proc/self/fd/<X>, however
1) You need to ensure that /proc exists
2) You cannot use O_NOFOLLOW flag

Both problems may look insignificant, but they are sensitive for CRIU.
First of all, procfs may not be mounted in the namespace where we are
restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
file with this flag during restore.

This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
struct open_how and changes getname() call to getname_flags() to avoid
ENOENT for empty filenames.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---

Why does even CRIU needs to reopen O_PATH files?
Long story short: to support restoring opened files that are overmounted
with single file bindmounts.
In-depth explanation: when restoring mount tree, before doing mount()
call, CRIU opens mountpoint with O_PATH and saves this fd for later use
for each mount. If we need to restore overmounted file, we look at the
mount which overmounts file mount and use its saved mountpoint fd in
openat(<saved_fd>, <relative_path>, flags).
If we need to open an overmounted mountpoint directory itself, we can use
openat(<saved_fd>, ".", flags). However, if we have a bindmount, its
mountpoint is a regular file. Therefore to open it we need to be able to
reopen O_PATH descriptor. As I mentioned above, procfs workaround is
possible but imposes several restrictions. Not to mention a hussle with
/proc.

Important note: the algorithm above relies on Virtozzo CRIU "mount-v2"
engine, which is currently being prepared for mainstream CRIU.
This patch ensures that CRIU will support all kinds of overmounted files.

 fs/open.c                    | 4 +++-
 include/linux/fcntl.h        | 2 +-
 include/uapi/linux/openat2.h | 2 ++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index f732fb9..cfde988 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1131,6 +1131,8 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 			return -EAGAIN;
 		lookup_flags |= LOOKUP_CACHED;
 	}
+	if (how->resolve & RESOLVE_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
 
 	op->lookup_flags = lookup_flags;
 	return 0;
@@ -1203,7 +1205,7 @@ static long do_sys_openat2(int dfd, const char __user *filename,
 	if (fd)
 		return fd;
 
-	tmp = getname(filename);
+	tmp = getname_flags(filename, op.lookup_flags, 0);
 	if (IS_ERR(tmp))
 		return PTR_ERR(tmp);
 
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index a332e79..eabc7a8 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -15,7 +15,7 @@
 /* List of all valid flags for the how->resolve argument: */
 #define VALID_RESOLVE_FLAGS \
 	(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
-	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_CACHED)
+	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_CACHED | RESOLVE_EMPTY_PATH)
 
 /* List of all open_how "versions". */
 #define OPEN_HOW_SIZE_VER0	24 /* sizeof first published struct */
diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
index a5feb76..a42cf88 100644
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -39,5 +39,7 @@ struct open_how {
 					completed through cached lookup. May
 					return -EAGAIN if that's not
 					possible. */
+#define RESOLVE_EMPTY_PATH	0x40 /* If pathname is an empty string, open
+					the file referred by dirfd */
 
 #endif /* _UAPI_LINUX_OPENAT2_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12  9:02 [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2 Andrey Zhadchenko
@ 2022-01-12 14:34 ` Aleksa Sarai
  2022-01-12 14:51   ` Christian Brauner
  2022-01-12 14:39 ` Christian Brauner
  1 sibling, 1 reply; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-12 14:34 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: linux-fsdevel, viro, christian.brauner, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 5420 bytes --]

On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> If you have an opened O_PATH file, currently there is no way to re-open
> it with other flags with openat/openat2. As a workaround it is possible
> to open it via /proc/self/fd/<X>, however
> 1) You need to ensure that /proc exists
> 2) You cannot use O_NOFOLLOW flag

There is also another issue -- you can mount on top of magic-links so if
you're a container runtime that has been tricked into creating bad
mounts of top of /proc/ subdirectories there's no way of detecting that
this has happened. (Though I think in the long-term we will need to
make it possible for unprivileged users to create a procfs mountfd if
they have hidepid=4,subset=pids set -- there are loads of things
containers need to touch in procfs which can be overmounted in malicious
ways.)

> Both problems may look insignificant, but they are sensitive for CRIU.
> First of all, procfs may not be mounted in the namespace where we are
> restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> file with this flag during restore.
> 
> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> struct open_how and changes getname() call to getname_flags() to avoid
> ENOENT for empty filenames.

This is something I've wanted to implement for a while, but from memory
we need to add some other protections in place before enabling this.

The main one is disallowing re-opening of a path when it was originally
opened with a different set of modes. [1] is the patch I originally
wrote as part of the openat2(2) (but I dropped it since I wasn't sure
whether it might break some systems in subtle ways -- though according
to my testing there wasn't an issue on any of my machines).

I'd can try to revive that patchset if you'd like. Being able to re-open
paths without going through procfs is something I've wanted to finish up
for a while, so thanks for sending this patch. :D

[1]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/

> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> ---
> 
> Why does even CRIU needs to reopen O_PATH files?
> Long story short: to support restoring opened files that are overmounted
> with single file bindmounts.
> In-depth explanation: when restoring mount tree, before doing mount()
> call, CRIU opens mountpoint with O_PATH and saves this fd for later use
> for each mount. If we need to restore overmounted file, we look at the
> mount which overmounts file mount and use its saved mountpoint fd in
> openat(<saved_fd>, <relative_path>, flags).
> If we need to open an overmounted mountpoint directory itself, we can use
> openat(<saved_fd>, ".", flags). However, if we have a bindmount, its
> mountpoint is a regular file. Therefore to open it we need to be able to
> reopen O_PATH descriptor. As I mentioned above, procfs workaround is
> possible but imposes several restrictions. Not to mention a hussle with
> /proc.
> 
> Important note: the algorithm above relies on Virtozzo CRIU "mount-v2"
> engine, which is currently being prepared for mainstream CRIU.
> This patch ensures that CRIU will support all kinds of overmounted files.
> 
>  fs/open.c                    | 4 +++-
>  include/linux/fcntl.h        | 2 +-
>  include/uapi/linux/openat2.h | 2 ++
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index f732fb9..cfde988 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1131,6 +1131,8 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
>  			return -EAGAIN;
>  		lookup_flags |= LOOKUP_CACHED;
>  	}
> +	if (how->resolve & RESOLVE_EMPTY_PATH)
> +		lookup_flags |= LOOKUP_EMPTY;
>  
>  	op->lookup_flags = lookup_flags;
>  	return 0;
> @@ -1203,7 +1205,7 @@ static long do_sys_openat2(int dfd, const char __user *filename,
>  	if (fd)
>  		return fd;
>  
> -	tmp = getname(filename);
> +	tmp = getname_flags(filename, op.lookup_flags, 0);
>  	if (IS_ERR(tmp))
>  		return PTR_ERR(tmp);
>  
> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> index a332e79..eabc7a8 100644
> --- a/include/linux/fcntl.h
> +++ b/include/linux/fcntl.h
> @@ -15,7 +15,7 @@
>  /* List of all valid flags for the how->resolve argument: */
>  #define VALID_RESOLVE_FLAGS \
>  	(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
> -	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_CACHED)
> +	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_CACHED | RESOLVE_EMPTY_PATH)
>  
>  /* List of all open_how "versions". */
>  #define OPEN_HOW_SIZE_VER0	24 /* sizeof first published struct */
> diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
> index a5feb76..a42cf88 100644
> --- a/include/uapi/linux/openat2.h
> +++ b/include/uapi/linux/openat2.h
> @@ -39,5 +39,7 @@ struct open_how {
>  					completed through cached lookup. May
>  					return -EAGAIN if that's not
>  					possible. */
> +#define RESOLVE_EMPTY_PATH	0x40 /* If pathname is an empty string, open
> +					the file referred by dirfd */
>  
>  #endif /* _UAPI_LINUX_OPENAT2_H */
> -- 
> 1.8.3.1
> 

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12  9:02 [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2 Andrey Zhadchenko
  2022-01-12 14:34 ` Aleksa Sarai
@ 2022-01-12 14:39 ` Christian Brauner
  2022-01-12 14:43   ` Aleksa Sarai
  1 sibling, 1 reply; 18+ messages in thread
From: Christian Brauner @ 2022-01-12 14:39 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: linux-fsdevel, cyphar, viro, ptikhomirov, linux-api

On Wed, Jan 12, 2022 at 12:02:17PM +0300, Andrey Zhadchenko wrote:
> If you have an opened O_PATH file, currently there is no way to re-open
> it with other flags with openat/openat2. As a workaround it is possible
> to open it via /proc/self/fd/<X>, however
> 1) You need to ensure that /proc exists
> 2) You cannot use O_NOFOLLOW flag
> 
> Both problems may look insignificant, but they are sensitive for CRIU.

Not just CRIU. It's also an issue for systemd, LXD, and other users.
(One old example is where we do need to sometimes stash an O_PATH fd to
a /dev/pts/ptmx device and to actually perform an open on the device we
reopen via /proc/<pid>/fd/<nr>.)

> First of all, procfs may not be mounted in the namespace where we are
> restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> file with this flag during restore.
> 
> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> struct open_how and changes getname() call to getname_flags() to avoid
> ENOENT for empty filenames.

From my perspective this makes sense and is something that would be
very useful instead of having to hack around this via procfs.

However, e should consider adding RESOLVE_EMPTY_PATH since we already
have AT_EMPTY_PATH. If we think this is workable we should try and reuse
AT_EMPTY_PATH that keeps the api consistent with linkat(), readlinkat(),
execveat(), statx(), open_tree(), mount_setattr() etc.

If AT_EMPTY_PATH doesn't conflict with another O_* flag one could make
openat() support it too?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 14:39 ` Christian Brauner
@ 2022-01-12 14:43   ` Aleksa Sarai
  2022-01-12 14:53     ` Christian Brauner
  0 siblings, 1 reply; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-12 14:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrey Zhadchenko, linux-fsdevel, viro, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

On 2022-01-12, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> On Wed, Jan 12, 2022 at 12:02:17PM +0300, Andrey Zhadchenko wrote:
> > If you have an opened O_PATH file, currently there is no way to re-open
> > it with other flags with openat/openat2. As a workaround it is possible
> > to open it via /proc/self/fd/<X>, however
> > 1) You need to ensure that /proc exists
> > 2) You cannot use O_NOFOLLOW flag
> > 
> > Both problems may look insignificant, but they are sensitive for CRIU.
> 
> Not just CRIU. It's also an issue for systemd, LXD, and other users.
> (One old example is where we do need to sometimes stash an O_PATH fd to
> a /dev/pts/ptmx device and to actually perform an open on the device we
> reopen via /proc/<pid>/fd/<nr>.)
> 
> > First of all, procfs may not be mounted in the namespace where we are
> > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > file with this flag during restore.
> > 
> > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > struct open_how and changes getname() call to getname_flags() to avoid
> > ENOENT for empty filenames.
> 
> From my perspective this makes sense and is something that would be
> very useful instead of having to hack around this via procfs.
> 
> However, e should consider adding RESOLVE_EMPTY_PATH since we already
> have AT_EMPTY_PATH. If we think this is workable we should try and reuse
> AT_EMPTY_PATH that keeps the api consistent with linkat(), readlinkat(),
> execveat(), statx(), open_tree(), mount_setattr() etc.
> 
> If AT_EMPTY_PATH doesn't conflict with another O_* flag one could make
> openat() support it too?

I would much prefer O_EMPTYPATH, in fact I think this is what I called
it in my first draft ages ago. RESOLVE_ is meant to be related to
resolution restrictions, not changing the opening mode.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 14:34 ` Aleksa Sarai
@ 2022-01-12 14:51   ` Christian Brauner
  2022-01-12 18:56     ` Andrey Zhadchenko
  2022-01-13  6:55     ` Aleksa Sarai
  0 siblings, 2 replies; 18+ messages in thread
From: Christian Brauner @ 2022-01-12 14:51 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Andrey Zhadchenko, linux-fsdevel, viro, ptikhomirov, linux-api

On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > If you have an opened O_PATH file, currently there is no way to re-open
> > it with other flags with openat/openat2. As a workaround it is possible
> > to open it via /proc/self/fd/<X>, however
> > 1) You need to ensure that /proc exists
> > 2) You cannot use O_NOFOLLOW flag
> 
> There is also another issue -- you can mount on top of magic-links so if
> you're a container runtime that has been tricked into creating bad
> mounts of top of /proc/ subdirectories there's no way of detecting that
> this has happened. (Though I think in the long-term we will need to
> make it possible for unprivileged users to create a procfs mountfd if
> they have hidepid=4,subset=pids set -- there are loads of things
> containers need to touch in procfs which can be overmounted in malicious
> ways.)

Yeah, though I see this as a less pressing issue for now. I'd rather
postpone this and make userspace work a bit more. There are ways to
design programs so you know that the procfs instance you're interacting
with is the one you want to interact with without requiring unprivileged
mounting outside of a userns+pidns+mountns pair. ;)

> 
> > Both problems may look insignificant, but they are sensitive for CRIU.
> > First of all, procfs may not be mounted in the namespace where we are
> > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > file with this flag during restore.
> > 
> > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > struct open_how and changes getname() call to getname_flags() to avoid
> > ENOENT for empty filenames.
> 
> This is something I've wanted to implement for a while, but from memory
> we need to add some other protections in place before enabling this.
> 
> The main one is disallowing re-opening of a path when it was originally
> opened with a different set of modes. [1] is the patch I originally
> wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> whether it might break some systems in subtle ways -- though according
> to my testing there wasn't an issue on any of my machines).

Oh this is the discussion we had around turning an opath fd into a say
O_RDWR fd, I think.
So yes, I think restricting fd reopening makes sense. However, going
from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
be the default anyway. So we would need to implement this as a denylist
anyway. The default is that opath fds can be reopened with whatever and
only if the opath creator has restricted reopening will it fail, i.e.
it's similar to a denylist.

But this patch wouldn't prevent that or hinder the upgrade mask
restriction afaict.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 14:43   ` Aleksa Sarai
@ 2022-01-12 14:53     ` Christian Brauner
  2022-01-12 17:45       ` Andrey Zhadchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Christian Brauner @ 2022-01-12 14:53 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Andrey Zhadchenko, linux-fsdevel, viro, ptikhomirov, linux-api

On Thu, Jan 13, 2022 at 01:43:31AM +1100, Aleksa Sarai wrote:
> On 2022-01-12, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> > On Wed, Jan 12, 2022 at 12:02:17PM +0300, Andrey Zhadchenko wrote:
> > > If you have an opened O_PATH file, currently there is no way to re-open
> > > it with other flags with openat/openat2. As a workaround it is possible
> > > to open it via /proc/self/fd/<X>, however
> > > 1) You need to ensure that /proc exists
> > > 2) You cannot use O_NOFOLLOW flag
> > > 
> > > Both problems may look insignificant, but they are sensitive for CRIU.
> > 
> > Not just CRIU. It's also an issue for systemd, LXD, and other users.
> > (One old example is where we do need to sometimes stash an O_PATH fd to
> > a /dev/pts/ptmx device and to actually perform an open on the device we
> > reopen via /proc/<pid>/fd/<nr>.)
> > 
> > > First of all, procfs may not be mounted in the namespace where we are
> > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > file with this flag during restore.
> > > 
> > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > struct open_how and changes getname() call to getname_flags() to avoid
> > > ENOENT for empty filenames.
> > 
> > From my perspective this makes sense and is something that would be
> > very useful instead of having to hack around this via procfs.
> > 
> > However, e should consider adding RESOLVE_EMPTY_PATH since we already
> > have AT_EMPTY_PATH. If we think this is workable we should try and reuse
> > AT_EMPTY_PATH that keeps the api consistent with linkat(), readlinkat(),
> > execveat(), statx(), open_tree(), mount_setattr() etc.
> > 
> > If AT_EMPTY_PATH doesn't conflict with another O_* flag one could make
> > openat() support it too?
> 
> I would much prefer O_EMPTYPATH, in fact I think this is what I called
> it in my first draft ages ago. RESOLVE_ is meant to be related to
> resolution restrictions, not changing the opening mode.

That seems okay to me too. The advantage of AT_EMPTY_PATH is that we
don't double down on the naming confusion, imho.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 14:53     ` Christian Brauner
@ 2022-01-12 17:45       ` Andrey Zhadchenko
  2022-01-13  6:47         ` Aleksa Sarai
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Zhadchenko @ 2022-01-12 17:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Aleksa Sarai, linux-fsdevel, viro, ptikhomirov, linux-api



On 1/12/22 17:53, Christian Brauner wrote:
> On Thu, Jan 13, 2022 at 01:43:31AM +1100, Aleksa Sarai wrote:
>> On 2022-01-12, Christian Brauner <christian.brauner@ubuntu.com> wrote:
>>> On Wed, Jan 12, 2022 at 12:02:17PM +0300, Andrey Zhadchenko wrote:
>>>> If you have an opened O_PATH file, currently there is no way to re-open
>>>> it with other flags with openat/openat2. As a workaround it is possible
>>>> to open it via /proc/self/fd/<X>, however
>>>> 1) You need to ensure that /proc exists
>>>> 2) You cannot use O_NOFOLLOW flag
>>>>
>>>> Both problems may look insignificant, but they are sensitive for CRIU.
>>>
>>> Not just CRIU. It's also an issue for systemd, LXD, and other users.
>>> (One old example is where we do need to sometimes stash an O_PATH fd to
>>> a /dev/pts/ptmx device and to actually perform an open on the device we
>>> reopen via /proc/<pid>/fd/<nr>.)
>>>
>>>> First of all, procfs may not be mounted in the namespace where we are
>>>> restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
>>>> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
>>>> file with this flag during restore.
>>>>
>>>> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
>>>> struct open_how and changes getname() call to getname_flags() to avoid
>>>> ENOENT for empty filenames.
>>>
>>>  From my perspective this makes sense and is something that would be
>>> very useful instead of having to hack around this via procfs.
>>>
>>> However, e should consider adding RESOLVE_EMPTY_PATH since we already
>>> have AT_EMPTY_PATH. If we think this is workable we should try and reuse
>>> AT_EMPTY_PATH that keeps the api consistent with linkat(), readlinkat(),
>>> execveat(), statx(), open_tree(), mount_setattr() etc.
>>>
>>> If AT_EMPTY_PATH doesn't conflict with another O_* flag one could make
>>> openat() support it too?
>>
>> I would much prefer O_EMPTYPATH, in fact I think this is what I called
>> it in my first draft ages ago. RESOLVE_ is meant to be related to
>> resolution restrictions, not changing the opening mode.
> 
> That seems okay to me too. The advantage of AT_EMPTY_PATH is that we
> don't double down on the naming confusion, imho.
Unfortunately AT_EMPTY_PATH is 0x1000 which is O_DSYNC (octal 010000).
At first I thought to add new field in struct open_how for AT_* flags.
However most of them are irrelevant, except AT_SYMLINK_NOFOLLOW, which
duplicates RESOLVE flags, and maybe AT_NO_AUTOMOUNT.
O_EMPTYPATH idea seems cool

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 14:51   ` Christian Brauner
@ 2022-01-12 18:56     ` Andrey Zhadchenko
  2022-01-13  6:46       ` Aleksa Sarai
  2022-01-13  6:55     ` Aleksa Sarai
  1 sibling, 1 reply; 18+ messages in thread
From: Andrey Zhadchenko @ 2022-01-12 18:56 UTC (permalink / raw)
  To: Christian Brauner, Aleksa Sarai
  Cc: linux-fsdevel, viro, ptikhomirov, linux-api



On 1/12/22 17:51, Christian Brauner wrote:
> On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
>> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
>>> If you have an opened O_PATH file, currently there is no way to re-open
>>> it with other flags with openat/openat2. As a workaround it is possible
>>> to open it via /proc/self/fd/<X>, however
>>> 1) You need to ensure that /proc exists
>>> 2) You cannot use O_NOFOLLOW flag
>>
>> There is also another issue -- you can mount on top of magic-links so if
>> you're a container runtime that has been tricked into creating bad
>> mounts of top of /proc/ subdirectories there's no way of detecting that
>> this has happened. (Though I think in the long-term we will need to
>> make it possible for unprivileged users to create a procfs mountfd if
>> they have hidepid=4,subset=pids set -- there are loads of things
>> containers need to touch in procfs which can be overmounted in malicious
>> ways.)
> 
> Yeah, though I see this as a less pressing issue for now. I'd rather
> postpone this and make userspace work a bit more. There are ways to
> design programs so you know that the procfs instance you're interacting
> with is the one you want to interact with without requiring unprivileged
> mounting outside of a userns+pidns+mountns pair. ;)
> 
>>
>>> Both problems may look insignificant, but they are sensitive for CRIU.
>>> First of all, procfs may not be mounted in the namespace where we are
>>> restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
>>> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
>>> file with this flag during restore.
>>>
>>> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
>>> struct open_how and changes getname() call to getname_flags() to avoid
>>> ENOENT for empty filenames.
>>
>> This is something I've wanted to implement for a while, but from memory
>> we need to add some other protections in place before enabling this.
>>
>> The main one is disallowing re-opening of a path when it was originally
>> opened with a different set of modes. [1] is the patch I originally
I looked at this patch. However I am not able to reproduce the problem.
For example, I can't open /proc/self/exe as RDWR with the following:
fd1 = open(/proc/self/exe, O_PATH)
fd2 = open(/proc/self/fd/3, O_RDWR) <- error
or open file with incorrect flags via O_PATH to O_PATH fd from proc
This is fixed or did I understand this problem wrong?
>> wrote as part of the openat2(2) (but I dropped it since I wasn't sure
>> whether it might break some systems in subtle ways -- though according
>> to my testing there wasn't an issue on any of my machines).
> 
> Oh this is the discussion we had around turning an opath fd into a say
> O_RDWR fd, I think.
> So yes, I think restricting fd reopening makes sense. However, going
> from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
> be the default anyway. So we would need to implement this as a denylist
> anyway. The default is that opath fds can be reopened with whatever and
> only if the opath creator has restricted reopening will it fail, i.e.
> it's similar to a denylist.
> 
> But this patch wouldn't prevent that or hinder the upgrade mask
> restriction afaict.

This issue is actually more complicated than I thought.

What do you think of the following:
1. Add new O_EMPTYPATH constant
2. When we open something with O_PATH, remember access flags (currently
we drop all flags in do_dentry_open() for O_PATH fds). This is similar
to Aleksa Sarai idea, but I do not think we should add some new fields,
because CRIU needs to be able to see it. Just leave access flags
untouched.
3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
access flags against the ones we remembered for O_PATH fd

This won't solve magiclinks problems but there at least will be API to
avoid procfs and which allow to add some restrictions.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 18:56     ` Andrey Zhadchenko
@ 2022-01-13  6:46       ` Aleksa Sarai
  2022-01-13  7:52         ` Andrey Zhadchenko
  2022-01-13 14:05         ` Christian Brauner
  0 siblings, 2 replies; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-13  6:46 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: Christian Brauner, linux-fsdevel, viro, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 6746 bytes --]

On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> On 1/12/22 17:51, Christian Brauner wrote:
> > On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> > > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > > > If you have an opened O_PATH file, currently there is no way to re-open
> > > > it with other flags with openat/openat2. As a workaround it is possible
> > > > to open it via /proc/self/fd/<X>, however
> > > > 1) You need to ensure that /proc exists
> > > > 2) You cannot use O_NOFOLLOW flag
> > > 
> > > There is also another issue -- you can mount on top of magic-links so if
> > > you're a container runtime that has been tricked into creating bad
> > > mounts of top of /proc/ subdirectories there's no way of detecting that
> > > this has happened. (Though I think in the long-term we will need to
> > > make it possible for unprivileged users to create a procfs mountfd if
> > > they have hidepid=4,subset=pids set -- there are loads of things
> > > containers need to touch in procfs which can be overmounted in malicious
> > > ways.)
> > 
> > Yeah, though I see this as a less pressing issue for now. I'd rather
> > postpone this and make userspace work a bit more. There are ways to
> > design programs so you know that the procfs instance you're interacting
> > with is the one you want to interact with without requiring unprivileged
> > mounting outside of a userns+pidns+mountns pair. ;)
> > 
> > > 
> > > > Both problems may look insignificant, but they are sensitive for CRIU.
> > > > First of all, procfs may not be mounted in the namespace where we are
> > > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > > file with this flag during restore.
> > > > 
> > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > struct open_how and changes getname() call to getname_flags() to avoid
> > > > ENOENT for empty filenames.
> > > 
> > > This is something I've wanted to implement for a while, but from memory
> > > we need to add some other protections in place before enabling this.
> > > 
> > > The main one is disallowing re-opening of a path when it was originally
> > > opened with a different set of modes. [1] is the patch I originally
> I looked at this patch. However I am not able to reproduce the problem.
> For example, I can't open /proc/self/exe as RDWR with the following:
> fd1 = open(/proc/self/exe, O_PATH)
> fd2 = open(/proc/self/fd/3, O_RDWR) <- error
> or open file with incorrect flags via O_PATH to O_PATH fd from proc
> This is fixed or did I understand this problem wrong?

You will get -ETXTBSY because the /proc/self/exe is still a current->mm
of a process. What you need to do is have two processes (or fork+exec a
process and do this):

 1. Grab the /proc/$pid/exe handle of the target process.
 2. Wait for the target process to do an exec() of another program (or
    exit).
 3. *Then* re-open the fd with write permissions. This is allowed
    because the file is no longer being used as the current->mm of a
	process and thus is treated like a regular file handle even though
	it was only ever resolveable through /proc/self/exe which should
	(semantically) only ever be readable.

This attack was used against runc in 2016 and a similar attack was
possible with some later CVEs (I think there was also one against LXC at
some point but I might be mistaken). There were other bugs which lead to
this vector being usable, but my view is that this shouldn't have been
possible in the first place.

I can cook up a simple example if the above description isn't explaining
the issue thoroughly enough.

> > > wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> > > whether it might break some systems in subtle ways -- though according
> > > to my testing there wasn't an issue on any of my machines).
> > 
> > Oh this is the discussion we had around turning an opath fd into a say
> > O_RDWR fd, I think.
> > So yes, I think restricting fd reopening makes sense. However, going
> > from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
> > be the default anyway. So we would need to implement this as a denylist
> > anyway. The default is that opath fds can be reopened with whatever and
> > only if the opath creator has restricted reopening will it fail, i.e.
> > it's similar to a denylist.
> > 
> > But this patch wouldn't prevent that or hinder the upgrade mask
> > restriction afaict.
> 
> This issue is actually more complicated than I thought.
> 
> What do you think of the following:
> 1. Add new O_EMPTYPATH constant
> 2. When we open something with O_PATH, remember access flags (currently
> we drop all flags in do_dentry_open() for O_PATH fds). This is similar
> to Aleksa Sarai idea, but I do not think we should add some new fields,
> because CRIU needs to be able to see it. Just leave access flags
> untouched.

There are two problems with this:

 * The problem with this is that O_PATH and O_PATH|O_RDONLY are
   identical. O_RDONLY is defined as 0. I guess by new fields you're
   referring to what you'd get from fcntl(F_GETFL)?

   What you're suggesting here is the openat2() O_PATH access mask
   stuff. That is a feature I think would be useful, but it's not
   necessary to get O_EMPTYPATH working.

   If you really need to be able to get the O_PATH re-opening mask of a
   file descriptor (which you probably do for CRIU) we can add that
   information to F_GETFL or some other such interface.

 * We need to make sure that the default access modes of O_PATH on
   magic links are correct. We can't simply allow any access mode in
   that case, because if we do then we haven't really solved the
   /proc/self/exe issue.

> 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
> access flags against the ones we remembered for O_PATH fd

 * We also need to add the same restrictions for opening through
   /proc/self/fd/$n.

> This won't solve magiclinks problems but there at least will be API to
> avoid procfs and which allow to add some restrictions.

I think the magic link problems need to be solved if we're going to
enshrine this fd reopening behaviour by adding an O_* flag for it.
Though of course this is already an issue with /proc/self/fd/$n
re-opening.

However since I already have a patch which solves this issue, I can work
on reviving it and re-send it.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 17:45       ` Andrey Zhadchenko
@ 2022-01-13  6:47         ` Aleksa Sarai
  2022-01-13 10:33           ` Christian Brauner
  0 siblings, 1 reply; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-13  6:47 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: Christian Brauner, linux-fsdevel, viro, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 3141 bytes --]

On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> 
> 
> On 1/12/22 17:53, Christian Brauner wrote:
> > On Thu, Jan 13, 2022 at 01:43:31AM +1100, Aleksa Sarai wrote:
> > > On 2022-01-12, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> > > > On Wed, Jan 12, 2022 at 12:02:17PM +0300, Andrey Zhadchenko wrote:
> > > > > If you have an opened O_PATH file, currently there is no way to re-open
> > > > > it with other flags with openat/openat2. As a workaround it is possible
> > > > > to open it via /proc/self/fd/<X>, however
> > > > > 1) You need to ensure that /proc exists
> > > > > 2) You cannot use O_NOFOLLOW flag
> > > > > 
> > > > > Both problems may look insignificant, but they are sensitive for CRIU.
> > > > 
> > > > Not just CRIU. It's also an issue for systemd, LXD, and other users.
> > > > (One old example is where we do need to sometimes stash an O_PATH fd to
> > > > a /dev/pts/ptmx device and to actually perform an open on the device we
> > > > reopen via /proc/<pid>/fd/<nr>.)
> > > > 
> > > > > First of all, procfs may not be mounted in the namespace where we are
> > > > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > > > file with this flag during restore.
> > > > > 
> > > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > > struct open_how and changes getname() call to getname_flags() to avoid
> > > > > ENOENT for empty filenames.
> > > > 
> > > >  From my perspective this makes sense and is something that would be
> > > > very useful instead of having to hack around this via procfs.
> > > > 
> > > > However, e should consider adding RESOLVE_EMPTY_PATH since we already
> > > > have AT_EMPTY_PATH. If we think this is workable we should try and reuse
> > > > AT_EMPTY_PATH that keeps the api consistent with linkat(), readlinkat(),
> > > > execveat(), statx(), open_tree(), mount_setattr() etc.
> > > > 
> > > > If AT_EMPTY_PATH doesn't conflict with another O_* flag one could make
> > > > openat() support it too?
> > > 
> > > I would much prefer O_EMPTYPATH, in fact I think this is what I called
> > > it in my first draft ages ago. RESOLVE_ is meant to be related to
> > > resolution restrictions, not changing the opening mode.
> > 
> > That seems okay to me too. The advantage of AT_EMPTY_PATH is that we
> > don't double down on the naming confusion, imho.
> Unfortunately AT_EMPTY_PATH is 0x1000 which is O_DSYNC (octal 010000).
> At first I thought to add new field in struct open_how for AT_* flags.
> However most of them are irrelevant, except AT_SYMLINK_NOFOLLOW, which
> duplicates RESOLVE flags, and maybe AT_NO_AUTOMOUNT.
> O_EMPTYPATH idea seems cool

Yeah the issue is that openat/openat2 don't actually take AT_* flags and
all of the constants conflict. I would prefer not mixing O_ and AT_
flags in open (and I suspect Al would also prefer that).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-12 14:51   ` Christian Brauner
  2022-01-12 18:56     ` Andrey Zhadchenko
@ 2022-01-13  6:55     ` Aleksa Sarai
  1 sibling, 0 replies; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-13  6:55 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrey Zhadchenko, linux-fsdevel, viro, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 4595 bytes --]

On 2022-01-12, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > > If you have an opened O_PATH file, currently there is no way to re-open
> > > it with other flags with openat/openat2. As a workaround it is possible
> > > to open it via /proc/self/fd/<X>, however
> > > 1) You need to ensure that /proc exists
> > > 2) You cannot use O_NOFOLLOW flag
> > 
> > There is also another issue -- you can mount on top of magic-links so if
> > you're a container runtime that has been tricked into creating bad
> > mounts of top of /proc/ subdirectories there's no way of detecting that
> > this has happened. (Though I think in the long-term we will need to
> > make it possible for unprivileged users to create a procfs mountfd if
> > they have hidepid=4,subset=pids set -- there are loads of things
> > containers need to touch in procfs which can be overmounted in malicious
> > ways.)
> 
> Yeah, though I see this as a less pressing issue for now. I'd rather
> postpone this and make userspace work a bit more. There are ways to
> design programs so you know that the procfs instance you're interacting
> with is the one you want to interact with without requiring unprivileged
> mounting outside of a userns+pidns+mountns pair. ;)

I'm not sure I agree. While with openat2(RESOLVE_NO_XDEV|RESOLVE_NO_SYMLINKS)
you can access the vast majority of procfs through a checked procfs
handle (fstatfs and the other checks you can do), you cannot jump
through magic links safely because RESOLVE_NO_XDEV blocks magic-link
jumps.

You can't even use a safe handle to /proc/self/fd and then just follow
the link because you can mount magiclinks on top of magiclinks, so even
a hypothetical RESOLVE_ONLY_MAGICLINKS wouldn't really work. Maybe a
RESOLVE_ONLY_MAGICLINKS that didn't cross mounts except if the crossing
was through a magiclink would work but I suspect implementing that would
be tricky (there's loads of places where you might trip LOOKUP_NO_XDEV).

O_EMPTYPATH would solve this issue for the /proc/self/fd/... magiclinks,
but /proc/self/{exe,cwd,root,ns/*} are all still susceptible to the same
issue. We use /proc/self/exe in runc, and everyone uses /proc/self/ns/*.

But yeah we can definitely solve this in a separate patchseries, and
O_EMPTYPATH is something we should have regardless of whether we solve
the procfs issue another way.

> > > Both problems may look insignificant, but they are sensitive for CRIU.
> > > First of all, procfs may not be mounted in the namespace where we are
> > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > file with this flag during restore.
> > > 
> > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > struct open_how and changes getname() call to getname_flags() to avoid
> > > ENOENT for empty filenames.
> > 
> > This is something I've wanted to implement for a while, but from memory
> > we need to add some other protections in place before enabling this.
> > 
> > The main one is disallowing re-opening of a path when it was originally
> > opened with a different set of modes. [1] is the patch I originally
> > wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> > whether it might break some systems in subtle ways -- though according
> > to my testing there wasn't an issue on any of my machines).
> 
> Oh this is the discussion we had around turning an opath fd into a say
> O_RDWR fd, I think.
> So yes, I think restricting fd reopening makes sense. However, going
> from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
> be the default anyway. So we would need to implement this as a denylist
> anyway. The default is that opath fds can be reopened with whatever and
> only if the opath creator has restricted reopening will it fail, i.e.
> it's similar to a denylist.
> 
> But this patch wouldn't prevent that or hinder the upgrade mask
> restriction afaict.

Yeah the patch I linked implements the behaviour you mentioned (O_PATH
of a regular path lets you re-open with any mode, O_PATH of an O_PATH
copies the permitted re-opening modes of the old O_PATH, and O_PATH of a
magic link copies the file mode of the magic link).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-13  6:46       ` Aleksa Sarai
@ 2022-01-13  7:52         ` Andrey Zhadchenko
  2022-01-14  4:24           ` Andrey Zhadchenko
  2022-01-13 14:05         ` Christian Brauner
  1 sibling, 1 reply; 18+ messages in thread
From: Andrey Zhadchenko @ 2022-01-13  7:52 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Christian Brauner, linux-fsdevel, viro, ptikhomirov, linux-api



On 1/13/22 09:46, Aleksa Sarai wrote:
> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
>> On 1/12/22 17:51, Christian Brauner wrote:
>>> On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
>>>> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
>>>>> If you have an opened O_PATH file, currently there is no way to re-open
>>>>> it with other flags with openat/openat2. As a workaround it is possible
>>>>> to open it via /proc/self/fd/<X>, however
>>>>> 1) You need to ensure that /proc exists
>>>>> 2) You cannot use O_NOFOLLOW flag
>>>>
>>>> There is also another issue -- you can mount on top of magic-links so if
>>>> you're a container runtime that has been tricked into creating bad
>>>> mounts of top of /proc/ subdirectories there's no way of detecting that
>>>> this has happened. (Though I think in the long-term we will need to
>>>> make it possible for unprivileged users to create a procfs mountfd if
>>>> they have hidepid=4,subset=pids set -- there are loads of things
>>>> containers need to touch in procfs which can be overmounted in malicious
>>>> ways.)
>>>
>>> Yeah, though I see this as a less pressing issue for now. I'd rather
>>> postpone this and make userspace work a bit more. There are ways to
>>> design programs so you know that the procfs instance you're interacting
>>> with is the one you want to interact with without requiring unprivileged
>>> mounting outside of a userns+pidns+mountns pair. ;)
>>>
>>>>
>>>>> Both problems may look insignificant, but they are sensitive for CRIU.
>>>>> First of all, procfs may not be mounted in the namespace where we are
>>>>> restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
>>>>> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
>>>>> file with this flag during restore.
>>>>>
>>>>> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
>>>>> struct open_how and changes getname() call to getname_flags() to avoid
>>>>> ENOENT for empty filenames.
>>>>
>>>> This is something I've wanted to implement for a while, but from memory
>>>> we need to add some other protections in place before enabling this.
>>>>
>>>> The main one is disallowing re-opening of a path when it was originally
>>>> opened with a different set of modes. [1] is the patch I originally
>> I looked at this patch. However I am not able to reproduce the problem.
>> For example, I can't open /proc/self/exe as RDWR with the following:
>> fd1 = open(/proc/self/exe, O_PATH)
>> fd2 = open(/proc/self/fd/3, O_RDWR) <- error
>> or open file with incorrect flags via O_PATH to O_PATH fd from proc
>> This is fixed or did I understand this problem wrong?
> 
> You will get -ETXTBSY because the /proc/self/exe is still a current->mm
> of a process. What you need to do is have two processes (or fork+exec a
> process and do this):
> 
>   1. Grab the /proc/$pid/exe handle of the target process.
>   2. Wait for the target process to do an exec() of another program (or
>      exit).
>   3. *Then* re-open the fd with write permissions. This is allowed
>      because the file is no longer being used as the current->mm of a
> 	process and thus is treated like a regular file handle even though
> 	it was only ever resolveable through /proc/self/exe which should
> 	(semantically) only ever be readable.
> 
> This attack was used against runc in 2016 and a similar attack was
> possible with some later CVEs (I think there was also one against LXC at
> some point but I might be mistaken). There were other bugs which lead to
> this vector being usable, but my view is that this shouldn't have been
> possible in the first place.
> 
> I can cook up a simple example if the above description isn't explaining
> the issue thoroughly enough.
> 

Thanks for the explanation! I get it now

>>>> wrote as part of the openat2(2) (but I dropped it since I wasn't sure
>>>> whether it might break some systems in subtle ways -- though according
>>>> to my testing there wasn't an issue on any of my machines).
>>>
>>> Oh this is the discussion we had around turning an opath fd into a say
>>> O_RDWR fd, I think.
>>> So yes, I think restricting fd reopening makes sense. However, going
>>> from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
>>> be the default anyway. So we would need to implement this as a denylist
>>> anyway. The default is that opath fds can be reopened with whatever and
>>> only if the opath creator has restricted reopening will it fail, i.e.
>>> it's similar to a denylist.
>>>
>>> But this patch wouldn't prevent that or hinder the upgrade mask
>>> restriction afaict.
>>
>> This issue is actually more complicated than I thought.
>>
>> What do you think of the following:
>> 1. Add new O_EMPTYPATH constant
>> 2. When we open something with O_PATH, remember access flags (currently
>> we drop all flags in do_dentry_open() for O_PATH fds). This is similar
>> to Aleksa Sarai idea, but I do not think we should add some new fields,
>> because CRIU needs to be able to see it. Just leave access flags
>> untouched.
> 
> There are two problems with this:
> 
>   * The problem with this is that O_PATH and O_PATH|O_RDONLY are
>     identical. O_RDONLY is defined as 0. I guess by new fields you're

Yes, I didn't thought about that.

>     referring to what you'd get from fcntl(F_GETFL)?
> 
>     What you're suggesting here is the openat2() O_PATH access mask
>     stuff. That is a feature I think would be useful, but it's not
>     necessary to get O_EMPTYPATH working.
> 
>     If you really need to be able to get the O_PATH re-opening mask of a
>     file descriptor (which you probably do for CRIU) we can add that
>     information to F_GETFL or some other such interface.

That would be cool. In the patch I saw new FMODE_PATH_READ and
MODE_PATH_WRITE but there was no option to dump it.

> 
>   * We need to make sure that the default access modes of O_PATH on
>     magic links are correct. We can't simply allow any access mode in
>     that case, because if we do then we haven't really solved the
>     /proc/self/exe issue.
> 
>> 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
>> access flags against the ones we remembered for O_PATH fd
> 
>   * We also need to add the same restrictions for opening through
>     /proc/self/fd/$n.
> 
>> This won't solve magiclinks problems but there at least will be API to
>> avoid procfs and which allow to add some restrictions.
> 
> I think the magic link problems need to be solved if we're going to
> enshrine this fd reopening behaviour by adding an O_* flag for it.
> Though of course this is already an issue with /proc/self/fd/$n
> re-opening.

I think these issues are close but still different. Probably we can make
three ideas from this discussion.
1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
really a new feature (since we can already do it via /proc for most
cases). And also this won't break anything.
2. Add modes for magiclinks. This is more restrictive change. However I
don't think any non-malicious programs will do procfs shenanigans and
will be affected by this changes. This is the patch you sent some time
ago
3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
make it apply if someone tries to do /proc workaround with this exact
O_PATH fd

> 
> However since I already have a patch which solves this issue, I can work
> on reviving it and re-send it.
>

Why not if it only makes it better

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-13  6:47         ` Aleksa Sarai
@ 2022-01-13 10:33           ` Christian Brauner
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Brauner @ 2022-01-13 10:33 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Andrey Zhadchenko, Christian Brauner, linux-fsdevel, viro,
	ptikhomirov, linux-api

On Thu, Jan 13, 2022 at 05:47:51PM +1100, Aleksa Sarai wrote:
> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > 
> > 
> > On 1/12/22 17:53, Christian Brauner wrote:
> > > On Thu, Jan 13, 2022 at 01:43:31AM +1100, Aleksa Sarai wrote:
> > > > On 2022-01-12, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> > > > > On Wed, Jan 12, 2022 at 12:02:17PM +0300, Andrey Zhadchenko wrote:
> > > > > > If you have an opened O_PATH file, currently there is no way to re-open
> > > > > > it with other flags with openat/openat2. As a workaround it is possible
> > > > > > to open it via /proc/self/fd/<X>, however
> > > > > > 1) You need to ensure that /proc exists
> > > > > > 2) You cannot use O_NOFOLLOW flag
> > > > > > 
> > > > > > Both problems may look insignificant, but they are sensitive for CRIU.
> > > > > 
> > > > > Not just CRIU. It's also an issue for systemd, LXD, and other users.
> > > > > (One old example is where we do need to sometimes stash an O_PATH fd to
> > > > > a /dev/pts/ptmx device and to actually perform an open on the device we
> > > > > reopen via /proc/<pid>/fd/<nr>.)
> > > > > 
> > > > > > First of all, procfs may not be mounted in the namespace where we are
> > > > > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > > > > file with this flag during restore.
> > > > > > 
> > > > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > > > struct open_how and changes getname() call to getname_flags() to avoid
> > > > > > ENOENT for empty filenames.
> > > > > 
> > > > >  From my perspective this makes sense and is something that would be
> > > > > very useful instead of having to hack around this via procfs.
> > > > > 
> > > > > However, e should consider adding RESOLVE_EMPTY_PATH since we already
> > > > > have AT_EMPTY_PATH. If we think this is workable we should try and reuse
> > > > > AT_EMPTY_PATH that keeps the api consistent with linkat(), readlinkat(),
> > > > > execveat(), statx(), open_tree(), mount_setattr() etc.
> > > > > 
> > > > > If AT_EMPTY_PATH doesn't conflict with another O_* flag one could make
> > > > > openat() support it too?
> > > > 
> > > > I would much prefer O_EMPTYPATH, in fact I think this is what I called
> > > > it in my first draft ages ago. RESOLVE_ is meant to be related to
> > > > resolution restrictions, not changing the opening mode.
> > > 
> > > That seems okay to me too. The advantage of AT_EMPTY_PATH is that we
> > > don't double down on the naming confusion, imho.
> > Unfortunately AT_EMPTY_PATH is 0x1000 which is O_DSYNC (octal 010000).
> > At first I thought to add new field in struct open_how for AT_* flags.
> > However most of them are irrelevant, except AT_SYMLINK_NOFOLLOW, which
> > duplicates RESOLVE flags, and maybe AT_NO_AUTOMOUNT.
> > O_EMPTYPATH idea seems cool
> 
> Yeah the issue is that openat/openat2 don't actually take AT_* flags and
> all of the constants conflict. I would prefer not mixing O_ and AT_
> flags in open (and I suspect Al would also prefer that).

If we can't reuse the value then it's not that important. But then we
should probably consider adding O_EMPTYPATH indeed. It doesn't make much
sense as a resolve flag (I think you mentioned that in an earlier mail
too.).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-13  6:46       ` Aleksa Sarai
  2022-01-13  7:52         ` Andrey Zhadchenko
@ 2022-01-13 14:05         ` Christian Brauner
  2022-01-13 14:44           ` Aleksa Sarai
  1 sibling, 1 reply; 18+ messages in thread
From: Christian Brauner @ 2022-01-13 14:05 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Andrey Zhadchenko, linux-fsdevel, viro, ptikhomirov, linux-api

On Thu, Jan 13, 2022 at 05:46:43PM +1100, Aleksa Sarai wrote:
> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > On 1/12/22 17:51, Christian Brauner wrote:
> > > On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> > > > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > > > > If you have an opened O_PATH file, currently there is no way to re-open
> > > > > it with other flags with openat/openat2. As a workaround it is possible
> > > > > to open it via /proc/self/fd/<X>, however
> > > > > 1) You need to ensure that /proc exists
> > > > > 2) You cannot use O_NOFOLLOW flag
> > > > 
> > > > There is also another issue -- you can mount on top of magic-links so if
> > > > you're a container runtime that has been tricked into creating bad
> > > > mounts of top of /proc/ subdirectories there's no way of detecting that
> > > > this has happened. (Though I think in the long-term we will need to
> > > > make it possible for unprivileged users to create a procfs mountfd if
> > > > they have hidepid=4,subset=pids set -- there are loads of things
> > > > containers need to touch in procfs which can be overmounted in malicious
> > > > ways.)
> > > 
> > > Yeah, though I see this as a less pressing issue for now. I'd rather
> > > postpone this and make userspace work a bit more. There are ways to
> > > design programs so you know that the procfs instance you're interacting
> > > with is the one you want to interact with without requiring unprivileged
> > > mounting outside of a userns+pidns+mountns pair. ;)
> > > 
> > > > 
> > > > > Both problems may look insignificant, but they are sensitive for CRIU.
> > > > > First of all, procfs may not be mounted in the namespace where we are
> > > > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > > > file with this flag during restore.
> > > > > 
> > > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > > struct open_how and changes getname() call to getname_flags() to avoid
> > > > > ENOENT for empty filenames.
> > > > 
> > > > This is something I've wanted to implement for a while, but from memory
> > > > we need to add some other protections in place before enabling this.
> > > > 
> > > > The main one is disallowing re-opening of a path when it was originally
> > > > opened with a different set of modes. [1] is the patch I originally
> > I looked at this patch. However I am not able to reproduce the problem.
> > For example, I can't open /proc/self/exe as RDWR with the following:
> > fd1 = open(/proc/self/exe, O_PATH)
> > fd2 = open(/proc/self/fd/3, O_RDWR) <- error
> > or open file with incorrect flags via O_PATH to O_PATH fd from proc
> > This is fixed or did I understand this problem wrong?
> 
> You will get -ETXTBSY because the /proc/self/exe is still a current->mm
> of a process. What you need to do is have two processes (or fork+exec a
> process and do this):

Note that not too long ago someone proposed to remove the -ETXTBSY
restriction and I argued against doing that in order to not make these
attacks easier.

> 
>  1. Grab the /proc/$pid/exe handle of the target process.
>  2. Wait for the target process to do an exec() of another program (or
>     exit).
>  3. *Then* re-open the fd with write permissions. This is allowed
>     because the file is no longer being used as the current->mm of a
> 	process and thus is treated like a regular file handle even though
> 	it was only ever resolveable through /proc/self/exe which should
> 	(semantically) only ever be readable.
> 
> This attack was used against runc in 2016 and a similar attack was
> possible with some later CVEs (I think there was also one against LXC at
> some point but I might be mistaken). There were other bugs which lead to

(IIrc, it only affects privileged containers and we did write the fix for this
together.)

> this vector being usable, but my view is that this shouldn't have been
> possible in the first place.
> 
> I can cook up a simple example if the above description isn't explaining
> the issue thoroughly enough.
> 
> > > > wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> > > > whether it might break some systems in subtle ways -- though according
> > > > to my testing there wasn't an issue on any of my machines).
> > > 
> > > Oh this is the discussion we had around turning an opath fd into a say
> > > O_RDWR fd, I think.
> > > So yes, I think restricting fd reopening makes sense. However, going
> > > from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
> > > be the default anyway. So we would need to implement this as a denylist
> > > anyway. The default is that opath fds can be reopened with whatever and
> > > only if the opath creator has restricted reopening will it fail, i.e.
> > > it's similar to a denylist.
> > > 
> > > But this patch wouldn't prevent that or hinder the upgrade mask
> > > restriction afaict.
> > 
> > This issue is actually more complicated than I thought.
> > 
> > What do you think of the following:
> > 1. Add new O_EMPTYPATH constant
> > 2. When we open something with O_PATH, remember access flags (currently
> > we drop all flags in do_dentry_open() for O_PATH fds). This is similar
> > to Aleksa Sarai idea, but I do not think we should add some new fields,
> > because CRIU needs to be able to see it. Just leave access flags
> > untouched.
> 
> There are two problems with this:
> 
>  * The problem with this is that O_PATH and O_PATH|O_RDONLY are
>    identical. O_RDONLY is defined as 0. I guess by new fields you're
>    referring to what you'd get from fcntl(F_GETFL)?
> 
>    What you're suggesting here is the openat2() O_PATH access mask
>    stuff. That is a feature I think would be useful, but it's not
>    necessary to get O_EMPTYPATH working.

Yes, that's crucial to notice. I don't think we need to make the
patchsets dependent on each other which is what I mentioned in my
earlier mail.

> 
>    If you really need to be able to get the O_PATH re-opening mask of a
>    file descriptor (which you probably do for CRIU) we can add that
>    information to F_GETFL or some other such interface.

fcntl() would certainly be a sensible choice for that.

> 
>  * We need to make sure that the default access modes of O_PATH on
>    magic links are correct. We can't simply allow any access mode in
>    that case, because if we do then we haven't really solved the
>    /proc/self/exe issue.

Or alternatively we make O_EMPTYPATH not work on magic links.

> 
> > 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
> > access flags against the ones we remembered for O_PATH fd
> 
>  * We also need to add the same restrictions for opening through
>    /proc/self/fd/$n.
> 
> > This won't solve magiclinks problems but there at least will be API to
> > avoid procfs and which allow to add some restrictions.
> 
> I think the magic link problems need to be solved if we're going to
> enshrine this fd reopening behaviour by adding an O_* flag for it.

As I understand it there are two naive options:
1. We do add O_EMPTYPATH before introducing upgrade masks. In this case
   O_EMPTYPATH would be unrestricted by default. Meaning, you can go
   from an O_PATH fd to an fd with any access mask.
   So after the introduction of upgrade masks, O_EMPTYPATH is restricted
   iff the O_PATH fd has opened with an upgrade mask.
2. We do introduce upgrade masks before introducing O_EMPTYPATH.
   Iiuc, we could then introduce O_EMPTYPATH in a more restricted
   manner such that O_EMPTYPATH will be restricted by default,
   i.e. it wouldn't allow reopening an O_PATH fd _unless_ an upgrade
   mask has been specified. The problem with this approach is that
   /proc/self/fd/$n would break that logic and we can't change that
   behavior without regressions.
   Additionally, this might  make O_EMPTYPATH less useful for some users
   in case they are not in control of the O_PATH fd handed to them
   and the opener has not opened the O_PATH fd with an upgrade mask.

So I think 2. is out of the question which means that we can add
O_EMPTYPATH first or add upgrade masks first or together; it wouldn't
really matter afaict.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-13 14:05         ` Christian Brauner
@ 2022-01-13 14:44           ` Aleksa Sarai
  0 siblings, 0 replies; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-13 14:44 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrey Zhadchenko, linux-fsdevel, viro, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 9411 bytes --]

On 2022-01-13, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> On Thu, Jan 13, 2022 at 05:46:43PM +1100, Aleksa Sarai wrote:
> > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > > On 1/12/22 17:51, Christian Brauner wrote:
> > > > On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> > > > > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > > > > > If you have an opened O_PATH file, currently there is no way to re-open
> > > > > > it with other flags with openat/openat2. As a workaround it is possible
> > > > > > to open it via /proc/self/fd/<X>, however
> > > > > > 1) You need to ensure that /proc exists
> > > > > > 2) You cannot use O_NOFOLLOW flag
> > > > > 
> > > > > There is also another issue -- you can mount on top of magic-links so if
> > > > > you're a container runtime that has been tricked into creating bad
> > > > > mounts of top of /proc/ subdirectories there's no way of detecting that
> > > > > this has happened. (Though I think in the long-term we will need to
> > > > > make it possible for unprivileged users to create a procfs mountfd if
> > > > > they have hidepid=4,subset=pids set -- there are loads of things
> > > > > containers need to touch in procfs which can be overmounted in malicious
> > > > > ways.)
> > > > 
> > > > Yeah, though I see this as a less pressing issue for now. I'd rather
> > > > postpone this and make userspace work a bit more. There are ways to
> > > > design programs so you know that the procfs instance you're interacting
> > > > with is the one you want to interact with without requiring unprivileged
> > > > mounting outside of a userns+pidns+mountns pair. ;)
> > > > 
> > > > > 
> > > > > > Both problems may look insignificant, but they are sensitive for CRIU.
> > > > > > First of all, procfs may not be mounted in the namespace where we are
> > > > > > restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
> > > > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
> > > > > > file with this flag during restore.
> > > > > > 
> > > > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > > > struct open_how and changes getname() call to getname_flags() to avoid
> > > > > > ENOENT for empty filenames.
> > > > > 
> > > > > This is something I've wanted to implement for a while, but from memory
> > > > > we need to add some other protections in place before enabling this.
> > > > > 
> > > > > The main one is disallowing re-opening of a path when it was originally
> > > > > opened with a different set of modes. [1] is the patch I originally
> > > I looked at this patch. However I am not able to reproduce the problem.
> > > For example, I can't open /proc/self/exe as RDWR with the following:
> > > fd1 = open(/proc/self/exe, O_PATH)
> > > fd2 = open(/proc/self/fd/3, O_RDWR) <- error
> > > or open file with incorrect flags via O_PATH to O_PATH fd from proc
> > > This is fixed or did I understand this problem wrong?
> > 
> > You will get -ETXTBSY because the /proc/self/exe is still a current->mm
> > of a process. What you need to do is have two processes (or fork+exec a
> > process and do this):
> 
> Note that not too long ago someone proposed to remove the -ETXTBSY
> restriction and I argued against doing that in order to not make these
> attacks easier.
> 
> > 
> >  1. Grab the /proc/$pid/exe handle of the target process.
> >  2. Wait for the target process to do an exec() of another program (or
> >     exit).
> >  3. *Then* re-open the fd with write permissions. This is allowed
> >     because the file is no longer being used as the current->mm of a
> > 	process and thus is treated like a regular file handle even though
> > 	it was only ever resolveable through /proc/self/exe which should
> > 	(semantically) only ever be readable.
> > 
> > This attack was used against runc in 2016 and a similar attack was
> > possible with some later CVEs (I think there was also one against LXC at
> > some point but I might be mistaken). There were other bugs which lead to
> 
> (IIrc, it only affects privileged containers and we did write the fix for this
> together.)
> 
> > this vector being usable, but my view is that this shouldn't have been
> > possible in the first place.
> > 
> > I can cook up a simple example if the above description isn't explaining
> > the issue thoroughly enough.
> > 
> > > > > wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> > > > > whether it might break some systems in subtle ways -- though according
> > > > > to my testing there wasn't an issue on any of my machines).
> > > > 
> > > > Oh this is the discussion we had around turning an opath fd into a say
> > > > O_RDWR fd, I think.
> > > > So yes, I think restricting fd reopening makes sense. However, going
> > > > from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
> > > > be the default anyway. So we would need to implement this as a denylist
> > > > anyway. The default is that opath fds can be reopened with whatever and
> > > > only if the opath creator has restricted reopening will it fail, i.e.
> > > > it's similar to a denylist.
> > > > 
> > > > But this patch wouldn't prevent that or hinder the upgrade mask
> > > > restriction afaict.
> > > 
> > > This issue is actually more complicated than I thought.
> > > 
> > > What do you think of the following:
> > > 1. Add new O_EMPTYPATH constant
> > > 2. When we open something with O_PATH, remember access flags (currently
> > > we drop all flags in do_dentry_open() for O_PATH fds). This is similar
> > > to Aleksa Sarai idea, but I do not think we should add some new fields,
> > > because CRIU needs to be able to see it. Just leave access flags
> > > untouched.
> > 
> > There are two problems with this:
> > 
> >  * The problem with this is that O_PATH and O_PATH|O_RDONLY are
> >    identical. O_RDONLY is defined as 0. I guess by new fields you're
> >    referring to what you'd get from fcntl(F_GETFL)?
> > 
> >    What you're suggesting here is the openat2() O_PATH access mask
> >    stuff. That is a feature I think would be useful, but it's not
> >    necessary to get O_EMPTYPATH working.
> 
> Yes, that's crucial to notice. I don't think we need to make the
> patchsets dependent on each other which is what I mentioned in my
> earlier mail.
> 
> > 
> >    If you really need to be able to get the O_PATH re-opening mask of a
> >    file descriptor (which you probably do for CRIU) we can add that
> >    information to F_GETFL or some other such interface.
> 
> fcntl() would certainly be a sensible choice for that.
> 
> > 
> >  * We need to make sure that the default access modes of O_PATH on
> >    magic links are correct. We can't simply allow any access mode in
> >    that case, because if we do then we haven't really solved the
> >    /proc/self/exe issue.
> 
> Or alternatively we make O_EMPTYPATH not work on magic links.
> 
> > 
> > > 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
> > > access flags against the ones we remembered for O_PATH fd
> > 
> >  * We also need to add the same restrictions for opening through
> >    /proc/self/fd/$n.
> > 
> > > This won't solve magiclinks problems but there at least will be API to
> > > avoid procfs and which allow to add some restrictions.
> > 
> > I think the magic link problems need to be solved if we're going to
> > enshrine this fd reopening behaviour by adding an O_* flag for it.
> 
> As I understand it there are two naive options:
> 1. We do add O_EMPTYPATH before introducing upgrade masks. In this case
>    O_EMPTYPATH would be unrestricted by default. Meaning, you can go
>    from an O_PATH fd to an fd with any access mask.
>    So after the introduction of upgrade masks, O_EMPTYPATH is restricted
>    iff the O_PATH fd has opened with an upgrade mask.
> 2. We do introduce upgrade masks before introducing O_EMPTYPATH.
>    Iiuc, we could then introduce O_EMPTYPATH in a more restricted
>    manner such that O_EMPTYPATH will be restricted by default,
>    i.e. it wouldn't allow reopening an O_PATH fd _unless_ an upgrade
>    mask has been specified. The problem with this approach is that
>    /proc/self/fd/$n would break that logic and we can't change that
>    behavior without regressions.
>    Additionally, this might  make O_EMPTYPATH less useful for some users
>    in case they are not in control of the O_PATH fd handed to them
>    and the opener has not opened the O_PATH fd with an upgrade mask.
> 
> So I think 2. is out of the question which means that we can add
> O_EMPTYPATH first or add upgrade masks first or together; it wouldn't
> really matter afaict.

Yeah agreed. (1) was what I had planned originally. O_EMPTYPATH should
be identical semantically to /proc/self/fd/$n (both for the API's sanity
and our own). It also makes more sense to make upgrade_mask a deny list
because it makes it simpler to add new deny modes in the future (such as
denying exec permissions -- which isn't currently possible).

And yes, we should add upgrade_masks later.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-13  7:52         ` Andrey Zhadchenko
@ 2022-01-14  4:24           ` Andrey Zhadchenko
  2022-01-14  4:28             ` Aleksa Sarai
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Zhadchenko @ 2022-01-14  4:24 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Christian Brauner, linux-fsdevel, viro, ptikhomirov, linux-api



On 1/13/22 10:52, Andrey Zhadchenko wrote:
> 
> 
> On 1/13/22 09:46, Aleksa Sarai wrote:
>> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
>>> On 1/12/22 17:51, Christian Brauner wrote:
>>>> On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
>>>>> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> 
>>>>> wrote:
>>>>>> If you have an opened O_PATH file, currently there is no way to 
>>>>>> re-open
>>>>>> it with other flags with openat/openat2. As a workaround it is 
>>>>>> possible
>>>>>> to open it via /proc/self/fd/<X>, however
>>>>>> 1) You need to ensure that /proc exists
>>>>>> 2) You cannot use O_NOFOLLOW flag
>>>>>
>>>>> There is also another issue -- you can mount on top of magic-links 
>>>>> so if
>>>>> you're a container runtime that has been tricked into creating bad
>>>>> mounts of top of /proc/ subdirectories there's no way of detecting 
>>>>> that
>>>>> this has happened. (Though I think in the long-term we will need to
>>>>> make it possible for unprivileged users to create a procfs mountfd if
>>>>> they have hidepid=4,subset=pids set -- there are loads of things
>>>>> containers need to touch in procfs which can be overmounted in 
>>>>> malicious
>>>>> ways.)
>>>>
>>>> Yeah, though I see this as a less pressing issue for now. I'd rather
>>>> postpone this and make userspace work a bit more. There are ways to
>>>> design programs so you know that the procfs instance you're interacting
>>>> with is the one you want to interact with without requiring 
>>>> unprivileged
>>>> mounting outside of a userns+pidns+mountns pair. ;)
>>>>
>>>>>
>>>>>> Both problems may look insignificant, but they are sensitive for 
>>>>>> CRIU.
>>>>>> First of all, procfs may not be mounted in the namespace where we are
>>>>>> restoring the process. Secondly, if someone opens a file with 
>>>>>> O_NOFOLLOW
>>>>>> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also 
>>>>>> open the
>>>>>> file with this flag during restore.
>>>>>>
>>>>>> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
>>>>>> struct open_how and changes getname() call to getname_flags() to 
>>>>>> avoid
>>>>>> ENOENT for empty filenames.
>>>>>
>>>>> This is something I've wanted to implement for a while, but from 
>>>>> memory
>>>>> we need to add some other protections in place before enabling this.
>>>>>
>>>>> The main one is disallowing re-opening of a path when it was 
>>>>> originally
>>>>> opened with a different set of modes. [1] is the patch I originally
>>> I looked at this patch. However I am not able to reproduce the problem.
>>> For example, I can't open /proc/self/exe as RDWR with the following:
>>> fd1 = open(/proc/self/exe, O_PATH)
>>> fd2 = open(/proc/self/fd/3, O_RDWR) <- error
>>> or open file with incorrect flags via O_PATH to O_PATH fd from proc
>>> This is fixed or did I understand this problem wrong?
>>
>> You will get -ETXTBSY because the /proc/self/exe is still a current->mm
>> of a process. What you need to do is have two processes (or fork+exec a
>> process and do this):
>>
>>   1. Grab the /proc/$pid/exe handle of the target process.
>>   2. Wait for the target process to do an exec() of another program (or
>>      exit).
>>   3. *Then* re-open the fd with write permissions. This is allowed
>>      because the file is no longer being used as the current->mm of a
>>     process and thus is treated like a regular file handle even though
>>     it was only ever resolveable through /proc/self/exe which should
>>     (semantically) only ever be readable.
>>
>> This attack was used against runc in 2016 and a similar attack was
>> possible with some later CVEs (I think there was also one against LXC at
>> some point but I might be mistaken). There were other bugs which lead to
>> this vector being usable, but my view is that this shouldn't have been
>> possible in the first place.
>>
>> I can cook up a simple example if the above description isn't explaining
>> the issue thoroughly enough.
>>
> 
> Thanks for the explanation! I get it now
> 
>>>>> wrote as part of the openat2(2) (but I dropped it since I wasn't sure
>>>>> whether it might break some systems in subtle ways -- though according
>>>>> to my testing there wasn't an issue on any of my machines).
>>>>
>>>> Oh this is the discussion we had around turning an opath fd into a say
>>>> O_RDWR fd, I think.
>>>> So yes, I think restricting fd reopening makes sense. However, going
>>>> from an O_PATH fd to e.g. an fd with O_RDWR does make sense and 
>>>> needs to
>>>> be the default anyway. So we would need to implement this as a denylist
>>>> anyway. The default is that opath fds can be reopened with whatever and
>>>> only if the opath creator has restricted reopening will it fail, i.e.
>>>> it's similar to a denylist.
>>>>
>>>> But this patch wouldn't prevent that or hinder the upgrade mask
>>>> restriction afaict.
>>>
>>> This issue is actually more complicated than I thought.
>>>
>>> What do you think of the following:
>>> 1. Add new O_EMPTYPATH constant
>>> 2. When we open something with O_PATH, remember access flags (currently
>>> we drop all flags in do_dentry_open() for O_PATH fds). This is similar
>>> to Aleksa Sarai idea, but I do not think we should add some new fields,
>>> because CRIU needs to be able to see it. Just leave access flags
>>> untouched.
>>
>> There are two problems with this:
>>
>>   * The problem with this is that O_PATH and O_PATH|O_RDONLY are
>>     identical. O_RDONLY is defined as 0. I guess by new fields you're
> 
> Yes, I didn't thought about that.
> 
>>     referring to what you'd get from fcntl(F_GETFL)?
>>
>>     What you're suggesting here is the openat2() O_PATH access mask
>>     stuff. That is a feature I think would be useful, but it's not
>>     necessary to get O_EMPTYPATH working.
>>
>>     If you really need to be able to get the O_PATH re-opening mask of a
>>     file descriptor (which you probably do for CRIU) we can add that
>>     information to F_GETFL or some other such interface.
> 
> That would be cool. In the patch I saw new FMODE_PATH_READ and
> MODE_PATH_WRITE but there was no option to dump it.
> 
>>
>>   * We need to make sure that the default access modes of O_PATH on
>>     magic links are correct. We can't simply allow any access mode in
>>     that case, because if we do then we haven't really solved the
>>     /proc/self/exe issue.
>>
>>> 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
>>> access flags against the ones we remembered for O_PATH fd
>>
>>   * We also need to add the same restrictions for opening through
>>     /proc/self/fd/$n.
>>
>>> This won't solve magiclinks problems but there at least will be API to
>>> avoid procfs and which allow to add some restrictions.
>>
>> I think the magic link problems need to be solved if we're going to
>> enshrine this fd reopening behaviour by adding an O_* flag for it.
>> Though of course this is already an issue with /proc/self/fd/$n
>> re-opening.
> 
> I think these issues are close but still different. Probably we can make
> three ideas from this discussion.
> 1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
> really a new feature (since we can already do it via /proc for most
> cases). And also this won't break anything.
> 2. Add modes for magiclinks. This is more restrictive change. However I
> don't think any non-malicious programs will do procfs shenanigans and
> will be affected by this changes. This is the patch you sent some time
> ago

Oops, I didn't notice third patch in you series "open: O_EMPTYPATH: 
procfs-less file descriptor re-opening". This is exactly what I tried to
do.
It will be very cool if you resurrect and re-send magic-links
adjustments and O_EMPTYPATH.


> 3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
> make it apply if someone tries to do /proc workaround with this exact
> O_PATH fd
> 
>>
>> However since I already have a patch which solves this issue, I can work
>> on reviving it and re-send it.
>>
> 
> Why not if it only makes it better

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-14  4:24           ` Andrey Zhadchenko
@ 2022-01-14  4:28             ` Aleksa Sarai
  2022-01-17  6:35               ` Andrey Zhadchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Aleksa Sarai @ 2022-01-14  4:28 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: Christian Brauner, linux-fsdevel, viro, ptikhomirov, linux-api

[-- Attachment #1: Type: text/plain, Size: 9820 bytes --]

On 2022-01-14, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> 
> 
> On 1/13/22 10:52, Andrey Zhadchenko wrote:
> > 
> > 
> > On 1/13/22 09:46, Aleksa Sarai wrote:
> > > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
> > > > On 1/12/22 17:51, Christian Brauner wrote:
> > > > > On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> > > > > > On 2022-01-12, Andrey Zhadchenko
> > > > > > <andrey.zhadchenko@virtuozzo.com> wrote:
> > > > > > > If you have an opened O_PATH file, currently there
> > > > > > > is no way to re-open
> > > > > > > it with other flags with openat/openat2. As a
> > > > > > > workaround it is possible
> > > > > > > to open it via /proc/self/fd/<X>, however
> > > > > > > 1) You need to ensure that /proc exists
> > > > > > > 2) You cannot use O_NOFOLLOW flag
> > > > > > 
> > > > > > There is also another issue -- you can mount on top of
> > > > > > magic-links so if
> > > > > > you're a container runtime that has been tricked into creating bad
> > > > > > mounts of top of /proc/ subdirectories there's no way of
> > > > > > detecting that
> > > > > > this has happened. (Though I think in the long-term we will need to
> > > > > > make it possible for unprivileged users to create a procfs mountfd if
> > > > > > they have hidepid=4,subset=pids set -- there are loads of things
> > > > > > containers need to touch in procfs which can be
> > > > > > overmounted in malicious
> > > > > > ways.)
> > > > > 
> > > > > Yeah, though I see this as a less pressing issue for now. I'd rather
> > > > > postpone this and make userspace work a bit more. There are ways to
> > > > > design programs so you know that the procfs instance you're interacting
> > > > > with is the one you want to interact with without requiring
> > > > > unprivileged
> > > > > mounting outside of a userns+pidns+mountns pair. ;)
> > > > > 
> > > > > > 
> > > > > > > Both problems may look insignificant, but they are
> > > > > > > sensitive for CRIU.
> > > > > > > First of all, procfs may not be mounted in the namespace where we are
> > > > > > > restoring the process. Secondly, if someone opens a
> > > > > > > file with O_NOFOLLOW
> > > > > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU
> > > > > > > must also open the
> > > > > > > file with this flag during restore.
> > > > > > > 
> > > > > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > > > > struct open_how and changes getname() call to
> > > > > > > getname_flags() to avoid
> > > > > > > ENOENT for empty filenames.
> > > > > > 
> > > > > > This is something I've wanted to implement for a while,
> > > > > > but from memory
> > > > > > we need to add some other protections in place before enabling this.
> > > > > > 
> > > > > > The main one is disallowing re-opening of a path when it
> > > > > > was originally
> > > > > > opened with a different set of modes. [1] is the patch I originally
> > > > I looked at this patch. However I am not able to reproduce the problem.
> > > > For example, I can't open /proc/self/exe as RDWR with the following:
> > > > fd1 = open(/proc/self/exe, O_PATH)
> > > > fd2 = open(/proc/self/fd/3, O_RDWR) <- error
> > > > or open file with incorrect flags via O_PATH to O_PATH fd from proc
> > > > This is fixed or did I understand this problem wrong?
> > > 
> > > You will get -ETXTBSY because the /proc/self/exe is still a current->mm
> > > of a process. What you need to do is have two processes (or fork+exec a
> > > process and do this):
> > > 
> > >   1. Grab the /proc/$pid/exe handle of the target process.
> > >   2. Wait for the target process to do an exec() of another program (or
> > >      exit).
> > >   3. *Then* re-open the fd with write permissions. This is allowed
> > >      because the file is no longer being used as the current->mm of a
> > >     process and thus is treated like a regular file handle even though
> > >     it was only ever resolveable through /proc/self/exe which should
> > >     (semantically) only ever be readable.
> > > 
> > > This attack was used against runc in 2016 and a similar attack was
> > > possible with some later CVEs (I think there was also one against LXC at
> > > some point but I might be mistaken). There were other bugs which lead to
> > > this vector being usable, but my view is that this shouldn't have been
> > > possible in the first place.
> > > 
> > > I can cook up a simple example if the above description isn't explaining
> > > the issue thoroughly enough.
> > > 
> > 
> > Thanks for the explanation! I get it now
> > 
> > > > > > wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> > > > > > whether it might break some systems in subtle ways -- though according
> > > > > > to my testing there wasn't an issue on any of my machines).
> > > > > 
> > > > > Oh this is the discussion we had around turning an opath fd into a say
> > > > > O_RDWR fd, I think.
> > > > > So yes, I think restricting fd reopening makes sense. However, going
> > > > > from an O_PATH fd to e.g. an fd with O_RDWR does make sense
> > > > > and needs to
> > > > > be the default anyway. So we would need to implement this as a denylist
> > > > > anyway. The default is that opath fds can be reopened with whatever and
> > > > > only if the opath creator has restricted reopening will it fail, i.e.
> > > > > it's similar to a denylist.
> > > > > 
> > > > > But this patch wouldn't prevent that or hinder the upgrade mask
> > > > > restriction afaict.
> > > > 
> > > > This issue is actually more complicated than I thought.
> > > > 
> > > > What do you think of the following:
> > > > 1. Add new O_EMPTYPATH constant
> > > > 2. When we open something with O_PATH, remember access flags (currently
> > > > we drop all flags in do_dentry_open() for O_PATH fds). This is similar
> > > > to Aleksa Sarai idea, but I do not think we should add some new fields,
> > > > because CRIU needs to be able to see it. Just leave access flags
> > > > untouched.
> > > 
> > > There are two problems with this:
> > > 
> > >   * The problem with this is that O_PATH and O_PATH|O_RDONLY are
> > >     identical. O_RDONLY is defined as 0. I guess by new fields you're
> > 
> > Yes, I didn't thought about that.
> > 
> > >     referring to what you'd get from fcntl(F_GETFL)?
> > > 
> > >     What you're suggesting here is the openat2() O_PATH access mask
> > >     stuff. That is a feature I think would be useful, but it's not
> > >     necessary to get O_EMPTYPATH working.
> > > 
> > >     If you really need to be able to get the O_PATH re-opening mask of a
> > >     file descriptor (which you probably do for CRIU) we can add that
> > >     information to F_GETFL or some other such interface.
> > 
> > That would be cool. In the patch I saw new FMODE_PATH_READ and
> > MODE_PATH_WRITE but there was no option to dump it.
> > 
> > > 
> > >   * We need to make sure that the default access modes of O_PATH on
> > >     magic links are correct. We can't simply allow any access mode in
> > >     that case, because if we do then we haven't really solved the
> > >     /proc/self/exe issue.
> > > 
> > > > 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
> > > > access flags against the ones we remembered for O_PATH fd
> > > 
> > >   * We also need to add the same restrictions for opening through
> > >     /proc/self/fd/$n.
> > > 
> > > > This won't solve magiclinks problems but there at least will be API to
> > > > avoid procfs and which allow to add some restrictions.
> > > 
> > > I think the magic link problems need to be solved if we're going to
> > > enshrine this fd reopening behaviour by adding an O_* flag for it.
> > > Though of course this is already an issue with /proc/self/fd/$n
> > > re-opening.
> > 
> > I think these issues are close but still different. Probably we can make
> > three ideas from this discussion.
> > 1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
> > really a new feature (since we can already do it via /proc for most
> > cases). And also this won't break anything.
> > 2. Add modes for magiclinks. This is more restrictive change. However I
> > don't think any non-malicious programs will do procfs shenanigans and
> > will be affected by this changes. This is the patch you sent some time
> > ago
> 
> Oops, I didn't notice third patch in you series "open: O_EMPTYPATH:
> procfs-less file descriptor re-opening". This is exactly what I tried to
> do.
> It will be very cool if you resurrect and re-send magic-links
> adjustments and O_EMPTYPATH.

I'll rebase it (adding a way to dump the reopening mask for O_PATH
descriptors) and send it next week (assuming it doesn't require too
much tweaking).

It should be noted that on paper you can get the reopening mask with the
current version of the patchset (look at the mode of the magic link in
/proc/self/fd/$n) but that's obviously not a reasonable solution.

> > 3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
> > make it apply if someone tries to do /proc workaround with this exact
> > O_PATH fd

I originally wanted to do this in openat2() since it feels analogous to
open modes for regular file descriptors (in fact I planned to make
how->mode a union with how->upgrade_mask) but I'll need to think about
how to expose that in fcntl().

> > > However since I already have a patch which solves this issue, I can work
> > > on reviving it and re-send it.
> > 
> > Why not if it only makes it better

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2
  2022-01-14  4:28             ` Aleksa Sarai
@ 2022-01-17  6:35               ` Andrey Zhadchenko
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Zhadchenko @ 2022-01-17  6:35 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Christian Brauner, linux-fsdevel, viro, ptikhomirov, linux-api



On 1/14/22 07:28, Aleksa Sarai wrote:
> On 2022-01-14, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
>>
>>
>> On 1/13/22 10:52, Andrey Zhadchenko wrote:
>>>
>>>
>>> On 1/13/22 09:46, Aleksa Sarai wrote:
>>>> On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> wrote:
>>>>> On 1/12/22 17:51, Christian Brauner wrote:
>>>>>> On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
>>>>>>> On 2022-01-12, Andrey Zhadchenko
>>>>>>> <andrey.zhadchenko@virtuozzo.com> wrote:
>>>>>>>> If you have an opened O_PATH file, currently there
>>>>>>>> is no way to re-open
>>>>>>>> it with other flags with openat/openat2. As a
>>>>>>>> workaround it is possible
>>>>>>>> to open it via /proc/self/fd/<X>, however
>>>>>>>> 1) You need to ensure that /proc exists
>>>>>>>> 2) You cannot use O_NOFOLLOW flag
>>>>>>>
>>>>>>> There is also another issue -- you can mount on top of
>>>>>>> magic-links so if
>>>>>>> you're a container runtime that has been tricked into creating bad
>>>>>>> mounts of top of /proc/ subdirectories there's no way of
>>>>>>> detecting that
>>>>>>> this has happened. (Though I think in the long-term we will need to
>>>>>>> make it possible for unprivileged users to create a procfs mountfd if
>>>>>>> they have hidepid=4,subset=pids set -- there are loads of things
>>>>>>> containers need to touch in procfs which can be
>>>>>>> overmounted in malicious
>>>>>>> ways.)
>>>>>>
>>>>>> Yeah, though I see this as a less pressing issue for now. I'd rather
>>>>>> postpone this and make userspace work a bit more. There are ways to
>>>>>> design programs so you know that the procfs instance you're interacting
>>>>>> with is the one you want to interact with without requiring
>>>>>> unprivileged
>>>>>> mounting outside of a userns+pidns+mountns pair. ;)
>>>>>>
>>>>>>>
>>>>>>>> Both problems may look insignificant, but they are
>>>>>>>> sensitive for CRIU.
>>>>>>>> First of all, procfs may not be mounted in the namespace where we are
>>>>>>>> restoring the process. Secondly, if someone opens a
>>>>>>>> file with O_NOFOLLOW
>>>>>>>> flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU
>>>>>>>> must also open the
>>>>>>>> file with this flag during restore.
>>>>>>>>
>>>>>>>> This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
>>>>>>>> struct open_how and changes getname() call to
>>>>>>>> getname_flags() to avoid
>>>>>>>> ENOENT for empty filenames.
>>>>>>>
>>>>>>> This is something I've wanted to implement for a while,
>>>>>>> but from memory
>>>>>>> we need to add some other protections in place before enabling this.
>>>>>>>
>>>>>>> The main one is disallowing re-opening of a path when it
>>>>>>> was originally
>>>>>>> opened with a different set of modes. [1] is the patch I originally
>>>>> I looked at this patch. However I am not able to reproduce the problem.
>>>>> For example, I can't open /proc/self/exe as RDWR with the following:
>>>>> fd1 = open(/proc/self/exe, O_PATH)
>>>>> fd2 = open(/proc/self/fd/3, O_RDWR) <- error
>>>>> or open file with incorrect flags via O_PATH to O_PATH fd from proc
>>>>> This is fixed or did I understand this problem wrong?
>>>>
>>>> You will get -ETXTBSY because the /proc/self/exe is still a current->mm
>>>> of a process. What you need to do is have two processes (or fork+exec a
>>>> process and do this):
>>>>
>>>>    1. Grab the /proc/$pid/exe handle of the target process.
>>>>    2. Wait for the target process to do an exec() of another program (or
>>>>       exit).
>>>>    3. *Then* re-open the fd with write permissions. This is allowed
>>>>       because the file is no longer being used as the current->mm of a
>>>>      process and thus is treated like a regular file handle even though
>>>>      it was only ever resolveable through /proc/self/exe which should
>>>>      (semantically) only ever be readable.
>>>>
>>>> This attack was used against runc in 2016 and a similar attack was
>>>> possible with some later CVEs (I think there was also one against LXC at
>>>> some point but I might be mistaken). There were other bugs which lead to
>>>> this vector being usable, but my view is that this shouldn't have been
>>>> possible in the first place.
>>>>
>>>> I can cook up a simple example if the above description isn't explaining
>>>> the issue thoroughly enough.
>>>>
>>>
>>> Thanks for the explanation! I get it now
>>>
>>>>>>> wrote as part of the openat2(2) (but I dropped it since I wasn't sure
>>>>>>> whether it might break some systems in subtle ways -- though according
>>>>>>> to my testing there wasn't an issue on any of my machines).
>>>>>>
>>>>>> Oh this is the discussion we had around turning an opath fd into a say
>>>>>> O_RDWR fd, I think.
>>>>>> So yes, I think restricting fd reopening makes sense. However, going
>>>>>> from an O_PATH fd to e.g. an fd with O_RDWR does make sense
>>>>>> and needs to
>>>>>> be the default anyway. So we would need to implement this as a denylist
>>>>>> anyway. The default is that opath fds can be reopened with whatever and
>>>>>> only if the opath creator has restricted reopening will it fail, i.e.
>>>>>> it's similar to a denylist.
>>>>>>
>>>>>> But this patch wouldn't prevent that or hinder the upgrade mask
>>>>>> restriction afaict.
>>>>>
>>>>> This issue is actually more complicated than I thought.
>>>>>
>>>>> What do you think of the following:
>>>>> 1. Add new O_EMPTYPATH constant
>>>>> 2. When we open something with O_PATH, remember access flags (currently
>>>>> we drop all flags in do_dentry_open() for O_PATH fds). This is similar
>>>>> to Aleksa Sarai idea, but I do not think we should add some new fields,
>>>>> because CRIU needs to be able to see it. Just leave access flags
>>>>> untouched.
>>>>
>>>> There are two problems with this:
>>>>
>>>>    * The problem with this is that O_PATH and O_PATH|O_RDONLY are
>>>>      identical. O_RDONLY is defined as 0. I guess by new fields you're
>>>
>>> Yes, I didn't thought about that.
>>>
>>>>      referring to what you'd get from fcntl(F_GETFL)?
>>>>
>>>>      What you're suggesting here is the openat2() O_PATH access mask
>>>>      stuff. That is a feature I think would be useful, but it's not
>>>>      necessary to get O_EMPTYPATH working.
>>>>
>>>>      If you really need to be able to get the O_PATH re-opening mask of a
>>>>      file descriptor (which you probably do for CRIU) we can add that
>>>>      information to F_GETFL or some other such interface.
>>>
>>> That would be cool. In the patch I saw new FMODE_PATH_READ and
>>> MODE_PATH_WRITE but there was no option to dump it.
>>>
>>>>
>>>>    * We need to make sure that the default access modes of O_PATH on
>>>>      magic links are correct. We can't simply allow any access mode in
>>>>      that case, because if we do then we haven't really solved the
>>>>      /proc/self/exe issue.
>>>>
>>>>> 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
>>>>> access flags against the ones we remembered for O_PATH fd
>>>>
>>>>    * We also need to add the same restrictions for opening through
>>>>      /proc/self/fd/$n.
>>>>
>>>>> This won't solve magiclinks problems but there at least will be API to
>>>>> avoid procfs and which allow to add some restrictions.
>>>>
>>>> I think the magic link problems need to be solved if we're going to
>>>> enshrine this fd reopening behaviour by adding an O_* flag for it.
>>>> Though of course this is already an issue with /proc/self/fd/$n
>>>> re-opening.
>>>
>>> I think these issues are close but still different. Probably we can make
>>> three ideas from this discussion.
>>> 1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
>>> really a new feature (since we can already do it via /proc for most
>>> cases). And also this won't break anything.
>>> 2. Add modes for magiclinks. This is more restrictive change. However I
>>> don't think any non-malicious programs will do procfs shenanigans and
>>> will be affected by this changes. This is the patch you sent some time
>>> ago
>>
>> Oops, I didn't notice third patch in you series "open: O_EMPTYPATH:
>> procfs-less file descriptor re-opening". This is exactly what I tried to
>> do.
>> It will be very cool if you resurrect and re-send magic-links
>> adjustments and O_EMPTYPATH.
> 
> I'll rebase it (adding a way to dump the reopening mask for O_PATH
> descriptors) and send it next week (assuming it doesn't require too
> much tweaking).

Thanks!

> 
> It should be noted that on paper you can get the reopening mask with the
> current version of the patchset (look at the mode of the magic link in
> /proc/self/fd/$n) but that's obviously not a reasonable solution.
> 
>>> 3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
>>> make it apply if someone tries to do /proc workaround with this exact
>>> O_PATH fd
> 
> I originally wanted to do this in openat2() since it feels analogous to
> open modes for regular file descriptors (in fact I planned to make
> how->mode a union with how->upgrade_mask) but I'll need to think about
> how to expose that in fcntl().
> 

Probably new openat2() field is even better. We do not really need new
F_SETFL for O_PATH descriptors, because you would always re-open it with
O_EMPTYPATH again with stricter flags. As for exposing this flags for
CRIU, new line in /proc/<PID>/fdinfo/<N> is fine (and preferable even
if you add new fcntl())

>>>> However since I already have a patch which solves this issue, I can work
>>>> on reviving it and re-send it.
>>>
>>> Why not if it only makes it better
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-17  6:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12  9:02 [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2 Andrey Zhadchenko
2022-01-12 14:34 ` Aleksa Sarai
2022-01-12 14:51   ` Christian Brauner
2022-01-12 18:56     ` Andrey Zhadchenko
2022-01-13  6:46       ` Aleksa Sarai
2022-01-13  7:52         ` Andrey Zhadchenko
2022-01-14  4:24           ` Andrey Zhadchenko
2022-01-14  4:28             ` Aleksa Sarai
2022-01-17  6:35               ` Andrey Zhadchenko
2022-01-13 14:05         ` Christian Brauner
2022-01-13 14:44           ` Aleksa Sarai
2022-01-13  6:55     ` Aleksa Sarai
2022-01-12 14:39 ` Christian Brauner
2022-01-12 14:43   ` Aleksa Sarai
2022-01-12 14:53     ` Christian Brauner
2022-01-12 17:45       ` Andrey Zhadchenko
2022-01-13  6:47         ` Aleksa Sarai
2022-01-13 10:33           ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.