linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Giuseppe Scrivano <gscrivan@redhat.com>
Cc: linux-kernel@vger.kernel.org, keescook@chromium.org,
	bristot@redhat.com, ebiederm@xmission.com, brauner@kernel.org,
	viro@zeniv.linux.org.uk, alexl@redhat.com, peterz@infradead.org,
	bmasney@redhat.com
Subject: Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl
Date: Thu, 26 Jan 2023 02:28:47 +1100	[thread overview]
Message-ID: <20230125152847.wr443tggzb3no6mg@senku> (raw)
In-Reply-To: <87h6wgcrv6.fsf@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 9068 bytes --]

On 2023-01-24, Giuseppe Scrivano <gscrivan@redhat.com> wrote:
> Aleksa Sarai <cyphar@cyphar.com> writes:
> 
> > On 2023-01-20, Giuseppe Scrivano <gscrivan@redhat.com> wrote:
> >> This patch adds a new prctl called PR_HIDE_SELF_EXE which allows
> >> processes to hide their own /proc/*/exe file. When this prctl is
> >> used, every access to /proc/*/exe for the calling process will
> >> fail with ENOENT.
> >> 
> >> This is useful for preventing issues like CVE-2019-5736, where an
> >> attacker can gain host root access by overwriting the binary
> >> in OCI runtimes through file-descriptor mishandling in containers.
> >> 
> >> The current fix for CVE-2019-5736 is to create a read-only copy or
> >> a bind-mount of the current executable, and then re-exec the current
> >> process.  With the new prctl, the read-only copy or bind-mount copy is
> >> not needed anymore.
> >> 
> >> While map_files/ also might contain symlinks to files in host,
> >> proc_map_files_get_link() permissions checks are already sufficient.
> >
> > I suspect this doesn't protect against the execve("/proc/self/exe")
> > tactic (because it clears the bit on execve), so I'm not sure this is
> > much safer than PR_SET_DUMPABLE (yeah, it stops root in the source
> > userns from accessing /proc/$pid/exe but the above attack makes that no
> > longer that important).
> 
> it protects against that attack too.  It clears the bit _after_ the
> execve() syscall is done.
> 
> If you attempt execve("/proc/self/exe") you still get ENOENT:
> 
> ```
> #include <stdlib.h>
> #include <stdio.h>
> #include <sys/prctl.h>
> #include <unistd.h>
> 
> int main(void)
> {
>         int ret;
> 
>         ret = prctl(65, 1, 0, 0, 0);
>         if (ret != 0)
>                 exit(1);
> 
>         execl("/proc/self/exe", "foo", NULL);
>         exit(2);
> }
> ```
> 
> # strace -e prctl,execve ./hide-self-exe
> execve("./hide-self-exe", ["./hide-self-exe"], 0x7fff975a3690 /* 39 vars */) = 0
> prctl(0x41 /* PR_??? */, 0x1, 0, 0, 0)  = 0
> execve("/proc/self/exe", ["foo"], 0x7ffcf51868b8 /* 39 vars */) = -1 ENOENT (No such file or directory)
> +++ exited with 2 +++
> 
> I've also tried execv'ing with a script that uses "#!/proc/self/exe" and
> I get the same ENOENT.

Ah, you're right. As you mentioned, you could still do the attack
through /proc/self/map_files but that would require you to know where
the binary will be located (and being non-dumpable blocks container
processes from doing tricks to get the right path).

I wonder if we should somehow require (or auto-apply) SUID_DUMP_NONE
when setting this prctl, since it does currently depend on it to be
properly secure...

> > I think the only way to fix this properly is by blocking re-opens of
> > magic links that have more permissions than they originally did. I just
> > got back from vacation, but I'm working on fixing up [1] so it's ready
> > to be an RFC so we can close this hole once and for all.
> 
> so that relies on the fact opening /proc/self/exe with O_WRONLY fails
> with ETXTBSY?

Not quite, it relies on the fact that /proc/self/exe (and any other
magiclink to /proc/self/exe) does not have a write mode (semantically,
because of -ETXTBSY) and thus blocks any attempt to open it (or re-open
it) with a write mode. It also fixes some other possible issues and lets
you have upgrade masks (a-la capabilities) to file descriptors.

Ultimately I think having a complete "no really, nobody can touch this"
knob is also a good idea, and as this is is much simpler we can it in
much quicker than the magiclink stuff (which I still think is necessary
in general).

> > [1]: https://github.com/cyphar/linux/tree/magiclink/open_how-reopen
> >
> >> 
> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
> >> ---
> >> v2: https://lkml.org/lkml/2023/1/19/849
> >> 
> >> Differences from v2:
> >> 
> >> - fixed the test to check PR_SET_HIDE_SELF_EXE after fork
> >> 
> >> v1: https://lkml.org/lkml/2023/1/4/334
> >> 
> >> Differences from v1:
> >> 
> >> - amended more information in the commit message wrt map_files not
> >>   requiring the same protection.
> >> - changed the test to verify PR_HIDE_SELF_EXE cannot be unset after
> >>   a fork.
> >> 
> >> fs/exec.c                        | 1 +
> >>  fs/proc/base.c                   | 8 +++++---
> >>  include/linux/sched.h            | 5 +++++
> >>  include/uapi/linux/prctl.h       | 3 +++
> >>  kernel/sys.c                     | 9 +++++++++
> >>  tools/include/uapi/linux/prctl.h | 3 +++
> >>  6 files changed, 26 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/fs/exec.c b/fs/exec.c
> >> index ab913243a367..5a5dd964c3a3 100644
> >> --- a/fs/exec.c
> >> +++ b/fs/exec.c
> >> @@ -1855,6 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm,
> >>  	/* execve succeeded */
> >>  	current->fs->in_exec = 0;
> >>  	current->in_execve = 0;
> >> +	task_clear_hide_self_exe(current);
> >>  	rseq_execve(current);
> >>  	acct_update_integrals(current);
> >>  	task_numa_free(current, false);
> >> diff --git a/fs/proc/base.c b/fs/proc/base.c
> >> index 9e479d7d202b..959968e2da0d 100644
> >> --- a/fs/proc/base.c
> >> +++ b/fs/proc/base.c
> >> @@ -1723,19 +1723,21 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
> >>  {
> >>  	struct task_struct *task;
> >>  	struct file *exe_file;
> >> +	long hide_self_exe;
> >>  
> >>  	task = get_proc_task(d_inode(dentry));
> >>  	if (!task)
> >>  		return -ENOENT;
> >>  	exe_file = get_task_exe_file(task);
> >> +	hide_self_exe = task_hide_self_exe(task);
> >>  	put_task_struct(task);
> >> -	if (exe_file) {
> >> +	if (exe_file && !hide_self_exe) {
> >>  		*exe_path = exe_file->f_path;
> >>  		path_get(&exe_file->f_path);
> >>  		fput(exe_file);
> >>  		return 0;
> >> -	} else
> >> -		return -ENOENT;
> >> +	}
> >> +	return -ENOENT;
> >>  }
> >>  
> >>  static const char *proc_pid_get_link(struct dentry *dentry,
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 853d08f7562b..8db32d5fc285 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -1790,6 +1790,7 @@ static __always_inline bool is_percpu_thread(void)
> >>  #define PFA_SPEC_IB_DISABLE		5	/* Indirect branch speculation restricted */
> >>  #define PFA_SPEC_IB_FORCE_DISABLE	6	/* Indirect branch speculation permanently restricted */
> >>  #define PFA_SPEC_SSB_NOEXEC		7	/* Speculative Store Bypass clear on execve() */
> >> +#define PFA_HIDE_SELF_EXE		8	/* Hide /proc/self/exe for the process */
> >>  
> >>  #define TASK_PFA_TEST(name, func)					\
> >>  	static inline bool task_##func(struct task_struct *p)		\
> >> @@ -1832,6 +1833,10 @@ TASK_PFA_CLEAR(SPEC_IB_DISABLE, spec_ib_disable)
> >>  TASK_PFA_TEST(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable)
> >>  TASK_PFA_SET(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable)
> >>  
> >> +TASK_PFA_TEST(HIDE_SELF_EXE, hide_self_exe)
> >> +TASK_PFA_SET(HIDE_SELF_EXE, hide_self_exe)
> >> +TASK_PFA_CLEAR(HIDE_SELF_EXE, hide_self_exe)
> >> +
> >>  static inline void
> >>  current_restore_flags(unsigned long orig_flags, unsigned long flags)
> >>  {
> >> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> >> index a5e06dcbba13..f12f3df12468 100644
> >> --- a/include/uapi/linux/prctl.h
> >> +++ b/include/uapi/linux/prctl.h
> >> @@ -284,4 +284,7 @@ struct prctl_mm_map {
> >>  #define PR_SET_VMA		0x53564d41
> >>  # define PR_SET_VMA_ANON_NAME		0
> >>  
> >> +#define PR_SET_HIDE_SELF_EXE		65
> >> +#define PR_GET_HIDE_SELF_EXE		66
> >> +
> >>  #endif /* _LINUX_PRCTL_H */
> >> diff --git a/kernel/sys.c b/kernel/sys.c
> >> index 5fd54bf0e886..e992f1b72973 100644
> >> --- a/kernel/sys.c
> >> +++ b/kernel/sys.c
> >> @@ -2626,6 +2626,15 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >>  	case PR_SET_VMA:
> >>  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
> >>  		break;
> >> +	case PR_SET_HIDE_SELF_EXE:
> >> +		if (arg2 != 1 || arg3 || arg4 || arg5)
> >> +			return -EINVAL;
> >> +		task_set_hide_self_exe(current);
> >> +		break;
> >> +	case PR_GET_HIDE_SELF_EXE:
> >> +		if (arg2 || arg3 || arg4 || arg5)
> >> +			return -EINVAL;
> >> +		return task_hide_self_exe(current) ? 1 : 0;
> >>  	default:
> >>  		error = -EINVAL;
> >>  		break;
> >> diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
> >> index a5e06dcbba13..f12f3df12468 100644
> >> --- a/tools/include/uapi/linux/prctl.h
> >> +++ b/tools/include/uapi/linux/prctl.h
> >> @@ -284,4 +284,7 @@ struct prctl_mm_map {
> >>  #define PR_SET_VMA		0x53564d41
> >>  # define PR_SET_VMA_ANON_NAME		0
> >>  
> >> +#define PR_SET_HIDE_SELF_EXE		65
> >> +#define PR_GET_HIDE_SELF_EXE		66
> >> +
> >>  #endif /* _LINUX_PRCTL_H */
> >> -- 
> >> 2.38.1
> >> 
> 

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2023-01-25 15:29 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 10:25 [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl Giuseppe Scrivano
2023-01-20 10:25 ` [PATCH v3 2/2] selftests: add tests for prctl(SET_HIDE_SELF_EXE) Giuseppe Scrivano
2023-01-20 16:05   ` Brian Masney
2023-01-23 18:41 ` [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl Colin Walters
2023-01-23 19:21   ` Giuseppe Scrivano
2023-01-23 22:07     ` Colin Walters
2023-01-23 22:54       ` Giuseppe Scrivano
2023-01-23 23:14         ` Colin Walters
2023-01-24  1:53 ` Aleksa Sarai
2023-01-24  7:29   ` Giuseppe Scrivano
2023-01-25 15:28     ` Aleksa Sarai [this message]
2023-01-25 16:30       ` Giuseppe Scrivano
2023-01-29 13:59         ` Colin Walters
2023-01-29 16:58           ` Christian Brauner
2023-01-29 18:12             ` Colin Walters
2023-01-30  9:53               ` Christian Brauner
2023-01-30 10:06                 ` Christian Brauner
2023-01-30 21:52                   ` Colin Walters
2023-01-31 14:17                   ` Giuseppe Scrivano
2023-02-25  0:27                   ` Andrei Vagin
2023-02-28 14:19                     ` Giuseppe Scrivano
2023-01-26  8:25       ` Christian Brauner
2023-01-24 19:17   ` Andrei Vagin
2023-01-27 12:31 ` Christian Brauner
2023-01-27 20:34   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230125152847.wr443tggzb3no6mg@senku \
    --to=cyphar@cyphar.com \
    --cc=alexl@redhat.com \
    --cc=bmasney@redhat.com \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=gscrivan@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).