All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
@ 2023-09-07 20:24 Guilherme G. Piccoli
  2023-09-07 20:24 ` [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs Guilherme G. Piccoli
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Guilherme G. Piccoli @ 2023-09-07 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: linux-mm, kernel-dev, kernel, keescook, ebiederm, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, david, dave, sonicadvance1,
	joshua, Guilherme G. Piccoli

Currently the kernel provides a symlink to the executable binary, in the
form of procfs file exe_file (/proc/self/exe_file for example). But what
happens in interpreted scenarios (like binfmt_misc) is that such link
always points to the *interpreter*. For cases of Linux binary emulators,
like FEX [0] for example, it's then necessary to somehow mask that and
emulate the true binary path.

We hereby propose a way to expose such interpreted binary as exe_file if
the flag 'I' is selected on binfmt_misc. When that flag is set, the file
/proc/self/exe_file points to the *interpreted* file, be it ELF or not.
In order to allow users to distinguish if such flag is used or not without
checking the binfmt_misc filesystem, we propose also the /proc/self/interpreter
file, which always points to the *interpreter* in scenarios where
interpretation is set, like binfmt_misc. This file is empty / points to
nothing in the case of regular ELF execution, though we could consider
implementing a way to point to the LD preloader if that makes sense...

This was sent as RFC because of course it's a very core change, affecting
multiple areas and there are design choices (and questions) in each patch
so we could discuss and check the best way to implement the solution as
well as the corner cases handling. This is a very useful feature for
emulators and such, like FEX and Wine, which usually need to circumvent
this kernel limitation in order to expose the true emulated file name
(more examples at [1][2][3]).

This patchset is based on the currently v6.6-rc1 candidate (Linus tree
from yesterday) and was tested under QEMU as well as using FEX.
Thanks in advance for comments, any feedback is greatly appreciated!
Cheers,

Guilherme


[0] https://github.com/FEX-Emu/FEX

[1] Using an environment variable trick to override exe_file:
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/u_process.c#L209 

[2] https://github.com/baldurk/renderdoc/pull/2694

[3] FEX handling of the exe_file parsing:
https://github.com/FEX-Emu/FEX/blob/main/Source/Tools/FEXLoader/LinuxSyscalls/FileManagement.cpp#L499


Guilherme G. Piccoli (2):
  binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs
  fork, procfs: Introduce /proc/self/interpreter symlink

 Documentation/admin-guide/binfmt-misc.rst |  11 ++
 arch/arc/kernel/troubleshoot.c            |   5 +
 fs/binfmt_elf.c                           |   7 ++
 fs/binfmt_misc.c                          |  11 ++
 fs/coredump.c                             |   5 +
 fs/exec.c                                 |  26 ++++-
 fs/proc/base.c                            |  48 +++++---
 include/linux/binfmts.h                   |   4 +
 include/linux/mm.h                        |   7 +-
 include/linux/mm_types.h                  |   2 +
 kernel/audit.c                            |   5 +
 kernel/audit_watch.c                      |   7 +-
 kernel/fork.c                             | 131 +++++++++++++++++-----
 kernel/signal.c                           |   7 +-
 kernel/sys.c                              |   5 +
 kernel/taskstats.c                        |   7 +-
 security/tomoyo/util.c                    |   5 +
 17 files changed, 241 insertions(+), 52 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs
  2023-09-07 20:24 [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
@ 2023-09-07 20:24 ` Guilherme G. Piccoli
  2023-10-10  4:31   ` kernel test robot
  2023-09-07 20:24 ` [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink Guilherme G. Piccoli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Guilherme G. Piccoli @ 2023-09-07 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: linux-mm, kernel-dev, kernel, keescook, ebiederm, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, david, dave, sonicadvance1,
	joshua, Guilherme G. Piccoli

The procfs symlink /proc/self/exe_file points to the executable binary
of the running process/thread. What happens though for binfmt_misc
interpreted cases is that it effectively points to the *interpreter*,
which is valid as the interpreter is in fact the one binary running.

But there are cases in which this is considered a limitation - see for
example the case of Linux architecture emulators, like FEX [0]. A binary
running under such emulator could check its own symlink, and that'd
be invalid, pointing instead to the emulator (its interpreter).
This adds overhead to the emulation process, that must trap accesses
to such symlink to guarantee it is exposed properly to the emulated binary.

Add hereby the flag 'I' to binfmt_misc to allow override this default
behavior of mapping the interpreter as /proc/self/exe - with this flag,
the *interpreted* file is exposed in procfs instead.

[0] https://github.com/FEX-Emu/FEX

Suggested-by: Ryan Houdek <sonicadvance1@gmail.com>
Tested-by: Ryan Houdek <sonicadvance1@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


Some design decisions / questions:

(a) The patch makes use of interpreted_file == NULL to diverge if
the flag is set or not. In other words: in any case but binfmt_misc
with flag 'I' set, this pointer is NULL and the real exe_file
is retrieved. This way, we don't need to propagate some flag from the
ends of binfmt_misc up to procfs code. Does it make sense?

(b) Of course there's various places affected by this change that
I'm not sure if we should care or not, i.e., if we need to somehow
change the behavior as well. Examples are audit code, tomoyo, etc.
Even worse is the case of users setting the exe_file through prctl;
what to do in these cases? I've marked them on code using comments
starting with FIXME (BINPRM_FLAGS_EXPOSE_INTERP) so we can debate.

(c) Keeping interpreted_file on mm_struct *seems* to make sense
to me, though I'm not really sure of the impact of this new
member there, for cache locality or anything else I'm not seeing.
An alternative would be a small struct exec_files{} that contains
both exe_file and interpreted_file, if that's somehow better...

(d) Naming: both "interpreted_file" and the flag "I" were simple
choices and subject to change to any community suggestion, I'm not
attached to them in any way.

Probably there's more implicit design decisions here and any feedback
is greatly appreciated! Thanks in advance,

Guilherme


 Documentation/admin-guide/binfmt-misc.rst |  11 +++
 arch/arc/kernel/troubleshoot.c            |   5 ++
 fs/binfmt_elf.c                           |   7 ++
 fs/binfmt_misc.c                          |  11 +++
 fs/coredump.c                             |   5 ++
 fs/exec.c                                 |  18 +++-
 fs/proc/base.c                            |   2 +-
 include/linux/binfmts.h                   |   3 +
 include/linux/mm.h                        |   6 +-
 include/linux/mm_types.h                  |   1 +
 kernel/audit.c                            |   5 ++
 kernel/audit_watch.c                      |   7 +-
 kernel/fork.c                             | 105 ++++++++++++++++------
 kernel/signal.c                           |   7 +-
 kernel/sys.c                              |   5 ++
 kernel/taskstats.c                        |   7 +-
 security/tomoyo/util.c                    |   5 ++
 17 files changed, 173 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/binfmt-misc.rst b/Documentation/admin-guide/binfmt-misc.rst
index 59cd902e3549..175fca8439d6 100644
--- a/Documentation/admin-guide/binfmt-misc.rst
+++ b/Documentation/admin-guide/binfmt-misc.rst
@@ -88,6 +88,17 @@ Here is what the fields mean:
 	    emulation is installed and uses the opened image to spawn the
 	    emulator, meaning it is always available once installed,
 	    regardless of how the environment changes.
+      ``I`` - expose the interpreted file in the /proc/self/exe symlink
+            By default, binfmt_misc executing binaries expose their interpreter
+            as the /proc/self/exe file, which makes sense given that the actual
+            executable running is the interpreter indeed. But there are some
+            cases in which we want to change that behavior - imagine an emulator
+            of Linux binaries (of different architecture, for example) which
+            needs to deal with the different behaviors when running native - the
+            binary's symlink (/proc/self/exe) points to the binary itself - vs
+            the emulated case, whereas the link points to the interpreter. This
+            flag allows to change the default behavior and have the proc symlink
+            pointing to the **interpreted** file, not the interpreter.
 
 
 There are some restrictions:
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index d5b3ed2c58f5..d078af66f07b 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -62,6 +62,11 @@ static void print_task_path_n_nm(struct task_struct *tsk)
 	if (!mm)
 		goto done;
 
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc
+	 * with flag 'I' set, this functionality will diverge from the
+	 * /proc/self/exe symlink with regards of what executable is running.
+	 */
 	exe_file = get_mm_exe_file(mm);
 	mmput(mm);
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7b3d2d491407..fb0c22fa3635 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1162,6 +1162,13 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			}
 		}
 
+		/*
+		 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): here is one of our problems - bprm->file is
+		 * elf_mapped(), whereas our saved bprm->interpreted_file isn't. Now, why not just
+		 * map it, right? Because we're not sure how to (or if it's indeed necessary).
+		 * What if the interpreted file is not ELF? Could be anything that its interpreter
+		 * is able to read and execute...
+		 */
 		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
 				elf_prot, elf_flags, total_size);
 		if (BAD_ADDR(error)) {
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index e0108d17b085..36350c3d73f5 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -48,6 +48,7 @@ enum {Enabled, Magic};
 #define MISC_FMT_OPEN_BINARY (1UL << 30)
 #define MISC_FMT_CREDENTIALS (1UL << 29)
 #define MISC_FMT_OPEN_FILE (1UL << 28)
+#define MISC_FMT_EXPOSE_INTERPRETED (1UL << 27)
 
 typedef struct {
 	struct list_head list;
@@ -181,6 +182,9 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		goto ret;
 
+	if (fmt->flags & MISC_FMT_EXPOSE_INTERPRETED)
+		bprm->interp_flags |= BINPRM_FLAGS_EXPOSE_INTERP;
+
 	if (fmt->flags & MISC_FMT_OPEN_FILE) {
 		interp_file = file_clone_open(fmt->interp_file);
 		if (!IS_ERR(interp_file))
@@ -258,6 +262,11 @@ static char *check_special_flags(char *sfs, Node *e)
 			p++;
 			e->flags |= MISC_FMT_OPEN_FILE;
 			break;
+		case 'I':
+			pr_debug("register: flag: I: (expose interpreted binary)\n");
+			p++;
+			e->flags |= MISC_FMT_EXPOSE_INTERPRETED;
+			break;
 		default:
 			cont = 0;
 		}
@@ -524,6 +533,8 @@ static void entry_status(Node *e, char *page)
 		*dp++ = 'C';
 	if (e->flags & MISC_FMT_OPEN_FILE)
 		*dp++ = 'F';
+	if (e->flags & MISC_FMT_EXPOSE_INTERPRETED)
+		*dp++ = 'I';
 	*dp++ = '\n';
 
 	if (!test_bit(Magic, &e->flags)) {
diff --git a/fs/coredump.c b/fs/coredump.c
index 9d235fa14ab9..1a771c7cba67 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -164,6 +164,11 @@ static int cn_print_exe_file(struct core_name *cn, bool name_only)
 	char *pathbuf, *path, *ptr;
 	int ret;
 
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc
+	 * with flag 'I' set, this coredump functionality will diverge from the
+	 * /proc/self/exe symlink with regards of what executable is running.
+	 */
 	exe_file = get_mm_exe_file(current->mm);
 	if (!exe_file)
 		return cn_esc_printf(cn, "%s (path unknown)", current->comm);
diff --git a/fs/exec.c b/fs/exec.c
index 6518e33ea813..bb1574f37b67 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1280,7 +1280,8 @@ int begin_new_exec(struct linux_binprm * bprm)
 	 * not visible until then. Doing it here also ensures
 	 * we don't race against replace_mm_exe_file().
 	 */
-	retval = set_mm_exe_file(bprm->mm, bprm->file);
+	retval = set_mm_exe_file(bprm->mm, bprm->file,
+				 bprm->interpreted_file);
 	if (retval)
 		goto out;
 
@@ -1405,6 +1406,13 @@ int begin_new_exec(struct linux_binprm * bprm)
 		fd_install(retval, bprm->executable);
 		bprm->executable = NULL;
 		bprm->execfd = retval;
+		/*
+		 * Since bprm->interpreted_file points to bprm->executable and
+		 * fd_install() consumes its refcount, we need to bump the refcount
+		 * here to avoid warnings as "file count is 0" on kernel log.
+		 */
+		if (unlikely(bprm->interp_flags & BINPRM_FLAGS_EXPOSE_INTERP))
+			get_file(bprm->interpreted_file);
 	}
 	return 0;
 
@@ -1500,6 +1508,8 @@ static void free_bprm(struct linux_binprm *bprm)
 		allow_write_access(bprm->file);
 		fput(bprm->file);
 	}
+	if (bprm->interpreted_file)
+		fput(bprm->interpreted_file);
 	if (bprm->executable)
 		fput(bprm->executable);
 	/* If a binfmt changed the interp, free it. */
@@ -1789,6 +1799,9 @@ static int exec_binprm(struct linux_binprm *bprm)
 		bprm->interpreter = NULL;
 
 		allow_write_access(exec);
+		if (unlikely(bprm->interp_flags & BINPRM_FLAGS_EXPOSE_INTERP))
+			bprm->interpreted_file = exec;
+
 		if (unlikely(bprm->have_execfd)) {
 			if (bprm->executable) {
 				fput(exec);
@@ -1796,7 +1809,8 @@ static int exec_binprm(struct linux_binprm *bprm)
 			}
 			bprm->executable = exec;
 		} else
-			fput(exec);
+			if (!(bprm->interp_flags & BINPRM_FLAGS_EXPOSE_INTERP))
+				fput(exec);
 	}
 
 	audit_bprm(bprm);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ffd54617c354..a13fbfc46997 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1727,7 +1727,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 	task = get_proc_task(d_inode(dentry));
 	if (!task)
 		return -ENOENT;
-	exe_file = get_task_exe_file(task);
+	exe_file = get_task_exe_file(task, true);
 	put_task_struct(task);
 	if (exe_file) {
 		*exe_path = exe_file->f_path;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 8d51f69f9f5e..5dde52de7877 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -45,6 +45,7 @@ struct linux_binprm {
 		point_of_no_return:1;
 	struct file *executable; /* Executable to pass to the interpreter */
 	struct file *interpreter;
+	struct file *interpreted_file; /* only for binfmt_misc with flag I */
 	struct file *file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
@@ -75,6 +76,8 @@ struct linux_binprm {
 #define BINPRM_FLAGS_PRESERVE_ARGV0_BIT 3
 #define BINPRM_FLAGS_PRESERVE_ARGV0 (1 << BINPRM_FLAGS_PRESERVE_ARGV0_BIT)
 
+#define BINPRM_FLAGS_EXPOSE_INTERP_BIT 4
+#define BINPRM_FLAGS_EXPOSE_INTERP (1 << BINPRM_FLAGS_EXPOSE_INTERP_BIT)
 /*
  * This structure defines the functions that are used to load the binary formats that
  * linux accepts.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf5d0b1b16f4..a00b32906604 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3255,10 +3255,12 @@ static inline int check_data_rlimit(unsigned long rlim,
 extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
 
-extern int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
+extern int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file,
+			   struct file *new_interpreted_file);
 extern int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
-extern struct file *get_task_exe_file(struct task_struct *task);
+extern struct file *get_task_exe_file(struct task_struct *task,
+				      bool prefer_interpreted);
 
 extern bool may_expand_vm(struct mm_struct *, vm_flags_t, unsigned long npages);
 extern void vm_stat_account(struct mm_struct *, vm_flags_t, long npages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36c5b43999e6..346f81875f3e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -842,6 +842,7 @@ struct mm_struct {
 
 		/* store ref to file /proc/<pid>/exe symlink points to */
 		struct file __rcu *exe_file;
+		struct file __rcu *interpreted_file; /* see binfmt_misc flag I */
 #ifdef CONFIG_MMU_NOTIFIER
 		struct mmu_notifier_subscriptions *notifier_subscriptions;
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 16205dd29843..83c64c376c0c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2197,6 +2197,11 @@ void audit_log_d_path_exe(struct audit_buffer *ab,
 	if (!mm)
 		goto out_null;
 
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc
+	 * with flag 'I' set, this audit functionality will diverge from the
+	 * /proc/self/exe symlink with regards of what executable is running.
+	 */
 	exe_file = get_mm_exe_file(mm);
 	if (!exe_file)
 		goto out_null;
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 65075f1e4ac8..b8f947849fb2 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -527,7 +527,12 @@ int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark)
 	unsigned long ino;
 	dev_t dev;
 
-	exe_file = get_task_exe_file(tsk);
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): if using the binfmt_misc flag 'I', we diverge
+	 * here from proc_exe_link(), exposing the true exe_file (instead of the interpreted
+	 * binary as proc). Should we expose here the same exe_file as proc's one *always*?
+	 */
+	exe_file = get_task_exe_file(tsk, false);
 	if (!exe_file)
 		return 0;
 	ino = file_inode(exe_file)->i_ino;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3b6d20dfb9a8..8c4824dcc433 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -628,12 +628,47 @@ void free_task(struct task_struct *tsk)
 }
 EXPORT_SYMBOL(free_task);
 
-static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm)
+/**
+ * __get_mm_exe_or_interp_file - helper that acquires a reference to the mm's
+ * executable file, or if prefer_interp is set, go with mm->interpreted_file
+ * instead.
+ *
+ * Returns %NULL if mm has no associated executable/interpreted file.
+ * User must release file via fput().
+ */
+static inline struct file *__get_mm_exe_or_interp_file(struct mm_struct *mm,
+						       bool prefer_interp)
 {
 	struct file *exe_file;
 
+	rcu_read_lock();
+
+	if (unlikely(prefer_interp))
+		exe_file = rcu_dereference(mm->interpreted_file);
+	else
+		exe_file = rcu_dereference(mm->exe_file);
+
+	if (exe_file && !get_file_rcu(exe_file))
+		exe_file = NULL;
+	rcu_read_unlock();
+	return exe_file;
+}
+
+struct file *get_mm_exe_file(struct mm_struct *mm)
+{
+	return __get_mm_exe_or_interp_file(mm, false);
+}
+
+static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm)
+{
+	struct file *exe_file, *interp_file;
+
 	exe_file = get_mm_exe_file(oldmm);
 	RCU_INIT_POINTER(mm->exe_file, exe_file);
+
+	interp_file = __get_mm_exe_or_interp_file(oldmm, true);
+	RCU_INIT_POINTER(mm->interpreted_file, interp_file);
+
 	/*
 	 * We depend on the oldmm having properly denied write access to the
 	 * exe_file already.
@@ -1279,6 +1314,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm_init_owner(mm, p);
 	mm_pasid_init(mm);
 	RCU_INIT_POINTER(mm->exe_file, NULL);
+	RCU_INIT_POINTER(mm->interpreted_file, NULL);
 	mmu_notifier_subscriptions_init(mm);
 	init_tlb_flush_pending(mm);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
@@ -1348,7 +1384,7 @@ static inline void __mmput(struct mm_struct *mm)
 	khugepaged_exit(mm); /* must run before exit_mmap */
 	exit_mmap(mm);
 	mm_put_huge_zero_page(mm);
-	set_mm_exe_file(mm, NULL);
+	set_mm_exe_file(mm, NULL, NULL);
 	if (!list_empty(&mm->mmlist)) {
 		spin_lock(&mmlist_lock);
 		list_del(&mm->mmlist);
@@ -1394,7 +1430,9 @@ EXPORT_SYMBOL_GPL(mmput_async);
 /**
  * set_mm_exe_file - change a reference to the mm's executable file
  *
- * This changes mm's executable file (shown as symlink /proc/[pid]/exe).
+ * This changes mm's executable file (shown as symlink /proc/[pid]/exe),
+ * and if new_interpreted_file != NULL, also sets this field (check the
+ * binfmt_misc documentation, flag 'I', for details about this).
  *
  * Main users are mmput() and sys_execve(). Callers prevent concurrent
  * invocations: in mmput() nobody alive left, in execve it happens before
@@ -1402,16 +1440,19 @@ EXPORT_SYMBOL_GPL(mmput_async);
  *
  * Can only fail if new_exe_file != NULL.
  */
-int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
+int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file,
+		    struct file *new_interpreted_file)
 {
 	struct file *old_exe_file;
+	struct file *old_interpreted_file;
 
 	/*
-	 * It is safe to dereference the exe_file without RCU as
-	 * this function is only called if nobody else can access
-	 * this mm -- see comment above for justification.
+	 * It is safe to dereference exe_file / interpreted_file
+	 * without RCU as this function is only called if nobody else
+	 * can access this mm -- see comment above for justification.
 	 */
 	old_exe_file = rcu_dereference_raw(mm->exe_file);
+	old_interpreted_file = rcu_dereference_raw(mm->interpreted_file);
 
 	if (new_exe_file) {
 		/*
@@ -1423,10 +1464,20 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 		get_file(new_exe_file);
 	}
 	rcu_assign_pointer(mm->exe_file, new_exe_file);
+
+	/* For this one we don't care about write access... */
+	if (new_interpreted_file)
+		get_file(new_interpreted_file);
+	rcu_assign_pointer(mm->interpreted_file, new_interpreted_file);
+
 	if (old_exe_file) {
 		allow_write_access(old_exe_file);
 		fput(old_exe_file);
 	}
+
+	if (old_interpreted_file)
+		fput(old_interpreted_file);
+
 	return 0;
 }
 
@@ -1436,6 +1487,12 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
  * This changes mm's executable file (shown as symlink /proc/[pid]/exe).
  *
  * Main user is sys_prctl(PR_SET_MM_MAP/EXE_FILE).
+ *
+ * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): imagine user performs the sys_prctl()
+ * aiming to change /proc/self/exe symlink - suppose user is interested
+ * in the executable path itself. With binfmt_misc flag 'I', this change
+ * **won't reflect** since procfs make use of interpreted_file. What to do
+ * in this case? Do we care?
  */
 int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 {
@@ -1482,31 +1539,15 @@ int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 }
 
 /**
- * get_mm_exe_file - acquire a reference to the mm's executable file
- *
- * Returns %NULL if mm has no associated executable file.
- * User must release file via fput().
- */
-struct file *get_mm_exe_file(struct mm_struct *mm)
-{
-	struct file *exe_file;
-
-	rcu_read_lock();
-	exe_file = rcu_dereference(mm->exe_file);
-	if (exe_file && !get_file_rcu(exe_file))
-		exe_file = NULL;
-	rcu_read_unlock();
-	return exe_file;
-}
-
-/**
- * get_task_exe_file - acquire a reference to the task's executable file
+ * get_task_exe_file - acquire a reference to the task's executable or
+ * interpreted file (only for procfs, when under binfmt_misc with flag 'I').
  *
  * Returns %NULL if task's mm (if any) has no associated executable file or
  * this is a kernel thread with borrowed mm (see the comment above get_task_mm).
  * User must release file via fput().
  */
-struct file *get_task_exe_file(struct task_struct *task)
+struct file *get_task_exe_file(struct task_struct *task,
+			       bool prefer_interpreted)
 {
 	struct file *exe_file = NULL;
 	struct mm_struct *mm;
@@ -1514,8 +1555,14 @@ struct file *get_task_exe_file(struct task_struct *task)
 	task_lock(task);
 	mm = task->mm;
 	if (mm) {
-		if (!(task->flags & PF_KTHREAD))
-			exe_file = get_mm_exe_file(mm);
+		if (!(task->flags & PF_KTHREAD)) {
+			if (unlikely(prefer_interpreted)) {
+				exe_file = __get_mm_exe_or_interp_file(mm, true);
+				if (!exe_file)
+					exe_file = get_mm_exe_file(mm);
+			} else
+				exe_file = get_mm_exe_file(mm);
+		}
 	}
 	task_unlock(task);
 	return exe_file;
diff --git a/kernel/signal.c b/kernel/signal.c
index 09019017d669..3a8d85a65c49 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1263,7 +1263,12 @@ static void print_fatal_signal(int signr)
 	struct pt_regs *regs = task_pt_regs(current);
 	struct file *exe_file;
 
-	exe_file = get_task_exe_file(current);
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): if using the binfmt_misc flag 'I', we diverge
+	 * here from proc_exe_link(), exposing the true exe_file (instead of the interpreted
+	 * binary as proc). Should we expose here the same exe_file as proc's one *always*?
+	 */
+	exe_file = get_task_exe_file(current, false);
 	if (exe_file) {
 		pr_info("%pD: %s: potentially unexpected fatal signal %d.\n",
 			exe_file, current->comm, signr);
diff --git a/kernel/sys.c b/kernel/sys.c
index 2410e3999ebe..17fab2f71443 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1912,6 +1912,11 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	if (err)
 		goto exit;
 
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): please read the comment
+	 * on replace_mm_exe_file() to ponder about the divergence when
+	 * using binfmt_misc with flag 'I'.
+	 */
 	err = replace_mm_exe_file(mm, exe.file);
 exit:
 	fdput(exe);
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 8ce3fa0c19e2..a5d5afc1919a 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -157,7 +157,12 @@ static void send_cpu_listeners(struct sk_buff *skb,
 static void exe_add_tsk(struct taskstats *stats, struct task_struct *tsk)
 {
 	/* No idea if I'm allowed to access that here, now. */
-	struct file *exe_file = get_task_exe_file(tsk);
+	struct file *exe_file = get_task_exe_file(tsk, false);
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): if using the binfmt_misc flag 'I', we diverge
+	 * here from proc_exe_link(), exposing the true exe_file (instead of the interpreted
+	 * binary as proc). Should we expose here the same exe_file as proc's one *always*?
+	 */
 
 	if (exe_file) {
 		/* Following cp_new_stat64() in stat.c . */
diff --git a/security/tomoyo/util.c b/security/tomoyo/util.c
index 6799b1122c9d..844bdbd27240 100644
--- a/security/tomoyo/util.c
+++ b/security/tomoyo/util.c
@@ -971,6 +971,11 @@ const char *tomoyo_get_exe(void)
 
 	if (!mm)
 		return NULL;
+	/*
+	 * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc
+	 * with flag 'I' set, this tomoyo functionality will diverge from the
+	 * /proc/self/exe symlink with regards of what executable is running.
+	 */
 	exe_file = get_mm_exe_file(mm);
 	if (!exe_file)
 		return NULL;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink
  2023-09-07 20:24 [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
  2023-09-07 20:24 ` [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs Guilherme G. Piccoli
@ 2023-09-07 20:24 ` Guilherme G. Piccoli
  2023-10-10  5:38   ` kernel test robot
  2023-10-06  7:51 ` [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
  2023-10-06 12:07 ` David Hildenbrand
  3 siblings, 1 reply; 15+ messages in thread
From: Guilherme G. Piccoli @ 2023-09-07 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: linux-mm, kernel-dev, kernel, keescook, ebiederm, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, david, dave, sonicadvance1,
	joshua, Guilherme G. Piccoli

Currently we have the /proc/self/exe_file symlink, that points
to the executable binary of the running process - or to the
*interpreted* file if flag 'I' is set on binfmt_misc. In this
second case, we then lose the ability of having a symlink to
the interpreter.

Introduce hereby the /proc/self/interpreter symlink which always
points to the *interpreter* when we're under interpretation, like
on binfmt_misc use. We don't require to have a new file pointer
since mm_struct contains exe_file, which points to such interpreter.

Suggested-by: Ryan Houdek <sonicadvance1@gmail.com>
Tested-by: Ryan Houdek <sonicadvance1@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


Design choices / questions:

(a) The file /proc/self/interpreter is always present, and in
most cases, points to nothing (since we're running an ELF binary,
with no interpreter!). Is it OK? The implementation follows the
pattern of /proc/self/exe_file, returns -ENOENT when we have
no interpreter. I'm not sure if there's a way to implement this
as a procfs file that is not always visible, i.e., it would only
show up if such interpreter exists. If that would be possible,
is it better than the current implementation though? Also, we could
somehow make /proc/self/interpreter points to the LD preloader for
ELFs, if that's considered a better approach.

(b) I like xmacros to avoid repeating code, but I'm totally
fine changing that as well as naming of stuff.

(c) Should we extend functionality present on exe_file to
the interpreter, like prctl replacing mechanism, audit info
collection, etc?

Any feedback here is greatly appreciated - thanks in advance!
Cheers,

Guilherme


 fs/exec.c                |  8 +++++++
 fs/proc/base.c           | 48 ++++++++++++++++++++++++++--------------
 include/linux/binfmts.h  |  1 +
 include/linux/mm.h       |  1 +
 include/linux/mm_types.h |  1 +
 kernel/fork.c            | 26 ++++++++++++++++++++++
 6 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index bb1574f37b67..39f9c86d5ebc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1285,6 +1285,13 @@ int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	/*
+	 * In case we're in an interpreted scenario (like binfmt_misc,
+	 * for example), flag it on mm_struct in order to expose such
+	 * interpreter as the symlink /proc/self/interpreter.
+	 */
+	bprm->mm->has_interpreter = bprm->has_interpreter;
+
 	/* If the binary is not readable then enforce mm->dumpable=0 */
 	would_dump(bprm, bprm->file);
 	if (bprm->have_execfd)
@@ -1796,6 +1803,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 
 		exec = bprm->file;
 		bprm->file = bprm->interpreter;
+		bprm->has_interpreter = true;
 		bprm->interpreter = NULL;
 
 		allow_write_access(exec);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a13fbfc46997..ecd5ea05acb0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1719,25 +1719,39 @@ static const struct file_operations proc_pid_set_comm_operations = {
 	.release	= single_release,
 };
 
-static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
+static struct file *proc_get_task_exe_file(struct task_struct *task)
 {
-	struct task_struct *task;
-	struct file *exe_file;
-
-	task = get_proc_task(d_inode(dentry));
-	if (!task)
-		return -ENOENT;
-	exe_file = get_task_exe_file(task, true);
-	put_task_struct(task);
-	if (exe_file) {
-		*exe_path = exe_file->f_path;
-		path_get(&exe_file->f_path);
-		fput(exe_file);
-		return 0;
-	} else
-		return -ENOENT;
+	return get_task_exe_file(task, true);
 }
 
+static struct file *proc_get_task_interpreter_file(struct task_struct *task)
+{
+	return get_task_interpreter_file(task);
+}
+
+/* Definition of proc_exe_link and proc_interpreter_link. */
+#define PROC_GET_LINK_FUNC(type) \
+static int proc_##type##_link(struct dentry *dentry, struct path *path)	\
+{									\
+	struct task_struct *task;					\
+	struct file *file;						\
+									\
+	task = get_proc_task(d_inode(dentry));				\
+	if (!task)							\
+		return -ENOENT;						\
+	file = proc_get_task_##type##_file(task);			\
+	put_task_struct(task);						\
+	if (file) {							\
+		*path = file->f_path;					\
+		path_get(&file->f_path);				\
+		fput(file);						\
+		return 0;						\
+	} else								\
+		return -ENOENT;						\
+}
+PROC_GET_LINK_FUNC(exe);
+PROC_GET_LINK_FUNC(interpreter);
+
 static const char *proc_pid_get_link(struct dentry *dentry,
 				     struct inode *inode,
 				     struct delayed_call *done)
@@ -3276,6 +3290,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	LNK("cwd",        proc_cwd_link),
 	LNK("root",       proc_root_link),
 	LNK("exe",        proc_exe_link),
+	LNK("interpreter", proc_interpreter_link),
 	REG("mounts",     S_IRUGO, proc_mounts_operations),
 	REG("mountinfo",  S_IRUGO, proc_mountinfo_operations),
 	REG("mountstats", S_IRUSR, proc_mountstats_operations),
@@ -3626,6 +3641,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	LNK("cwd",       proc_cwd_link),
 	LNK("root",      proc_root_link),
 	LNK("exe",       proc_exe_link),
+	LNK("interpreter", proc_interpreter_link),
 	REG("mounts",    S_IRUGO, proc_mounts_operations),
 	REG("mountinfo",  S_IRUGO, proc_mountinfo_operations),
 #ifdef CONFIG_PROC_PAGE_MONITOR
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 5dde52de7877..2362c6bc6ead 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -46,6 +46,7 @@ struct linux_binprm {
 	struct file *executable; /* Executable to pass to the interpreter */
 	struct file *interpreter;
 	struct file *interpreted_file; /* only for binfmt_misc with flag I */
+	bool has_interpreter; /* In order to expose /proc/self/interpreter */
 	struct file *file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00b32906604..e06b703db494 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3261,6 +3261,7 @@ extern int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
 extern struct file *get_task_exe_file(struct task_struct *task,
 				      bool prefer_interpreted);
+extern struct file *get_task_interpreter_file(struct task_struct *task);
 
 extern bool may_expand_vm(struct mm_struct *, vm_flags_t, unsigned long npages);
 extern void vm_stat_account(struct mm_struct *, vm_flags_t, long npages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 346f81875f3e..19a73c41991c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -843,6 +843,7 @@ struct mm_struct {
 		/* store ref to file /proc/<pid>/exe symlink points to */
 		struct file __rcu *exe_file;
 		struct file __rcu *interpreted_file; /* see binfmt_misc flag I */
+		bool has_interpreter; /* exposes (or not) /proc/self/interpreter */
 #ifdef CONFIG_MMU_NOTIFIER
 		struct mmu_notifier_subscriptions *notifier_subscriptions;
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c4824dcc433..5cb542f92d5e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1568,6 +1568,32 @@ struct file *get_task_exe_file(struct task_struct *task,
 	return exe_file;
 }
 
+/**
+ * get_task_interpreter_file - acquire a reference to the task's *interpreter*
+ * executable (which is in fact the exe_file on mm_struct!). This is used in
+ * order to expose /proc/self/interpreter, if we're under an interpreted
+ * scenario (like binfmt_misc).
+ *
+ * Returns %NULL if exe_file is not an interpreter (i.e., it is the truly
+ * running binary indeed).
+ */
+struct file *get_task_interpreter_file(struct task_struct *task)
+{
+	struct file *interpreter_file = NULL;
+	struct mm_struct *mm;
+
+	task_lock(task);
+
+	mm = task->mm;
+	if (mm && mm->has_interpreter) {
+		if (!(task->flags & PF_KTHREAD))
+			interpreter_file = get_mm_exe_file(mm);
+	}
+
+	task_unlock(task);
+	return interpreter_file;
+}
+
 /**
  * get_task_mm - acquire a reference to the task's mm
  *
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-09-07 20:24 [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
  2023-09-07 20:24 ` [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs Guilherme G. Piccoli
  2023-09-07 20:24 ` [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink Guilherme G. Piccoli
@ 2023-10-06  7:51 ` Guilherme G. Piccoli
  2023-10-06 12:07 ` David Hildenbrand
  3 siblings, 0 replies; 15+ messages in thread
From: Guilherme G. Piccoli @ 2023-10-06  7:51 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: linux-mm, kernel-dev, kernel, keescook, ebiederm, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, david, dave, sonicadvance1,
	joshua

On 07/09/2023 22:24, Guilherme G. Piccoli wrote:
> Currently the kernel provides a symlink to the executable binary, in the
> form of procfs file exe_file (/proc/self/exe_file for example). But what
> happens in interpreted scenarios (like binfmt_misc) is that such link
> always points to the *interpreter*. For cases of Linux binary emulators,
> like FEX [0] for example, it's then necessary to somehow mask that and
> emulate the true binary path.
> 
> We hereby propose a way to expose such interpreted binary as exe_file if
> the flag 'I' is selected on binfmt_misc. When that flag is set, the file
> /proc/self/exe_file points to the *interpreted* file, be it ELF or not.
> In order to allow users to distinguish if such flag is used or not without
> checking the binfmt_misc filesystem, we propose also the /proc/self/interpreter
> file, which always points to the *interpreter* in scenarios where
> interpretation is set, like binfmt_misc. This file is empty / points to
> nothing in the case of regular ELF execution, though we could consider
> implementing a way to point to the LD preloader if that makes sense...
> 
> This was sent as RFC because of course it's a very core change, affecting
> multiple areas and there are design choices (and questions) in each patch
> so we could discuss and check the best way to implement the solution as
> well as the corner cases handling. This is a very useful feature for
> emulators and such, like FEX and Wine, which usually need to circumvent
> this kernel limitation in order to expose the true emulated file name
> (more examples at [1][2][3]).
> 
> This patchset is based on the currently v6.6-rc1 candidate (Linus tree
> from yesterday) and was tested under QEMU as well as using FEX.
> Thanks in advance for comments, any feedback is greatly appreciated!
> Cheers,
> 
> Guilherme
> 
> 
> [0] https://github.com/FEX-Emu/FEX
> 
> [1] Using an environment variable trick to override exe_file:
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/u_process.c#L209 
> 
> [2] https://github.com/baldurk/renderdoc/pull/2694
> 
> [3] FEX handling of the exe_file parsing:
> https://github.com/FEX-Emu/FEX/blob/main/Source/Tools/FEXLoader/LinuxSyscalls/FileManagement.cpp#L499
> 
> 

Hi folks, gentle monthly ping.
Any opinions / suggestions on that?

Thanks in advance,


Guilherme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-09-07 20:24 [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
                   ` (2 preceding siblings ...)
  2023-10-06  7:51 ` [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
@ 2023-10-06 12:07 ` David Hildenbrand
  2023-10-09 17:37   ` Kees Cook
  3 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2023-10-06 12:07 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-kernel, linux-fsdevel
  Cc: linux-mm, kernel-dev, kernel, keescook, ebiederm, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, dave, sonicadvance1, joshua

On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> Currently the kernel provides a symlink to the executable binary, in the
> form of procfs file exe_file (/proc/self/exe_file for example). But what
> happens in interpreted scenarios (like binfmt_misc) is that such link
> always points to the *interpreter*. For cases of Linux binary emulators,
> like FEX [0] for example, it's then necessary to somehow mask that and
> emulate the true binary path.

I'm absolutely no expert on that, but I'm wondering if, instead of 
modifying exe_file and adding an interpreter file, you'd want to leave 
exe_file alone and instead provide an easier way to obtain the 
interpreted file.

Can you maybe describe why modifying exe_file is desired (about which 
consumers are we worrying? ) and what exactly FEX does to handle that 
(how does it mask that?).

So a bit more background on the challenges without this change would be 
appreciated.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-10-06 12:07 ` David Hildenbrand
@ 2023-10-09 17:37   ` Kees Cook
  2023-10-11 23:53     ` Ryan Houdek
  2023-11-13 17:33     ` Guilherme G. Piccoli
  0 siblings, 2 replies; 15+ messages in thread
From: Kees Cook @ 2023-10-09 17:37 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: David Hildenbrand, linux-kernel, linux-fsdevel, linux-mm,
	kernel-dev, kernel, ebiederm, oleg, yzaikin, mcgrof, akpm,
	brauner, viro, willy, dave, sonicadvance1, joshua

On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> > Currently the kernel provides a symlink to the executable binary, in the
> > form of procfs file exe_file (/proc/self/exe_file for example). But what
> > happens in interpreted scenarios (like binfmt_misc) is that such link
> > always points to the *interpreter*. For cases of Linux binary emulators,
> > like FEX [0] for example, it's then necessary to somehow mask that and
> > emulate the true binary path.
> 
> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
> exe_file and adding an interpreter file, you'd want to leave exe_file alone
> and instead provide an easier way to obtain the interpreted file.
> 
> Can you maybe describe why modifying exe_file is desired (about which
> consumers are we worrying? ) and what exactly FEX does to handle that (how
> does it mask that?).
> 
> So a bit more background on the challenges without this change would be
> appreciated.

Yeah, it sounds like you're dealing with a process that examines
/proc/self/exe_file for itself only to find the binfmt_misc interpreter
when it was run via binfmt_misc?

What actually breaks? Or rather, why does the process to examine
exe_file? I'm just trying to see if there are other solutions here that
would avoid creating an ambiguous interface...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs
  2023-09-07 20:24 ` [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs Guilherme G. Piccoli
@ 2023-10-10  4:31   ` kernel test robot
  0 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2023-10-10  4:31 UTC (permalink / raw)
  To: Guilherme G. Piccoli; +Cc: oe-kbuild-all

Hi Guilherme,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on pcmoore-audit/next linus/master v6.6-rc5]
[cannot apply to kees/for-next/execve next-20231009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Guilherme-G-Piccoli/binfmt_misc-fork-proc-Introduce-flag-to-expose-the-interpreted-binary-in-procfs/20230908-044529
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230907204256.3700336-2-gpiccoli%40igalia.com
patch subject: [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20231010/202310101213.5jXkSFZh-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231010/202310101213.5jXkSFZh-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310101213.5jXkSFZh-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/fork.c:641: warning: Function parameter or member 'mm' not described in '__get_mm_exe_or_interp_file'
>> kernel/fork.c:641: warning: Function parameter or member 'prefer_interp' not described in '__get_mm_exe_or_interp_file'
   kernel/fork.c:1445: warning: Function parameter or member 'mm' not described in 'set_mm_exe_file'
   kernel/fork.c:1445: warning: Function parameter or member 'new_exe_file' not described in 'set_mm_exe_file'
>> kernel/fork.c:1445: warning: Function parameter or member 'new_interpreted_file' not described in 'set_mm_exe_file'
   kernel/fork.c:1498: warning: Function parameter or member 'mm' not described in 'replace_mm_exe_file'
   kernel/fork.c:1498: warning: Function parameter or member 'new_exe_file' not described in 'replace_mm_exe_file'
   kernel/fork.c:1551: warning: Function parameter or member 'task' not described in 'get_task_exe_file'
>> kernel/fork.c:1551: warning: Function parameter or member 'prefer_interpreted' not described in 'get_task_exe_file'
   kernel/fork.c:1581: warning: Function parameter or member 'task' not described in 'get_task_mm'
   kernel/fork.c:2156: warning: bad line: 
   kernel/fork.c:2177: warning: Function parameter or member 'ret' not described in '__pidfd_prepare'
   kernel/fork.c:2177: warning: Excess function parameter 'pidfd' description in '__pidfd_prepare'
   kernel/fork.c:2226: warning: Function parameter or member 'ret' not described in 'pidfd_prepare'
   kernel/fork.c:2226: warning: Excess function parameter 'pidfd' description in 'pidfd_prepare'
   kernel/fork.c:3235: warning: expecting prototype for clone3(). Prototype was for sys_clone3() instead


vim +641 kernel/fork.c

^1da177e4c3f41 Linus Torvalds       2005-04-16  630  
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  631  /**
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  632   * __get_mm_exe_or_interp_file - helper that acquires a reference to the mm's
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  633   * executable file, or if prefer_interp is set, go with mm->interpreted_file
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  634   * instead.
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  635   *
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  636   * Returns %NULL if mm has no associated executable/interpreted file.
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  637   * User must release file via fput().
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  638   */
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  639  static inline struct file *__get_mm_exe_or_interp_file(struct mm_struct *mm,
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  640  						       bool prefer_interp)
fe69d560b5bd9e David Hildenbrand    2021-04-23 @641  {
fe69d560b5bd9e David Hildenbrand    2021-04-23  642  	struct file *exe_file;
fe69d560b5bd9e David Hildenbrand    2021-04-23  643  
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  644  	rcu_read_lock();
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  645  
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  646  	if (unlikely(prefer_interp))
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  647  		exe_file = rcu_dereference(mm->interpreted_file);
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  648  	else
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  649  		exe_file = rcu_dereference(mm->exe_file);
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  650  
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  651  	if (exe_file && !get_file_rcu(exe_file))
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  652  		exe_file = NULL;
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  653  	rcu_read_unlock();
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  654  	return exe_file;
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  655  }
b7ff0c42f1987b Guilherme G. Piccoli 2023-09-07  656  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink
  2023-09-07 20:24 ` [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink Guilherme G. Piccoli
@ 2023-10-10  5:38   ` kernel test robot
  0 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2023-10-10  5:38 UTC (permalink / raw)
  To: Guilherme G. Piccoli; +Cc: oe-kbuild-all

Hi Guilherme,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on pcmoore-audit/next linus/master v6.6-rc5]
[cannot apply to kees/for-next/execve next-20231009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Guilherme-G-Piccoli/binfmt_misc-fork-proc-Introduce-flag-to-expose-the-interpreted-binary-in-procfs/20230908-044529
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230907204256.3700336-3-gpiccoli%40igalia.com
patch subject: [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20231010/202310101349.prZqmt5R-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231010/202310101349.prZqmt5R-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310101349.prZqmt5R-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/fork.c:641: warning: Function parameter or member 'mm' not described in '__get_mm_exe_or_interp_file'
   kernel/fork.c:641: warning: Function parameter or member 'prefer_interp' not described in '__get_mm_exe_or_interp_file'
   kernel/fork.c:1445: warning: Function parameter or member 'mm' not described in 'set_mm_exe_file'
   kernel/fork.c:1445: warning: Function parameter or member 'new_exe_file' not described in 'set_mm_exe_file'
   kernel/fork.c:1445: warning: Function parameter or member 'new_interpreted_file' not described in 'set_mm_exe_file'
   kernel/fork.c:1498: warning: Function parameter or member 'mm' not described in 'replace_mm_exe_file'
   kernel/fork.c:1498: warning: Function parameter or member 'new_exe_file' not described in 'replace_mm_exe_file'
   kernel/fork.c:1551: warning: Function parameter or member 'task' not described in 'get_task_exe_file'
   kernel/fork.c:1551: warning: Function parameter or member 'prefer_interpreted' not described in 'get_task_exe_file'
>> kernel/fork.c:1581: warning: Function parameter or member 'task' not described in 'get_task_interpreter_file'
   kernel/fork.c:1607: warning: Function parameter or member 'task' not described in 'get_task_mm'
   kernel/fork.c:2182: warning: bad line: 
   kernel/fork.c:2203: warning: Function parameter or member 'ret' not described in '__pidfd_prepare'
   kernel/fork.c:2203: warning: Excess function parameter 'pidfd' description in '__pidfd_prepare'
   kernel/fork.c:2252: warning: Function parameter or member 'ret' not described in 'pidfd_prepare'
   kernel/fork.c:2252: warning: Excess function parameter 'pidfd' description in 'pidfd_prepare'
   kernel/fork.c:3261: warning: expecting prototype for clone3(). Prototype was for sys_clone3() instead


vim +1581 kernel/fork.c

  1570	
  1571	/**
  1572	 * get_task_interpreter_file - acquire a reference to the task's *interpreter*
  1573	 * executable (which is in fact the exe_file on mm_struct!). This is used in
  1574	 * order to expose /proc/self/interpreter, if we're under an interpreted
  1575	 * scenario (like binfmt_misc).
  1576	 *
  1577	 * Returns %NULL if exe_file is not an interpreter (i.e., it is the truly
  1578	 * running binary indeed).
  1579	 */
  1580	struct file *get_task_interpreter_file(struct task_struct *task)
> 1581	{
  1582		struct file *interpreter_file = NULL;
  1583		struct mm_struct *mm;
  1584	
  1585		task_lock(task);
  1586	
  1587		mm = task->mm;
  1588		if (mm && mm->has_interpreter) {
  1589			if (!(task->flags & PF_KTHREAD))
  1590				interpreter_file = get_mm_exe_file(mm);
  1591		}
  1592	
  1593		task_unlock(task);
  1594		return interpreter_file;
  1595	}
  1596	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-10-09 17:37   ` Kees Cook
@ 2023-10-11 23:53     ` Ryan Houdek
  2023-11-13 17:33     ` Guilherme G. Piccoli
  1 sibling, 0 replies; 15+ messages in thread
From: Ryan Houdek @ 2023-10-11 23:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Guilherme G. Piccoli, David Hildenbrand, linux-kernel,
	linux-fsdevel, linux-mm, kernel-dev, kernel, ebiederm, oleg,
	yzaikin, mcgrof, akpm, brauner, viro, willy, dave, joshua

On Mon, Oct 9, 2023 at 10:37 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
> > On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> > > Currently the kernel provides a symlink to the executable binary, in the
> > > form of procfs file exe_file (/proc/self/exe_file for example). But what
> > > happens in interpreted scenarios (like binfmt_misc) is that such link
> > > always points to the *interpreter*. For cases of Linux binary emulators,
> > > like FEX [0] for example, it's then necessary to somehow mask that and
> > > emulate the true binary path.
> >
> > I'm absolutely no expert on that, but I'm wondering if, instead of modifying
> > exe_file and adding an interpreter file, you'd want to leave exe_file alone
> > and instead provide an easier way to obtain the interpreted file.
> >
> > Can you maybe describe why modifying exe_file is desired (about which
> > consumers are we worrying? ) and what exactly FEX does to handle that (how
> > does it mask that?).
> >
> > So a bit more background on the challenges without this change would be
> > appreciated.
>
> Yeah, it sounds like you're dealing with a process that examines
> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
> when it was run via binfmt_misc?
>
> What actually breaks? Or rather, why does the process to examine
> exe_file? I'm just trying to see if there are other solutions here that
> would avoid creating an ambiguous interface...
>
> --
> Kees Cook

Hey there, FEX-Emu developer here. I can try and explain some of the issues.

First thing is that we should set the stage here that there is a
fundamental discrepancy
between how ELF interpreters are represented versus binfmt_misc
interpreters when it
comes to procfs exe. An ELF file today can either be static or dynamic, with the
dynamic ELF files having a program header called PT_INTERP which will tell the
kernel where its interpreter executable lives. In an x86-64 environment this
is likely to be something like /lib64/ld-linux-x86-64.so.2. Today, the Kernel
doesn't put the PT_INTERP handle into procfs exe, it instead uses the
dynamic ELF
that was originally launched.

In contrast to how this behaviour works, a binfmt_misc interpreter
file getting launched
through execve may or may not have ELF header sections. But it is left up to the
binfmt_misc handler to do whatever it may need. The kernel sets procfs
exe to the
binfmt_misc interpreter instead of the executable.

This is fundamentally the contrasting behaviour that is trying to be
improved. It seems
like the this behaviour is an oversight of the original binfmt_misc
implementation
rather than any sort of ambition to ensure there is a difference. It's
already ambiguous
that the interface changes when executing an executable through binfmt_misc.

Some simple ways applications break:
- Applications like chrome tend to relaunch themselves through execve
with `/proc/self/exe`
  - Chrome does this. I think Flatpaks or AppImage applications do this?
  - There are definitely more that do this that I have noticed.
- In the cover letter there was a link to Mesa, the OSS OpenGL/Vulkan
drivers using this
  - This library uses this interface to find out what application is
running for applying
     workarounds for application bugs. Plenty of historical
applications that use the API
     badly or incorrectly and need specific driver workarounds for them.
- Some applications may use this path to open their own executable path and then
   mmap back in for doing tricky memory mirroring or dynamic linking
of themselves.
   - Saw some old abandoned emulator software doing this.

There's likely more uses that I haven't noticed from software using
this interface.

Onward to what FEX-Emu is and how it tries working around the issue
with a fairly naive hack.
FEX-Emu is an x86 and x86-64 CPU emulator that gets installed as a
binfmt_misc interpreter.
It then executes x86 and x86-64 ELF files on an Arm64 device as
effectively a multi-arch
capable fashion. It's lightweight in that all application processes
and threads are just
regular Arm64 processes and threads. This is similar to how qemu-user operates.

When processing system calls, FEX will intercept any call that
consumes a pathname,
it will then inspect that path name and if it is one of the ways it is
possible to access
procfs/exe then it redirects to the true x86/x86-64 executable. This
is an attempt to behave
like how if the ELF was executed without a binfmt_misc handler.

Pathnames captured in FEX-Emu today:
- /proc/self/exe
- /proc/<pid>/exe
- /proc/thread-self/exe

This is very fragile and doesn't cover the full range of how
applications could access procfs.
Applications could end up using the *at variants of syscalls with an
FD that has /proc/self/
open. They could do simple tricks like `/proc/self/../self/exe` and it
would side-step this check.
It's a game of whack-a-mole and escalating overhead to try and close
the gap purely due
to, what appears to be, an oversight in how binfmt_misc and PT_INTERP
is handled.

Hopefully this explains why this is necessary and that reducing the
differences between
how PT_INTERP and binfmt_misc are represented is desired.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-10-09 17:37   ` Kees Cook
  2023-10-11 23:53     ` Ryan Houdek
@ 2023-11-13 17:33     ` Guilherme G. Piccoli
  2023-11-13 18:29       ` Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Guilherme G. Piccoli @ 2023-11-13 17:33 UTC (permalink / raw)
  To: Kees Cook, David Hildenbrand, sonicadvance1
  Cc: linux-kernel, linux-fsdevel, linux-mm, kernel-dev, kernel,
	ebiederm, oleg, yzaikin, mcgrof, akpm, brauner, viro, willy,
	dave, joshua

On 09/10/2023 14:37, Kees Cook wrote:
> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>> Currently the kernel provides a symlink to the executable binary, in the
>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>> emulate the true binary path.
>>
>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>> and instead provide an easier way to obtain the interpreted file.
>>
>> Can you maybe describe why modifying exe_file is desired (about which
>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>> does it mask that?).
>>
>> So a bit more background on the challenges without this change would be
>> appreciated.
> 
> Yeah, it sounds like you're dealing with a process that examines
> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
> when it was run via binfmt_misc?
> 
> What actually breaks? Or rather, why does the process to examine
> exe_file? I'm just trying to see if there are other solutions here that
> would avoid creating an ambiguous interface...
> 

Thanks Kees and David! Did Ryan's thorough comment addressed your
questions? Do you have any take on the TODOs?

I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
But would be better having the TODOs addressed, I guess.

Thanks in advance for reviews and feedback on this.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-11-13 17:33     ` Guilherme G. Piccoli
@ 2023-11-13 18:29       ` Eric W. Biederman
  2023-11-13 19:16         ` David Hildenbrand
  2023-11-13 19:17         ` Guilherme G. Piccoli
  0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2023-11-13 18:29 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Kees Cook, David Hildenbrand, sonicadvance1, linux-kernel,
	linux-fsdevel, linux-mm, kernel-dev, kernel, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, dave, joshua

"Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:

> On 09/10/2023 14:37, Kees Cook wrote:
>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>> emulate the true binary path.
>>>
>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>> and instead provide an easier way to obtain the interpreted file.
>>>
>>> Can you maybe describe why modifying exe_file is desired (about which
>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>> does it mask that?).
>>>
>>> So a bit more background on the challenges without this change would be
>>> appreciated.
>> 
>> Yeah, it sounds like you're dealing with a process that examines
>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>> when it was run via binfmt_misc?
>> 
>> What actually breaks? Or rather, why does the process to examine
>> exe_file? I'm just trying to see if there are other solutions here that
>> would avoid creating an ambiguous interface...
>> 
>
> Thanks Kees and David! Did Ryan's thorough comment addressed your
> questions? Do you have any take on the TODOs?
>
> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
> But would be better having the TODOs addressed, I guess.

Currently there is a mechanism in the kernel for changing
/proc/self/exe.  Would that be reasonable to use in this case?

It came from the checkpoint/restart work, but given that it is already
implemented it seems like the path of least resistance to get your
binfmt_misc that wants to look like binfmt_elf to use that mechanism.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-11-13 18:29       ` Eric W. Biederman
@ 2023-11-13 19:16         ` David Hildenbrand
  2023-11-14 16:11           ` Eric W. Biederman
  2023-11-13 19:17         ` Guilherme G. Piccoli
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2023-11-13 19:16 UTC (permalink / raw)
  To: Eric W. Biederman, Guilherme G. Piccoli
  Cc: Kees Cook, sonicadvance1, linux-kernel, linux-fsdevel, linux-mm,
	kernel-dev, kernel, oleg, yzaikin, mcgrof, akpm, brauner, viro,
	willy, dave, joshua

On 13.11.23 19:29, Eric W. Biederman wrote:
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
> 
>> On 09/10/2023 14:37, Kees Cook wrote:
>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>>> emulate the true binary path.
>>>>
>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>>> and instead provide an easier way to obtain the interpreted file.
>>>>
>>>> Can you maybe describe why modifying exe_file is desired (about which
>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>>> does it mask that?).
>>>>
>>>> So a bit more background on the challenges without this change would be
>>>> appreciated.
>>>
>>> Yeah, it sounds like you're dealing with a process that examines
>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>>> when it was run via binfmt_misc?
>>>
>>> What actually breaks? Or rather, why does the process to examine
>>> exe_file? I'm just trying to see if there are other solutions here that
>>> would avoid creating an ambiguous interface...
>>>
>>
>> Thanks Kees and David! Did Ryan's thorough comment addressed your
>> questions? Do you have any take on the TODOs?
>>
>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
>> But would be better having the TODOs addressed, I guess.
> 
> Currently there is a mechanism in the kernel for changing
> /proc/self/exe.  Would that be reasonable to use in this case?
> 
> It came from the checkpoint/restart work, but given that it is already
> implemented it seems like the path of least resistance to get your
> binfmt_misc that wants to look like binfmt_elf to use that mechanism.

I had that in mind as well, but 
prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable 
is still mmaped (due to denywrite handling); that should be the case for 
the emulator I strongly assume.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-11-13 18:29       ` Eric W. Biederman
  2023-11-13 19:16         ` David Hildenbrand
@ 2023-11-13 19:17         ` Guilherme G. Piccoli
  1 sibling, 0 replies; 15+ messages in thread
From: Guilherme G. Piccoli @ 2023-11-13 19:17 UTC (permalink / raw)
  To: Eric W. Biederman, sonicadvance1
  Cc: Kees Cook, David Hildenbrand, linux-kernel, linux-fsdevel,
	linux-mm, kernel-dev, kernel, oleg, yzaikin, mcgrof, akpm,
	brauner, viro, willy, dave, joshua

On 13/11/2023 15:29, Eric W. Biederman wrote:
> [...]
> Currently there is a mechanism in the kernel for changing
> /proc/self/exe.  Would that be reasonable to use in this case?
> 
> It came from the checkpoint/restart work, but given that it is already
> implemented it seems like the path of least resistance to get your
> binfmt_misc that wants to look like binfmt_elf to use that mechanism.
> 
> Eric
> 

Thanks Eric! I'm curious on how that would work: we'd change the symlink
of the emulator? So, the *emulated* software, when reading that, would
see the correct symlink?

Also, just to fully clarify: are you suggesting we hook the new
binfmt_misc flag proposed here to the internal kernel way of changing
the proc/self/exe symlink, or are you suggesting we use the prctl() tune
from the emulator, like the userspace changing its own symlink?

One of the biggest concerns I have with this kind of approach is that
changing the symlink actually...changes it - the binary mapping itself,
I mean.
Whereas my way was a "fake" change, just expose one thing for the
emulated app, but changes nothing else...

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-11-13 19:16         ` David Hildenbrand
@ 2023-11-14 16:11           ` Eric W. Biederman
  2023-11-14 16:14             ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2023-11-14 16:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Guilherme G. Piccoli, Kees Cook, sonicadvance1, linux-kernel,
	linux-fsdevel, linux-mm, kernel-dev, kernel, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, dave, joshua

David Hildenbrand <david@redhat.com> writes:

> On 13.11.23 19:29, Eric W. Biederman wrote:
>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>> 
>>> On 09/10/2023 14:37, Kees Cook wrote:
>>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>>>> emulate the true binary path.
>>>>>
>>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>>>> and instead provide an easier way to obtain the interpreted file.
>>>>>
>>>>> Can you maybe describe why modifying exe_file is desired (about which
>>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>>>> does it mask that?).
>>>>>
>>>>> So a bit more background on the challenges without this change would be
>>>>> appreciated.
>>>>
>>>> Yeah, it sounds like you're dealing with a process that examines
>>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>>>> when it was run via binfmt_misc?
>>>>
>>>> What actually breaks? Or rather, why does the process to examine
>>>> exe_file? I'm just trying to see if there are other solutions here that
>>>> would avoid creating an ambiguous interface...
>>>>
>>>
>>> Thanks Kees and David! Did Ryan's thorough comment addressed your
>>> questions? Do you have any take on the TODOs?
>>>
>>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
>>> But would be better having the TODOs addressed, I guess.
>> Currently there is a mechanism in the kernel for changing
>> /proc/self/exe.  Would that be reasonable to use in this case?
>> It came from the checkpoint/restart work, but given that it is
>> already
>> implemented it seems like the path of least resistance to get your
>> binfmt_misc that wants to look like binfmt_elf to use that mechanism.
>
> I had that in mind as well, but
> prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable
> is still mmaped (due to denywrite handling); that should be the case
> for the emulator I strongly assume.

Bah yes.  The sanity check that that the old executable is no longer
mapped does make it so that we can't trivially change the /proc/self/exe
using prctl(PR_SET_MM_EXE_FILE).

Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc
  2023-11-14 16:11           ` Eric W. Biederman
@ 2023-11-14 16:14             ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2023-11-14 16:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Guilherme G. Piccoli, Kees Cook, sonicadvance1, linux-kernel,
	linux-fsdevel, linux-mm, kernel-dev, kernel, oleg, yzaikin,
	mcgrof, akpm, brauner, viro, willy, dave, joshua

On 14.11.23 17:11, Eric W. Biederman wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 13.11.23 19:29, Eric W. Biederman wrote:
>>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>>>
>>>> On 09/10/2023 14:37, Kees Cook wrote:
>>>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>>>>> emulate the true binary path.
>>>>>>
>>>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>>>>> and instead provide an easier way to obtain the interpreted file.
>>>>>>
>>>>>> Can you maybe describe why modifying exe_file is desired (about which
>>>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>>>>> does it mask that?).
>>>>>>
>>>>>> So a bit more background on the challenges without this change would be
>>>>>> appreciated.
>>>>>
>>>>> Yeah, it sounds like you're dealing with a process that examines
>>>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>>>>> when it was run via binfmt_misc?
>>>>>
>>>>> What actually breaks? Or rather, why does the process to examine
>>>>> exe_file? I'm just trying to see if there are other solutions here that
>>>>> would avoid creating an ambiguous interface...
>>>>>
>>>>
>>>> Thanks Kees and David! Did Ryan's thorough comment addressed your
>>>> questions? Do you have any take on the TODOs?
>>>>
>>>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
>>>> But would be better having the TODOs addressed, I guess.
>>> Currently there is a mechanism in the kernel for changing
>>> /proc/self/exe.  Would that be reasonable to use in this case?
>>> It came from the checkpoint/restart work, but given that it is
>>> already
>>> implemented it seems like the path of least resistance to get your
>>> binfmt_misc that wants to look like binfmt_elf to use that mechanism.
>>
>> I had that in mind as well, but
>> prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable
>> is still mmaped (due to denywrite handling); that should be the case
>> for the emulator I strongly assume.
> 
> Bah yes.  The sanity check that that the old executable is no longer
> mapped does make it so that we can't trivially change the /proc/self/exe
> using prctl(PR_SET_MM_EXE_FILE).

I was wondering if we should have a new file (yet have to come up witha 
fitting name) that defaults to /proc/self/exe as long as that new file 
doesn't explicitly get set via  a prctl.

So /proc/self/exe would indeed always show the emulator (executable), 
but the new file could be adjusted to something that is being executed 
by the emulator.

Just a thought ... I'd rather leave /proc/self/exe alone.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-14 16:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-07 20:24 [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
2023-09-07 20:24 ` [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs Guilherme G. Piccoli
2023-10-10  4:31   ` kernel test robot
2023-09-07 20:24 ` [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink Guilherme G. Piccoli
2023-10-10  5:38   ` kernel test robot
2023-10-06  7:51 ` [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc Guilherme G. Piccoli
2023-10-06 12:07 ` David Hildenbrand
2023-10-09 17:37   ` Kees Cook
2023-10-11 23:53     ` Ryan Houdek
2023-11-13 17:33     ` Guilherme G. Piccoli
2023-11-13 18:29       ` Eric W. Biederman
2023-11-13 19:16         ` David Hildenbrand
2023-11-14 16:11           ` Eric W. Biederman
2023-11-14 16:14             ` David Hildenbrand
2023-11-13 19:17         ` Guilherme G. Piccoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.