All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] linux-user changes to run docker
@ 2021-05-31  5:50 YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve YAMAMOTO Takashi
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi

These patches allowed me to run arm64 and armv7 version of
dind image on amd64.

This patchset includes a few patches marked [!MERGE],
which are not for the upsteam merge. They are included here just to
show the context to reviewers.

You can find my test setup here:
https://github.com/yamt/garbage/tree/master/binfmt-aarch64-install

YAMAMOTO Takashi (11):
  linux-user: handle /proc/self/exe for execve
  linux-user: Fix the execfd case of /proc/self/exe open
  linux-user: dup the execfd on start up
  linux-user: make exec_path realpath
  linux-user: Implement pivot_root
  linux-user: add get_exe_path
  linux-user: simplify is_proc_myself
  linux-user: Implement exec of /proc/$pid/exe of qemu process
  linux-user: Make the qemu detection for /proc/$pid/exe a bit
    conservative
  linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE]
  linux-user: always assume preserve_argv0 for now [!MERGE]

 linux-user/main.c    |  57 ++++++++++++++-
 linux-user/qemu.h    |   2 +
 linux-user/syscall.c | 171 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 219 insertions(+), 11 deletions(-)

-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-06-20 14:14   ` Laurent Vivier
  2021-05-31  5:50 ` [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open YAMAMOTO Takashi
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

It seems somehow common to execve /proc/self/exe in docker
or golang community these days.
At least, moby "reexec" and runc "libcontainer" do that.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c9f812091c..a2b03ecb8b 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8470,6 +8470,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
 #endif
     case TARGET_NR_execve:
         {
+            const char *path;
             char **argp, **envp;
             int argc, envc;
             abi_ulong gp;
@@ -8537,7 +8538,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
              * before the execve completes and makes it the other
              * program's problem.
              */
-            ret = get_errno(safe_execve(p, argp, envp));
+            path = p;
+            if (is_proc_myself(path, "exe")) {
+                path = exec_path;
+            }
+            ret = get_errno(safe_execve(path, argp, envp));
             unlock_user(p, arg1, 0);
 
             goto execve_end;
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-06-20 14:16   ` Laurent Vivier
  2021-05-31  5:50 ` [PATCH v2 03/11] linux-user: dup the execfd on start up YAMAMOTO Takashi
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

It's problematic to return AT_EXECFD as it is because the user app
would close it.
This patch opens it via /proc/self/fd instead.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a2b03ecb8b..14a63518e2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8118,7 +8118,17 @@ static int do_openat(void *cpu_env, int dirfd, const char *pathname, int flags,
 
     if (is_proc_myself(pathname, "exe")) {
         int execfd = qemu_getauxval(AT_EXECFD);
-        return execfd ? execfd : safe_openat(dirfd, exec_path, flags, mode);
+        if (execfd) {
+            char filename[PATH_MAX];
+            int ret;
+
+            snprintf(filename, sizeof(filename), "/proc/self/fd/%d", execfd);
+            ret = safe_openat(dirfd, filename, flags, mode);
+            if (ret != -1) {
+                return ret;
+            }
+        }
+        return safe_openat(dirfd, exec_path, flags, mode);
     }
 
     for (fake_open = fakes; fake_open->filename; fake_open++) {
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/11] linux-user: dup the execfd on start up
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 04/11] linux-user: make exec_path realpath YAMAMOTO Takashi
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

So that it can be used for other purposes (e.g. syscall.c)
after the elf loader closed it.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/main.c    | 10 +++++++++-
 linux-user/qemu.h    |  2 ++
 linux-user/syscall.c |  5 ++---
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 4dfc47ad3b..a9d02f9583 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -55,6 +55,7 @@
 #endif
 
 char *exec_path;
+int exec_fd = -1;
 
 int singlestep;
 static const char *argv0;
@@ -693,7 +694,14 @@ int main(int argc, char **argv, char **envp)
      * Manage binfmt-misc open-binary flag
      */
     execfd = qemu_getauxval(AT_EXECFD);
-    if (execfd == 0) {
+    if (execfd > 0) {
+        /*
+         * dup execfd to a global so that it can be used after loader_exec
+         * closes it.
+         */
+
+        exec_fd = dup(execfd);
+    } else {
         execfd = open(exec_path, O_RDONLY);
         if (execfd < 0) {
             printf("Error while loading %s: %s\n", exec_path, strerror(errno));
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 3b0b6b75fe..ee4e9a1779 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -160,6 +160,8 @@ typedef struct TaskState {
 } __attribute__((aligned(16))) TaskState;
 
 extern char *exec_path;
+extern int exec_fd;
+
 void init_task_state(TaskState *ts);
 void task_settid(TaskState *);
 void stop_all_tasks(void);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 14a63518e2..2947e79dc0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8117,12 +8117,11 @@ static int do_openat(void *cpu_env, int dirfd, const char *pathname, int flags,
     };
 
     if (is_proc_myself(pathname, "exe")) {
-        int execfd = qemu_getauxval(AT_EXECFD);
-        if (execfd) {
+        if (exec_fd != -1) {
             char filename[PATH_MAX];
             int ret;
 
-            snprintf(filename, sizeof(filename), "/proc/self/fd/%d", execfd);
+            snprintf(filename, sizeof(filename), "/proc/self/fd/%d", exec_fd);
             ret = safe_openat(dirfd, filename, flags, mode);
             if (ret != -1) {
                 return ret;
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/11] linux-user: make exec_path realpath
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (2 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 03/11] linux-user: dup the execfd on start up YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 05/11] linux-user: Implement pivot_root YAMAMOTO Takashi
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

Otherwise, it can be easily fooled by the user app using chdir().

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/main.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index a9d02f9583..be604a84f9 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -55,6 +55,7 @@
 #endif
 
 char *exec_path;
+char exec_path_store[PATH_MAX];
 int exec_fd = -1;
 
 int singlestep;
@@ -611,7 +612,20 @@ static int parse_args(int argc, char **argv)
         exit(EXIT_FAILURE);
     }
 
-    exec_path = argv[optind];
+    /*
+     * Try to get the realpath of the executable to avoid being
+     * fooled by chdir is the user app.
+     *
+     * Note: realpath here can fail for some use cases.
+     * For example, runc executes an unlinked binary via
+     * /proc/self/fd.
+     * It isn't fatal as far as we have an exec fd.
+     * (Otherwise, we will fail to load the binary.
+     */
+    exec_path = realpath(argv[optind], exec_path_store);
+    if (exec_path == NULL) {
+        exec_path = argv[optind];
+    }
 
     return optind;
 }
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/11] linux-user: Implement pivot_root
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (3 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 04/11] linux-user: make exec_path realpath YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-06-20 14:02   ` Laurent Vivier
  2021-06-20 14:05   ` Laurent Vivier
  2021-05-31  5:50 ` [PATCH v2 06/11] linux-user: add get_exe_path YAMAMOTO Takashi
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

Used by runc.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 2947e79dc0..51144c6d29 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8254,6 +8254,10 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask,
     return 0;
 }
 
+#if defined(TARGET_NR_pivot_root) && defined(__NR_pivot_root)
+_syscall2(int, pivot_root, const char *, new_root, const char *, put_old)
+#endif
+
 /* This is an internal helper for do_syscall so that it is easier
  * to have a single return point, so that actions, such as logging
  * of syscall results, can be performed.
@@ -13222,6 +13226,23 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
         return ret;
 #endif
 
+#if defined(TARGET_NR_pivot_root)
+    case TARGET_NR_pivot_root:
+        {
+            void *p2;
+            p = lock_user_string(arg1); /* new_root */
+            p2 = lock_user_string(arg2); /* put_old */
+            if (!p || !p2) {
+                ret = -TARGET_EFAULT;
+            } else {
+                ret = get_errno(pivot_root(p, p2));
+            }
+            unlock_user(p2, arg2, 0);
+            unlock_user(p, arg1, 0);
+        }
+        return ret;
+#endif
+
     default:
         qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
         return -TARGET_ENOSYS;
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/11] linux-user: add get_exe_path
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (4 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 05/11] linux-user: Implement pivot_root YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 07/11] linux-user: simplify is_proc_myself YAMAMOTO Takashi
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

Refactor to prepare the special cases for /proc/$pid/exe where
pid is not the calling process.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 48 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 51144c6d29..999760448d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7986,6 +7986,45 @@ static int open_self_auxv(void *cpu_env, int fd)
     return 0;
 }
 
+static const char *get_exe_path(int pid, char *buf, size_t bufsize)
+{
+    if (pid == getpid()) {
+        return exec_path;
+    }
+
+    return NULL;
+}
+
+static int is_proc_file(const char *filename, int *pidp, const char *entry)
+{
+    if (!strncmp(filename, "/proc/", strlen("/proc/"))) {
+        int pid;
+
+        filename += strlen("/proc/");
+        if (!strncmp(filename, "self/", strlen("self/"))) {
+            pid = getpid();
+            filename += strlen("self/");
+        } else if (*filename >= '1' && *filename <= '9') {
+            pid = 0;
+            while (*filename >= '0' && *filename <= '9') {
+                pid = pid * 10 + *filename - '0';
+                filename++;
+            }
+            if (*filename != '/') {
+                return 0;
+            }
+            filename++;
+        } else {
+            return 0;
+        }
+        if (!strcmp(filename, entry)) {
+            *pidp = pid;
+            return 1;
+        }
+    }
+    return 0;
+}
+
 static int is_proc_myself(const char *filename, const char *entry)
 {
     if (!strncmp(filename, "/proc/", strlen("/proc/"))) {
@@ -8492,6 +8531,8 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
             abi_ulong addr;
             char **q;
             int total_size = 0;
+            int pid;
+            char path_store[PATH_MAX];
 
             argc = 0;
             guest_argp = arg2;
@@ -8552,8 +8593,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
              * program's problem.
              */
             path = p;
-            if (is_proc_myself(path, "exe")) {
-                path = exec_path;
+            if (is_proc_file(path, &pid, "exe")) {
+                path = get_exe_path(pid, path_store, sizeof(path_store));
+                if (path == NULL) {
+                    path = p;
+                }
             }
             ret = get_errno(safe_execve(path, argp, envp));
             unlock_user(p, arg1, 0);
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/11] linux-user: simplify is_proc_myself
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (5 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 06/11] linux-user: add get_exe_path YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 08/11] linux-user: Implement exec of /proc/$pid/exe of qemu process YAMAMOTO Takashi
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 999760448d..86b12cc8b4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8027,26 +8027,9 @@ static int is_proc_file(const char *filename, int *pidp, const char *entry)
 
 static int is_proc_myself(const char *filename, const char *entry)
 {
-    if (!strncmp(filename, "/proc/", strlen("/proc/"))) {
-        filename += strlen("/proc/");
-        if (!strncmp(filename, "self/", strlen("self/"))) {
-            filename += strlen("self/");
-        } else if (*filename >= '1' && *filename <= '9') {
-            char myself[80];
-            snprintf(myself, sizeof(myself), "%d/", getpid());
-            if (!strncmp(filename, myself, strlen(myself))) {
-                filename += strlen(myself);
-            } else {
-                return 0;
-            }
-        } else {
-            return 0;
-        }
-        if (!strcmp(filename, entry)) {
-            return 1;
-        }
-    }
-    return 0;
+    int pid;
+
+    return is_proc_file(filename, &pid, entry) && pid == getpid();
 }
 
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN) || \
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/11] linux-user: Implement exec of /proc/$pid/exe of qemu process
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (6 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 07/11] linux-user: simplify is_proc_myself YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 09/11] linux-user: Make the qemu detection for /proc/$pid/exe a bit conservative YAMAMOTO Takashi
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

dockerd makes runc invoke dockerd using /proc/$pid/exe.
This commit makes it work when both of dockerd and runc are
emulated by qemu linux-user.
In that case, we (the qemu interpreting runc) need to invoke the
real executable (dockerd), where /proc/$pid/exe in question is
the qemu command interpreting dockerd.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 59 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 86b12cc8b4..6f9161dbe4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7988,10 +7988,69 @@ static int open_self_auxv(void *cpu_env, int fd)
 
 static const char *get_exe_path(int pid, char *buf, size_t bufsize)
 {
+    ssize_t ssz;
+    int fd;
+
     if (pid == getpid()) {
         return exec_path;
     }
 
+    /* dockerd makes runc invoke dockerd using "/proc/${dockerd_pid}/exe". */
+    snprintf(buf, bufsize, "/proc/%d/cmdline", pid);
+    fd = open(buf, O_RDONLY);
+    if (fd == -1) {
+        return NULL;
+    }
+    ssz = read(fd, buf, bufsize);
+    if (ssz != -1) {
+        const char *argv0;
+        const char *argv1;
+        const char *cp;
+        const char *ep;
+        const char *slash;
+
+        cp = buf;
+        ep = cp + ssz;
+
+        argv0 = cp;
+        while (*cp != 0) {
+            cp++;
+            if (cp >= ep) {
+                goto fail;
+            }
+        }
+
+        cp++;
+        if (cp >= ep) {
+            goto fail;
+        }
+
+        argv1 = cp;
+        while (*cp != 0) {
+            cp++;
+            if (cp >= ep) {
+                goto fail;
+            }
+        }
+
+        /*
+         * XXX a bit too loose detection of qemu.
+         * maybe we can compare /proc/$pid/exe with ours.
+         */
+        slash = strrchr(argv0, '/');
+        if (slash != NULL) {
+            argv0 = slash + 1; /* basename */
+        }
+        if (strncmp(argv0, "qemu-", sizeof("qemu-") - 1)) {
+            goto fail;
+        }
+
+        close(fd);
+        return argv1;
+    }
+fail:
+    close(fd);
+
     return NULL;
 }
 
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/11] linux-user: Make the qemu detection for /proc/$pid/exe a bit conservative
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (7 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 08/11] linux-user: Implement exec of /proc/$pid/exe of qemu process YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 10/11] linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE] YAMAMOTO Takashi
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

Perform the qemu special case only when the binary seems the same as
our own executable.
This is enough for my use case (docker and runc) where the involved
qemu binaries are always for the same arch.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 6f9161dbe4..56a3c37d83 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7986,6 +7986,26 @@ static int open_self_auxv(void *cpu_env, int fd)
     return 0;
 }
 
+static int is_same_qemu(const char *their_exe)
+{
+    struct stat our_st;
+    struct stat their_st;
+
+    if (stat("/proc/self/exe", &our_st) != 0) {
+        return 0;
+    }
+    if (stat(their_exe, &their_st) != 0) {
+        return 0;
+    }
+    if (our_st.st_dev != their_st.st_dev) {
+        return 0;
+    }
+    if (our_st.st_ino != their_st.st_ino) {
+        return 0;
+    }
+    return 1;
+}
+
 static const char *get_exe_path(int pid, char *buf, size_t bufsize)
 {
     ssize_t ssz;
@@ -7996,6 +8016,20 @@ static const char *get_exe_path(int pid, char *buf, size_t bufsize)
     }
 
     /* dockerd makes runc invoke dockerd using "/proc/${dockerd_pid}/exe". */
+
+    /*
+     * Check that it's the same qemu binary as ours
+     * to avoid false positives.
+     *
+     * While ideally we want to allow different qemu binaries,
+     * (E.g. linux-user for a different arch)
+     * I can't think of any reliable way to detect the cases.
+     */
+    snprintf(buf, bufsize, "/proc/%d/exe", pid);
+    if (!is_same_qemu(buf)) {
+        return NULL;
+    }
+
     snprintf(buf, bufsize, "/proc/%d/cmdline", pid);
     fd = open(buf, O_RDONLY);
     if (fd == -1) {
@@ -8033,10 +8067,6 @@ static const char *get_exe_path(int pid, char *buf, size_t bufsize)
             }
         }
 
-        /*
-         * XXX a bit too loose detection of qemu.
-         * maybe we can compare /proc/$pid/exe with ours.
-         */
         slash = strrchr(argv0, '/');
         if (slash != NULL) {
             argv0 = slash + 1; /* basename */
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/11] linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE]
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (8 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 09/11] linux-user: Make the qemu detection for /proc/$pid/exe a bit conservative YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  5:50 ` [PATCH v2 11/11] linux-user: always assume preserve_argv0 for now [!MERGE] YAMAMOTO Takashi
  2021-05-31  6:07 ` [PATCH v2 00/11] linux-user changes to run docker no-reply
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

runc uses clone() with a combination of flags which we don't
support. This commit works it around by ignoring CLONE_PARENT.

[!MERGE] because this is just a crude hack for the very specific
application.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/syscall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 56a3c37d83..7645ed36e4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6553,6 +6553,8 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
         pthread_mutex_destroy(&info.mutex);
         pthread_mutex_unlock(&clone_lock);
     } else {
+        flags &= ~CLONE_PARENT; /* XXX crude hack for libcontainer. */
+
         /* if no CLONE_VM, we consider it is a fork */
         if (flags & CLONE_INVALID_FORK_FLAGS) {
             return -TARGET_EINVAL;
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/11] linux-user: always assume preserve_argv0 for now [!MERGE]
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (9 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 10/11] linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE] YAMAMOTO Takashi
@ 2021-05-31  5:50 ` YAMAMOTO Takashi
  2021-05-31  6:07 ` [PATCH v2 00/11] linux-user changes to run docker no-reply
  11 siblings, 0 replies; 20+ messages in thread
From: YAMAMOTO Takashi @ 2021-05-31  5:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: YAMAMOTO Takashi, Laurent Vivier

Just because the kernel I'm using is not new enough.

[!MERGE] because this is specific to my environment and
would break others.

Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
---
 linux-user/main.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index be604a84f9..a3d8b7788f 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -726,7 +726,38 @@ int main(int argc, char **argv, char **envp)
     /*
      * get binfmt_misc flags
      */
+#if 0
     preserve_argv0 = !!(qemu_getauxval(AT_FLAGS) & AT_FLAGS_PRESERVE_ARGV0);
+#else
+/*
+ * my kernel doesn't have the following commit. this is a crude workaroud.
+
+commit 2347961b11d4079deace3c81dceed460c08a8fc1
+Author: Laurent Vivier <laurent@vivier.eu>
+Date:   Tue Jan 28 14:25:39 2020 +0100
+
+    binfmt_misc: pass binfmt_misc flags to the interpreter
+
+    It can be useful to the interpreter to know which flags are in use.
+
+    For instance, knowing if the preserve-argv[0] is in use would
+    allow to skip the pathname argument.
+
+    This patch uses an unused auxiliary vector, AT_FLAGS, to add a
+    flag to inform interpreter if the preserve-argv[0] is enabled.
+
+    Note by Helge Deller:
+    The real-world user of this patch is qemu-user, which needs to know
+    if it has to preserve the argv[0]. See Debian bug #970460.
+
+    Signed-off-by: Laurent Vivier <laurent@vivier.eu>
+    Reviewed-by: YunQiang Su <ysu@wavecomp.com>
+    URL: http://bugs.debian.org/970460
+    Signed-off-by: Helge Deller <deller@gmx.de>
+
+ */
+    preserve_argv0 = true;
+#endif
 
     /*
      * Manage binfmt-misc preserve-arg[0] flag
-- 
2.21.1 (Apple Git-122.3)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/11] linux-user changes to run docker
  2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
                   ` (10 preceding siblings ...)
  2021-05-31  5:50 ` [PATCH v2 11/11] linux-user: always assume preserve_argv0 for now [!MERGE] YAMAMOTO Takashi
@ 2021-05-31  6:07 ` no-reply
  11 siblings, 0 replies; 20+ messages in thread
From: no-reply @ 2021-05-31  6:07 UTC (permalink / raw)
  To: yamamoto; +Cc: qemu-devel, yamamoto

Patchew URL: https://patchew.org/QEMU/20210531055019.10149-1-yamamoto@midokura.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210531055019.10149-1-yamamoto@midokura.com
Subject: [PATCH v2 00/11] linux-user changes to run docker

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
9266033 linux-user: always assume preserve_argv0 for now [!MERGE]
c47e5f7 linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE]
3918e1b linux-user: Make the qemu detection for /proc/$pid/exe a bit conservative
c085fdc linux-user: Implement exec of /proc/$pid/exe of qemu process
1183bf9 linux-user: simplify is_proc_myself
3d2e26d linux-user: add get_exe_path
4468f68 linux-user: Implement pivot_root
42242cc linux-user: make exec_path realpath
057e7e4 linux-user: dup the execfd on start up
187d6d2 linux-user: Fix the execfd case of /proc/self/exe open
02ef5e5 linux-user: handle /proc/self/exe for execve

=== OUTPUT BEGIN ===
1/11 Checking commit 02ef5e5d9e50 (linux-user: handle /proc/self/exe for execve)
2/11 Checking commit 187d6d29d488 (linux-user: Fix the execfd case of /proc/self/exe open)
3/11 Checking commit 057e7e49d4fe (linux-user: dup the execfd on start up)
4/11 Checking commit 42242cc0513b (linux-user: make exec_path realpath)
5/11 Checking commit 4468f68480f3 (linux-user: Implement pivot_root)
WARNING: architecture specific defines should be avoided
#23: FILE: linux-user/syscall.c:8257:
+#if defined(TARGET_NR_pivot_root) && defined(__NR_pivot_root)

total: 0 errors, 1 warnings, 33 lines checked

Patch 5/11 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/11 Checking commit 3d2e26daa82e (linux-user: add get_exe_path)
7/11 Checking commit 1183bf9702cb (linux-user: simplify is_proc_myself)
8/11 Checking commit c085fdc89c7c (linux-user: Implement exec of /proc/$pid/exe of qemu process)
9/11 Checking commit 3918e1b7eee5 (linux-user: Make the qemu detection for /proc/$pid/exe a bit conservative)
10/11 Checking commit c47e5f77ef07 (linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE])
11/11 Checking commit 9266033e47cb (linux-user: always assume preserve_argv0 for now [!MERGE])
ERROR: if this code is redundant consider removing it
#26: FILE: linux-user/main.c:729:
+#if 0

total: 1 errors, 0 warnings, 38 lines checked

Patch 11/11 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210531055019.10149-1-yamamoto@midokura.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 05/11] linux-user: Implement pivot_root
  2021-05-31  5:50 ` [PATCH v2 05/11] linux-user: Implement pivot_root YAMAMOTO Takashi
@ 2021-06-20 14:02   ` Laurent Vivier
  2021-06-20 14:05   ` Laurent Vivier
  1 sibling, 0 replies; 20+ messages in thread
From: Laurent Vivier @ 2021-06-20 14:02 UTC (permalink / raw)
  To: YAMAMOTO Takashi, qemu-devel

Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
> Used by runc.
> 
> Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
> ---
>  linux-user/syscall.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 2947e79dc0..51144c6d29 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8254,6 +8254,10 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask,
>      return 0;
>  }
>  
> +#if defined(TARGET_NR_pivot_root) && defined(__NR_pivot_root)
> +_syscall2(int, pivot_root, const char *, new_root, const char *, put_old)
> +#endif
> +
>  /* This is an internal helper for do_syscall so that it is easier
>   * to have a single return point, so that actions, such as logging
>   * of syscall results, can be performed.
> @@ -13222,6 +13226,23 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
>          return ret;
>  #endif
>  
> +#if defined(TARGET_NR_pivot_root)
> +    case TARGET_NR_pivot_root:
> +        {
> +            void *p2;
> +            p = lock_user_string(arg1); /* new_root */
> +            p2 = lock_user_string(arg2); /* put_old */
> +            if (!p || !p2) {
> +                ret = -TARGET_EFAULT;
> +            } else {
> +                ret = get_errno(pivot_root(p, p2));
> +            }
> +            unlock_user(p2, arg2, 0);
> +            unlock_user(p, arg1, 0);
> +        }
> +        return ret;
> +#endif
> +
>      default:
>          qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
>          return -TARGET_ENOSYS;
> 

Reviewed-by: Laurent Vivier <laurent@vivier.eu>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 05/11] linux-user: Implement pivot_root
  2021-05-31  5:50 ` [PATCH v2 05/11] linux-user: Implement pivot_root YAMAMOTO Takashi
  2021-06-20 14:02   ` Laurent Vivier
@ 2021-06-20 14:05   ` Laurent Vivier
  1 sibling, 0 replies; 20+ messages in thread
From: Laurent Vivier @ 2021-06-20 14:05 UTC (permalink / raw)
  To: YAMAMOTO Takashi, qemu-devel

Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
> Used by runc.
> 
> Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
> ---
>  linux-user/syscall.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 2947e79dc0..51144c6d29 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8254,6 +8254,10 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask,
>      return 0;
>  }
>  
> +#if defined(TARGET_NR_pivot_root) && defined(__NR_pivot_root)
> +_syscall2(int, pivot_root, const char *, new_root, const char *, put_old)
> +#endif
> +
>  /* This is an internal helper for do_syscall so that it is easier
>   * to have a single return point, so that actions, such as logging
>   * of syscall results, can be performed.
> @@ -13222,6 +13226,23 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
>          return ret;
>  #endif
>  
> +#if defined(TARGET_NR_pivot_root)
> +    case TARGET_NR_pivot_root:
> +        {
> +            void *p2;
> +            p = lock_user_string(arg1); /* new_root */
> +            p2 = lock_user_string(arg2); /* put_old */
> +            if (!p || !p2) {
> +                ret = -TARGET_EFAULT;
> +            } else {
> +                ret = get_errno(pivot_root(p, p2));
> +            }
> +            unlock_user(p2, arg2, 0);
> +            unlock_user(p, arg1, 0);
> +        }
> +        return ret;
> +#endif
> +
>      default:
>          qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
>          return -TARGET_ENOSYS;
> 

Applied to my linux-user-for-6.1 branch.

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve
  2021-05-31  5:50 ` [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve YAMAMOTO Takashi
@ 2021-06-20 14:14   ` Laurent Vivier
  2021-06-21  2:02     ` Takashi Yamamoto
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Vivier @ 2021-06-20 14:14 UTC (permalink / raw)
  To: YAMAMOTO Takashi, qemu-devel

Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
> It seems somehow common to execve /proc/self/exe in docker
> or golang community these days.
> At least, moby "reexec" and runc "libcontainer" do that.
> 
> Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
> ---
>  linux-user/syscall.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index c9f812091c..a2b03ecb8b 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8470,6 +8470,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
>  #endif
>      case TARGET_NR_execve:
>          {
> +            const char *path;
>              char **argp, **envp;
>              int argc, envc;
>              abi_ulong gp;
> @@ -8537,7 +8538,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
>               * before the execve completes and makes it the other
>               * program's problem.
>               */
> -            ret = get_errno(safe_execve(p, argp, envp));
> +            path = p;
> +            if (is_proc_myself(path, "exe")) {
> +                path = exec_path;
> +            }
> +            ret = get_errno(safe_execve(path, argp, envp));
>              unlock_user(p, arg1, 0);
>  
>              goto execve_end;
> 

The problem here is QEMU can fail to execute the file directly.

The binary can be launched with binfmt_misc and the 'O' flag:

     ``O`` - open-binary
            Legacy behavior of binfmt_misc is to pass the full path
            of the binary to the interpreter as an argument. When this flag is
            included, binfmt_misc will open the file for reading and pass its
            descriptor as an argument, instead of the full path, thus allowing
            the interpreter to execute non-readable binaries. This feature
            should be used with care - the interpreter has to be trusted not to
            emit the contents of the non-readable binary.

You should use do_openat() (that resolves the /proc/self/exe path) and fexecve().

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open
  2021-05-31  5:50 ` [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open YAMAMOTO Takashi
@ 2021-06-20 14:16   ` Laurent Vivier
  2021-06-21  1:19     ` Takashi Yamamoto
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Vivier @ 2021-06-20 14:16 UTC (permalink / raw)
  To: YAMAMOTO Takashi, qemu-devel

Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
> It's problematic to return AT_EXECFD as it is because the user app
> would close it.
> This patch opens it via /proc/self/fd instead.
> 
> Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
> ---
>  linux-user/syscall.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index a2b03ecb8b..14a63518e2 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8118,7 +8118,17 @@ static int do_openat(void *cpu_env, int dirfd, const char *pathname, int flags,
>  
>      if (is_proc_myself(pathname, "exe")) {
>          int execfd = qemu_getauxval(AT_EXECFD);
> -        return execfd ? execfd : safe_openat(dirfd, exec_path, flags, mode);
> +        if (execfd) {
> +            char filename[PATH_MAX];
> +            int ret;
> +
> +            snprintf(filename, sizeof(filename), "/proc/self/fd/%d", execfd);
> +            ret = safe_openat(dirfd, filename, flags, mode);
> +            if (ret != -1) {
> +                return ret;
> +            }
> +        }
> +        return safe_openat(dirfd, exec_path, flags, mode);
>      }
>  
>      for (fake_open = fakes; fake_open->filename; fake_open++) {
> 

I think a dup() would be more appropriate, or explain why not.

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open
  2021-06-20 14:16   ` Laurent Vivier
@ 2021-06-21  1:19     ` Takashi Yamamoto
  0 siblings, 0 replies; 20+ messages in thread
From: Takashi Yamamoto @ 2021-06-21  1:19 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

On Sun, Jun 20, 2021 at 11:16 PM Laurent Vivier <laurent@vivier.eu> wrote:
>
> Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
> > It's problematic to return AT_EXECFD as it is because the user app
> > would close it.
> > This patch opens it via /proc/self/fd instead.
> >
> > Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
> > ---
> >  linux-user/syscall.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> > index a2b03ecb8b..14a63518e2 100644
> > --- a/linux-user/syscall.c
> > +++ b/linux-user/syscall.c
> > @@ -8118,7 +8118,17 @@ static int do_openat(void *cpu_env, int dirfd, const char *pathname, int flags,
> >
> >      if (is_proc_myself(pathname, "exe")) {
> >          int execfd = qemu_getauxval(AT_EXECFD);
> > -        return execfd ? execfd : safe_openat(dirfd, exec_path, flags, mode);
> > +        if (execfd) {
> > +            char filename[PATH_MAX];
> > +            int ret;
> > +
> > +            snprintf(filename, sizeof(filename), "/proc/self/fd/%d", execfd);
> > +            ret = safe_openat(dirfd, filename, flags, mode);
> > +            if (ret != -1) {
> > +                return ret;
> > +            }
> > +        }
> > +        return safe_openat(dirfd, exec_path, flags, mode);
> >      }
> >
> >      for (fake_open = fakes; fake_open->filename; fake_open++) {
> >
>
> I think a dup() would be more appropriate, or explain why not.

i did this way because dup() doesn't deal with open flags.

>
> Thanks,
> Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve
  2021-06-20 14:14   ` Laurent Vivier
@ 2021-06-21  2:02     ` Takashi Yamamoto
  2021-06-22 13:47       ` Laurent Vivier
  0 siblings, 1 reply; 20+ messages in thread
From: Takashi Yamamoto @ 2021-06-21  2:02 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

On Sun, Jun 20, 2021 at 11:14 PM Laurent Vivier <laurent@vivier.eu> wrote:
>
> Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
> > It seems somehow common to execve /proc/self/exe in docker
> > or golang community these days.
> > At least, moby "reexec" and runc "libcontainer" do that.
> >
> > Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
> > ---
> >  linux-user/syscall.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> > index c9f812091c..a2b03ecb8b 100644
> > --- a/linux-user/syscall.c
> > +++ b/linux-user/syscall.c
> > @@ -8470,6 +8470,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
> >  #endif
> >      case TARGET_NR_execve:
> >          {
> > +            const char *path;
> >              char **argp, **envp;
> >              int argc, envc;
> >              abi_ulong gp;
> > @@ -8537,7 +8538,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
> >               * before the execve completes and makes it the other
> >               * program's problem.
> >               */
> > -            ret = get_errno(safe_execve(p, argp, envp));
> > +            path = p;
> > +            if (is_proc_myself(path, "exe")) {
> > +                path = exec_path;
> > +            }
> > +            ret = get_errno(safe_execve(path, argp, envp));
> >              unlock_user(p, arg1, 0);
> >
> >              goto execve_end;
> >
>
> The problem here is QEMU can fail to execute the file directly.

i don't understand this sentence. can you explain a bit?

>
> The binary can be launched with binfmt_misc and the 'O' flag:
>
>      ``O`` - open-binary
>             Legacy behavior of binfmt_misc is to pass the full path
>             of the binary to the interpreter as an argument. When this flag is
>             included, binfmt_misc will open the file for reading and pass its
>             descriptor as an argument, instead of the full path, thus allowing
>             the interpreter to execute non-readable binaries. This feature
>             should be used with care - the interpreter has to be trusted not to
>             emit the contents of the non-readable binary.
>
> You should use do_openat() (that resolves the /proc/self/exe path) and fexecve().

i thought there was an issue with the approach. but i don't remember
what it was.
maybe i will retry it.

>
> Thanks,
> Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve
  2021-06-21  2:02     ` Takashi Yamamoto
@ 2021-06-22 13:47       ` Laurent Vivier
  0 siblings, 0 replies; 20+ messages in thread
From: Laurent Vivier @ 2021-06-22 13:47 UTC (permalink / raw)
  To: Takashi Yamamoto; +Cc: qemu-devel

Le 21/06/2021 à 04:02, Takashi Yamamoto a écrit :
> On Sun, Jun 20, 2021 at 11:14 PM Laurent Vivier <laurent@vivier.eu> wrote:
>>
>> Le 31/05/2021 à 07:50, YAMAMOTO Takashi a écrit :
>>> It seems somehow common to execve /proc/self/exe in docker
>>> or golang community these days.
>>> At least, moby "reexec" and runc "libcontainer" do that.
>>>
>>> Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
>>> ---
>>>  linux-user/syscall.c | 7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>>> index c9f812091c..a2b03ecb8b 100644
>>> --- a/linux-user/syscall.c
>>> +++ b/linux-user/syscall.c
>>> @@ -8470,6 +8470,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
>>>  #endif
>>>      case TARGET_NR_execve:
>>>          {
>>> +            const char *path;
>>>              char **argp, **envp;
>>>              int argc, envc;
>>>              abi_ulong gp;
>>> @@ -8537,7 +8538,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
>>>               * before the execve completes and makes it the other
>>>               * program's problem.
>>>               */
>>> -            ret = get_errno(safe_execve(p, argp, envp));
>>> +            path = p;
>>> +            if (is_proc_myself(path, "exe")) {
>>> +                path = exec_path;
>>> +            }
>>> +            ret = get_errno(safe_execve(path, argp, envp));
>>>              unlock_user(p, arg1, 0);
>>>
>>>              goto execve_end;
>>>
>>
>> The problem here is QEMU can fail to execute the file directly.
> 
> i don't understand this sentence. can you explain a bit?

It's related to the text below.
The binary can be executable ('x') but not readable ('r'), so QEMU cannot load it.
It's the purpose of the 'O' flag: kernel opens the file and pass the FD to QEMU to execute it.

Thanks,
Laurent

> 
>>
>> The binary can be launched with binfmt_misc and the 'O' flag:
>>
>>      ``O`` - open-binary
>>             Legacy behavior of binfmt_misc is to pass the full path
>>             of the binary to the interpreter as an argument. When this flag is
>>             included, binfmt_misc will open the file for reading and pass its
>>             descriptor as an argument, instead of the full path, thus allowing
>>             the interpreter to execute non-readable binaries. This feature
>>             should be used with care - the interpreter has to be trusted not to
>>             emit the contents of the non-readable binary.
>>
>> You should use do_openat() (that resolves the /proc/self/exe path) and fexecve().
> 
> i thought there was an issue with the approach. but i don't remember
> what it was.
> maybe i will retry it.
> 
>>
>> Thanks,
>> Laurent



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-06-22 13:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-31  5:50 [PATCH v2 00/11] linux-user changes to run docker YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 01/11] linux-user: handle /proc/self/exe for execve YAMAMOTO Takashi
2021-06-20 14:14   ` Laurent Vivier
2021-06-21  2:02     ` Takashi Yamamoto
2021-06-22 13:47       ` Laurent Vivier
2021-05-31  5:50 ` [PATCH v2 02/11] linux-user: Fix the execfd case of /proc/self/exe open YAMAMOTO Takashi
2021-06-20 14:16   ` Laurent Vivier
2021-06-21  1:19     ` Takashi Yamamoto
2021-05-31  5:50 ` [PATCH v2 03/11] linux-user: dup the execfd on start up YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 04/11] linux-user: make exec_path realpath YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 05/11] linux-user: Implement pivot_root YAMAMOTO Takashi
2021-06-20 14:02   ` Laurent Vivier
2021-06-20 14:05   ` Laurent Vivier
2021-05-31  5:50 ` [PATCH v2 06/11] linux-user: add get_exe_path YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 07/11] linux-user: simplify is_proc_myself YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 08/11] linux-user: Implement exec of /proc/$pid/exe of qemu process YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 09/11] linux-user: Make the qemu detection for /proc/$pid/exe a bit conservative YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 10/11] linux-user: a crude hack for libcontainer (CLONE_PARENT) [!MERGE] YAMAMOTO Takashi
2021-05-31  5:50 ` [PATCH v2 11/11] linux-user: always assume preserve_argv0 for now [!MERGE] YAMAMOTO Takashi
2021-05-31  6:07 ` [PATCH v2 00/11] linux-user changes to run docker no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.