linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ali Raza <aliraza@bu.edu>
To: linux-kernel@vger.kernel.org
Cc: corbet@lwn.net, masahiroy@kernel.org, michal.lkml@markovi.net,
	ndesaulniers@google.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	luto@kernel.org, ebiederm@xmission.com, keescook@chromium.org,
	peterz@infradead.org, viro@zeniv.linux.org.uk, arnd@arndb.de,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, pbonzini@redhat.com, jpoimboe@kernel.org,
	linux-doc@vger.kernel.org, linux-kbuild@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-arch@vger.kernel.org, x86@kernel.org, rjones@redhat.com,
	munsoner@bu.edu, tommyu@bu.edu, drepper@redhat.com,
	lwoodman@redhat.com, mboydmcse@gmail.com, okrieg@bu.edu,
	rmancuso@bu.edu, Ali Raza <aliraza@bu.edu>
Subject: [RFC UKL 08/10] exec: Make exec path for starting UKL application
Date: Mon,  3 Oct 2022 18:21:31 -0400	[thread overview]
Message-ID: <20221003222133.20948-9-aliraza@bu.edu> (raw)
In-Reply-To: <20221003222133.20948-1-aliraza@bu.edu>

The UKL application still relies on much of the setup done to start a
standard user space process, so we still need to use much of that path.
There are several areas that the UKL application doesn't need or want so we
bypass them in the case of UKL. These are: ELF loading, because it is part
of the kernel image; and segments register value initialization.  We need
to record a starting location for the application heap, this normally is
the end of the ELF binary, once loaded. We choose an arbitrary low address
because there is no binary to load. We also hardcode the entry point for
the application to ukl__start which is the entry point for glibc plus the
'ukl_' prefix.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>

Suggested-by: Thomas Unger <tommyu@bu.edu>
Signed-off-by: Ali Raza <aliraza@bu.edu>
---
 arch/x86/include/asm/elf.h   |  9 ++++--
 arch/x86/kernel/process.c    | 13 +++++++++
 arch/x86/kernel/process_64.c | 27 ++++++++++--------
 fs/binfmt_elf.c              | 28 ++++++++++++++++++
 fs/exec.c                    | 55 ++++++++++++++++++++++++++----------
 5 files changed, 103 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index cb0ff1055ab1..91b6efafb46f 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -6,6 +6,7 @@
  * ELF register definitions..
  */
 #include <linux/thread_info.h>
+#include <linux/sched.h>
 
 #include <asm/ptrace.h>
 #include <asm/user.h>
@@ -164,9 +165,11 @@ static inline void elf_common_init(struct thread_struct *t,
 	regs->si = regs->di = regs->bp = 0;
 	regs->r8 = regs->r9 = regs->r10 = regs->r11 = 0;
 	regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0;
-	t->fsbase = t->gsbase = 0;
-	t->fsindex = t->gsindex = 0;
-	t->ds = t->es = ds;
+	if (!is_ukl_thread()) {
+		t->fsbase = t->gsbase = 0;
+		t->fsindex = t->gsindex = 0;
+		t->ds = t->es = ds;
+	}
 }
 
 #define ELF_PLAT_INIT(_r, load_addr)			\
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 58a6ea472db9..8395fc0c3398 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -192,6 +192,19 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	frame->bx = 0;
 	*childregs = *current_pt_regs();
 	childregs->ax = 0;
+
+#ifdef CONFIG_UNIKERNEL_LINUX
+	/*
+	 * UKL leaves return address and flags on user stack. This works
+	 * fine for clone (i.e., VM shared) but not for 'fork' style
+	 * clone (i.e., VM not shared). This is where we clean those extra
+	 * elements from user stack.
+	 */
+	if (is_ukl_thread() & !(clone_flags & CLONE_VM)) {
+		childregs->sp += 2*(sizeof(long));
+	}
+#endif
+
 	if (sp)
 		childregs->sp = sp;
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e9e4a2946452..cf007b95d684 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -530,21 +530,26 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
 {
 	WARN_ON_ONCE(regs != current_pt_regs());
 
-	if (static_cpu_has(X86_BUG_NULL_SEG)) {
-		/* Loading zero below won't clear the base. */
-		loadsegment(fs, __USER_DS);
-		load_gs_index(__USER_DS);
-	}
+	if (!is_ukl_thread()) {
+		if (static_cpu_has(X86_BUG_NULL_SEG)) {
+			/* Loading zero below won't clear the base. */
+			loadsegment(fs, __USER_DS);
+			load_gs_index(__USER_DS);
+		}
 
-	loadsegment(fs, 0);
-	loadsegment(es, _ds);
-	loadsegment(ds, _ds);
-	load_gs_index(0);
+		loadsegment(fs, 0);
+		loadsegment(es, _ds);
+		loadsegment(ds, _ds);
+		load_gs_index(0);
 
+		regs->cs		= _cs;
+		regs->ss		= _ss;
+	} else {
+		regs->cs		= __KERNEL_CS;
+		regs->ss		= __KERNEL_DS;
+	}
 	regs->ip		= new_ip;
 	regs->sp		= new_sp;
-	regs->cs		= _cs;
-	regs->ss		= _ss;
 	regs->flags		= X86_EFLAGS_IF;
 }
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 63c7ebb0da89..1c91f1179398 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -845,6 +845,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	struct pt_regs *regs;
 
 	retval = -ENOEXEC;
+
+	if (is_ukl_thread())
+		goto UKL_SKIP_READING_ELF;
+
 	/* First of all, some simple consistency checks */
 	if (memcmp(elf_ex->e_ident, ELFMAG, SELFMAG) != 0)
 		goto out;
@@ -998,6 +1002,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	if (retval)
 		goto out_free_dentry;
 
+UKL_SKIP_READING_ELF:
 	/* Flush all traces of the currently running executable */
 	retval = begin_new_exec(bprm);
 	if (retval)
@@ -1029,6 +1034,17 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	start_data = 0;
 	end_data = 0;
 
+	if (is_ukl_thread()) {
+		/*
+		 * load_bias needs to ensure that we push the heap start
+		 * past the end of the executable, but in this case, it is
+		 * already mapped with the kernel text.  So we select an
+		 * address that is "high enough"
+		 */
+		load_bias = 0x405000;
+		goto UKL_SKIP_LOADING_ELF;
+	}
+
 	/* Now we do a little grungy work by mmapping the ELF image into
 	   the correct location in memory. */
 	for(i = 0, elf_ppnt = elf_phdata;
@@ -1224,6 +1240,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		}
 	}
 
+UKL_SKIP_LOADING_ELF:
 	e_entry = elf_ex->e_entry + load_bias;
 	phdr_addr += load_bias;
 	elf_bss += load_bias;
@@ -1246,6 +1263,16 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		goto out_free_dentry;
 	}
 
+	if (is_ukl_thread()) {
+		/*
+		 * We know that this symbol exists and that it is the entry
+		 * point for the linked application.
+		 */
+		extern void ukl__start(void);
+		elf_entry = (unsigned long) ukl__start;
+		goto UKL_SKIP_FINDING_ELF_ENTRY;
+	}
+
 	if (interpreter) {
 		elf_entry = load_elf_interp(interp_elf_ex,
 					    interpreter,
@@ -1283,6 +1310,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 
 	set_binfmt(&elf_format);
 
+UKL_SKIP_FINDING_ELF_ENTRY:
 #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
 	retval = ARCH_SETUP_ADDITIONAL_PAGES(bprm, elf_ex, !!interpreter);
 	if (retval < 0)
diff --git a/fs/exec.c b/fs/exec.c
index d046dbb9cbd0..4ae06fcf7436 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1246,9 +1246,11 @@ int begin_new_exec(struct linux_binprm * bprm)
 	int retval;
 
 	/* Once we are committed compute the creds */
-	retval = bprm_creds_from_file(bprm);
-	if (retval)
-		return retval;
+	if (!is_ukl_thread()) {
+		retval = bprm_creds_from_file(bprm);
+		if (retval)
+			return retval;
+	}
 
 	/*
 	 * Ensure all future errors are fatal.
@@ -1282,9 +1284,11 @@ int begin_new_exec(struct linux_binprm * bprm)
 		goto out;
 
 	/* If the binary is not readable then enforce mm->dumpable=0 */
-	would_dump(bprm, bprm->file);
-	if (bprm->have_execfd)
-		would_dump(bprm, bprm->executable);
+	if (!is_ukl_thread()) {
+		would_dump(bprm, bprm->file);
+		if (bprm->have_execfd)
+			would_dump(bprm, bprm->executable);
+	}
 
 	/*
 	 * Release all of the old mmap stuff
@@ -1509,6 +1513,11 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
 	if (!bprm)
 		goto out;
 
+	if (is_ukl_thread()) {
+		bprm->filename = "UKL";
+		goto out_ukl;
+	}
+
 	if (fd == AT_FDCWD || filename->name[0] == '/') {
 		bprm->filename = filename->name;
 	} else {
@@ -1522,6 +1531,8 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
 
 		bprm->filename = bprm->fdpath;
 	}
+
+out_ukl:
 	bprm->interp = bprm->filename;
 
 	retval = bprm_mm_init(bprm);
@@ -1708,6 +1719,15 @@ static int search_binary_handler(struct linux_binprm *bprm)
 	struct linux_binfmt *fmt;
 	int retval;
 
+	if (is_ukl_thread()) {
+		list_for_each_entry(fmt, &formats, lh) {
+			retval = fmt->load_binary(bprm);
+			if (retval == 0)
+				return retval;
+		}
+		goto out_ukl;
+	}
+
 	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		return retval;
@@ -1717,7 +1737,7 @@ static int search_binary_handler(struct linux_binprm *bprm)
 		return retval;
 
 	retval = -ENOENT;
- retry:
+retry:
 	read_lock(&binfmt_lock);
 	list_for_each_entry(fmt, &formats, lh) {
 		if (!try_module_get(fmt->module))
@@ -1745,6 +1765,7 @@ static int search_binary_handler(struct linux_binprm *bprm)
 		goto retry;
 	}
 
+out_ukl:
 	return retval;
 }
 
@@ -1799,7 +1820,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 static int bprm_execve(struct linux_binprm *bprm,
 		       int fd, struct filename *filename, int flags)
 {
-	struct file *file;
+	struct file *file = NULL;
 	int retval;
 
 	retval = prepare_bprm_creds(bprm);
@@ -1809,10 +1830,12 @@ static int bprm_execve(struct linux_binprm *bprm,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	file = do_open_execat(fd, filename, flags);
-	retval = PTR_ERR(file);
-	if (IS_ERR(file))
-		goto out_unmark;
+	if (!is_ukl_thread()) {
+		file = do_open_execat(fd, filename, flags);
+		retval = PTR_ERR(file);
+		if (IS_ERR(file))
+			goto out_unmark;
+	}
 
 	sched_exec();
 
@@ -1830,9 +1853,11 @@ static int bprm_execve(struct linux_binprm *bprm,
 		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
 
 	/* Set the unchanging part of bprm->cred */
-	retval = security_bprm_creds_for_exec(bprm);
-	if (retval)
-		goto out;
+	if (!is_ukl_thread()) {
+		retval = security_bprm_creds_for_exec(bprm);
+		if (retval)
+			goto out;
+	}
 
 	retval = exec_binprm(bprm);
 	if (retval < 0)
-- 
2.21.3


  parent reply	other threads:[~2022-10-03 22:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03 22:21 [RFC UKL 00/10] Unikernel Linux (UKL) Ali Raza
2022-10-03 22:21 ` [RFC UKL 01/10] kbuild: Add sections and symbols to linker script for UKL support Ali Raza
2022-10-03 22:21 ` [RFC UKL 02/10] x86/boot: Load the PT_TLS segment for Unikernel configs Ali Raza
2022-10-04 17:30   ` Andy Lutomirski
2022-10-06 21:00     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 03/10] sched: Add task_struct tracking of kernel or application execution Ali Raza
2022-10-03 22:21 ` [RFC UKL 04/10] x86/entry: Create alternate entry path for system calls Ali Raza
2022-10-04 17:43   ` Andy Lutomirski
2022-10-06 21:12     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 05/10] x86/uaccess: Make access_ok UKL aware Ali Raza
2022-10-04 17:36   ` Andy Lutomirski
2022-10-06 21:16     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 06/10] x86/fault: Skip checking kernel mode access to user address space for UKL Ali Raza
2022-10-03 22:21 ` [RFC UKL 07/10] x86/signal: Adjust signal handler register values and return frame Ali Raza
2022-10-04 17:34   ` Andy Lutomirski
2022-10-06 21:20     ` Ali Raza
2022-10-03 22:21 ` Ali Raza [this message]
2022-10-03 22:21 ` [RFC UKL 09/10] exec: Give userspace a method for starting UKL process Ali Raza
2022-10-04 17:35   ` Andy Lutomirski
2022-10-06 21:25     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 10/10] Kconfig: Add config option for enabling and sample for testing UKL Ali Raza
2022-10-04  2:11   ` Bagas Sanjaya
2022-10-06 21:28     ` Ali Raza
2022-10-07 10:21       ` Masahiro Yamada
2022-10-13 17:08         ` Ali Raza
2022-10-06 21:27 ` [RFC UKL 00/10] Unikernel Linux (UKL) H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221003222133.20948-9-aliraza@bu.edu \
    --to=aliraza@bu.edu \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=drepper@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=masahiroy@kernel.org \
    --cc=mboydmcse@gmail.com \
    --cc=mgorman@suse.de \
    --cc=michal.lkml@markovi.net \
    --cc=mingo@redhat.com \
    --cc=munsoner@bu.edu \
    --cc=ndesaulniers@google.com \
    --cc=okrieg@bu.edu \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjones@redhat.com \
    --cc=rmancuso@bu.edu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tommyu@bu.edu \
    --cc=vincent.guittot@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).