linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv7 0/3] syscalls,x86: Add execveat() system call
@ 2014-11-07 17:01 David Drysdale
  2014-11-07 17:01 ` [PATCHv7 1/3] syscalls,x86: implement " David Drysdale
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: David Drysdale @ 2014-11-07 17:01 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, David Drysdale


This patch set adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc
filesystem, at least for executables (rather than scripts).  The
current glibc version of fexecve(3) is implemented via /proc, which
causes problems in sandboxed or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA
suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
syscall would be an appropriate generalization.

Also, having a new syscall means that it can take a flags argument
without back-compatibility concerns.  The current implementation just
defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
flags could be added in future -- for example, flags for new namespaces
(as suggested at https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.


Changes since v6:
 - Remove special case for O_PATH file descriptors [Andy Lutomirski]
 - Use kasprintf rather than error-prone arithmetic [Kees Cook]
 - Add test for long name [Kees Cook]
 - Add test for non-executable O_PATH fd [Andy Lutomirski]

Changes since v5:
 - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts
   that invoke an interpreter fail the exec (as they will not be able
   to access the invoked file). [Andy Lutomirski]
 - Don't truncate long paths. [Andy Lutomirski]
 - Commonize code to open the executed file. [Eric W. Biederman]
 - Mark O_PATH file descriptors so they cannot be fexecve()ed.
 - Make self-test more helpful, and add additional cases:
     - file offset non-zero
     - binary file without execute bit
     - O_CLOEXEC fds

Changes since v4, suggested by Eric W. Biederman:
 - Use empty filename with AT_EMPTY_PATH flag rather than NULL
   pathname to request fexecve-like behaviour.
 - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
   rather than using d_path().
 - Patch against v3.17 (bfe01a5ba249)

Changes since Meredydd's v3 patch:
 - Added a selftest.
 - Added a man page.
 - Left open_exec() signature untouched to reduce patch impact
   elsewhere (as suggested by Al Viro).
 - Filled in bprm->filename with d_path() into a buffer, to avoid use
   of potentially-ephemeral dentry->d_name.
 - Patch against v3.14 (455c6fdbd21916).


David Drysdale (2):
  syscalls,x86: implement execveat() system call
  syscalls,x86: add selftest for execveat(2)

 arch/x86/ia32/audit.c                   |   1 +
 arch/x86/ia32/ia32entry.S               |   1 +
 arch/x86/kernel/audit_64.c              |   1 +
 arch/x86/kernel/entry_64.S              |  28 +++
 arch/x86/syscalls/syscall_32.tbl        |   1 +
 arch/x86/syscalls/syscall_64.tbl        |   2 +
 arch/x86/um/sys_call_table_64.c         |   1 +
 fs/binfmt_em86.c                        |   4 +
 fs/binfmt_misc.c                        |   4 +
 fs/binfmt_script.c                      |  10 +
 fs/exec.c                               | 110 +++++++--
 fs/namei.c                              |   2 +-
 include/linux/binfmts.h                 |   4 +
 include/linux/compat.h                  |   3 +
 include/linux/fs.h                      |   1 +
 include/linux/sched.h                   |   4 +
 include/linux/syscalls.h                |   4 +
 include/uapi/asm-generic/unistd.h       |   4 +-
 kernel/sys_ni.c                         |   3 +
 lib/audit.c                             |   3 +
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/exec/.gitignore |   9 +
 tools/testing/selftests/exec/Makefile   |  25 ++
 tools/testing/selftests/exec/execveat.c | 397 ++++++++++++++++++++++++++++++++
 24 files changed, 608 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/exec/.gitignore
 create mode 100644 tools/testing/selftests/exec/Makefile
 create mode 100644 tools/testing/selftests/exec/execveat.c

--
2.1.0.rc2.206.gedb03e5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCHv7 1/3] syscalls,x86: implement execveat() system call
  2014-11-07 17:01 [PATCHv7 0/3] syscalls,x86: Add execveat() system call David Drysdale
@ 2014-11-07 17:01 ` David Drysdale
  2014-11-07 17:01 ` [PATCHv7 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: David Drysdale @ 2014-11-07 17:01 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, David Drysdale

Add a new execveat(2) system call. execveat() is to execve() as
openat() is to open(): it takes a file descriptor that refers to a
directory, and resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in
other UNIXen, but in Linux glibc it depends on opening
"/proc/self/fd/<fd>" (and so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also,
script execution via an O_CLOEXEC file descriptor fails (as the file
will not be accessible after exec).

Only x86-64, i386 and x32 ABIs are supported in this patch.

Based on patches by Meredydd Luff <meredydd@senatehouse.org>

Signed-off-by: David Drysdale <drysdale@google.com>
---
 arch/x86/ia32/audit.c             |   1 +
 arch/x86/ia32/ia32entry.S         |   1 +
 arch/x86/kernel/audit_64.c        |   1 +
 arch/x86/kernel/entry_64.S        |  28 ++++++++++
 arch/x86/syscalls/syscall_32.tbl  |   1 +
 arch/x86/syscalls/syscall_64.tbl  |   2 +
 arch/x86/um/sys_call_table_64.c   |   1 +
 fs/binfmt_em86.c                  |   4 ++
 fs/binfmt_misc.c                  |   4 ++
 fs/binfmt_script.c                |  10 ++++
 fs/exec.c                         | 110 +++++++++++++++++++++++++++++++++-----
 fs/namei.c                        |   2 +-
 include/linux/binfmts.h           |   4 ++
 include/linux/compat.h            |   3 ++
 include/linux/fs.h                |   1 +
 include/linux/sched.h             |   4 ++
 include/linux/syscalls.h          |   4 ++
 include/uapi/asm-generic/unistd.h |   4 +-
 kernel/sys_ni.c                   |   3 ++
 lib/audit.c                       |   3 ++
 20 files changed, 176 insertions(+), 15 deletions(-)

diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
index 5d7b381da692..2eccc8932ae6 100644
--- a/arch/x86/ia32/audit.c
+++ b/arch/x86/ia32/audit.c
@@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
 	case __NR_socketcall:
 		return 4;
 	case __NR_execve:
+	case __NR_execveat:
 		return 5;
 	default:
 		return 1;
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb05023c..2516c09743e0 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -464,6 +464,7 @@ GLOBAL(\label)
 	PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
 	PTREGSCALL stub32_sigreturn, sys32_sigreturn
 	PTREGSCALL stub32_execve, compat_sys_execve
+	PTREGSCALL stub32_execveat, compat_sys_execveat
 	PTREGSCALL stub32_fork, sys_fork
 	PTREGSCALL stub32_vfork, sys_vfork

diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
index 06d3e5a14d9d..f3672508b249 100644
--- a/arch/x86/kernel/audit_64.c
+++ b/arch/x86/kernel/audit_64.c
@@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
 	case __NR_openat:
 		return 3;
 	case __NR_execve:
+	case __NR_execveat:
 		return 5;
 	default:
 		return 0;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 2fac1343a90b..00c4526e6ffe 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -665,6 +665,20 @@ ENTRY(stub_execve)
 	CFI_ENDPROC
 END(stub_execve)

+ENTRY(stub_execveat)
+	CFI_STARTPROC
+	addq $8, %rsp
+	PARTIAL_FRAME 0
+	SAVE_REST
+	FIXUP_TOP_OF_STACK %r11
+	call sys_execveat
+	RESTORE_TOP_OF_STACK %r11
+	movq %rax,RAX(%rsp)
+	RESTORE_REST
+	jmp int_ret_from_sys_call
+	CFI_ENDPROC
+END(stub_execveat)
+
 /*
  * sigreturn is special because it needs to restore all registers on return.
  * This cannot be done with SYSRET, so use the IRET return path instead.
@@ -710,6 +724,20 @@ ENTRY(stub_x32_execve)
 	CFI_ENDPROC
 END(stub_x32_execve)

+ENTRY(stub_x32_execveat)
+	CFI_STARTPROC
+	addq $8, %rsp
+	PARTIAL_FRAME 0
+	SAVE_REST
+	FIXUP_TOP_OF_STACK %r11
+	call compat_sys_execveat
+	RESTORE_TOP_OF_STACK %r11
+	movq %rax,RAX(%rsp)
+	RESTORE_REST
+	jmp int_ret_from_sys_call
+	CFI_ENDPROC
+END(stub_x32_execveat)
+
 #endif

 /*
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 028b78168d85..2633e3195455 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -363,3 +363,4 @@
 354	i386	seccomp			sys_seccomp
 355	i386	getrandom		sys_getrandom
 356	i386	memfd_create		sys_memfd_create
+357	i386	execveat		sys_execveat			stub32_execveat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 35dd922727b9..1af5badd159c 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -327,6 +327,7 @@
 318	common	getrandom		sys_getrandom
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
+321	64	execveat		stub_execveat

 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -365,3 +366,4 @@
 542	x32	getsockopt		compat_sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
+545	x32	execveat		stub_x32_execveat
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723070ca..20c3649d0691 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -31,6 +31,7 @@
 #define stub_fork sys_fork
 #define stub_vfork sys_vfork
 #define stub_execve sys_execve
+#define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn

 #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index f37b08cea1f7..490538536cb4 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
 			return -ENOEXEC;
 	}

+	/* Need to be able to load the file after exec */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
 	allow_write_access(bprm->file);
 	fput(bprm->file);
 	bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index b60500300dd7..e659f5562356 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -127,6 +127,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (!fmt)
 		goto _ret;

+	/* Need to be able to load the file after exec */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
 		retval = remove_arg_zero(bprm);
 		if (retval)
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 5027a3e14922..afdf4e3cafc2 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)

 	if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
 		return -ENOEXEC;
+
+	/*
+	 * If the script filename will be inaccessible after exec, typically
+	 * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
+	 * up now (on the assumption that the interpreter will want to load
+	 * this file).
+	 */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
 	/*
 	 * This section does the #! interpretation.
 	 * Sorta complicated, but hopefully it will work.  -TYT
diff --git a/fs/exec.c b/fs/exec.c
index a2b42a98c743..500363723b57 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -747,18 +747,25 @@ EXPORT_SYMBOL(setup_arg_pages);

 #endif /* CONFIG_MMU */

-static struct file *do_open_exec(struct filename *name)
+static struct file *do_open_execat(int fd, struct filename *name, int flags)
 {
 	struct file *file;
 	int err;
-	static const struct open_flags open_exec_flags = {
+	struct open_flags open_exec_flags = {
 		.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
 		.acc_mode = MAY_EXEC | MAY_OPEN,
 		.intent = LOOKUP_OPEN,
 		.lookup_flags = LOOKUP_FOLLOW,
 	};

-	file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
+	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+		return ERR_PTR(-EINVAL);
+	if (flags & AT_SYMLINK_NOFOLLOW)
+		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
+	if (flags & AT_EMPTY_PATH)
+		open_exec_flags.lookup_flags |= LOOKUP_EMPTY;
+
+	file = do_filp_open(fd, name, &open_exec_flags);
 	if (IS_ERR(file))
 		goto out;

@@ -769,12 +776,13 @@ static struct file *do_open_exec(struct filename *name)
 	if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
 		goto exit;

-	fsnotify_open(file);
-
 	err = deny_write_access(file);
 	if (err)
 		goto exit;

+	if (name->name[0] != '\0')
+		fsnotify_open(file);
+
 out:
 	return file;

@@ -786,7 +794,7 @@ exit:
 struct file *open_exec(const char *name)
 {
 	struct filename tmp = { .name = name };
-	return do_open_exec(&tmp);
+	return do_open_execat(AT_FDCWD, &tmp, 0);
 }
 EXPORT_SYMBOL(open_exec);

@@ -1422,10 +1430,12 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int do_execve_common(struct filename *filename,
-				struct user_arg_ptr argv,
-				struct user_arg_ptr envp)
+static int do_execveat_common(int fd, struct filename *filename,
+			      struct user_arg_ptr argv,
+			      struct user_arg_ptr envp,
+			      int flags)
 {
+	char *pathbuf = NULL;
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
@@ -1466,7 +1476,7 @@ static int do_execve_common(struct filename *filename,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;

-	file = do_open_exec(filename);
+	file = do_open_execat(fd, filename, flags);
 	retval = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out_unmark;
@@ -1474,7 +1484,26 @@ static int do_execve_common(struct filename *filename,
 	sched_exec();

 	bprm->file = file;
-	bprm->filename = bprm->interp = filename->name;
+	if (fd == AT_FDCWD || filename->name[0] == '/') {
+		bprm->filename = filename->name;
+	} else {
+		if (filename->name[0] == '\0')
+			pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d", fd);
+		else
+			pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d/%s",
+					    fd, filename->name);
+		if (!pathbuf) {
+			retval = -ENOMEM;
+			goto out_unmark;
+		}
+		/* Record that a name derived from an O_CLOEXEC fd will be
+		 * inaccessible after exec. Relies on having exclusive access to
+		 * current->files (due to unshare_files above). */
+		if (close_on_exec(fd, current->files->fdt))
+			bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
+		bprm->filename = pathbuf;
+	}
+	bprm->interp = bprm->filename;

 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1532,6 +1561,7 @@ out_unmark:

 out_free:
 	free_bprm(bprm);
+	kfree(pathbuf);

 out_files:
 	if (displaced)
@@ -1547,7 +1577,18 @@ int do_execve(struct filename *filename,
 {
 	struct user_arg_ptr argv = { .ptr.native = __argv };
 	struct user_arg_ptr envp = { .ptr.native = __envp };
-	return do_execve_common(filename, argv, envp);
+	return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+int do_execveat(int fd, struct filename *filename,
+		const char __user *const __user *__argv,
+		const char __user *const __user *__envp,
+		int flags)
+{
+	struct user_arg_ptr argv = { .ptr.native = __argv };
+	struct user_arg_ptr envp = { .ptr.native = __envp };
+
+	return do_execveat_common(fd, filename, argv, envp, flags);
 }

 #ifdef CONFIG_COMPAT
@@ -1563,7 +1604,23 @@ static int compat_do_execve(struct filename *filename,
 		.is_compat = true,
 		.ptr.compat = __envp,
 	};
-	return do_execve_common(filename, argv, envp);
+	return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+static int compat_do_execveat(int fd, struct filename *filename,
+			      const compat_uptr_t __user *__argv,
+			      const compat_uptr_t __user *__envp,
+			      int flags)
+{
+	struct user_arg_ptr argv = {
+		.is_compat = true,
+		.ptr.compat = __argv,
+	};
+	struct user_arg_ptr envp = {
+		.is_compat = true,
+		.ptr.compat = __envp,
+	};
+	return do_execveat_common(fd, filename, argv, envp, flags);
 }
 #endif

@@ -1603,6 +1660,20 @@ SYSCALL_DEFINE3(execve,
 {
 	return do_execve(getname(filename), argv, envp);
 }
+
+SYSCALL_DEFINE5(execveat,
+		int, fd, const char __user *, filename,
+		const char __user *const __user *, argv,
+		const char __user *const __user *, envp,
+		int, flags)
+{
+	int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+	return do_execveat(fd,
+			   getname_flags(filename, lookup_flags, NULL),
+			   argv, envp, flags);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
 	const compat_uptr_t __user *, argv,
@@ -1610,4 +1681,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
 {
 	return compat_do_execve(getname(filename), argv, envp);
 }
+
+COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
+		       const char __user *, filename,
+		       const compat_uptr_t __user *, argv,
+		       const compat_uptr_t __user *, envp,
+		       int,  flags)
+{
+	int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+	return compat_do_execveat(fd,
+				  getname_flags(filename, lookup_flags, NULL),
+				  argv, envp, flags);
+}
 #endif
diff --git a/fs/namei.c b/fs/namei.c
index a7b05bf82d31..553c84d3e0cc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -130,7 +130,7 @@ void final_putname(struct filename *name)

 #define EMBEDDED_NAME_MAX	(PATH_MAX - sizeof(struct filename))

-static struct filename *
+struct filename *
 getname_flags(const char __user *filename, int flags, int *empty)
 {
 	struct filename *result, *err;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 61f29e5ea840..576e4639ca60 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -53,6 +53,10 @@ struct linux_binprm {
 #define BINPRM_FLAGS_EXECFD_BIT 1
 #define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)

+/* filename of the binary will be inaccessible after exec */
+#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
+#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
+
 /* Function parameter for binfmt->coredump */
 struct coredump_params {
 	const siginfo_t *siginfo;
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e6494261eaff..7450ca2ac1fc 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);

 asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp);
+asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
+		     const compat_uptr_t __user *argv,
+		     const compat_uptr_t __user *envp, int flags);

 asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
 		compat_ulong_t __user *outp, compat_ulong_t __user *exp,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 94187721ad41..e9818574d738 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
 extern struct file * dentry_open(const struct path *, int, const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);

+extern struct filename *getname_flags(const char __user *, int, int *);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b867a4dab38a..33e056da7d33 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2430,6 +2430,10 @@ extern void do_group_exit(int);
 extern int do_execve(struct filename *,
 		     const char __user * const __user *,
 		     const char __user * const __user *);
+extern int do_execveat(int, struct filename *,
+		       const char __user * const __user *,
+		       const char __user * const __user *,
+		       int);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
 struct task_struct *fork_idle(int);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0f86d85a9ce4..df5422294deb 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
 			      unsigned int flags);

+asmlinkage long sys_execveat(int dfd, const char __user *filename,
+			const char __user *const __user *argv,
+			const char __user *const __user *envp, int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 11d11bc5c78f..feef07d29663 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
 __SYSCALL(__NR_getrandom, sys_getrandom)
 #define __NR_memfd_create 279
 __SYSCALL(__NR_memfd_create, sys_memfd_create)
+#define __NR_execveat 280
+__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)

 #undef __NR_syscalls
-#define __NR_syscalls 280
+#define __NR_syscalls 281

 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 391d4ddb6f4b..efb06058ad3e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);

 /* operate on Secure Computing state */
 cond_syscall(sys_seccomp);
+
+/* execveat */
+cond_syscall(sys_execveat);
diff --git a/lib/audit.c b/lib/audit.c
index 1d726a22565b..b8fb5ee81e26 100644
--- a/lib/audit.c
+++ b/lib/audit.c
@@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
 	case __NR_socketcall:
 		return 4;
 #endif
+#ifdef __NR_execveat
+	case __NR_execveat:
+#endif
 	case __NR_execve:
 		return 5;
 	default:
--
2.1.0.rc2.206.gedb03e5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCHv7 2/3] syscalls,x86: add selftest for execveat(2)
  2014-11-07 17:01 [PATCHv7 0/3] syscalls,x86: Add execveat() system call David Drysdale
  2014-11-07 17:01 ` [PATCHv7 1/3] syscalls,x86: implement " David Drysdale
@ 2014-11-07 17:01 ` David Drysdale
  2014-11-07 17:01 ` [PATCHv7 man-pages 3/3] execveat.2: initial man page " David Drysdale
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: David Drysdale @ 2014-11-07 17:01 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, David Drysdale

Signed-off-by: David Drysdale <drysdale@google.com>
---
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/exec/.gitignore |   9 +
 tools/testing/selftests/exec/Makefile   |  25 ++
 tools/testing/selftests/exec/execveat.c | 397 ++++++++++++++++++++++++++++++++
 4 files changed, 432 insertions(+)
 create mode 100644 tools/testing/selftests/exec/.gitignore
 create mode 100644 tools/testing/selftests/exec/Makefile
 create mode 100644 tools/testing/selftests/exec/execveat.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 36ff2e4c7b6f..210cf68b3511 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += powerpc
 TARGETS += user
 TARGETS += sysctl
 TARGETS += firmware
+TARGETS += exec

 TARGETS_HOTPLUG = cpu-hotplug
 TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
new file mode 100644
index 000000000000..64073e050c6a
--- /dev/null
+++ b/tools/testing/selftests/exec/.gitignore
@@ -0,0 +1,9 @@
+subdir*
+script*
+execveat
+execveat.symlink
+execveat.moved
+execveat.path.ephemeral
+execveat.ephemeral
+execveat.denatured
+xxxxxxxx*
\ No newline at end of file
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
new file mode 100644
index 000000000000..66dfc2ce1788
--- /dev/null
+++ b/tools/testing/selftests/exec/Makefile
@@ -0,0 +1,25 @@
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -Wall
+BINARIES = execveat
+DEPS = execveat.symlink execveat.denatured script subdir
+all: $(BINARIES) $(DEPS)
+
+subdir:
+	mkdir -p $@
+script:
+	echo '#!/bin/sh' > $@
+	echo 'exit $$*' >> $@
+	chmod +x $@
+execveat.symlink: execveat
+	ln -s -f $< $@
+execveat.denatured: execveat
+	cp $< $@
+	chmod -x $@
+%: %.c
+	$(CC) $(CFLAGS) -o $@ $^
+
+run_tests: all
+	./execveat
+
+clean:
+	rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved xxxxx*
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
new file mode 100644
index 000000000000..33a5c06d95ca
--- /dev/null
+++ b/tools/testing/selftests/exec/execveat.c
@@ -0,0 +1,397 @@
+/*
+ * Copyright (c) 2014 Google, Inc.
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftests for execveat(2).
+ */
+
+#define _GNU_SOURCE  /* to get O_PATH, AT_EMPTY_PATH */
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+static char longpath[2 * PATH_MAX] = "";
+static char *envp[] = { "IN_TEST=yes", NULL, NULL };
+static char *argv[] = { "execveat", "99", NULL };
+
+static int execveat_(int fd, const char *path, char **argv, char **envp,
+		     int flags)
+{
+#ifdef __NR_execveat
+	return syscall(__NR_execveat, fd, path, argv, envp, flags);
+#else
+	errno = -ENOSYS;
+	return -1;
+#endif
+}
+
+#define check_execveat_fail(fd, path, flags, errno)	\
+	_check_execveat_fail(fd, path, flags, errno, #errno)
+static int _check_execveat_fail(int fd, const char *path, int flags,
+				int expected_errno, const char *errno_str)
+{
+	int rc;
+
+	errno = 0;
+	printf("Check failure of execveat(%d, '%s', %d) with %s... ",
+		fd, path?:"(null)", flags, errno_str);
+	rc = execveat_(fd, path, argv, envp, flags);
+
+	if (rc > 0) {
+		printf("[FAIL] (unexpected success from execveat(2))\n");
+		return 1;
+	}
+	if (errno != expected_errno) {
+		printf("[FAIL] (expected errno %d (%s) not %d (%s)\n",
+			expected_errno, strerror(expected_errno),
+			errno, strerror(errno));
+		return 1;
+	}
+	printf("[OK]\n");
+	return 0;
+}
+
+static int check_execveat_invoked_rc(int fd, const char *path, int flags,
+				     int expected_rc)
+{
+	int status;
+	int rc;
+	pid_t child;
+	int pathlen = path ? strlen(path) : 0;
+
+	if (pathlen > 40)
+		printf("Check success of execveat(%d, '%.20s...%s', %d)... ",
+			fd, path, (path + pathlen - 20), flags);
+	else
+		printf("Check success of execveat(%d, '%s', %d)... ",
+			fd, path?:"(null)", flags);
+	child = fork();
+	if (child < 0) {
+		printf("[FAIL] (fork() failed)\n");
+		return 1;
+	}
+	if (child == 0) {
+		/* Child: do execveat(). */
+		rc = execveat_(fd, path, argv, envp, flags);
+		printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n",
+			rc, errno, strerror(errno));
+		exit(1);  /* should not reach here */
+	}
+	/* Parent: wait for & check child's exit status. */
+	rc = waitpid(child, &status, 0);
+	if (rc != child) {
+		printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc);
+		return 1;
+	}
+	if (!WIFEXITED(status)) {
+		printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n",
+			child, status);
+		return 1;
+	}
+	if (WEXITSTATUS(status) != expected_rc) {
+		printf("[FAIL] (child %d exited with %d not %d)\n",
+			child, WEXITSTATUS(status), expected_rc);
+		return 1;
+	}
+	printf("[OK]\n");
+	return 0;
+}
+
+static int check_execveat(int fd, const char *path, int flags)
+{
+	return check_execveat_invoked_rc(fd, path, flags, 99);
+}
+
+static char *concat(const char *left, const char *right)
+{
+	char *result = malloc(strlen(left) + strlen(right) + 1);
+
+	strcpy(result, left);
+	strcat(result, right);
+	return result;
+}
+
+static int open_or_die(const char *filename, int flags)
+{
+	int fd = open(filename, flags);
+
+	if (fd < 0) {
+		printf("Failed to open '%s'; "
+			"check prerequisites are available\n", filename);
+		exit(1);
+	}
+	return fd;
+}
+
+static void exe_cp(const char *src, const char *dest)
+{
+	int in_fd = open_or_die(src, O_RDONLY);
+	int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755);
+	struct stat info;
+
+	fstat(in_fd, &info);
+	sendfile(out_fd, in_fd, NULL, info.st_size);
+	close(in_fd);
+	close(out_fd);
+}
+
+#define XX_DIR_LEN 200
+static int check_execveat_pathmax(int dot_dfd, const char *src, int is_script)
+{
+	int fail = 0;
+	int ii, count, len;
+	char longname[XX_DIR_LEN + 1];
+	int fd;
+
+	if (*longpath == '\0') {
+		/* Create a filename close to PATH_MAX in length */
+		memset(longname, 'x', XX_DIR_LEN - 1);
+		longname[XX_DIR_LEN - 1] = '/';
+		longname[XX_DIR_LEN] = '\0';
+		count = (PATH_MAX - 3) / XX_DIR_LEN;
+		for (ii = 0; ii < count; ii++) {
+			strcat(longpath, longname);
+			mkdir(longpath, 0755);
+		}
+		len = (PATH_MAX - 3) - (count * XX_DIR_LEN);
+		if (len <= 0)
+			len = 1;
+		memset(longname, 'y', len);
+		longname[len] = '\0';
+		strcat(longpath, longname);
+	}
+	exe_cp(src, longpath);
+
+	/*
+	 * Execute as a pre-opened file descriptor, which works whether this is
+	 * a script or not (because the interpreter sees a filename like
+	 * "/dev/fd/20").
+	 */
+	fd = open(longpath, O_RDONLY);
+	if (fd > 0) {
+		printf("Invoke copy of '%s' via filename of length %lu:\n",
+			src, strlen(longpath));
+		fail += check_execveat(fd, "", AT_EMPTY_PATH);
+	} else {
+		printf("Failed to open length %lu filename, errno=%d (%s)\n",
+			strlen(longpath), errno, strerror(errno));
+		fail++;
+	}
+
+	/*
+	 * Execute as a long pathname relative to ".".  If this is a script,
+	 * the interpreter will launch but fail to open the script because its
+	 * name ("/dev/fd/5/xxx....") is bigger than PATH_MAX.
+	 */
+	if (is_script)
+		fail += check_execveat_invoked_rc(dot_dfd, longpath, 0, 127);
+	else
+		fail += check_execveat(dot_dfd, longpath, 0);
+
+	return fail;
+}
+
+static int run_tests(void)
+{
+	int fail = 0;
+	char *fullname = realpath("execveat", NULL);
+	char *fullname_script = realpath("script", NULL);
+	char *fullname_symlink = concat(fullname, ".symlink");
+	int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY);
+	int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral",
+					       O_DIRECTORY|O_RDONLY);
+	int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY);
+	int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH);
+	int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC);
+	int fd = open_or_die("execveat", O_RDONLY);
+	int fd_path = open_or_die("execveat", O_RDONLY|O_PATH);
+	int fd_symlink = open_or_die("execveat.symlink", O_RDONLY);
+	int fd_denatured = open_or_die("execveat.denatured", O_RDONLY);
+	int fd_denatured_path = open_or_die("execveat.denatured",
+					    O_RDONLY|O_PATH);
+	int fd_script = open_or_die("script", O_RDONLY);
+	int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY);
+	int fd_ephemeral_path = open_or_die("execveat.path.ephemeral",
+					    O_RDONLY|O_PATH);
+	int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY);
+	int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC);
+	int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC);
+
+	/* Change file position to confirm it doesn't affect anything */
+	lseek(fd, 10, SEEK_SET);
+
+	/* Normal executable file: */
+	/*   dfd + path */
+	fail += check_execveat(subdir_dfd, "../execveat", 0);
+	fail += check_execveat(dot_dfd, "execveat", 0);
+	fail += check_execveat(dot_dfd_path, "execveat", 0);
+	/*   absolute path */
+	fail += check_execveat(AT_FDCWD, fullname, 0);
+	/*   absolute path with nonsense dfd */
+	fail += check_execveat(99, fullname, 0);
+	/*   fd + no path */
+	fail += check_execveat(fd, "", AT_EMPTY_PATH);
+	/*   O_CLOEXEC fd + no path */
+	fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH);
+	/*   O_PATH fd */
+	fail += check_execveat(fd_path, "", AT_EMPTY_PATH);
+
+	/* Mess with executable file that's already open: */
+	/*   fd + no path to a file that's been renamed */
+	rename("execveat.ephemeral", "execveat.moved");
+	fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+	/*   fd + no path to a file that's been deleted */
+	unlink("execveat.moved"); /* remove the file now fd open */
+	fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+
+	/* Mess with executable file that's already open with O_PATH */
+	/*   fd + no path to a file that's been deleted */
+	unlink("execveat.path.ephemeral");
+	fail += check_execveat(fd_ephemeral_path, "", AT_EMPTY_PATH);
+
+	/* Invalid argument failures */
+	fail += check_execveat_fail(fd, "", 0, ENOENT);
+	fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT);
+
+	/* Symlink to executable file: */
+	/*   dfd + path */
+	fail += check_execveat(dot_dfd, "execveat.symlink", 0);
+	fail += check_execveat(dot_dfd_path, "execveat.symlink", 0);
+	/*   absolute path */
+	fail += check_execveat(AT_FDCWD, fullname_symlink, 0);
+	/*   fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */
+	fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH);
+	fail += check_execveat(fd_symlink, "",
+			       AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+
+	/* Symlink fails when AT_SYMLINK_NOFOLLOW set: */
+	/*   dfd + path */
+	fail += check_execveat_fail(dot_dfd, "execveat.symlink",
+				    AT_SYMLINK_NOFOLLOW, ELOOP);
+	fail += check_execveat_fail(dot_dfd_path, "execveat.symlink",
+				    AT_SYMLINK_NOFOLLOW, ELOOP);
+	/*   absolute path */
+	fail += check_execveat_fail(AT_FDCWD, fullname_symlink,
+				    AT_SYMLINK_NOFOLLOW, ELOOP);
+
+	/* Shell script wrapping executable file: */
+	/*   dfd + path */
+	fail += check_execveat(subdir_dfd, "../script", 0);
+	fail += check_execveat(dot_dfd, "script", 0);
+	fail += check_execveat(dot_dfd_path, "script", 0);
+	/*   absolute path */
+	fail += check_execveat(AT_FDCWD, fullname_script, 0);
+	/*   fd + no path */
+	fail += check_execveat(fd_script, "", AT_EMPTY_PATH);
+	fail += check_execveat(fd_script, "",
+			       AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+	/*   O_CLOEXEC fd fails for a script (as script file inaccessible) */
+	fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH,
+				    ENOENT);
+	fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT);
+
+	/* Mess with script file that's already open: */
+	/*   fd + no path to a file that's been renamed */
+	rename("script.ephemeral", "script.moved");
+	fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+	/*   fd + no path to a file that's been deleted */
+	unlink("script.moved"); /* remove the file while fd open */
+	fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+
+	/* Rename a subdirectory in the path: */
+	rename("subdir.ephemeral", "subdir.moved");
+	fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+	fail += check_execveat(subdir_dfd_ephemeral, "script", 0);
+	/* Remove the subdir and its contents */
+	unlink("subdir.moved/script");
+	unlink("subdir.moved");
+	/* Shell loads via deleted subdir OK because name starts with .. */
+	fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+	fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT);
+
+	/* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */
+	fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL);
+	/* Invalid path => ENOENT */
+	fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT);
+	fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT);
+	fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT);
+	/* Attempt to execute directory => EACCES */
+	fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES);
+	/* Attempt to execute non-executable => EACCES */
+	fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES);
+	fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES);
+	fail += check_execveat_fail(fd_denatured_path, "", AT_EMPTY_PATH,
+				    EACCES);
+	/* Attempt to execute nonsense FD => EBADF */
+	fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF);
+	fail += check_execveat_fail(99, "execveat", 0, EBADF);
+	/* Attempt to execute relative to non-directory => ENOTDIR */
+	fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR);
+
+	fail += check_execveat_pathmax(dot_dfd, "execveat", 0);
+	fail += check_execveat_pathmax(dot_dfd, "script", 1);
+	return fail;
+}
+
+static void prerequisites(void)
+{
+	int fd;
+	const char *script = "#!/bin/sh\nexit $*\n";
+
+	/* Create ephemeral copies of files */
+	exe_cp("execveat", "execveat.ephemeral");
+	exe_cp("execveat", "execveat.path.ephemeral");
+	exe_cp("script", "script.ephemeral");
+	mkdir("subdir.ephemeral", 0755);
+
+	fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755);
+	write(fd, script, strlen(script));
+	close(fd);
+}
+
+int main(int argc, char **argv)
+{
+	int ii;
+	int rc;
+	const char *verbose = getenv("VERBOSE");
+
+	if (argc >= 2) {
+		/* If we are invoked with an argument, don't run tests. */
+		const char *in_test = getenv("IN_TEST");
+
+		if (verbose) {
+			printf("  invoked with:");
+			for (ii = 0; ii < argc; ii++)
+				printf(" [%d]='%s'", ii, argv[ii]);
+			printf("\n");
+		}
+
+		/* Check expected environment transferred. */
+		if (!in_test || strcmp(in_test, "yes") != 0) {
+			printf("[FAIL] (no IN_TEST=yes in env)\n");
+			return 1;
+		}
+
+		/* Use the final argument as an exit code. */
+		rc = atoi(argv[argc - 1]);
+		fflush(stdout);
+	} else {
+		prerequisites();
+		if (verbose)
+			envp[1] = "VERBOSE=1";
+		rc = run_tests();
+		if (rc > 0)
+			printf("%d tests failed\n", rc);
+	}
+	return rc;
+}
--
2.1.0.rc2.206.gedb03e5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCHv7 man-pages 3/3] execveat.2: initial man page for execveat(2)
  2014-11-07 17:01 [PATCHv7 0/3] syscalls,x86: Add execveat() system call David Drysdale
  2014-11-07 17:01 ` [PATCHv7 1/3] syscalls,x86: implement " David Drysdale
  2014-11-07 17:01 ` [PATCHv7 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale
@ 2014-11-07 17:01 ` David Drysdale
  2014-11-12 21:50 ` [PATCHv7 0/3] syscalls,x86: Add execveat() system call Andrew Morton
  2014-11-12 22:08 ` Andrew Morton
  4 siblings, 0 replies; 9+ messages in thread
From: David Drysdale @ 2014-11-07 17:01 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, David Drysdale

Signed-off-by: David Drysdale <drysdale@google.com>
---
 man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)
 create mode 100644 man2/execveat.2

diff --git a/man2/execveat.2 b/man2/execveat.2
new file mode 100644
index 000000000000..937d79e4c4f0
--- /dev/null
+++ b/man2/execveat.2
@@ -0,0 +1,153 @@
+.\" Copyright (c) 2014 Google, Inc.
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+execveat \- execute program relative to a directory file descriptor
+.SH SYNOPSIS
+.B #include <unistd.h>
+.sp
+.BI "int execveat(int " fd ", const char *" pathname ","
+.br
+.BI "             char *const " argv "[],  char *const " envp "[],"
+.br
+.BI "             int " flags);
+.SH DESCRIPTION
+The
+.BR execveat ()
+system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP.
+The
+.BR execveat ()
+system call operates in exactly the same way as
+.BR execve (2),
+except for the differences described in this manual page.
+
+If the pathname given in
+.I pathname
+is relative, then it is interpreted relative to the directory
+referred to by the file descriptor
+.I fd
+(rather than relative to the current working directory of
+the calling process, as is done by
+.BR execve (2)
+for a relative pathname).
+
+If
+.I pathname
+is relative and
+.I fd
+is the special value
+.BR AT_FDCWD ,
+then
+.I pathname
+is interpreted relative to the current working
+directory of the calling process (like
+.BR execve (2)).
+
+If
+.I pathname
+is absolute, then
+.I fd
+is ignored.
+
+If
+.I pathname
+is an empty string and the
+.BR AT_EMPTY_PATH
+flag is specified, then the file descriptor
+.I fd
+specifies the file to be executed.
+
+.I flags
+can either be 0, or include the following flags:
+.TP
+.BR AT_EMPTY_PATH
+If
+.I pathname
+is an empty string, operate on the file referred to by
+.IR fd
+(which may have been obtained using the
+.BR open (2)
+.B O_PATH
+flag).
+.TP
+.B AT_SYMLINK_NOFOLLOW
+If the file identified by
+.I fd
+and a non-NULL
+.I pathname
+is a symbolic link, then the call fails with the error
+.BR EINVAL .
+.SH "RETURN VALUE"
+On success,
+.BR execveat ()
+does not return. On error \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+The same errors that occur for
+.BR execve (2)
+can also occur for
+.BR execveat ().
+The following additional errors can occur for
+.BR execveat ():
+.TP
+.B EBADF
+.I fd
+is not a valid file descriptor.
+.TP
+.B ENOENT
+The program identified by \fIfd\fP and \fIpathname\fP requires the
+use of an interpreter program (such as a script starting with
+"#!") but the file descriptor
+.I fd
+was opened with the
+.B O_CLOEXEC
+flag and so the program file is inaccessible to the launched interpreter.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ENOTDIR
+.I pathname
+is relative and
+.I fd
+is a file descriptor referring to a file other than a directory.
+.SH VERSIONS
+.BR execveat ()
+was added to Linux in kernel 3.???.
+.SH NOTES
+In addition to the reasons explained in
+.BR openat (2),
+the
+.BR execveat ()
+system call is also needed to allow
+.BR fexecve (3)
+to be implemented on systems that do not have the
+.I /proc
+filesystem mounted.
+.SH SEE ALSO
+.BR execve (2),
+.BR fexecve (3)
--
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCHv7 0/3] syscalls,x86: Add execveat() system call
  2014-11-07 17:01 [PATCHv7 0/3] syscalls,x86: Add execveat() system call David Drysdale
                   ` (2 preceding siblings ...)
  2014-11-07 17:01 ` [PATCHv7 man-pages 3/3] execveat.2: initial man page " David Drysdale
@ 2014-11-12 21:50 ` Andrew Morton
  2014-11-13 14:42   ` David Drysdale
  2014-11-12 22:08 ` Andrew Morton
  4 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2014-11-12 21:50 UTC (permalink / raw)
  To: David Drysdale
  Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Arnd Bergmann, Rich Felker,
	Christoph Hellwig, x86, linux-arch, linux-api

On Fri,  7 Nov 2014 17:01:01 +0000 David Drysdale <drysdale@google.com> wrote:

> This patch set adds execveat(2) for x86, and is derived from Meredydd
> Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
> 
> The primary aim of adding an execveat syscall is to allow an
> implementation of fexecve(3) that does not rely on the /proc
> filesystem, at least for executables (rather than scripts).  The
> current glibc version of fexecve(3) is implemented via /proc, which
> causes problems in sandboxed or otherwise restricted environments.

Have the relevant glibc people seen/reviewed/liked this?  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv7 0/3] syscalls,x86: Add execveat() system call
  2014-11-07 17:01 [PATCHv7 0/3] syscalls,x86: Add execveat() system call David Drysdale
                   ` (3 preceding siblings ...)
  2014-11-12 21:50 ` [PATCHv7 0/3] syscalls,x86: Add execveat() system call Andrew Morton
@ 2014-11-12 22:08 ` Andrew Morton
  2014-11-13 15:51   ` David Drysdale
  4 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2014-11-12 22:08 UTC (permalink / raw)
  To: David Drysdale
  Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Arnd Bergmann, Rich Felker,
	Christoph Hellwig, x86, linux-arch, linux-api, Michael Kerrisk

On Fri,  7 Nov 2014 17:01:01 +0000 David Drysdale <drysdale@google.com> wrote:

> This patch set adds execveat(2) for x86

I grabbed these.  If someone else was planning to do so, feel free to
shout at me.

I haven't been following the discussion closely so some reviewed-by's
and tested-by's would be nice.

Thanks for writing a manpage.  mtk.manpages@gmail.com should have been
cc'ed.  He is now ;)

Your syscall number was taken by sys_bpf.  I renumbered it
appropriately.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv7 0/3] syscalls,x86: Add execveat() system call
  2014-11-12 21:50 ` [PATCHv7 0/3] syscalls,x86: Add execveat() system call Andrew Morton
@ 2014-11-13 14:42   ` David Drysdale
  2014-11-13 20:13     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: David Drysdale @ 2014-11-13 14:42 UTC (permalink / raw)
  To: Andrew Morton, Christoph Hellwig, Rich Felker
  Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Arnd Bergmann, X86 ML, linux-arch,
	Linux API

On Wed, Nov 12, 2014 at 9:50 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri,  7 Nov 2014 17:01:01 +0000 David Drysdale <drysdale@google.com> wrote:
>
>> This patch set adds execveat(2) for x86, and is derived from Meredydd
>> Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
>>
>> The primary aim of adding an execveat syscall is to allow an
>> implementation of fexecve(3) that does not rely on the /proc
>> filesystem, at least for executables (rather than scripts).  The
>> current glibc version of fexecve(3) is implemented via /proc, which
>> causes problems in sandboxed or otherwise restricted environments.
>
> Have the relevant glibc people seen/reviewed/liked this?

I think it's been mentioned in passing but not explicitly discussed over there
(https://sourceware.org/ml/libc-alpha/2014-10/msg00497.html,
https://sourceware.org/ml/libc-alpha/2014-10/msg00509.html)
and a couple of the participants in that thread (Christoph Hellwig, Rich Felker)
were also cc:ed here.

It sounded like execveat might be useful for another feature (O_EXEC) but
I'm not sure whether that amounts to the relevant glibc folk liking this...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv7 0/3] syscalls,x86: Add execveat() system call
  2014-11-12 22:08 ` Andrew Morton
@ 2014-11-13 15:51   ` David Drysdale
  0 siblings, 0 replies; 9+ messages in thread
From: David Drysdale @ 2014-11-13 15:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro,
	Meredydd Luff, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Arnd Bergmann, Rich Felker,
	Christoph Hellwig, X86 ML, linux-arch, Linux API,
	Michael Kerrisk

On Wed, Nov 12, 2014 at 10:08 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri,  7 Nov 2014 17:01:01 +0000 David Drysdale <drysdale@google.com> wrote:
>
>> This patch set adds execveat(2) for x86
>
> I grabbed these.  If someone else was planning to do so, feel free to
> shout at me.
>
> I haven't been following the discussion closely so some reviewed-by's
> and tested-by's would be nice.

Yes please -- Andy, Eric, Al?

> Thanks for writing a manpage.  mtk.manpages@gmail.com should have been
> cc'ed.  He is now ;)

Thanks (for some reason I though cc'ing linux-api took care of Michael
seeing it).

> Your syscall number was taken by sys_bpf.  I renumbered it
> appropriately.

Looks like the robots have also found a few things that need fixing:
 1) Sparse warning on "close_on_exec(fd, current->files->fdt)": I guess
    that needs a rcu_dereference_raw() or some such around arg2.
 2) Syscall not implemented warning on sparc build: I could attempt to wire
    up the sparc syscall (although I can only cross-compile it, not run it),
    or is it best to leave it and ask sparclinux / David Miller nicely to take
    care of it?  Or have I just missed some bit of syscall machinery?
 3) Non-canonical comment format just before the close_on_exec() call.
(Well, that last wasn't found by a robot -- thanks Andrew!)

When I generate a new version, would it be easier for you if I also merged up
against v3.18-rc4?  That would at least take care of the syscall renumbering.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv7 0/3] syscalls,x86: Add execveat() system call
  2014-11-13 14:42   ` David Drysdale
@ 2014-11-13 20:13     ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2014-11-13 20:13 UTC (permalink / raw)
  To: David Drysdale
  Cc: Christoph Hellwig, Rich Felker, Eric W. Biederman,
	Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, X86 ML, linux-arch, Linux API

On Thu, 13 Nov 2014 14:42:58 +0000 David Drysdale <drysdale@google.com> wrote:

> On Wed, Nov 12, 2014 at 9:50 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Fri,  7 Nov 2014 17:01:01 +0000 David Drysdale <drysdale@google.com> wrote:
> >
> >> This patch set adds execveat(2) for x86, and is derived from Meredydd
> >> Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
> >>
> >> The primary aim of adding an execveat syscall is to allow an
> >> implementation of fexecve(3) that does not rely on the /proc
> >> filesystem, at least for executables (rather than scripts).  The
> >> current glibc version of fexecve(3) is implemented via /proc, which
> >> causes problems in sandboxed or otherwise restricted environments.
> >
> > Have the relevant glibc people seen/reviewed/liked this?
> 
> I think it's been mentioned in passing but not explicitly discussed over there
> (https://sourceware.org/ml/libc-alpha/2014-10/msg00497.html,
> https://sourceware.org/ml/libc-alpha/2014-10/msg00509.html)
> and a couple of the participants in that thread (Christoph Hellwig, Rich Felker)
> were also cc:ed here.
> 
> It sounded like execveat might be useful for another feature (O_EXEC) but
> I'm not sure whether that amounts to the relevant glibc folk liking this...

OK.  Could you please try to hunt down the appropriate people and give
them a poke?  We'd be in a mess if we merged this then glibc didn't use
it, or glibc developers required/suggested any interface modifications.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-11-13 20:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-07 17:01 [PATCHv7 0/3] syscalls,x86: Add execveat() system call David Drysdale
2014-11-07 17:01 ` [PATCHv7 1/3] syscalls,x86: implement " David Drysdale
2014-11-07 17:01 ` [PATCHv7 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale
2014-11-07 17:01 ` [PATCHv7 man-pages 3/3] execveat.2: initial man page " David Drysdale
2014-11-12 21:50 ` [PATCHv7 0/3] syscalls,x86: Add execveat() system call Andrew Morton
2014-11-13 14:42   ` David Drysdale
2014-11-13 20:13     ` Andrew Morton
2014-11-12 22:08 ` Andrew Morton
2014-11-13 15:51   ` David Drysdale

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).