All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
@ 2018-03-15 19:04 Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 01/36] syscalls: define goal to not call sys_xyzzy() from within the kernel Dominik Brodowski
                   ` (37 more replies)
  0 siblings, 38 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Here is a re-spin of the first set of patches which reduce the number of
syscall invocations from within the kernel; the RFC may be found at

The rationale for this change is described in patch 1 as follows:

	The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
	and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
	through kernel entry points, but not from the kernel itself. This
	will allow cleanups and optimizations to the entry paths *and* to
	the parts of the kernel code which currently need to pretend to be
	userspace in order to make use of syscalls.

The whole series can be found at 

	https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next

and will be submitted for merging for the v4.17-rc1 cycle, probably together
with another batch of related patches I hope to send out tomorrow as a RFC.

Changes since the RFC / v1:

- rebase to v4.15-rc5; sys_ioperm already got its SYSCALL_DEFINE3
- add ACKs
- CC: -> Cc: (suggested by Ingo Molnar)
- update comment in include/linux/syscalls.h (suggested by Ingo Molnar and
	Andy Lutomirski)
- separate declarations from definitions with newlines in
	include/linux/syscalls.h; add comment on ksys_close() (suggested by
	Ingo Molnar)
- expand commit messages (suggested by Christoph Hellwig)
- include patch 36:
	fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
- do not worry about the following archs, as they are going away:
	cris, frv, metag, mn10300, score, tile
	(solving conflicts in -next)
- fix builds with CONFIG_FUTEX=n, CONFIG_ADVISE_SYSCALLS=n (solving issues
	found by Stephen Rothwell)

Thanks,
	Dominik


Dominik Brodowski (36):
  syscalls: define goal to not call sys_xyzzy() from within the kernel
  kernel: use kernel_wait4() instead of sys_wait4()
  mm: use do_futex() instead of sys_futex() in mm_release()
  kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
  fs: add do_readlinkat() helper; remove internal call to
    sys_readlinkat()
  fs: add do_pipe2() helper; remove internal call to sys_pipe2()
  fs: add do_renameat2() helper; remove internal call to sys_renameat2()
  fs: add do_futimesat() helper; remove internal call to sys_futimesat()
  syscalls: add do_epoll_*() helpers; remove internal calls to
    sys_epoll_*()
  fs: add do_signalfd4() helper; remove internal calls to
    sys_signalfd4()
  fs: add do_eventfd() helper; remove internal call to sys_eventfd()
  kernel: open-code sys_rt_sigpending() in sys_sigpending()
  x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to
    sys_ioperm()
  fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
  fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
  fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  fs: add ksys_write() helper; remove in-kernel calls to sys_write()
  kernel: add ksys_unshare() helper; remove in-kernel calls to
    sys_unshare()
  mm: add ksys_fadvise64_64() helper; remove in-kernel call to
    sys_fadvise64_64()
  mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to
    sys_mmap_pgoff()
  fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
  fs: add ksys_sync_file_range helper(); remove in-kernel calls to
    syscall
  fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
  hostfs: rename do_rmdir() to hostfs_do_rmdir()
  fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
  fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel
    calls to syscall
  fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove
    in-kernel calls to syscall
  fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel
    calls to syscall
  fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel
    calls to syscall
  fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod()
    wrapper; remove in-kernel calls to syscall
  fs: add do_faccessat() helper and ksys_access() wrapper; remove
    in-kernel calls to syscall
  fs: add ksys_ftruncate() wrapper; remove in-kernel calls to
    sys_ftruncate()
  fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown()
    wrappers
  fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
  fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()

 Documentation/process/adding-syscalls.rst |  14 ---
 arch/alpha/kernel/osf_sys.c               |   2 +-
 arch/arm/kernel/sys_arm.c                 |   2 +-
 arch/arm64/kernel/sys.c                   |   2 +-
 arch/ia64/kernel/sys_ia64.c               |   4 +-
 arch/m68k/kernel/sys_m68k.c               |   2 +-
 arch/microblaze/kernel/sys_microblaze.c   |   6 +-
 arch/mips/kernel/linux32.c                |  10 +-
 arch/mips/kernel/syscall.c                |   6 +-
 arch/parisc/kernel/sys_parisc.c           |  14 +--
 arch/powerpc/kernel/sys_ppc32.c           |   8 +-
 arch/powerpc/kernel/syscalls.c            |   6 +-
 arch/riscv/kernel/sys_riscv.c             |   4 +-
 arch/s390/kernel/compat_linux.c           |  23 ++---
 arch/s390/kernel/sys_s390.c               |   2 +-
 arch/sh/kernel/sys_sh.c                   |   4 +-
 arch/sh/kernel/sys_sh32.c                 |   8 +-
 arch/sparc/kernel/sys_sparc32.c           |  14 +--
 arch/sparc/kernel/sys_sparc_32.c          |   6 +-
 arch/sparc/kernel/sys_sparc_64.c          |   2 +-
 arch/um/kernel/syscall.c                  |   2 +-
 arch/x86/ia32/sys_ia32.c                  |  22 ++---
 arch/x86/include/asm/syscalls.h           |   1 +
 arch/x86/kernel/ioport.c                  |   7 +-
 arch/x86/kernel/sys_x86_64.c              |   2 +-
 arch/xtensa/kernel/syscall.c              |   2 +-
 drivers/base/devtmpfs.c                   |  11 ++-
 drivers/tty/vt/vt_ioctl.c                 |   6 +-
 fs/autofs4/dev-ioctl.c                    |   2 +-
 fs/binfmt_misc.c                          |   2 +-
 fs/eventfd.c                              |   9 +-
 fs/eventpoll.c                            |  23 +++--
 fs/file.c                                 |  17 +++-
 fs/hostfs/hostfs.h                        |   2 +-
 fs/hostfs/hostfs_kern.c                   |   2 +-
 fs/hostfs/hostfs_user.c                   |   2 +-
 fs/internal.h                             |  14 +++
 fs/namei.c                                |  61 +++++++++----
 fs/namespace.c                            |  19 +++-
 fs/open.c                                 |  68 ++++++++++----
 fs/pipe.c                                 |   9 +-
 fs/read_write.c                           |   9 +-
 fs/signalfd.c                             |  14 ++-
 fs/stat.c                                 |  12 ++-
 fs/sync.c                                 |  12 ++-
 fs/utimes.c                               |  13 ++-
 include/linux/futex.h                     |  13 ++-
 include/linux/syscalls.h                  | 146 +++++++++++++++++++++++++++++-
 init/do_mounts.c                          |  16 ++--
 init/do_mounts.h                          |   4 +-
 init/do_mounts_initrd.c                   |  38 ++++----
 init/do_mounts_md.c                       |  14 +--
 init/do_mounts_rd.c                       |  18 ++--
 init/initramfs.c                          |  48 +++++-----
 init/main.c                               |   9 +-
 init/noinitramfs.c                        |   6 +-
 kernel/exit.c                             |   2 +-
 kernel/fork.c                             |  11 ++-
 kernel/pid_namespace.c                    |   6 +-
 kernel/signal.c                           |  15 ++-
 kernel/sys.c                              |   9 +-
 kernel/uid16.c                            |   6 +-
 kernel/umh.c                              |   2 +-
 mm/fadvise.c                              |  10 +-
 mm/mmap.c                                 |  17 +++-
 mm/nommu.c                                |  17 +++-
 66 files changed, 614 insertions(+), 275 deletions(-)

-- 
2.16.2

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 01/36] syscalls: define goal to not call sys_xyzzy() from within the kernel
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
@ 2018-03-15 19:04 ` Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4() Dominik Brodowski
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
through kernel entry points, but not from the kernel itself. This
will allow cleanups and optimizations to the entry paths *and* to
the parts of the kernel code which currently need to pretend to be
userspace in order to make use of syscalls.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 Documentation/process/adding-syscalls.rst | 14 --------------
 include/linux/syscalls.h                  |  7 +++++++
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst
index 8cc25a06f353..35a14b730da9 100644
--- a/Documentation/process/adding-syscalls.rst
+++ b/Documentation/process/adding-syscalls.rst
@@ -201,12 +201,6 @@ followed by the (type, name) pairs for the parameters as arguments.  Using
 this macro allows metadata about the new system call to be made available for
 other tools.
 
-The new entry point also needs a corresponding function prototype, in
-``include/linux/syscalls.h``, marked as asmlinkage to match the way that system
-calls are invoked::
-
-    asmlinkage long sys_xyzzy(...);
-
 Some architectures (e.g. x86) have their own architecture-specific syscall
 tables, but several other architectures share a generic syscall table. Add your
 new system call to the generic list by adding an entry to the list in
@@ -240,7 +234,6 @@ To summarize, you need a commit that includes:
 
  - ``CONFIG`` option for the new function, normally in ``init/Kconfig``
  - ``SYSCALL_DEFINEn(xyzzy, ...)`` for the entry point
- - corresponding prototype in ``include/linux/syscalls.h``
  - generic table entry in ``include/uapi/asm-generic/unistd.h``
  - fallback stub in ``kernel/sys_ni.c``
 
@@ -302,12 +295,6 @@ needed to deal with them.  (Typically, the ``compat_sys_`` version converts the
 values to 64-bit versions and either calls on to the ``sys_`` version, or both of
 them call a common inner implementation function.)
 
-The compat entry point also needs a corresponding function prototype, in
-``include/linux/compat.h``, marked as asmlinkage to match the way that system
-calls are invoked::
-
-    asmlinkage long compat_sys_xyzzy(...);
-
 If the system call involves a structure that is laid out differently on 32-bit
 and 64-bit systems, say ``struct xyzzy_args``, then the include/linux/compat.h
 header file should also include a compat version of the structure (``struct
@@ -344,7 +331,6 @@ version; the entry in ``include/uapi/asm-generic/unistd.h`` should use
 To summarize, you need:
 
  - a ``COMPAT_SYSCALL_DEFINEn(xyzzy, ...)`` for the compat entry point
- - corresponding prototype in ``include/linux/compat.h``
  - (if needed) 32-bit mapping struct in ``include/linux/compat.h``
  - instance of ``__SC_COMP`` not ``__SYSCALL`` in
    ``include/uapi/asm-generic/unistd.h``
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..0526286a0314 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -941,4 +941,11 @@ asmlinkage long sys_pkey_free(int pkey);
 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 
+
+/*
+ * Kernel code should not call syscalls (i.e., sys_xyzyyz()) directly.
+ * Instead, use one of the functions which work equivalently, such as
+ * the ksys_xyzyyz() functions prototyped below.
+ */
+
 #endif
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 01/36] syscalls: define goal to not call sys_xyzzy() from within the kernel Dominik Brodowski
@ 2018-03-15 19:04 ` Dominik Brodowski
  2018-03-16 16:58   ` Luis R. Rodriguez
  2018-03-15 19:04 ` [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release() Dominik Brodowski
                   ` (35 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd, Luis R . Rodriguez

All call sites of sys_wait4() set *rusage to NULL. Therefore, there is
no need for the copy_to_user() handling of *rusage, and we can use
kernel_wait4() directly.

Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 kernel/exit.c          | 2 +-
 kernel/pid_namespace.c | 6 +++---
 kernel/umh.c           | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 995453d9fb55..c3c7ac560114 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1691,7 +1691,7 @@ SYSCALL_DEFINE4(wait4, pid_t, upid, int __user *, stat_addr,
  */
 SYSCALL_DEFINE3(waitpid, pid_t, pid, int __user *, stat_addr, int, options)
 {
-	return sys_wait4(pid, stat_addr, options, NULL);
+	return kernel_wait4(pid, stat_addr, options, NULL);
 }
 
 #endif
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 0b53eef7d34b..93b57f026688 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -242,16 +242,16 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 
 	/*
 	 * Reap the EXIT_ZOMBIE children we had before we ignored SIGCHLD.
-	 * sys_wait4() will also block until our children traced from the
+	 * kernel_wait4() will also block until our children traced from the
 	 * parent namespace are detached and become EXIT_DEAD.
 	 */
 	do {
 		clear_thread_flag(TIF_SIGPENDING);
-		rc = sys_wait4(-1, NULL, __WALL, NULL);
+		rc = kernel_wait4(-1, NULL, __WALL, NULL);
 	} while (rc != -ECHILD);
 
 	/*
-	 * sys_wait4() above can't reap the EXIT_DEAD children but we do not
+	 * kernel_wait4() above can't reap the EXIT_DEAD children but we do not
 	 * really care, we could reparent them to the global init. We could
 	 * exit and reap ->child_reaper even if it is not the last thread in
 	 * this pid_ns, free_pid(pid_allocated == 0) calls proc_cleanup_work(),
diff --git a/kernel/umh.c b/kernel/umh.c
index 18e5fa4b0e71..f4b557cadf08 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -135,7 +135,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 		 *
 		 * Thus the __user pointer cast is valid here.
 		 */
-		sys_wait4(pid, (int __user *)&ret, 0, NULL);
+		kernel_wait4(pid, (int __user *)&ret, 0, NULL);
 
 		/*
 		 * If ret is 0, either call_usermodehelper_exec_async failed and
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 01/36] syscalls: define goal to not call sys_xyzzy() from within the kernel Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4() Dominik Brodowski
@ 2018-03-15 19:04 ` Dominik Brodowski
  2018-03-16 11:58   ` Thomas Gleixner
  2018-03-16 18:43   ` Darren Hart
  2018-03-15 19:04 ` [PATCH v2 04/36] kernel: add do_getpgid() helper; remove internal call to sys_getpgid() Dominik Brodowski
                   ` (34 subsequent siblings)
  37 siblings, 2 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro
  Cc: luto, mingo, akpm, arnd, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Darren Hart

sys_futex() is a wrapper to do_futex() which does not modify any
values here:

- uaddr, val and val3 are kept the same

- op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
  Therefore, val2 is always 0.

- as utime is set to NULL, *timeout is NULL

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 include/linux/futex.h | 13 ++++++++++---
 kernel/fork.c         |  4 ++--
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index c0fb9a24bbd2..821ae502d3d8 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -9,9 +9,6 @@ struct inode;
 struct mm_struct;
 struct task_struct;
 
-long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
-	      u32 __user *uaddr2, u32 val2, u32 val3);
-
 extern int
 handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi);
 
@@ -55,6 +52,9 @@ union futex_key {
 
 #ifdef CONFIG_FUTEX
 extern void exit_robust_list(struct task_struct *curr);
+
+long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
+	      u32 __user *uaddr2, u32 val2, u32 val3);
 #ifdef CONFIG_HAVE_FUTEX_CMPXCHG
 #define futex_cmpxchg_enabled 1
 #else
@@ -64,6 +64,13 @@ extern int futex_cmpxchg_enabled;
 static inline void exit_robust_list(struct task_struct *curr)
 {
 }
+
+static inline long do_futex(u32 __user *uaddr, int op, u32 val,
+			    ktime_t *timeout, u32 __user *uaddr2,
+			    u32 val2, u32 val3)
+{
+	return -EINVAL;
+}
 #endif
 
 #ifdef CONFIG_FUTEX_PI
diff --git a/kernel/fork.c b/kernel/fork.c
index e5d9d405ae4e..b1e031aac9db 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1198,8 +1198,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 			 * not set up a proper pointer then tough luck.
 			 */
 			put_user(0, tsk->clear_child_tid);
-			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
-					1, NULL, NULL, 0);
+			do_futex(tsk->clear_child_tid, FUTEX_WAKE,
+					1, NULL, NULL, 0, 0);
 		}
 		tsk->clear_child_tid = NULL;
 	}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 04/36] kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (2 preceding siblings ...)
  2018-03-15 19:04 ` [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release() Dominik Brodowski
@ 2018-03-15 19:04 ` Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 05/36] fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() Dominik Brodowski
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the do_getpgid() helper removes an in-kernel call to the
sys_getpgid() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 kernel/sys.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index f2289de20e19..ebb138b841c8 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1027,7 +1027,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 	return err;
 }
 
-SYSCALL_DEFINE1(getpgid, pid_t, pid)
+static int do_getpgid(pid_t pid)
 {
 	struct task_struct *p;
 	struct pid *grp;
@@ -1055,11 +1055,16 @@ SYSCALL_DEFINE1(getpgid, pid_t, pid)
 	return retval;
 }
 
+SYSCALL_DEFINE1(getpgid, pid_t, pid)
+{
+	return do_getpgid(pid);
+}
+
 #ifdef __ARCH_WANT_SYS_GETPGRP
 
 SYSCALL_DEFINE0(getpgrp)
 {
-	return sys_getpgid(0);
+	return do_getpgid(0);
 }
 
 #endif
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 05/36] fs: add do_readlinkat() helper; remove internal call to sys_readlinkat()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (3 preceding siblings ...)
  2018-03-15 19:04 ` [PATCH v2 04/36] kernel: add do_getpgid() helper; remove internal call to sys_getpgid() Dominik Brodowski
@ 2018-03-15 19:04 ` Dominik Brodowski
  2018-03-15 19:04 ` [PATCH v2 06/36] fs: add do_pipe2() helper; remove internal call to sys_pipe2() Dominik Brodowski
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the do_readlinkat() helper removes an in-kernel call to the
sys_readlinkat() syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/stat.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 873785dae022..f8e6fb2c3657 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -379,8 +379,8 @@ SYSCALL_DEFINE2(newfstat, unsigned int, fd, struct stat __user *, statbuf)
 	return error;
 }
 
-SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
-		char __user *, buf, int, bufsiz)
+static int do_readlinkat(int dfd, const char __user *pathname,
+			 char __user *buf, int bufsiz)
 {
 	struct path path;
 	int error;
@@ -415,10 +415,16 @@ SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
 	return error;
 }
 
+SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
+		char __user *, buf, int, bufsiz)
+{
+	return do_readlinkat(dfd, pathname, buf, bufsiz);
+}
+
 SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf,
 		int, bufsiz)
 {
-	return sys_readlinkat(AT_FDCWD, path, buf, bufsiz);
+	return do_readlinkat(AT_FDCWD, path, buf, bufsiz);
 }
 
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 06/36] fs: add do_pipe2() helper; remove internal call to sys_pipe2()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (4 preceding siblings ...)
  2018-03-15 19:04 ` [PATCH v2 05/36] fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() Dominik Brodowski
@ 2018-03-15 19:04 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 07/36] fs: add do_renameat2() helper; remove internal call to sys_renameat2() Dominik Brodowski
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:04 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper removes an in-kernel call to the sys_pipe2() syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/pipe.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7b1954caf388..39d6f431da83 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -841,7 +841,7 @@ int do_pipe_flags(int *fd, int flags)
  * sys_pipe() is the normal C calling standard for creating
  * a pipe. It's not the way Unix traditionally does this, though.
  */
-SYSCALL_DEFINE2(pipe2, int __user *, fildes, int, flags)
+static int do_pipe2(int __user *fildes, int flags)
 {
 	struct file *files[2];
 	int fd[2];
@@ -863,9 +863,14 @@ SYSCALL_DEFINE2(pipe2, int __user *, fildes, int, flags)
 	return error;
 }
 
+SYSCALL_DEFINE2(pipe2, int __user *, fildes, int, flags)
+{
+	return do_pipe2(fildes, flags);
+}
+
 SYSCALL_DEFINE1(pipe, int __user *, fildes)
 {
-	return sys_pipe2(fildes, 0);
+	return do_pipe2(fildes, 0);
 }
 
 static int wait_for_partner(struct pipe_inode_info *pipe, unsigned int *cnt)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 07/36] fs: add do_renameat2() helper; remove internal call to sys_renameat2()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (5 preceding siblings ...)
  2018-03-15 19:04 ` [PATCH v2 06/36] fs: add do_pipe2() helper; remove internal call to sys_pipe2() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 08/36] fs: add do_futimesat() helper; remove internal call to sys_futimesat() Dominik Brodowski
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper removes in-kernel calls to the sys_renameat2() syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/namei.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 921ae32dbc80..524e829ffc7d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4478,8 +4478,8 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 }
 EXPORT_SYMBOL(vfs_rename);
 
-SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
-		int, newdfd, const char __user *, newname, unsigned int, flags)
+static int do_renameat2(int olddfd, const char __user *oldname, int newdfd,
+			const char __user *newname, unsigned int flags)
 {
 	struct dentry *old_dentry, *new_dentry;
 	struct dentry *trap;
@@ -4621,15 +4621,21 @@ SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
 	return error;
 }
 
+SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
+		int, newdfd, const char __user *, newname, unsigned int, flags)
+{
+	return do_renameat2(olddfd, oldname, newdfd, newname, flags);
+}
+
 SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		int, newdfd, const char __user *, newname)
 {
-	return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
+	return do_renameat2(olddfd, oldname, newdfd, newname, 0);
 }
 
 SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
 {
-	return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
+	return do_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
 }
 
 int vfs_whiteout(struct inode *dir, struct dentry *dentry)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 08/36] fs: add do_futimesat() helper; remove internal call to sys_futimesat()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (6 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 07/36] fs: add do_renameat2() helper; remove internal call to sys_renameat2() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 09/36] syscalls: add do_epoll_*() helpers; remove internal calls to sys_epoll_*() Dominik Brodowski
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper removes the in-kernel call to the sys_futimesat()
syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/utimes.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/utimes.c b/fs/utimes.c
index e4b3d7c2c9f5..5be035ed26c0 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -184,8 +184,8 @@ SYSCALL_DEFINE4(utimensat, int, dfd, const char __user *, filename,
 	return do_utimes(dfd, filename, utimes ? tstimes : NULL, flags);
 }
 
-SYSCALL_DEFINE3(futimesat, int, dfd, const char __user *, filename,
-		struct timeval __user *, utimes)
+static long do_futimesat(int dfd, const char __user *filename,
+			 struct timeval __user *utimes)
 {
 	struct timeval times[2];
 	struct timespec64 tstimes[2];
@@ -212,10 +212,17 @@ SYSCALL_DEFINE3(futimesat, int, dfd, const char __user *, filename,
 	return do_utimes(dfd, filename, utimes ? tstimes : NULL, 0);
 }
 
+
+SYSCALL_DEFINE3(futimesat, int, dfd, const char __user *, filename,
+		struct timeval __user *, utimes)
+{
+	return do_futimesat(dfd, filename, utimes);
+}
+
 SYSCALL_DEFINE2(utimes, char __user *, filename,
 		struct timeval __user *, utimes)
 {
-	return sys_futimesat(AT_FDCWD, filename, utimes);
+	return do_futimesat(AT_FDCWD, filename, utimes);
 }
 
 #ifdef CONFIG_COMPAT
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 09/36] syscalls: add do_epoll_*() helpers; remove internal calls to sys_epoll_*()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (7 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 08/36] fs: add do_futimesat() helper; remove internal call to sys_futimesat() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 10/36] fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4() Dominik Brodowski
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the helper functions do_epoll_create() and do_epoll_wait() allows us
to remove in-kernel calls to the related syscall functions.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/eventpoll.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0f3494ed3ed0..602ca4285b2e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1936,7 +1936,7 @@ static void clear_tfile_check_list(void)
 /*
  * Open an eventpoll file descriptor.
  */
-SYSCALL_DEFINE1(epoll_create1, int, flags)
+static int do_epoll_create(int flags)
 {
 	int error, fd;
 	struct eventpoll *ep = NULL;
@@ -1979,12 +1979,17 @@ SYSCALL_DEFINE1(epoll_create1, int, flags)
 	return error;
 }
 
+SYSCALL_DEFINE1(epoll_create1, int, flags)
+{
+	return do_epoll_create(flags);
+}
+
 SYSCALL_DEFINE1(epoll_create, int, size)
 {
 	if (size <= 0)
 		return -EINVAL;
 
-	return sys_epoll_create1(0);
+	return do_epoll_create(0);
 }
 
 /*
@@ -2148,8 +2153,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
  * Implement the event wait interface for the eventpoll file. It is the kernel
  * part of the user space epoll_wait(2).
  */
-SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
-		int, maxevents, int, timeout)
+static int do_epoll_wait(int epfd, struct epoll_event __user *events,
+			 int maxevents, int timeout)
 {
 	int error;
 	struct fd f;
@@ -2190,6 +2195,12 @@ SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
 	return error;
 }
 
+SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
+		int, maxevents, int, timeout)
+{
+	return do_epoll_wait(epfd, events, maxevents, timeout);
+}
+
 /*
  * Implement the event wait interface for the eventpoll file. It is the kernel
  * part of the user space epoll_pwait(2).
@@ -2214,7 +2225,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 		set_current_blocked(&ksigmask);
 	}
 
-	error = sys_epoll_wait(epfd, events, maxevents, timeout);
+	error = do_epoll_wait(epfd, events, maxevents, timeout);
 
 	/*
 	 * If we changed the signal mask, we need to restore the original one.
@@ -2257,7 +2268,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 		set_current_blocked(&ksigmask);
 	}
 
-	err = sys_epoll_wait(epfd, events, maxevents, timeout);
+	err = do_epoll_wait(epfd, events, maxevents, timeout);
 
 	/*
 	 * If we changed the signal mask, we need to restore the original one.
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 10/36] fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (8 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 09/36] syscalls: add do_epoll_*() helpers; remove internal calls to sys_epoll_*() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 11/36] fs: add do_eventfd() helper; remove internal call to sys_eventfd() Dominik Brodowski
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper removes in-kernel calls to the sys_signalfd4() syscall
function.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/signalfd.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 76bf9cc62074..501c41f3351f 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -256,8 +256,8 @@ static const struct file_operations signalfd_fops = {
 	.llseek		= noop_llseek,
 };
 
-SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
-		size_t, sizemask, int, flags)
+static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask,
+			int flags)
 {
 	sigset_t sigmask;
 	struct signalfd_ctx *ctx;
@@ -310,10 +310,16 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 	return ufd;
 }
 
+SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
+		size_t, sizemask, int, flags)
+{
+	return do_signalfd4(ufd, user_mask, sizemask, flags);
+}
+
 SYSCALL_DEFINE3(signalfd, int, ufd, sigset_t __user *, user_mask,
 		size_t, sizemask)
 {
-	return sys_signalfd4(ufd, user_mask, sizemask, 0);
+	return do_signalfd4(ufd, user_mask, sizemask, 0);
 }
 
 #ifdef CONFIG_COMPAT
@@ -333,7 +339,7 @@ COMPAT_SYSCALL_DEFINE4(signalfd4, int, ufd,
 	if (copy_to_user(ksigmask, &tmp, sizeof(sigset_t)))
 		return -EFAULT;
 
-	return sys_signalfd4(ufd, ksigmask, sizeof(sigset_t), flags);
+	return do_signalfd4(ufd, ksigmask, sizeof(sigset_t), flags);
 }
 
 COMPAT_SYSCALL_DEFINE3(signalfd, int, ufd,
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 11/36] fs: add do_eventfd() helper; remove internal call to sys_eventfd()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (9 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 10/36] fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 12/36] kernel: open-code sys_rt_sigpending() in sys_sigpending() Dominik Brodowski
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper removes an in-kernel call to the sys_eventfd() syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/eventfd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 012f5bd46dfa..08d3bd602f73 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -380,7 +380,7 @@ struct eventfd_ctx *eventfd_ctx_fileget(struct file *file)
 }
 EXPORT_SYMBOL_GPL(eventfd_ctx_fileget);
 
-SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
+static int do_eventfd(unsigned int count, int flags)
 {
 	struct eventfd_ctx *ctx;
 	int fd;
@@ -409,8 +409,13 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 	return fd;
 }
 
+SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
+{
+	return do_eventfd(count, flags);
+}
+
 SYSCALL_DEFINE1(eventfd, unsigned int, count)
 {
-	return sys_eventfd2(count, 0);
+	return do_eventfd(count, 0);
 }
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 12/36] kernel: open-code sys_rt_sigpending() in sys_sigpending()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (10 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 11/36] fs: add do_eventfd() helper; remove internal call to sys_eventfd() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Dominik Brodowski
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

A similar but not fully equivalent code path is already open-coded
three times (in sys_rt_sigpending and in the two compat stubs), so
do it a fourth time here.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 include/linux/syscalls.h |  2 +-
 kernel/signal.c          | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0526286a0314..a63e21e7a3af 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -288,7 +288,7 @@ asmlinkage long sys_capset(cap_user_header_t header,
 				const cap_user_data_t data);
 asmlinkage long sys_personality(unsigned int personality);
 
-asmlinkage long sys_sigpending(old_sigset_t __user *set);
+asmlinkage long sys_sigpending(old_sigset_t __user *uset);
 asmlinkage long sys_sigprocmask(int how, old_sigset_t __user *set,
 				old_sigset_t __user *oset);
 asmlinkage long sys_sigaltstack(const struct sigaltstack __user *uss,
diff --git a/kernel/signal.c b/kernel/signal.c
index c6e4c83dc090..985c61749bcf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3629,11 +3629,20 @@ int __compat_save_altstack(compat_stack_t __user *uss, unsigned long sp)
 
 /**
  *  sys_sigpending - examine pending signals
- *  @set: where mask of pending signal is returned
+ *  @uset: where mask of pending signal is returned
  */
-SYSCALL_DEFINE1(sigpending, old_sigset_t __user *, set)
+SYSCALL_DEFINE1(sigpending, old_sigset_t __user *, uset)
 {
-	return sys_rt_sigpending((sigset_t __user *)set, sizeof(old_sigset_t)); 
+	sigset_t set;
+	int err;
+
+	if (sizeof(old_sigset_t) > sizeof(*uset))
+		return -EINVAL;
+
+	err = do_sigpending(&set);
+	if (!err && copy_to_user(uset, &set, sizeof(old_sigset_t)))
+		err = -EFAULT;
+	return err;
 }
 
 #ifdef CONFIG_COMPAT
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (11 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 12/36] kernel: open-code sys_rt_sigpending() in sys_sigpending() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-16  8:43   ` Christoph Hellwig
  2018-03-16 12:00   ` Thomas Gleixner
  2018-03-15 19:05 ` [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount() Dominik Brodowski
                   ` (24 subsequent siblings)
  37 siblings, 2 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro
  Cc: luto, mingo, akpm, arnd, Thomas Gleixner, Ingo Molnar, Jiri Slaby, x86

Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
syscall.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: x86@kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/x86/include/asm/syscalls.h | 1 +
 arch/x86/kernel/ioport.c        | 7 ++++++-
 drivers/tty/vt/vt_ioctl.c       | 6 +++---
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index bad25bb80679..1c0bebbd039e 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -17,6 +17,7 @@
 
 /* Common in X86_32 and X86_64 */
 /* kernel/ioport.c */
+long ksys_ioperm(unsigned long from, unsigned long num, int turn_on);
 asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
 asmlinkage long sys_iopl(unsigned int);
 
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index 38deafebb21b..0fe1c8782208 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -23,7 +23,7 @@
 /*
  * this changes the io permissions bitmap in the current task.
  */
-SYSCALL_DEFINE3(ioperm, unsigned long, from, unsigned long, num, int, turn_on)
+long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 {
 	struct thread_struct *t = &current->thread;
 	struct tss_struct *tss;
@@ -96,6 +96,11 @@ SYSCALL_DEFINE3(ioperm, unsigned long, from, unsigned long, num, int, turn_on)
 	return 0;
 }
 
+SYSCALL_DEFINE3(ioperm, unsigned long, from, unsigned long, num, int, turn_on)
+{
+	return ksys_ioperm(from, num, turn_on);
+}
+
 /*
  * sys_iopl has to be used when you want to access the IO ports
  * beyond the 0x3ff range: to get the full 65536 ports bitmapped
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index d61be307256a..a78ad10a119b 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -57,7 +57,7 @@ extern struct tty_driver *console_driver;
  */
 
 #ifdef CONFIG_X86
-#include <linux/syscalls.h>
+#include <asm/syscalls.h>
 #endif
 
 static void complete_change_console(struct vc_data *vc);
@@ -420,12 +420,12 @@ int vt_ioctl(struct tty_struct *tty,
 			ret = -EINVAL;
 			break;
 		}
-		ret = sys_ioperm(arg, 1, (cmd == KDADDIO)) ? -ENXIO : 0;
+		ret = ksys_ioperm(arg, 1, (cmd == KDADDIO)) ? -ENXIO : 0;
 		break;
 
 	case KDENABIO:
 	case KDDISABIO:
-		ret = sys_ioperm(GPFIRST, GPNUM,
+		ret = ksys_ioperm(GPFIRST, GPNUM,
 				  (cmd == KDENABIO)) ? -ENXIO : 0;
 		break;
 #endif
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (12 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 20:11   ` Arnd Bergmann
  2018-03-15 19:05 ` [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount() Dominik Brodowski
                   ` (23 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper allows us to avoid the in-kernel calls to the sys_mount()
syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/base/devtmpfs.c  |  5 +++--
 fs/namespace.c           | 10 ++++++++--
 include/linux/syscalls.h |  3 +++
 init/do_mounts.c         |  4 ++--
 init/do_mounts_initrd.c  |  6 +++---
 5 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 50025d7959cb..4afb04686c8e 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -356,7 +356,8 @@ int devtmpfs_mount(const char *mntdir)
 	if (!thread)
 		return 0;
 
-	err = sys_mount("devtmpfs", (char *)mntdir, "devtmpfs", MS_SILENT, NULL);
+	err = ksys_mount("devtmpfs", (char *)mntdir, "devtmpfs", MS_SILENT,
+			 NULL);
 	if (err)
 		printk(KERN_INFO "devtmpfs: error mounting %i\n", err);
 	else
@@ -382,7 +383,7 @@ static int devtmpfsd(void *p)
 	*err = sys_unshare(CLONE_NEWNS);
 	if (*err)
 		goto out;
-	*err = sys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
+	*err = ksys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
 	if (*err)
 		goto out;
 	sys_chdir("/.."); /* will traverse into overmounted root */
diff --git a/fs/namespace.c b/fs/namespace.c
index 9d1374ab6e06..642b8b229944 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3032,8 +3032,8 @@ struct dentry *mount_subtree(struct vfsmount *mnt, const char *name)
 }
 EXPORT_SYMBOL(mount_subtree);
 
-SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
-		char __user *, type, unsigned long, flags, void __user *, data)
+int ksys_mount(char __user *dev_name, char __user *dir_name, char __user *type,
+	       unsigned long flags, void __user *data)
 {
 	int ret;
 	char *kernel_type;
@@ -3066,6 +3066,12 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 	return ret;
 }
 
+SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
+		char __user *, type, unsigned long, flags, void __user *, data)
+{
+	return ksys_mount(dev_name, dir_name, type, flags, data);
+}
+
 /*
  * Return true if path is reachable from root
  *
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a63e21e7a3af..69899ffa03e9 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -948,4 +948,7 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
  * the ksys_xyzyyz() functions prototyped below.
  */
 
+int ksys_mount(char __user *dev_name, char __user *dir_name, char __user *type,
+	       unsigned long flags, void __user *data);
+
 #endif
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 7cf4f6dafd5f..eb768de43d84 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -363,7 +363,7 @@ static void __init get_fs_names(char *page)
 static int __init do_mount_root(char *name, char *fs, int flags, void *data)
 {
 	struct super_block *s;
-	int err = sys_mount(name, "/root", fs, flags, data);
+	int err = ksys_mount(name, "/root", fs, flags, data);
 	if (err)
 		return err;
 
@@ -599,7 +599,7 @@ void __init prepare_namespace(void)
 	mount_root();
 out:
 	devtmpfs_mount("dev");
-	sys_mount(".", "/", NULL, MS_MOVE, NULL);
+	ksys_mount(".", "/", NULL, MS_MOVE, NULL);
 	sys_chroot(".");
 }
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 53d4f0f326e7..7868a6039fb4 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -43,7 +43,7 @@ static int init_linuxrc(struct subprocess_info *info, struct cred *new)
 	sys_dup(0);
 	/* move initrd over / and chdir/chroot in initrd root */
 	sys_chdir("/root");
-	sys_mount(".", "/", NULL, MS_MOVE, NULL);
+	ksys_mount(".", "/", NULL, MS_MOVE, NULL);
 	sys_chroot(".");
 	sys_setsid();
 	return 0;
@@ -81,7 +81,7 @@ static void __init handle_initrd(void)
 	current->flags &= ~PF_FREEZER_SKIP;
 
 	/* move initrd to rootfs' /old */
-	sys_mount("..", ".", NULL, MS_MOVE, NULL);
+	ksys_mount("..", ".", NULL, MS_MOVE, NULL);
 	/* switch root and cwd back to / of rootfs */
 	sys_chroot("..");
 
@@ -95,7 +95,7 @@ static void __init handle_initrd(void)
 	mount_root();
 
 	printk(KERN_NOTICE "Trying to move old root to /initrd ... ");
-	error = sys_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
+	error = ksys_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
 	if (!error)
 		printk("okay\n");
 	else {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (13 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-16  8:47   ` Christoph Hellwig
  2018-03-15 19:05 ` [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}() Dominik Brodowski
                   ` (22 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper allows us to avoid the in-kernel call to the sys_umount()
syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/namespace.c           | 9 +++++++--
 include/linux/syscalls.h | 1 +
 init/do_mounts_initrd.c  | 2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 642b8b229944..e398f32d7541 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1680,7 +1680,7 @@ static inline bool may_mandlock(void)
  * unixes. Our API is identical to OSF/1 to avoid making a mess of AMD
  */
 
-SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
+int ksys_umount(char __user *name, int flags)
 {
 	struct path path;
 	struct mount *mnt;
@@ -1720,6 +1720,11 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
 	return retval;
 }
 
+SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
+{
+	return ksys_umount(name, flags);
+}
+
 #ifdef __ARCH_WANT_SYS_OLDUMOUNT
 
 /*
@@ -1727,7 +1732,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
  */
 SYSCALL_DEFINE1(oldumount, char __user *, name)
 {
-	return sys_umount(name, 0);
+	return ksys_umount(name, 0);
 }
 
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 69899ffa03e9..929dfc6c2906 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -950,5 +950,6 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 
 int ksys_mount(char __user *dev_name, char __user *dir_name, char __user *type,
 	       unsigned long flags, void __user *data);
+int ksys_umount(char __user *name, int flags);
 
 #endif
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 7868a6039fb4..1c4da8353332 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -105,7 +105,7 @@ static void __init handle_initrd(void)
 		else
 			printk("failed\n");
 		printk(KERN_NOTICE "Unmounting old root\n");
-		sys_umount("/old", MNT_DETACH);
+		ksys_umount("/old", MNT_DETACH);
 		printk(KERN_NOTICE "Trying to free ramdisk memory ... ");
 		if (fd < 0) {
 			error = fd;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (14 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-16  8:48   ` Christoph Hellwig
  2018-03-15 19:05 ` [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot() Dominik Brodowski
                   ` (21 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using ksys_dup() and ksys_dup3() as helper functions allows us to
avoid the in-kernel calls to the sys_dup() and sys_dup3() syscalls.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/file.c                | 16 +++++++++++++---
 include/linux/syscalls.h |  1 +
 init/do_mounts_initrd.c  |  4 ++--
 init/main.c              |  4 ++--
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 42f0db4bd0fb..d304004f0b65 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -870,7 +870,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 	return err;
 }
 
-SYSCALL_DEFINE3(dup3, unsigned int, oldfd, unsigned int, newfd, int, flags)
+static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
 	int err = -EBADF;
 	struct file *file;
@@ -904,6 +904,11 @@ SYSCALL_DEFINE3(dup3, unsigned int, oldfd, unsigned int, newfd, int, flags)
 	return err;
 }
 
+SYSCALL_DEFINE3(dup3, unsigned int, oldfd, unsigned int, newfd, int, flags)
+{
+	return ksys_dup3(oldfd, newfd, flags);
+}
+
 SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
 {
 	if (unlikely(newfd == oldfd)) { /* corner case */
@@ -916,10 +921,10 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
 		rcu_read_unlock();
 		return retval;
 	}
-	return sys_dup3(oldfd, newfd, 0);
+	return ksys_dup3(oldfd, newfd, 0);
 }
 
-SYSCALL_DEFINE1(dup, unsigned int, fildes)
+int ksys_dup(unsigned int fildes)
 {
 	int ret = -EBADF;
 	struct file *file = fget_raw(fildes);
@@ -934,6 +939,11 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
 	return ret;
 }
 
+SYSCALL_DEFINE1(dup, unsigned int, fildes)
+{
+	return ksys_dup(fildes);
+}
+
 int f_dupfd(unsigned int from, struct file *file, unsigned flags)
 {
 	int err;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 929dfc6c2906..73f1889e73a5 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -951,5 +951,6 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 int ksys_mount(char __user *dev_name, char __user *dir_name, char __user *type,
 	       unsigned long flags, void __user *data);
 int ksys_umount(char __user *name, int flags);
+int ksys_dup(unsigned int fildes);
 
 #endif
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 1c4da8353332..e8573e1776f6 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -39,8 +39,8 @@ static int init_linuxrc(struct subprocess_info *info, struct cred *new)
 	sys_unshare(CLONE_FS | CLONE_FILES);
 	/* stdin/stdout/stderr for /linuxrc */
 	sys_open("/dev/console", O_RDWR, 0);
-	sys_dup(0);
-	sys_dup(0);
+	ksys_dup(0);
+	ksys_dup(0);
 	/* move initrd over / and chdir/chroot in initrd root */
 	sys_chdir("/root");
 	ksys_mount(".", "/", NULL, MS_MOVE, NULL);
diff --git a/init/main.c b/init/main.c
index 969eaf140ef0..b8649d1466e1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1077,8 +1077,8 @@ static noinline void __init kernel_init_freeable(void)
 	if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
 		pr_err("Warning: unable to open an initial console.\n");
 
-	(void) sys_dup(0);
-	(void) sys_dup(0);
+	(void) ksys_dup(0);
+	(void) ksys_dup(0);
 	/*
 	 * check if there is an early userspace init.  If yes, let it do all
 	 * the work
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (15 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 20:44   ` Arnd Bergmann
  2018-03-15 19:05 ` [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write() Dominik Brodowski
                   ` (20 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper allows us to avoid the in-kernel calls to the sys_chroot()
syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/base/devtmpfs.c  | 2 +-
 fs/open.c                | 7 ++++++-
 include/linux/syscalls.h | 1 +
 init/do_mounts.c         | 2 +-
 init/do_mounts_initrd.c  | 4 ++--
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 4afb04686c8e..5743f04014ca 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -387,7 +387,7 @@ static int devtmpfsd(void *p)
 	if (*err)
 		goto out;
 	sys_chdir("/.."); /* will traverse into overmounted root */
-	sys_chroot(".");
+	ksys_chroot(".");
 	complete(&setup_done);
 	while (1) {
 		spin_lock(&req_lock);
diff --git a/fs/open.c b/fs/open.c
index 7ea118471dce..7a475e8a2e41 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -479,7 +479,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 	return error;
 }
 
-SYSCALL_DEFINE1(chroot, const char __user *, filename)
+int ksys_chroot(const char __user *filename)
 {
 	struct path path;
 	int error;
@@ -512,6 +512,11 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	return error;
 }
 
+SYSCALL_DEFINE1(chroot, const char __user *, filename)
+{
+	return ksys_chroot(filename);
+}
+
 static int chmod_common(const struct path *path, umode_t mode)
 {
 	struct inode *inode = path->dentry->d_inode;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 73f1889e73a5..13c7bc43b6ef 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -952,5 +952,6 @@ int ksys_mount(char __user *dev_name, char __user *dir_name, char __user *type,
 	       unsigned long flags, void __user *data);
 int ksys_umount(char __user *name, int flags);
 int ksys_dup(unsigned int fildes);
+int ksys_chroot(const char __user *filename);
 
 #endif
diff --git a/init/do_mounts.c b/init/do_mounts.c
index eb768de43d84..2f06f7827b0c 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -600,7 +600,7 @@ void __init prepare_namespace(void)
 out:
 	devtmpfs_mount("dev");
 	ksys_mount(".", "/", NULL, MS_MOVE, NULL);
-	sys_chroot(".");
+	ksys_chroot(".");
 }
 
 static bool is_tmpfs;
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index e8573e1776f6..71293265ac4b 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -44,7 +44,7 @@ static int init_linuxrc(struct subprocess_info *info, struct cred *new)
 	/* move initrd over / and chdir/chroot in initrd root */
 	sys_chdir("/root");
 	ksys_mount(".", "/", NULL, MS_MOVE, NULL);
-	sys_chroot(".");
+	ksys_chroot(".");
 	sys_setsid();
 	return 0;
 }
@@ -83,7 +83,7 @@ static void __init handle_initrd(void)
 	/* move initrd to rootfs' /old */
 	ksys_mount("..", ".", NULL, MS_MOVE, NULL);
 	/* switch root and cwd back to / of rootfs */
-	sys_chroot("..");
+	ksys_chroot("..");
 
 	if (new_decode_dev(real_root_dev) == Root_RAM0) {
 		sys_chdir("/old");
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (16 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-16  8:52   ` Christoph Hellwig
  2018-03-15 19:05 ` [PATCH v2 19/36] kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() Dominik Brodowski
                   ` (19 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd, linux-s390

Using this helper allows us to avoid the in-kernel calls to the sys_write()
syscall.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/s390/kernel/compat_linux.c | 2 +-
 fs/read_write.c                 | 9 +++++++--
 include/linux/syscalls.h        | 1 +
 init/do_mounts_rd.c             | 4 ++--
 init/initramfs.c                | 2 +-
 5 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 79b7a3438d54..5a9cfde5fc28 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -468,7 +468,7 @@ COMPAT_SYSCALL_DEFINE3(s390_write, unsigned int, fd, const char __user *, buf, c
 	if ((compat_ssize_t) count < 0)
 		return -EINVAL; 
 
-	return sys_write(fd, buf, count);
+	return ksys_write(fd, buf, count);
 }
 
 /*
diff --git a/fs/read_write.c b/fs/read_write.c
index f8547b82dfb3..8e8f0b4f52e2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -578,8 +578,7 @@ SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
 	return ret;
 }
 
-SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
-		size_t, count)
+ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
@@ -595,6 +594,12 @@ SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
 	return ret;
 }
 
+SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
+		size_t, count)
+{
+	return ksys_write(fd, buf, count);
+}
+
 SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, buf,
 			size_t, count, loff_t, pos)
 {
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 13c7bc43b6ef..c6aa44b6a0a2 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -953,5 +953,6 @@ int ksys_mount(char __user *dev_name, char __user *dir_name, char __user *type,
 int ksys_umount(char __user *name, int flags);
 int ksys_dup(unsigned int fildes);
 int ksys_chroot(const char __user *filename);
+ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count);
 
 #endif
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 99e0b649fc0e..2d365c398ccc 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -270,7 +270,7 @@ int __init rd_load_image(char *from)
 			printk("Loading disk #%d... ", disk);
 		}
 		sys_read(in_fd, buf, BLOCK_SIZE);
-		sys_write(out_fd, buf, BLOCK_SIZE);
+		ksys_write(out_fd, buf, BLOCK_SIZE);
 #if !defined(CONFIG_S390)
 		if (!(i % 16)) {
 			pr_cont("%c\b", rotator[rotate & 0x3]);
@@ -317,7 +317,7 @@ static long __init compr_fill(void *buf, unsigned long len)
 
 static long __init compr_flush(void *window, unsigned long outcnt)
 {
-	long written = sys_write(crd_outfd, window, outcnt);
+	long written = ksys_write(crd_outfd, window, outcnt);
 	if (written != outcnt) {
 		if (decompress_error == 0)
 			printk(KERN_ERR
diff --git a/init/initramfs.c b/init/initramfs.c
index 7e99a0038942..6f972df15bf2 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -27,7 +27,7 @@ static ssize_t __init xwrite(int fd, const char *p, size_t count)
 
 	/* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */
 	while (count) {
-		ssize_t rv = sys_write(fd, p, count);
+		ssize_t rv = ksys_write(fd, p, count);
 
 		if (rv < 0) {
 			if (rv == -EINTR || rv == -EAGAIN)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 19/36] kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (17 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 20/36] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Dominik Brodowski
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper allows us to avoid the in-kernel calls to the
sys_unshare() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/base/devtmpfs.c  | 2 +-
 include/linux/syscalls.h | 1 +
 init/do_mounts_initrd.c  | 2 +-
 kernel/fork.c            | 7 ++++++-
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 5743f04014ca..d6f37537275c 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -380,7 +380,7 @@ static int devtmpfsd(void *p)
 {
 	char options[] = "mode=0755";
 	int *err = p;
-	*err = sys_unshare(CLONE_NEWNS);
+	*err = ksys_unshare(CLONE_NEWNS);
 	if (*err)
 		goto out;
 	*err = ksys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index c6aa44b6a0a2..2c0fabc7d19d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -954,5 +954,6 @@ int ksys_umount(char __user *name, int flags);
 int ksys_dup(unsigned int fildes);
 int ksys_chroot(const char __user *filename);
 ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count);
+int ksys_unshare(unsigned long unshare_flags);
 
 #endif
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 71293265ac4b..414284dadc64 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -36,7 +36,7 @@ __setup("noinitrd", no_initrd);
 
 static int init_linuxrc(struct subprocess_info *info, struct cred *new)
 {
-	sys_unshare(CLONE_FS | CLONE_FILES);
+	ksys_unshare(CLONE_FS | CLONE_FILES);
 	/* stdin/stdout/stderr for /linuxrc */
 	sys_open("/dev/console", O_RDWR, 0);
 	ksys_dup(0);
diff --git a/kernel/fork.c b/kernel/fork.c
index b1e031aac9db..f71b67dc156d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2354,7 +2354,7 @@ static int unshare_fd(unsigned long unshare_flags, struct files_struct **new_fdp
  * constructed. Here we are modifying the current, active,
  * task_struct.
  */
-SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
+int ksys_unshare(unsigned long unshare_flags)
 {
 	struct fs_struct *fs, *new_fs = NULL;
 	struct files_struct *fd, *new_fd = NULL;
@@ -2470,6 +2470,11 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 	return err;
 }
 
+SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
+{
+	return ksys_unshare(unshare_flags);
+}
+
 /*
  *	Helper to unshare the files of the current task.
  *	We don't want to expose copy_files internals to
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 20/36] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (18 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 19/36] kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 21/36] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd, linux-mm

Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel
calls to the sys_fadvise64_64() syscall.

Some compat stubs called sys_fadvise64(), which then just passed
through the arguments to sys_fadvise64_64(). Get rid of this
indirection, and call ksys_fadvise64_64() directly.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/arm/kernel/sys_arm.c       |  2 +-
 arch/mips/kernel/linux32.c      |  2 +-
 arch/parisc/kernel/sys_parisc.c |  2 +-
 arch/powerpc/kernel/sys_ppc32.c |  4 ++--
 arch/powerpc/kernel/syscalls.c  |  4 ++--
 arch/s390/kernel/compat_linux.c |  5 +++--
 arch/sh/kernel/sys_sh32.c       |  8 ++++----
 arch/sparc/kernel/sys_sparc32.c | 10 +++++-----
 arch/x86/ia32/sys_ia32.c        | 12 ++++++------
 arch/xtensa/kernel/syscall.c    |  2 +-
 include/linux/syscalls.h        |  9 +++++++++
 mm/fadvise.c                    | 10 ++++++++--
 12 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index 3151f5623d0e..bdf7514204ab 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -35,5 +35,5 @@
 asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
 				     loff_t offset, loff_t len)
 {
-	return sys_fadvise64_64(fd, offset, len, advice);
+	return ksys_fadvise64_64(fd, offset, len, advice);
 }
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index b332f6fc1e72..b8a3cf5d5950 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -149,7 +149,7 @@ asmlinkage long sys32_fadvise64_64(int fd, int __pad,
 	unsigned long a4, unsigned long a5,
 	int flags)
 {
-	return sys_fadvise64_64(fd,
+	return ksys_fadvise64_64(fd,
 			merge_64(a2, a3), merge_64(a4, a5),
 			flags);
 }
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 378a754ca186..da1c27ea8e1a 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -352,7 +352,7 @@ asmlinkage long parisc_fadvise64_64(int fd,
 			unsigned int high_off, unsigned int low_off,
 			unsigned int high_len, unsigned int low_len, int advice)
 {
-	return sys_fadvise64_64(fd, (loff_t)high_off << 32 | low_off,
+	return ksys_fadvise64_64(fd, (loff_t)high_off << 32 | low_off,
 			(loff_t)high_len << 32 | low_len, advice);
 }
 
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 15f216d022e2..93df264ab76c 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -113,8 +113,8 @@ asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long h
 long ppc32_fadvise64(int fd, u32 unused, u32 offset_high, u32 offset_low,
 		     size_t len, int advice)
 {
-	return sys_fadvise64(fd, (u64)offset_high << 32 | offset_low, len,
-			     advice);
+	return ksys_fadvise64_64(fd, (u64)offset_high << 32 | offset_low, len,
+				 advice);
 }
 
 asmlinkage long compat_sys_sync_file_range2(int fd, unsigned int flags,
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index a877bf8269fe..ecb981eea74b 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -119,8 +119,8 @@ long ppc64_personality(unsigned long personality)
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
 		      u32 len_high, u32 len_low)
 {
-	return sys_fadvise64(fd, (u64)offset_high << 32 | offset_low,
-			     (u64)len_high << 32 | len_low, advice);
+	return ksys_fadvise64_64(fd, (u64)offset_high << 32 | offset_low,
+				 (u64)len_high << 32 | len_low, advice);
 }
 
 long sys_switch_endian(void)
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 5a9cfde5fc28..357a66934a98 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -483,7 +483,8 @@ COMPAT_SYSCALL_DEFINE5(s390_fadvise64, int, fd, u32, high, u32, low, compat_size
 		advise = POSIX_FADV_DONTNEED;
 	else if (advise == 5)
 		advise = POSIX_FADV_NOREUSE;
-	return sys_fadvise64(fd, (unsigned long)high << 32 | low, len, advise);
+	return ksys_fadvise64_64(fd, (unsigned long)high << 32 | low, len,
+				 advise);
 }
 
 struct fadvise64_64_args {
@@ -503,7 +504,7 @@ COMPAT_SYSCALL_DEFINE1(s390_fadvise64_64, struct fadvise64_64_args __user *, arg
 		a.advice = POSIX_FADV_DONTNEED;
 	else if (a.advice == 5)
 		a.advice = POSIX_FADV_NOREUSE;
-	return sys_fadvise64_64(a.fd, a.offset, a.len, a.advice);
+	return ksys_fadvise64_64(a.fd, a.offset, a.len, a.advice);
 }
 
 COMPAT_SYSCALL_DEFINE6(s390_sync_file_range, int, fd, u32, offhigh, u32, offlow,
diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
index f8dc8bfd4606..4d55318e0899 100644
--- a/arch/sh/kernel/sys_sh32.c
+++ b/arch/sh/kernel/sys_sh32.c
@@ -52,10 +52,10 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
 				u32 len0, u32 len1, int advice)
 {
 #ifdef  __LITTLE_ENDIAN__
-	return sys_fadvise64_64(fd, (u64)offset1 << 32 | offset0,
-				(u64)len1 << 32 | len0,	advice);
+	return ksys_fadvise64_64(fd, (u64)offset1 << 32 | offset0,
+				 (u64)len1 << 32 | len0, advice);
 #else
-	return sys_fadvise64_64(fd, (u64)offset0 << 32 | offset1,
-				(u64)len0 << 32 | len1,	advice);
+	return ksys_fadvise64_64(fd, (u64)offset0 << 32 | offset1,
+				 (u64)len0 << 32 | len1, advice);
 #endif
 }
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 6d964bdefbaa..08261bc15d30 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -225,7 +225,7 @@ long compat_sys_fadvise64(int fd,
 			  unsigned long offlo,
 			  compat_size_t len, int advice)
 {
-	return sys_fadvise64_64(fd, (offhi << 32) | offlo, len, advice);
+	return ksys_fadvise64_64(fd, (offhi << 32) | offlo, len, advice);
 }
 
 long compat_sys_fadvise64_64(int fd,
@@ -233,10 +233,10 @@ long compat_sys_fadvise64_64(int fd,
 			     unsigned long lenhi, unsigned long lenlo,
 			     int advice)
 {
-	return sys_fadvise64_64(fd,
-				(offhi << 32) | offlo,
-				(lenhi << 32) | lenlo,
-				advice);
+	return ksys_fadvise64_64(fd,
+				 (offhi << 32) | offlo,
+				 (lenhi << 32) | lenlo,
+				 advice);
 }
 
 long sys32_sync_file_range(unsigned int fd, unsigned long off_high, unsigned long off_low, unsigned long nb_high, unsigned long nb_low, unsigned int flags)
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index 6512498bbef6..2afd718e7422 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -198,10 +198,10 @@ COMPAT_SYSCALL_DEFINE6(x86_fadvise64_64, int, fd, __u32, offset_low,
 		       __u32, offset_high, __u32, len_low, __u32, len_high,
 		       int, advice)
 {
-	return sys_fadvise64_64(fd,
-			       (((u64)offset_high)<<32) | offset_low,
-			       (((u64)len_high)<<32) | len_low,
-				advice);
+	return ksys_fadvise64_64(fd,
+				 (((u64)offset_high)<<32) | offset_low,
+				 (((u64)len_high)<<32) | len_low,
+				 advice);
 }
 
 COMPAT_SYSCALL_DEFINE4(x86_readahead, int, fd, unsigned int, off_lo,
@@ -222,8 +222,8 @@ COMPAT_SYSCALL_DEFINE6(x86_sync_file_range, int, fd, unsigned int, off_low,
 COMPAT_SYSCALL_DEFINE5(x86_fadvise64, int, fd, unsigned int, offset_lo,
 		       unsigned int, offset_hi, size_t, len, int, advice)
 {
-	return sys_fadvise64_64(fd, ((u64)offset_hi << 32) | offset_lo,
-				len, advice);
+	return ksys_fadvise64_64(fd, ((u64)offset_hi << 32) | offset_lo,
+				 len, advice);
 }
 
 COMPAT_SYSCALL_DEFINE6(x86_fallocate, int, fd, int, mode,
diff --git a/arch/xtensa/kernel/syscall.c b/arch/xtensa/kernel/syscall.c
index 74afbf02d07e..8201748da05b 100644
--- a/arch/xtensa/kernel/syscall.c
+++ b/arch/xtensa/kernel/syscall.c
@@ -55,7 +55,7 @@ asmlinkage long xtensa_shmat(int shmid, char __user *shmaddr, int shmflg)
 asmlinkage long xtensa_fadvise64_64(int fd, int advice,
 		unsigned long long offset, unsigned long long len)
 {
-	return sys_fadvise64_64(fd, offset, len, advice);
+	return ksys_fadvise64_64(fd, offset, len, advice);
 }
 
 #ifdef CONFIG_MMU
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2c0fabc7d19d..863ca7d6face 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -955,5 +955,14 @@ int ksys_dup(unsigned int fildes);
 int ksys_chroot(const char __user *filename);
 ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count);
 int ksys_unshare(unsigned long unshare_flags);
+#ifdef CONFIG_ADVISE_SYSCALLS
+int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice);
+#else
+static inline int ksys_fadvise64_64(int fd, loff_t offset, loff_t len,
+				    int advice)
+{
+	return -EINVAL;
+}
+#endif
 
 #endif
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 767887f5f3bf..afa41491d324 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -26,7 +26,8 @@
  * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could
  * deactivate the pages and clear PG_Referenced.
  */
-SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
+
+int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
 {
 	struct fd f = fdget(fd);
 	struct inode *inode;
@@ -185,11 +186,16 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
 	return ret;
 }
 
+SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
+{
+	return ksys_fadvise64_64(fd, offset, len, advice);
+}
+
 #ifdef __ARCH_WANT_SYS_FADVISE64
 
 SYSCALL_DEFINE4(fadvise64, int, fd, loff_t, offset, size_t, len, int, advice)
 {
-	return sys_fadvise64_64(fd, offset, len, advice);
+	return ksys_fadvise64_64(fd, offset, len, advice);
 }
 
 #endif
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 21/36] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (19 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 20/36] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 20:54   ` Arnd Bergmann
  2018-03-15 19:05 ` [PATCH v2 22/36] fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir() Dominik Brodowski
                   ` (16 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd, linux-mm

Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/alpha/kernel/osf_sys.c             |  2 +-
 arch/arm64/kernel/sys.c                 |  2 +-
 arch/ia64/kernel/sys_ia64.c             |  4 ++--
 arch/m68k/kernel/sys_m68k.c             |  2 +-
 arch/microblaze/kernel/sys_microblaze.c |  6 +++---
 arch/mips/kernel/linux32.c              |  4 ++--
 arch/mips/kernel/syscall.c              |  6 ++++--
 arch/parisc/kernel/sys_parisc.c         |  6 +++---
 arch/powerpc/kernel/syscalls.c          |  2 +-
 arch/riscv/kernel/sys_riscv.c           |  4 ++--
 arch/s390/kernel/compat_linux.c         |  6 +++---
 arch/s390/kernel/sys_s390.c             |  2 +-
 arch/sh/kernel/sys_sh.c                 |  4 ++--
 arch/sparc/kernel/sys_sparc_32.c        |  6 +++---
 arch/sparc/kernel/sys_sparc_64.c        |  2 +-
 arch/um/kernel/syscall.c                |  2 +-
 arch/x86/ia32/sys_ia32.c                |  2 +-
 arch/x86/kernel/sys_x86_64.c            |  2 +-
 include/linux/syscalls.h                |  3 +++
 mm/mmap.c                               | 17 ++++++++++++-----
 mm/nommu.c                              | 17 ++++++++++++-----
 21 files changed, 60 insertions(+), 41 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index fa1a392ca9a2..89faa6f4de47 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -189,7 +189,7 @@ SYSCALL_DEFINE6(osf_mmap, unsigned long, addr, unsigned long, len,
 		goto out;
 	if (off & ~PAGE_MASK)
 		goto out;
-	ret = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+	ret = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
  out:
 	return ret;
 }
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
index 26fe8ea93ea2..72981bae10eb 100644
--- a/arch/arm64/kernel/sys.c
+++ b/arch/arm64/kernel/sys.c
@@ -34,7 +34,7 @@ asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
 	if (offset_in_page(off) != 0)
 		return -EINVAL;
 
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
 }
 
 SYSCALL_DEFINE1(arm64_personality, unsigned int, personality)
diff --git a/arch/ia64/kernel/sys_ia64.c b/arch/ia64/kernel/sys_ia64.c
index 085adfcc74a4..9ebe1d633abc 100644
--- a/arch/ia64/kernel/sys_ia64.c
+++ b/arch/ia64/kernel/sys_ia64.c
@@ -139,7 +139,7 @@ int ia64_mmap_check(unsigned long addr, unsigned long len,
 asmlinkage unsigned long
 sys_mmap2 (unsigned long addr, unsigned long len, int prot, int flags, int fd, long pgoff)
 {
-	addr = sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+	addr = ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
 	if (!IS_ERR((void *) addr))
 		force_successful_syscall_return();
 	return addr;
@@ -151,7 +151,7 @@ sys_mmap (unsigned long addr, unsigned long len, int prot, int flags, int fd, lo
 	if (offset_in_page(off) != 0)
 		return -EINVAL;
 
-	addr = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+	addr = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
 	if (!IS_ERR((void *) addr))
 		force_successful_syscall_return();
 	return addr;
diff --git a/arch/m68k/kernel/sys_m68k.c b/arch/m68k/kernel/sys_m68k.c
index 27e10af5153a..6363ec83a290 100644
--- a/arch/m68k/kernel/sys_m68k.c
+++ b/arch/m68k/kernel/sys_m68k.c
@@ -46,7 +46,7 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
 	 * so we need to shift the argument down by 1; m68k mmap64(3)
 	 * (in libc) expects the last argument of mmap2 in 4Kb units.
 	 */
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
 }
 
 /* Convert virtual (user) address VADDR to physical address PADDR */
diff --git a/arch/microblaze/kernel/sys_microblaze.c b/arch/microblaze/kernel/sys_microblaze.c
index f1e1f666ddde..ed9f34da1a2a 100644
--- a/arch/microblaze/kernel/sys_microblaze.c
+++ b/arch/microblaze/kernel/sys_microblaze.c
@@ -40,7 +40,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 	if (pgoff & ~PAGE_MASK)
 		return -EINVAL;
 
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> PAGE_SHIFT);
 }
 
 SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
@@ -50,6 +50,6 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
 	if (pgoff & (~PAGE_MASK >> 12))
 		return -EINVAL;
 
-	return sys_mmap_pgoff(addr, len, prot, flags, fd,
-			      pgoff >> (PAGE_SHIFT - 12));
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       pgoff >> (PAGE_SHIFT - 12));
 }
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index b8a3cf5d5950..0ce4f7240f69 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -67,8 +67,8 @@ SYSCALL_DEFINE6(32_mmap2, unsigned long, addr, unsigned long, len,
 {
 	if (pgoff & (~PAGE_MASK >> 12))
 		return -EINVAL;
-	return sys_mmap_pgoff(addr, len, prot, flags, fd,
-			      pgoff >> (PAGE_SHIFT-12));
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       pgoff >> (PAGE_SHIFT-12));
 }
 
 #define RLIM_INFINITY32 0x7fffffff
diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index 58c6f634b550..69c17b549fd3 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -63,7 +63,8 @@ SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
 {
 	if (offset & ~PAGE_MASK)
 		return -EINVAL;
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       offset >> PAGE_SHIFT);
 }
 
 SYSCALL_DEFINE6(mips_mmap2, unsigned long, addr, unsigned long, len,
@@ -73,7 +74,8 @@ SYSCALL_DEFINE6(mips_mmap2, unsigned long, addr, unsigned long, len,
 	if (pgoff & (~PAGE_MASK >> 12))
 		return -EINVAL;
 
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> (PAGE_SHIFT-12));
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       pgoff >> (PAGE_SHIFT - 12));
 }
 
 save_static_function(sys_fork);
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index da1c27ea8e1a..572feeea834c 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -270,8 +270,8 @@ asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
 {
 	/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
 	   we have. */
-	return sys_mmap_pgoff(addr, len, prot, flags, fd,
-			      pgoff >> (PAGE_SHIFT - 12));
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       pgoff >> (PAGE_SHIFT - 12));
 }
 
 asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
@@ -279,7 +279,7 @@ asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
 		unsigned long offset)
 {
 	if (!(offset & ~PAGE_MASK)) {
-		return sys_mmap_pgoff(addr, len, prot, flags, fd,
+		return ksys_mmap_pgoff(addr, len, prot, flags, fd,
 					offset >> PAGE_SHIFT);
 	} else {
 		return -EINVAL;
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index ecb981eea74b..1ef3b80b62a6 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -57,7 +57,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
 		off >>= shift;
 	}
 
-	ret = sys_mmap_pgoff(addr, len, prot, flags, fd, off);
+	ret = ksys_mmap_pgoff(addr, len, prot, flags, fd, off);
 out:
 	return ret;
 }
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 79c78668258e..f7181ed8aafc 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -24,8 +24,8 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
 {
 	if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
 		return -EINVAL;
-	return sys_mmap_pgoff(addr, len, prot, flags, fd,
-			      offset >> (PAGE_SHIFT - page_shift_offset));
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       offset >> (PAGE_SHIFT - page_shift_offset));
 }
 
 #ifdef CONFIG_64BIT
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 357a66934a98..a47995a5174c 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -442,8 +442,8 @@ COMPAT_SYSCALL_DEFINE1(s390_old_mmap, struct mmap_arg_struct_emu31 __user *, arg
 		return -EFAULT;
 	if (a.offset & ~PAGE_MASK)
 		return -EINVAL;
-	return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
-			      a.offset >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+			       a.offset >> PAGE_SHIFT);
 }
 
 COMPAT_SYSCALL_DEFINE1(s390_mmap2, struct mmap_arg_struct_emu31 __user *, arg)
@@ -452,7 +452,7 @@ COMPAT_SYSCALL_DEFINE1(s390_mmap2, struct mmap_arg_struct_emu31 __user *, arg)
 
 	if (copy_from_user(&a, arg, sizeof(a)))
 		return -EFAULT;
-	return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
+	return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
 }
 
 COMPAT_SYSCALL_DEFINE3(s390_read, unsigned int, fd, char __user *, buf, compat_size_t, count)
diff --git a/arch/s390/kernel/sys_s390.c b/arch/s390/kernel/sys_s390.c
index 0090037ab148..31cefe0c28c0 100644
--- a/arch/s390/kernel/sys_s390.c
+++ b/arch/s390/kernel/sys_s390.c
@@ -53,7 +53,7 @@ SYSCALL_DEFINE1(mmap2, struct s390_mmap_arg_struct __user *, arg)
 
 	if (copy_from_user(&a, arg, sizeof(a)))
 		goto out;
-	error = sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
+	error = ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
 out:
 	return error;
 }
diff --git a/arch/sh/kernel/sys_sh.c b/arch/sh/kernel/sys_sh.c
index 724911c59e7d..f8afc014e084 100644
--- a/arch/sh/kernel/sys_sh.c
+++ b/arch/sh/kernel/sys_sh.c
@@ -35,7 +35,7 @@ asmlinkage int old_mmap(unsigned long addr, unsigned long len,
 {
 	if (off & ~PAGE_MASK)
 		return -EINVAL;
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, off>>PAGE_SHIFT);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, off>>PAGE_SHIFT);
 }
 
 asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
@@ -51,7 +51,7 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
 
 	pgoff >>= PAGE_SHIFT - 12;
 
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
 }
 
 /* sys_cacheflush -- flush (part of) the processor cache.  */
diff --git a/arch/sparc/kernel/sys_sparc_32.c b/arch/sparc/kernel/sys_sparc_32.c
index 990703b7cf4d..d980da4ffd7b 100644
--- a/arch/sparc/kernel/sys_sparc_32.c
+++ b/arch/sparc/kernel/sys_sparc_32.c
@@ -104,8 +104,8 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
 {
 	/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
 	   we have. */
-	return sys_mmap_pgoff(addr, len, prot, flags, fd,
-			      pgoff >> (PAGE_SHIFT - 12));
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+			       pgoff >> (PAGE_SHIFT - 12));
 }
 
 asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
@@ -113,7 +113,7 @@ asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
 	unsigned long off)
 {
 	/* no alignment check? */
-	return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
 }
 
 long sparc_remap_file_pages(unsigned long start, unsigned long size,
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 55416db482ad..ebb84dc8a5a7 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -458,7 +458,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 		goto out;
 	if (off & ~PAGE_MASK)
 		goto out;
-	retval = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+	retval = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
 out:
 	return retval;
 }
diff --git a/arch/um/kernel/syscall.c b/arch/um/kernel/syscall.c
index 6258676bed85..35f7047bdebc 100644
--- a/arch/um/kernel/syscall.c
+++ b/arch/um/kernel/syscall.c
@@ -22,7 +22,7 @@ long old_mmap(unsigned long addr, unsigned long len,
 	if (offset & ~PAGE_MASK)
 		goto out;
 
-	err = sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
+	err = ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
  out:
 	return err;
 }
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index 2afd718e7422..ca28ee07a191 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -164,7 +164,7 @@ COMPAT_SYSCALL_DEFINE1(x86_mmap, struct mmap_arg_struct32 __user *, arg)
 	if (a.offset & ~PAGE_MASK)
 		return -EINVAL;
 
-	return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+	return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
 			       a.offset>>PAGE_SHIFT);
 }
 
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 676774b9bb8d..a3f15ed545b5 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -97,7 +97,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 	if (off & ~PAGE_MASK)
 		goto out;
 
-	error = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+	error = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
 out:
 	return error;
 }
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 863ca7d6face..37b311393bdc 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -964,5 +964,8 @@ static inline int ksys_fadvise64_64(int fd, loff_t offset, loff_t len,
 	return -EINVAL;
 }
 #endif
+unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
+			      unsigned long prot, unsigned long flags,
+			      unsigned long fd, unsigned long pgoff);
 
 #endif
diff --git a/mm/mmap.c b/mm/mmap.c
index 9efdc021ad22..aa0dc8231c0d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1488,9 +1488,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	return addr;
 }
 
-SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
-		unsigned long, prot, unsigned long, flags,
-		unsigned long, fd, unsigned long, pgoff)
+unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
+			      unsigned long prot, unsigned long flags,
+			      unsigned long fd, unsigned long pgoff)
 {
 	struct file *file = NULL;
 	unsigned long retval;
@@ -1537,6 +1537,13 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 	return retval;
 }
 
+SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
+		unsigned long, prot, unsigned long, flags,
+		unsigned long, fd, unsigned long, pgoff)
+{
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+}
+
 #ifdef __ARCH_WANT_SYS_OLD_MMAP
 struct mmap_arg_struct {
 	unsigned long addr;
@@ -1556,8 +1563,8 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
 	if (offset_in_page(a.offset))
 		return -EINVAL;
 
-	return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
-			      a.offset >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+			       a.offset >> PAGE_SHIFT);
 }
 #endif /* __ARCH_WANT_SYS_OLD_MMAP */
 
diff --git a/mm/nommu.c b/mm/nommu.c
index ebb6e618dade..cad329629530 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1423,9 +1423,9 @@ unsigned long do_mmap(struct file *file,
 	return -ENOMEM;
 }
 
-SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
-		unsigned long, prot, unsigned long, flags,
-		unsigned long, fd, unsigned long, pgoff)
+unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
+			      unsigned long prot, unsigned long flags,
+			      unsigned long fd, unsigned long pgoff)
 {
 	struct file *file = NULL;
 	unsigned long retval = -EBADF;
@@ -1447,6 +1447,13 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 	return retval;
 }
 
+SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
+		unsigned long, prot, unsigned long, flags,
+		unsigned long, fd, unsigned long, pgoff)
+{
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+}
+
 #ifdef __ARCH_WANT_SYS_OLD_MMAP
 struct mmap_arg_struct {
 	unsigned long addr;
@@ -1466,8 +1473,8 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
 	if (offset_in_page(a.offset))
 		return -EINVAL;
 
-	return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
-			      a.offset >> PAGE_SHIFT);
+	return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+			       a.offset >> PAGE_SHIFT);
 }
 #endif /* __ARCH_WANT_SYS_OLD_MMAP */
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 22/36] fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (20 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 21/36] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 23/36] fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall Dominik Brodowski
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper allows us to avoid the in-kernel calls to the sys_chdir()
syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/base/devtmpfs.c  | 2 +-
 fs/open.c                | 7 ++++++-
 include/linux/syscalls.h | 1 +
 init/do_mounts.c         | 2 +-
 init/do_mounts_initrd.c  | 8 ++++----
 5 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index d6f37537275c..f7768077e817 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -386,7 +386,7 @@ static int devtmpfsd(void *p)
 	*err = ksys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
 	if (*err)
 		goto out;
-	sys_chdir("/.."); /* will traverse into overmounted root */
+	ksys_chdir("/.."); /* will traverse into overmounted root */
 	ksys_chroot(".");
 	complete(&setup_done);
 	while (1) {
diff --git a/fs/open.c b/fs/open.c
index 7a475e8a2e41..a19b8277c439 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -431,7 +431,7 @@ SYSCALL_DEFINE2(access, const char __user *, filename, int, mode)
 	return sys_faccessat(AT_FDCWD, filename, mode);
 }
 
-SYSCALL_DEFINE1(chdir, const char __user *, filename)
+int ksys_chdir(const char __user *filename)
 {
 	struct path path;
 	int error;
@@ -457,6 +457,11 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 	return error;
 }
 
+SYSCALL_DEFINE1(chdir, const char __user *, filename)
+{
+	return ksys_chdir(filename);
+}
+
 SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 {
 	struct fd f = fdget_raw(fd);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 37b311393bdc..923a4d056137 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -967,5 +967,6 @@ static inline int ksys_fadvise64_64(int fd, loff_t offset, loff_t len,
 unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 			      unsigned long prot, unsigned long flags,
 			      unsigned long fd, unsigned long pgoff);
+int ksys_chdir(const char __user *filename);
 
 #endif
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 2f06f7827b0c..89f18985fa90 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -367,7 +367,7 @@ static int __init do_mount_root(char *name, char *fs, int flags, void *data)
 	if (err)
 		return err;
 
-	sys_chdir("/root");
+	ksys_chdir("/root");
 	s = current->fs->pwd.dentry->d_sb;
 	ROOT_DEV = s->s_dev;
 	printk(KERN_INFO
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 414284dadc64..c19d9070134e 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -42,7 +42,7 @@ static int init_linuxrc(struct subprocess_info *info, struct cred *new)
 	ksys_dup(0);
 	ksys_dup(0);
 	/* move initrd over / and chdir/chroot in initrd root */
-	sys_chdir("/root");
+	ksys_chdir("/root");
 	ksys_mount(".", "/", NULL, MS_MOVE, NULL);
 	ksys_chroot(".");
 	sys_setsid();
@@ -61,7 +61,7 @@ static void __init handle_initrd(void)
 	/* mount initrd on rootfs' /root */
 	mount_block_root("/dev/root.old", root_mountflags & ~MS_RDONLY);
 	sys_mkdir("/old", 0700);
-	sys_chdir("/old");
+	ksys_chdir("/old");
 
 	/* try loading default modules from initrd */
 	load_default_modules();
@@ -86,11 +86,11 @@ static void __init handle_initrd(void)
 	ksys_chroot("..");
 
 	if (new_decode_dev(real_root_dev) == Root_RAM0) {
-		sys_chdir("/old");
+		ksys_chdir("/old");
 		return;
 	}
 
-	sys_chdir("/");
+	ksys_chdir("/");
 	ROOT_DEV = new_decode_dev(real_root_dev);
 	mount_root();
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 23/36] fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (21 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 22/36] fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink() Dominik Brodowski
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this helper allows us to avoid the in-kernel calls to the
sys_sync_file_range() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/mips/kernel/linux32.c      |  2 +-
 arch/parisc/kernel/sys_parisc.c |  2 +-
 arch/powerpc/kernel/sys_ppc32.c |  2 +-
 arch/s390/kernel/compat_linux.c |  2 +-
 arch/sparc/kernel/sys_sparc32.c |  2 +-
 arch/x86/ia32/sys_ia32.c        |  6 +++---
 fs/sync.c                       | 12 +++++++++---
 include/linux/syscalls.h        |  2 ++
 8 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 0ce4f7240f69..db895771ab06 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -139,7 +139,7 @@ asmlinkage long sys32_sync_file_range(int fd, int __pad,
 	unsigned long a4, unsigned long a5,
 	int flags)
 {
-	return sys_sync_file_range(fd,
+	return ksys_sync_file_range(fd,
 			merge_64(a2, a3), merge_64(a4, a5),
 			flags);
 }
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 572feeea834c..0b0e1cfb4bd9 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -360,7 +360,7 @@ asmlinkage long parisc_sync_file_range(int fd,
 			u32 hi_off, u32 lo_off, u32 hi_nbytes, u32 lo_nbytes,
 			unsigned int flags)
 {
-	return sys_sync_file_range(fd, (loff_t)hi_off << 32 | lo_off,
+	return ksys_sync_file_range(fd, (loff_t)hi_off << 32 | lo_off,
 			(loff_t)hi_nbytes << 32 | lo_nbytes, flags);
 }
 
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 93df264ab76c..3871aa9267e6 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -124,5 +124,5 @@ asmlinkage long compat_sys_sync_file_range2(int fd, unsigned int flags,
 	loff_t offset = ((loff_t)offset_hi << 32) | offset_lo;
 	loff_t nbytes = ((loff_t)nbytes_hi << 32) | nbytes_lo;
 
-	return sys_sync_file_range(fd, offset, nbytes, flags);
+	return ksys_sync_file_range(fd, offset, nbytes, flags);
 }
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index a47995a5174c..82e99bf3b00e 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -510,7 +510,7 @@ COMPAT_SYSCALL_DEFINE1(s390_fadvise64_64, struct fadvise64_64_args __user *, arg
 COMPAT_SYSCALL_DEFINE6(s390_sync_file_range, int, fd, u32, offhigh, u32, offlow,
 		       u32, nhigh, u32, nlow, unsigned int, flags)
 {
-	return sys_sync_file_range(fd, ((loff_t)offhigh << 32) + offlow,
+	return ksys_sync_file_range(fd, ((loff_t)offhigh << 32) + offlow,
 				   ((u64)nhigh << 32) + nlow, flags);
 }
 
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 08261bc15d30..c56f43893283 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -241,7 +241,7 @@ long compat_sys_fadvise64_64(int fd,
 
 long sys32_sync_file_range(unsigned int fd, unsigned long off_high, unsigned long off_low, unsigned long nb_high, unsigned long nb_low, unsigned int flags)
 {
-	return sys_sync_file_range(fd,
+	return ksys_sync_file_range(fd,
 				   (off_high << 32) | off_low,
 				   (nb_high << 32) | nb_low,
 				   flags);
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index ca28ee07a191..e5b053252a01 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -214,9 +214,9 @@ COMPAT_SYSCALL_DEFINE6(x86_sync_file_range, int, fd, unsigned int, off_low,
 		       unsigned int, off_hi, unsigned int, n_low,
 		       unsigned int, n_hi, int, flags)
 {
-	return sys_sync_file_range(fd,
-				   ((u64)off_hi << 32) | off_low,
-				   ((u64)n_hi << 32) | n_low, flags);
+	return ksys_sync_file_range(fd,
+				    ((u64)off_hi << 32) | off_low,
+				    ((u64)n_hi << 32) | n_low, flags);
 }
 
 COMPAT_SYSCALL_DEFINE5(x86_fadvise64, int, fd, unsigned int, offset_lo,
diff --git a/fs/sync.c b/fs/sync.c
index 6e0a2cbaf6de..ff947c30a6c0 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -280,8 +280,8 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
  * already-instantiated disk blocks, there are no guarantees here that the data
  * will be available after a crash.
  */
-SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
-				unsigned int, flags)
+int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
+			 unsigned int flags)
 {
 	int ret;
 	struct fd f;
@@ -359,10 +359,16 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 	return ret;
 }
 
+SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
+				unsigned int, flags)
+{
+	return ksys_sync_file_range(fd, offset, nbytes, flags);
+}
+
 /* It would be nice if people remember that not all the world's an i386
    when they introduce new system calls */
 SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
 				 loff_t, offset, loff_t, nbytes)
 {
-	return sys_sync_file_range(fd, offset, nbytes, flags);
+	return ksys_sync_file_range(fd, offset, nbytes, flags);
 }
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 923a4d056137..8f0f99702e7a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -968,5 +968,7 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 			      unsigned long prot, unsigned long flags,
 			      unsigned long fd, unsigned long pgoff);
 int ksys_chdir(const char __user *filename);
+int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
+			 unsigned int flags);
 
 #endif
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (22 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 23/36] fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 20:21   ` Arnd Bergmann
  2018-03-15 19:05 ` [PATCH v2 25/36] hostfs: rename do_rmdir() to hostfs_do_rmdir() Dominik Brodowski
                   ` (13 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this wrapper allows us to avoid the in-kernel calls to the
sys_unlink() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 include/linux/syscalls.h | 11 +++++++++++
 init/do_mounts.h         |  2 +-
 init/do_mounts_initrd.c  |  4 ++--
 init/do_mounts_rd.c      |  2 +-
 init/initramfs.c         |  4 ++--
 5 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 8f0f99702e7a..31aea3873de7 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -971,4 +971,15 @@ int ksys_chdir(const char __user *filename);
 int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 			 unsigned int flags);
 
+/*
+ * The following kernel syscall equivalents are just wrappers to fs-internal
+ * functions. Therefore, provide stubs to be inlined at the callsites.
+ */
+extern long do_unlinkat(int dfd, struct filename *name);
+
+static inline long ksys_unlink(const char __user *pathname)
+{
+	return do_unlinkat(AT_FDCWD, getname(pathname));
+}
+
 #endif
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 5b05c8f93f47..401f90ee1eeb 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -16,7 +16,7 @@ extern int root_mountflags;
 
 static inline int create_dev(char *name, dev_t dev)
 {
-	sys_unlink(name);
+	ksys_unlink(name);
 	return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev));
 }
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index c19d9070134e..784576b633fd 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -128,11 +128,11 @@ bool __init initrd_load(void)
 		 * mounted in the normal path.
 		 */
 		if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
-			sys_unlink("/initrd.image");
+			ksys_unlink("/initrd.image");
 			handle_initrd();
 			return true;
 		}
 	}
-	sys_unlink("/initrd.image");
+	ksys_unlink("/initrd.image");
 	return false;
 }
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 2d365c398ccc..5b69056f610a 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -288,7 +288,7 @@ int __init rd_load_image(char *from)
 	sys_close(out_fd);
 out:
 	kfree(buf);
-	sys_unlink("/dev/ram");
+	ksys_unlink("/dev/ram");
 	return res;
 }
 
diff --git a/init/initramfs.c b/init/initramfs.c
index 6f972df15bf2..08eb551168a8 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -319,7 +319,7 @@ static void __init clean_path(char *path, umode_t fmode)
 		if (S_ISDIR(st.mode))
 			sys_rmdir(path);
 		else
-			sys_unlink(path);
+			ksys_unlink(path);
 	}
 }
 
@@ -591,7 +591,7 @@ static void __init clean_rootfs(void)
 				if (S_ISDIR(st.mode))
 					sys_rmdir(dirp->d_name);
 				else
-					sys_unlink(dirp->d_name);
+					ksys_unlink(dirp->d_name);
 			}
 
 			num -= dirp->d_reclen;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 25/36] hostfs: rename do_rmdir() to hostfs_do_rmdir()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (23 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 26/36] fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir() Dominik Brodowski
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro
  Cc: luto, mingo, akpm, arnd, Jeff Dike, user-mode-linux-devel

do_rmdir() is used in the VFS layer at fs/namei.c, so use a different
name in hostfs.

Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/hostfs/hostfs.h      | 2 +-
 fs/hostfs/hostfs_kern.c | 2 +-
 fs/hostfs/hostfs_user.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/hostfs/hostfs.h b/fs/hostfs/hostfs.h
index ffaec2e7526c..cb8374af08a6 100644
--- a/fs/hostfs/hostfs.h
+++ b/fs/hostfs/hostfs.h
@@ -84,7 +84,7 @@ extern int set_attr(const char *file, struct hostfs_iattr *attrs, int fd);
 extern int make_symlink(const char *from, const char *to);
 extern int unlink_file(const char *file);
 extern int do_mkdir(const char *file, int mode);
-extern int do_rmdir(const char *file);
+extern int hostfs_do_rmdir(const char *file);
 extern int do_mknod(const char *file, int mode, unsigned int major,
 		    unsigned int minor);
 extern int link_file(const char *from, const char *to);
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index c148e7f4f451..3cd85eb5bbb1 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -706,7 +706,7 @@ static int hostfs_rmdir(struct inode *ino, struct dentry *dentry)
 
 	if ((file = dentry_name(dentry)) == NULL)
 		return -ENOMEM;
-	err = do_rmdir(file);
+	err = hostfs_do_rmdir(file);
 	__putname(file);
 	return err;
 }
diff --git a/fs/hostfs/hostfs_user.c b/fs/hostfs/hostfs_user.c
index 9c1e0f019880..5ecc4706172b 100644
--- a/fs/hostfs/hostfs_user.c
+++ b/fs/hostfs/hostfs_user.c
@@ -304,7 +304,7 @@ int do_mkdir(const char *file, int mode)
 	return 0;
 }
 
-int do_rmdir(const char *file)
+int hostfs_do_rmdir(const char *file)
 {
 	int err;
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 26/36] fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (24 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 25/36] hostfs: rename do_rmdir() to hostfs_do_rmdir() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 27/36] fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall Dominik Brodowski
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this wrapper allows us to avoid the in-kernel calls to the
sys_rmdir() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            | 1 +
 fs/namei.c               | 2 +-
 include/linux/syscalls.h | 7 +++++++
 init/initramfs.c         | 4 ++--
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index df262f41a0ef..0eda35fa1743 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -55,6 +55,7 @@ extern void __init chrdev_init(void);
 extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
+long do_rmdir(int dfd, const char __user *pathname);
 long do_unlinkat(int dfd, struct filename *name);
 
 /*
diff --git a/fs/namei.c b/fs/namei.c
index 524e829ffc7d..8545151f74e9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3872,7 +3872,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 }
 EXPORT_SYMBOL(vfs_rmdir);
 
-static long do_rmdir(int dfd, const char __user *pathname)
+long do_rmdir(int dfd, const char __user *pathname)
 {
 	int error = 0;
 	struct filename *name;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 31aea3873de7..48fa98840b4e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -982,4 +982,11 @@ static inline long ksys_unlink(const char __user *pathname)
 	return do_unlinkat(AT_FDCWD, getname(pathname));
 }
 
+extern long do_rmdir(int dfd, const char __user *pathname);
+
+static inline long ksys_rmdir(const char __user *pathname)
+{
+	return do_rmdir(AT_FDCWD, pathname);
+}
+
 #endif
diff --git a/init/initramfs.c b/init/initramfs.c
index 08eb551168a8..73bbb227f868 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -317,7 +317,7 @@ static void __init clean_path(char *path, umode_t fmode)
 
 	if (!vfs_lstat(path, &st) && (st.mode ^ fmode) & S_IFMT) {
 		if (S_ISDIR(st.mode))
-			sys_rmdir(path);
+			ksys_rmdir(path);
 		else
 			ksys_unlink(path);
 	}
@@ -589,7 +589,7 @@ static void __init clean_rootfs(void)
 			WARN_ON_ONCE(ret);
 			if (!ret) {
 				if (S_ISDIR(st.mode))
-					sys_rmdir(dirp->d_name);
+					ksys_rmdir(dirp->d_name);
 				else
 					ksys_unlink(dirp->d_name);
 			}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 27/36] fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (25 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 26/36] fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 28/36] fs: add do_symlinkat() helper and ksys_symlink() " Dominik Brodowski
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-internal do_mkdirat() helper allows us to get rid of
fs-internal calls to the sys_mkdirat() syscall.

Introducing the ksys_mkdir() wrapper allows us to avoid the in-kernel
calls to the sys_mkdir() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            | 1 +
 fs/namei.c               | 9 +++++++--
 include/linux/syscalls.h | 7 +++++++
 init/do_mounts_initrd.c  | 2 +-
 init/initramfs.c         | 2 +-
 init/noinitramfs.c       | 4 ++--
 6 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 0eda35fa1743..53846bd4d9d7 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -55,6 +55,7 @@ extern void __init chrdev_init(void);
 extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
+long do_mkdirat(int dfd, const char __user *pathname, umode_t mode);
 long do_rmdir(int dfd, const char __user *pathname);
 long do_unlinkat(int dfd, struct filename *name);
 
diff --git a/fs/namei.c b/fs/namei.c
index 8545151f74e9..dcf506227509 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3803,7 +3803,7 @@ int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 }
 EXPORT_SYMBOL(vfs_mkdir);
 
-SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
+long do_mkdirat(int dfd, const char __user *pathname, umode_t mode)
 {
 	struct dentry *dentry;
 	struct path path;
@@ -3828,9 +3828,14 @@ SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
 	return error;
 }
 
+SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
+{
+	return do_mkdirat(dfd, pathname, mode);
+}
+
 SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
 {
-	return sys_mkdirat(AT_FDCWD, pathname, mode);
+	return do_mkdirat(AT_FDCWD, pathname, mode);
 }
 
 int vfs_rmdir(struct inode *dir, struct dentry *dentry)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 48fa98840b4e..2abd968bbdfd 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -989,4 +989,11 @@ static inline long ksys_rmdir(const char __user *pathname)
 	return do_rmdir(AT_FDCWD, pathname);
 }
 
+extern long do_mkdirat(int dfd, const char __user *pathname, umode_t mode);
+
+static inline long ksys_mkdir(const char __user *pathname, umode_t mode)
+{
+	return do_mkdirat(AT_FDCWD, pathname, mode);
+}
+
 #endif
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 784576b633fd..d30db6bbf014 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -60,7 +60,7 @@ static void __init handle_initrd(void)
 	create_dev("/dev/root.old", Root_RAM0);
 	/* mount initrd on rootfs' /root */
 	mount_block_root("/dev/root.old", root_mountflags & ~MS_RDONLY);
-	sys_mkdir("/old", 0700);
+	ksys_mkdir("/old", 0700);
 	ksys_chdir("/old");
 
 	/* try loading default modules from initrd */
diff --git a/init/initramfs.c b/init/initramfs.c
index 73bbb227f868..ca538a5f9fa9 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -352,7 +352,7 @@ static int __init do_name(void)
 			}
 		}
 	} else if (S_ISDIR(mode)) {
-		sys_mkdir(collected, mode);
+		ksys_mkdir(collected, mode);
 		sys_chown(collected, uid, gid);
 		sys_chmod(collected, mode);
 		dir_add(collected, mtime);
diff --git a/init/noinitramfs.c b/init/noinitramfs.c
index 267739d85179..a08a9d937e60 100644
--- a/init/noinitramfs.c
+++ b/init/noinitramfs.c
@@ -29,7 +29,7 @@ static int __init default_rootfs(void)
 {
 	int err;
 
-	err = sys_mkdir((const char __user __force *) "/dev", 0755);
+	err = ksys_mkdir((const char __user __force *) "/dev", 0755);
 	if (err < 0)
 		goto out;
 
@@ -39,7 +39,7 @@ static int __init default_rootfs(void)
 	if (err < 0)
 		goto out;
 
-	err = sys_mkdir((const char __user __force *) "/root", 0700);
+	err = ksys_mkdir((const char __user __force *) "/root", 0700);
 	if (err < 0)
 		goto out;
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 28/36] fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (26 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 27/36] fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 29/36] fs: add do_mknodat() helper and ksys_mknod() " Dominik Brodowski
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-internal do_symlinkat() helper allows us to get rid of
internal calls to the sys_symlinkat() syscall.

Introducing the ksys_symlink() wrapper allows us to avoid the in-kernel
calls to the sys_symlink() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            |  2 ++
 fs/namei.c               | 12 +++++++++---
 include/linux/syscalls.h |  9 +++++++++
 init/initramfs.c         |  2 +-
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 53846bd4d9d7..a3f04ca2a08b 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -58,6 +58,8 @@ extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 long do_mkdirat(int dfd, const char __user *pathname, umode_t mode);
 long do_rmdir(int dfd, const char __user *pathname);
 long do_unlinkat(int dfd, struct filename *name);
+long do_symlinkat(const char __user *oldname, int newdfd,
+		  const char __user *newname);
 
 /*
  * namespace.c
diff --git a/fs/namei.c b/fs/namei.c
index dcf506227509..e15da92209d5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4113,8 +4113,8 @@ int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
 }
 EXPORT_SYMBOL(vfs_symlink);
 
-SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
-		int, newdfd, const char __user *, newname)
+long do_symlinkat(const char __user *oldname, int newdfd,
+		  const char __user *newname)
 {
 	int error;
 	struct filename *from;
@@ -4144,9 +4144,15 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
 	return error;
 }
 
+SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
+		int, newdfd, const char __user *, newname)
+{
+	return do_symlinkat(oldname, newdfd, newname);
+}
+
 SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newname)
 {
-	return sys_symlinkat(oldname, AT_FDCWD, newname);
+	return do_symlinkat(oldname, AT_FDCWD, newname);
 }
 
 /**
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2abd968bbdfd..882c7fad2b68 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -996,4 +996,13 @@ static inline long ksys_mkdir(const char __user *pathname, umode_t mode)
 	return do_mkdirat(AT_FDCWD, pathname, mode);
 }
 
+extern long do_symlinkat(const char __user *oldname, int newdfd,
+			 const char __user *newname);
+
+static inline long ksys_symlink(const char __user *oldname,
+				const char __user *newname)
+{
+	return do_symlinkat(oldname, AT_FDCWD, newname);
+}
+
 #endif
diff --git a/init/initramfs.c b/init/initramfs.c
index ca538a5f9fa9..cd9571a113b6 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -392,7 +392,7 @@ static int __init do_symlink(void)
 {
 	collected[N_ALIGN(name_len) + body_len] = '\0';
 	clean_path(collected, 0);
-	sys_symlink(collected + N_ALIGN(name_len), collected);
+	ksys_symlink(collected + N_ALIGN(name_len), collected);
 	sys_lchown(collected, uid, gid);
 	do_utime(collected, mtime);
 	state = SkipIt;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 29/36] fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (27 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 28/36] fs: add do_symlinkat() helper and ksys_symlink() " Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() " Dominik Brodowski
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-internal do_mknodat() helper allows us to get rid of
fs-internal calls to the sys_mknodat() syscall.

Introducing the ksys_mknod() wrapper allows us to avoid the in-kernel
calls to sys_mknod() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            |  2 ++
 fs/namei.c               | 12 +++++++++---
 include/linux/syscalls.h |  9 +++++++++
 init/do_mounts.h         |  2 +-
 init/initramfs.c         |  2 +-
 init/noinitramfs.c       |  2 +-
 6 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index a3f04ca2a08b..4f0b67054c54 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -55,6 +55,8 @@ extern void __init chrdev_init(void);
 extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
+long do_mknodat(int dfd, const char __user *filename, umode_t mode,
+		unsigned int dev);
 long do_mkdirat(int dfd, const char __user *pathname, umode_t mode);
 long do_rmdir(int dfd, const char __user *pathname);
 long do_unlinkat(int dfd, struct filename *name);
diff --git a/fs/namei.c b/fs/namei.c
index e15da92209d5..8459a18cdd18 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3728,8 +3728,8 @@ static int may_mknod(umode_t mode)
 	}
 }
 
-SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
-		unsigned, dev)
+long do_mknodat(int dfd, const char __user *filename, umode_t mode,
+		unsigned int dev)
 {
 	struct dentry *dentry;
 	struct path path;
@@ -3772,9 +3772,15 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
 	return error;
 }
 
+SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
+		unsigned int, dev)
+{
+	return do_mknodat(dfd, filename, mode, dev);
+}
+
 SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, dev)
 {
-	return sys_mknodat(AT_FDCWD, filename, mode, dev);
+	return do_mknodat(AT_FDCWD, filename, mode, dev);
 }
 
 int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 882c7fad2b68..a003ffedff52 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1005,4 +1005,13 @@ static inline long ksys_symlink(const char __user *oldname,
 	return do_symlinkat(oldname, AT_FDCWD, newname);
 }
 
+extern long do_mknodat(int dfd, const char __user *filename, umode_t mode,
+		       unsigned int dev);
+
+static inline long ksys_mknod(const char __user *filename, umode_t mode,
+			      unsigned int dev)
+{
+	return do_mknodat(AT_FDCWD, filename, mode, dev);
+}
+
 #endif
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 401f90ee1eeb..0bb0806de4ce 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -17,7 +17,7 @@ extern int root_mountflags;
 static inline int create_dev(char *name, dev_t dev)
 {
 	ksys_unlink(name);
-	return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev));
+	return ksys_mknod(name, S_IFBLK|0600, new_encode_dev(dev));
 }
 
 static inline u32 bstat(char *name)
diff --git a/init/initramfs.c b/init/initramfs.c
index cd9571a113b6..2972ed0ab399 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -359,7 +359,7 @@ static int __init do_name(void)
 	} else if (S_ISBLK(mode) || S_ISCHR(mode) ||
 		   S_ISFIFO(mode) || S_ISSOCK(mode)) {
 		if (maybe_link() == 0) {
-			sys_mknod(collected, mode, rdev);
+			ksys_mknod(collected, mode, rdev);
 			sys_chown(collected, uid, gid);
 			sys_chmod(collected, mode);
 			do_utime(collected, mtime);
diff --git a/init/noinitramfs.c b/init/noinitramfs.c
index a08a9d937e60..f4bad8436c93 100644
--- a/init/noinitramfs.c
+++ b/init/noinitramfs.c
@@ -33,7 +33,7 @@ static int __init default_rootfs(void)
 	if (err < 0)
 		goto out;
 
-	err = sys_mknod((const char __user __force *) "/dev/console",
+	err = ksys_mknod((const char __user __force *) "/dev/console",
 			S_IFCHR | S_IRUSR | S_IWUSR,
 			new_encode_dev(MKDEV(5, 1)));
 	if (err < 0)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (28 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 29/36] fs: add do_mknodat() helper and ksys_mknod() " Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 20:30   ` Arnd Bergmann
  2018-03-15 19:05 ` [PATCH v2 31/36] fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() " Dominik Brodowski
                   ` (7 subsequent siblings)
  37 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-internal do_linkat() helper allows us to get rid of
fs-internal calls to the sys_linkat() syscall.

Introducing the ksys_link() wrapper allows us to avoid the in-kernel
calls to sys_link() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            |  2 ++
 fs/namei.c               | 12 +++++++++---
 include/linux/syscalls.h |  9 +++++++++
 init/initramfs.c         |  2 +-
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 4f0b67054c54..91e6fc93fcb5 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -62,6 +62,8 @@ long do_rmdir(int dfd, const char __user *pathname);
 long do_unlinkat(int dfd, struct filename *name);
 long do_symlinkat(const char __user *oldname, int newdfd,
 		  const char __user *newname);
+int do_linkat(int olddfd, const char __user *oldname, int newdfd,
+	      const char __user *newname, int flags);
 
 /*
  * namespace.c
diff --git a/fs/namei.c b/fs/namei.c
index 8459a18cdd18..10148235829f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4250,8 +4250,8 @@ EXPORT_SYMBOL(vfs_link);
  * with linux 2.0, and to avoid hard-linking to directories
  * and other special files.  --ADM
  */
-SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
-		int, newdfd, const char __user *, newname, int, flags)
+int do_linkat(int olddfd, const char __user *oldname, int newdfd,
+	      const char __user *newname, int flags)
 {
 	struct dentry *new_dentry;
 	struct path old_path, new_path;
@@ -4315,9 +4315,15 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	return error;
 }
 
+SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
+		int, newdfd, const char __user *, newname, int, flags)
+{
+	return do_linkat(olddfd, oldname, newdfd, newname, flags);
+}
+
 SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname)
 {
-	return sys_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
+	return do_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
 }
 
 /**
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a003ffedff52..eb8745869833 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1014,4 +1014,13 @@ static inline long ksys_mknod(const char __user *filename, umode_t mode,
 	return do_mknodat(AT_FDCWD, filename, mode, dev);
 }
 
+extern int do_linkat(int olddfd, const char __user *oldname, int newdfd,
+		     const char __user *newname, int flags);
+
+static inline long ksys_link(const char __user *oldname,
+			     const char __user *newname)
+{
+	return do_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
+}
+
 #endif
diff --git a/init/initramfs.c b/init/initramfs.c
index 2972ed0ab399..5855ab632b4e 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -306,7 +306,7 @@ static int __init maybe_link(void)
 	if (nlink >= 2) {
 		char *old = find_link(major, minor, ino, mode, collected);
 		if (old)
-			return (sys_link(old, collected) < 0) ? -1 : 1;
+			return (ksys_link(old, collected) < 0) ? -1 : 1;
 	}
 	return 0;
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 31/36] fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (29 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() " Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 32/36] fs: add do_faccessat() helper and ksys_access() " Dominik Brodowski
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-internal do_fchmodat() helper allows us to get rid of
fs-internal calls to the sys_fchmodat() syscall.

Introducing the ksys_fchmod() helper and the ksys_chmod() wrapper allows
us to avoid the in-kernel calls to the sys_fchmod() and sys_chmod()
syscalls.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            |  2 ++
 fs/open.c                | 17 ++++++++++++++---
 include/linux/syscalls.h |  8 ++++++++
 init/initramfs.c         |  6 +++---
 4 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 91e6fc93fcb5..2474bf460f96 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -119,6 +119,8 @@ extern struct file *do_filp_open(int dfd, struct filename *pathname,
 extern struct file *do_file_open_root(struct dentry *, struct vfsmount *,
 		const char *, const struct open_flags *);
 
+int do_fchmodat(int dfd, const char __user *filename, umode_t mode);
+
 extern int open_check_o_direct(struct file *f);
 extern int vfs_open(const struct path *, struct file *, const struct cred *);
 extern struct file *filp_clone_open(struct file *);
diff --git a/fs/open.c b/fs/open.c
index a19b8277c439..6037f2bf418c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -551,7 +551,7 @@ static int chmod_common(const struct path *path, umode_t mode)
 	return error;
 }
 
-SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
+int ksys_fchmod(unsigned int fd, umode_t mode)
 {
 	struct fd f = fdget(fd);
 	int err = -EBADF;
@@ -564,7 +564,12 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 	return err;
 }
 
-SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode)
+SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
+{
+	return ksys_fchmod(fd, mode);
+}
+
+int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
 {
 	struct path path;
 	int error;
@@ -582,9 +587,15 @@ SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode
 	return error;
 }
 
+SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename,
+		umode_t, mode)
+{
+	return do_fchmodat(dfd, filename, mode);
+}
+
 SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
 {
-	return sys_fchmodat(AT_FDCWD, filename, mode);
+	return do_fchmodat(AT_FDCWD, filename, mode);
 }
 
 static int chown_common(const struct path *path, uid_t user, gid_t group)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index eb8745869833..f2b4858464e2 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -970,6 +970,7 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 int ksys_chdir(const char __user *filename);
 int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 			 unsigned int flags);
+int ksys_fchmod(unsigned int fd, umode_t mode);
 
 /*
  * The following kernel syscall equivalents are just wrappers to fs-internal
@@ -1023,4 +1024,11 @@ static inline long ksys_link(const char __user *oldname,
 	return do_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
 }
 
+extern int do_fchmodat(int dfd, const char __user *filename, umode_t mode);
+
+static inline int ksys_chmod(const char __user *filename, umode_t mode)
+{
+	return do_fchmodat(AT_FDCWD, filename, mode);
+}
+
 #endif
diff --git a/init/initramfs.c b/init/initramfs.c
index 5855ab632b4e..16c3c23076e2 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -344,7 +344,7 @@ static int __init do_name(void)
 
 			if (wfd >= 0) {
 				sys_fchown(wfd, uid, gid);
-				sys_fchmod(wfd, mode);
+				ksys_fchmod(wfd, mode);
 				if (body_len)
 					sys_ftruncate(wfd, body_len);
 				vcollected = kstrdup(collected, GFP_KERNEL);
@@ -354,14 +354,14 @@ static int __init do_name(void)
 	} else if (S_ISDIR(mode)) {
 		ksys_mkdir(collected, mode);
 		sys_chown(collected, uid, gid);
-		sys_chmod(collected, mode);
+		ksys_chmod(collected, mode);
 		dir_add(collected, mtime);
 	} else if (S_ISBLK(mode) || S_ISCHR(mode) ||
 		   S_ISFIFO(mode) || S_ISSOCK(mode)) {
 		if (maybe_link() == 0) {
 			ksys_mknod(collected, mode, rdev);
 			sys_chown(collected, uid, gid);
-			sys_chmod(collected, mode);
+			ksys_chmod(collected, mode);
 			do_utime(collected, mtime);
 		}
 	}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 32/36] fs: add do_faccessat() helper and ksys_access() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (30 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 31/36] fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() " Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 33/36] fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() Dominik Brodowski
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-internal do_faccessat() helper allows us to get rid of
fs-internal calls to the sys_faccessat() syscall.

Introducing the ksys_access() wrapper allows us to avoid the in-kernel
calls to the sys_access() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/internal.h            | 1 +
 fs/open.c                | 9 +++++++--
 include/linux/syscalls.h | 7 +++++++
 init/main.c              | 3 ++-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 2474bf460f96..26f4f05b52ef 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -119,6 +119,7 @@ extern struct file *do_filp_open(int dfd, struct filename *pathname,
 extern struct file *do_file_open_root(struct dentry *, struct vfsmount *,
 		const char *, const struct open_flags *);
 
+long do_faccessat(int dfd, const char __user *filename, int mode);
 int do_fchmodat(int dfd, const char __user *filename, umode_t mode);
 
 extern int open_check_o_direct(struct file *f);
diff --git a/fs/open.c b/fs/open.c
index 6037f2bf418c..0fc8188be31a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -350,7 +350,7 @@ SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
  * We do this by temporarily clearing all FS-related capabilities and
  * switching the fsuid/fsgid around to the real ones.
  */
-SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
+long do_faccessat(int dfd, const char __user *filename, int mode)
 {
 	const struct cred *old_cred;
 	struct cred *override_cred;
@@ -426,9 +426,14 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 	return res;
 }
 
+SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
+{
+	return do_faccessat(dfd, filename, mode);
+}
+
 SYSCALL_DEFINE2(access, const char __user *, filename, int, mode)
 {
-	return sys_faccessat(AT_FDCWD, filename, mode);
+	return do_faccessat(AT_FDCWD, filename, mode);
 }
 
 int ksys_chdir(const char __user *filename)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f2b4858464e2..c376ff14ce1c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1031,4 +1031,11 @@ static inline int ksys_chmod(const char __user *filename, umode_t mode)
 	return do_fchmodat(AT_FDCWD, filename, mode);
 }
 
+extern long do_faccessat(int dfd, const char __user *filename, int mode);
+
+static inline long ksys_access(const char __user *filename, int mode)
+{
+	return do_faccessat(AT_FDCWD, filename, mode);
+}
+
 #endif
diff --git a/init/main.c b/init/main.c
index b8649d1466e1..d0ded4322c6b 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1087,7 +1087,8 @@ static noinline void __init kernel_init_freeable(void)
 	if (!ramdisk_execute_command)
 		ramdisk_execute_command = "/init";
 
-	if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
+	if (ksys_access((const char __user *)
+			ramdisk_execute_command, 0) != 0) {
 		ramdisk_execute_command = NULL;
 		prepare_namespace();
 	}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 33/36] fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (31 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 32/36] fs: add do_faccessat() helper and ksys_access() " Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 34/36] fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers Dominik Brodowski
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel
calls to the sys_ftruncate() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/mips/kernel/linux32.c      | 2 +-
 arch/parisc/kernel/sys_parisc.c | 4 ++--
 arch/powerpc/kernel/sys_ppc32.c | 2 +-
 arch/s390/kernel/compat_linux.c | 2 +-
 arch/sparc/kernel/sys_sparc32.c | 2 +-
 arch/x86/ia32/sys_ia32.c        | 2 +-
 fs/internal.h                   | 1 +
 fs/open.c                       | 2 +-
 include/linux/syscalls.h        | 7 +++++++
 init/initramfs.c                | 2 +-
 10 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index db895771ab06..740eee40c668 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -88,7 +88,7 @@ SYSCALL_DEFINE4(32_truncate64, const char __user *, path,
 SYSCALL_DEFINE4(32_ftruncate64, unsigned long, fd, unsigned long, __dummy,
 	unsigned long, a2, unsigned long, a3)
 {
-	return sys_ftruncate(fd, merge_64(a2, a3));
+	return ksys_ftruncate(fd, merge_64(a2, a3));
 }
 
 SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, offset_high,
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 0b0e1cfb4bd9..59b315d6d194 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -298,7 +298,7 @@ asmlinkage long parisc_truncate64(const char __user * path,
 asmlinkage long parisc_ftruncate64(unsigned int fd,
 					unsigned int high, unsigned int low)
 {
-	return sys_ftruncate(fd, (long)high << 32 | low);
+	return ksys_ftruncate(fd, (long)high << 32 | low);
 }
 
 /* stubs for the benefit of the syscall_table since truncate64 and truncate 
@@ -309,7 +309,7 @@ asmlinkage long sys_truncate64(const char __user * path, unsigned long length)
 }
 asmlinkage long sys_ftruncate64(unsigned int fd, unsigned long length)
 {
-	return sys_ftruncate(fd, length);
+	return ksys_ftruncate(fd, length);
 }
 asmlinkage long sys_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg)
 {
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 3871aa9267e6..f41cb34c84c8 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -107,7 +107,7 @@ asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high,
 				 unsigned long low)
 {
-	return sys_ftruncate(fd, (high << 32) | low);
+	return ksys_ftruncate(fd, (high << 32) | low);
 }
 
 long ppc32_fadvise64(int fd, u32 unused, u32 offset_high, u32 offset_low,
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 82e99bf3b00e..572349852b75 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -307,7 +307,7 @@ COMPAT_SYSCALL_DEFINE3(s390_truncate64, const char __user *, path, u32, high, u3
 
 COMPAT_SYSCALL_DEFINE3(s390_ftruncate64, unsigned int, fd, u32, high, u32, low)
 {
-	return sys_ftruncate(fd, (unsigned long)high << 32 | low);
+	return ksys_ftruncate(fd, (unsigned long)high << 32 | low);
 }
 
 COMPAT_SYSCALL_DEFINE5(s390_pread64, unsigned int, fd, char __user *, ubuf,
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index c56f43893283..f8d357540748 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -65,7 +65,7 @@ asmlinkage long sys32_ftruncate64(unsigned int fd, unsigned long high, unsigned
 	if ((int)high < 0)
 		return -EINVAL;
 	else
-		return sys_ftruncate(fd, (high << 32) | low);
+		return ksys_ftruncate(fd, (high << 32) | low);
 }
 
 static int cp_compat_stat64(struct kstat *stat,
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index e5b053252a01..9f5c25093e7a 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -60,7 +60,7 @@ COMPAT_SYSCALL_DEFINE3(x86_truncate64, const char __user *, filename,
 COMPAT_SYSCALL_DEFINE3(x86_ftruncate64, unsigned int, fd,
 		       unsigned long, offset_low, unsigned long, offset_high)
 {
-       return sys_ftruncate(fd, ((loff_t) offset_high << 32) | offset_low);
+	return ksys_ftruncate(fd, ((loff_t) offset_high << 32) | offset_low);
 }
 
 /*
diff --git a/fs/internal.h b/fs/internal.h
index 26f4f05b52ef..49e0bf51576c 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -119,6 +119,7 @@ extern struct file *do_filp_open(int dfd, struct filename *pathname,
 extern struct file *do_file_open_root(struct dentry *, struct vfsmount *,
 		const char *, const struct open_flags *);
 
+long do_sys_ftruncate(unsigned int fd, loff_t length, int small);
 long do_faccessat(int dfd, const char __user *filename, int mode);
 int do_fchmodat(int dfd, const char __user *filename, umode_t mode);
 
diff --git a/fs/open.c b/fs/open.c
index 0fc8188be31a..77a4494f605d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -162,7 +162,7 @@ COMPAT_SYSCALL_DEFINE2(truncate, const char __user *, path, compat_off_t, length
 }
 #endif
 
-static long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
+long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 {
 	struct inode *inode;
 	struct dentry *dentry;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index c376ff14ce1c..1b453770cee8 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1038,4 +1038,11 @@ static inline long ksys_access(const char __user *filename, int mode)
 	return do_faccessat(AT_FDCWD, filename, mode);
 }
 
+extern long do_sys_ftruncate(unsigned int fd, loff_t length, int small);
+
+static inline long ksys_ftruncate(unsigned int fd, unsigned long length)
+{
+	return do_sys_ftruncate(fd, length, 1);
+}
+
 #endif
diff --git a/init/initramfs.c b/init/initramfs.c
index 16c3c23076e2..237a975738ba 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -346,7 +346,7 @@ static int __init do_name(void)
 				sys_fchown(wfd, uid, gid);
 				ksys_fchmod(wfd, mode);
 				if (body_len)
-					sys_ftruncate(wfd, body_len);
+					ksys_ftruncate(wfd, body_len);
 				vcollected = kstrdup(collected, GFP_KERNEL);
 				state = CopyFile;
 			}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 34/36] fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (32 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 33/36] fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 35/36] fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() Dominik Brodowski
                   ` (3 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the fs-interal do_fchownat() wrapper allows us to get rid of
fs-internal calls to the sys_fchownat() syscall.

Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers
allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 arch/s390/kernel/compat_linux.c |  6 +++---
 fs/internal.h                   |  2 ++
 fs/open.c                       | 23 +++++++++++++++++------
 include/linux/syscalls.h        | 17 +++++++++++++++++
 init/initramfs.c                |  8 ++++----
 kernel/uid16.c                  |  6 +++---
 6 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 572349852b75..a1fa8051fe63 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -89,18 +89,18 @@
 COMPAT_SYSCALL_DEFINE3(s390_chown16, const char __user *, filename,
 		       u16, user, u16, group)
 {
-	return sys_chown(filename, low2highuid(user), low2highgid(group));
+	return ksys_chown(filename, low2highuid(user), low2highgid(group));
 }
 
 COMPAT_SYSCALL_DEFINE3(s390_lchown16, const char __user *,
 		       filename, u16, user, u16, group)
 {
-	return sys_lchown(filename, low2highuid(user), low2highgid(group));
+	return ksys_lchown(filename, low2highuid(user), low2highgid(group));
 }
 
 COMPAT_SYSCALL_DEFINE3(s390_fchown16, unsigned int, fd, u16, user, u16, group)
 {
-	return sys_fchown(fd, low2highuid(user), low2highgid(group));
+	return ksys_fchown(fd, low2highuid(user), low2highgid(group));
 }
 
 COMPAT_SYSCALL_DEFINE2(s390_setregid16, u16, rgid, u16, egid)
diff --git a/fs/internal.h b/fs/internal.h
index 49e0bf51576c..980d005b21b4 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -122,6 +122,8 @@ extern struct file *do_file_open_root(struct dentry *, struct vfsmount *,
 long do_sys_ftruncate(unsigned int fd, loff_t length, int small);
 long do_faccessat(int dfd, const char __user *filename, int mode);
 int do_fchmodat(int dfd, const char __user *filename, umode_t mode);
+int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
+		int flag);
 
 extern int open_check_o_direct(struct file *f);
 extern int vfs_open(const struct path *, struct file *, const struct cred *);
diff --git a/fs/open.c b/fs/open.c
index 77a4494f605d..b3f3b2cd9f19 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -645,8 +645,8 @@ static int chown_common(const struct path *path, uid_t user, gid_t group)
 	return error;
 }
 
-SYSCALL_DEFINE5(fchownat, int, dfd, const char __user *, filename, uid_t, user,
-		gid_t, group, int, flag)
+int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
+		int flag)
 {
 	struct path path;
 	int error = -EINVAL;
@@ -677,18 +677,24 @@ SYSCALL_DEFINE5(fchownat, int, dfd, const char __user *, filename, uid_t, user,
 	return error;
 }
 
+SYSCALL_DEFINE5(fchownat, int, dfd, const char __user *, filename, uid_t, user,
+		gid_t, group, int, flag)
+{
+	return do_fchownat(dfd, filename, user, group, flag);
+}
+
 SYSCALL_DEFINE3(chown, const char __user *, filename, uid_t, user, gid_t, group)
 {
-	return sys_fchownat(AT_FDCWD, filename, user, group, 0);
+	return do_fchownat(AT_FDCWD, filename, user, group, 0);
 }
 
 SYSCALL_DEFINE3(lchown, const char __user *, filename, uid_t, user, gid_t, group)
 {
-	return sys_fchownat(AT_FDCWD, filename, user, group,
-			    AT_SYMLINK_NOFOLLOW);
+	return do_fchownat(AT_FDCWD, filename, user, group,
+			   AT_SYMLINK_NOFOLLOW);
 }
 
-SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
+int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
 {
 	struct fd f = fdget(fd);
 	int error = -EBADF;
@@ -708,6 +714,11 @@ SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
 	return error;
 }
 
+SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
+{
+	return ksys_fchown(fd, user, group);
+}
+
 int open_check_o_direct(struct file *f)
 {
 	/* NB: we're sure to have correct a_ops only after f_op->open */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1b453770cee8..fe9bac39f3c2 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -971,6 +971,7 @@ int ksys_chdir(const char __user *filename);
 int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 			 unsigned int flags);
 int ksys_fchmod(unsigned int fd, umode_t mode);
+int ksys_fchown(unsigned int fd, uid_t user, gid_t group);
 
 /*
  * The following kernel syscall equivalents are just wrappers to fs-internal
@@ -1045,4 +1046,20 @@ static inline long ksys_ftruncate(unsigned int fd, unsigned long length)
 	return do_sys_ftruncate(fd, length, 1);
 }
 
+extern int do_fchownat(int dfd, const char __user *filename, uid_t user,
+		       gid_t group, int flag);
+
+static inline long ksys_chown(const char __user *filename, uid_t user,
+			      gid_t group)
+{
+	return do_fchownat(AT_FDCWD, filename, user, group, 0);
+}
+
+static inline long ksys_lchown(const char __user *filename, uid_t user,
+			       gid_t group)
+{
+	return do_fchownat(AT_FDCWD, filename, user, group,
+			     AT_SYMLINK_NOFOLLOW);
+}
+
 #endif
diff --git a/init/initramfs.c b/init/initramfs.c
index 237a975738ba..0d3b001b0dc5 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -343,7 +343,7 @@ static int __init do_name(void)
 			wfd = sys_open(collected, openflags, mode);
 
 			if (wfd >= 0) {
-				sys_fchown(wfd, uid, gid);
+				ksys_fchown(wfd, uid, gid);
 				ksys_fchmod(wfd, mode);
 				if (body_len)
 					ksys_ftruncate(wfd, body_len);
@@ -353,14 +353,14 @@ static int __init do_name(void)
 		}
 	} else if (S_ISDIR(mode)) {
 		ksys_mkdir(collected, mode);
-		sys_chown(collected, uid, gid);
+		ksys_chown(collected, uid, gid);
 		ksys_chmod(collected, mode);
 		dir_add(collected, mtime);
 	} else if (S_ISBLK(mode) || S_ISCHR(mode) ||
 		   S_ISFIFO(mode) || S_ISSOCK(mode)) {
 		if (maybe_link() == 0) {
 			ksys_mknod(collected, mode, rdev);
-			sys_chown(collected, uid, gid);
+			ksys_chown(collected, uid, gid);
 			ksys_chmod(collected, mode);
 			do_utime(collected, mtime);
 		}
@@ -393,7 +393,7 @@ static int __init do_symlink(void)
 	collected[N_ALIGN(name_len) + body_len] = '\0';
 	clean_path(collected, 0);
 	ksys_symlink(collected + N_ALIGN(name_len), collected);
-	sys_lchown(collected, uid, gid);
+	ksys_lchown(collected, uid, gid);
 	do_utime(collected, mtime);
 	state = SkipIt;
 	next_state = Reset;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index ef1da2a5f9bd..ea3cf87ff000 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -20,17 +20,17 @@
 
 SYSCALL_DEFINE3(chown16, const char __user *, filename, old_uid_t, user, old_gid_t, group)
 {
-	return sys_chown(filename, low2highuid(user), low2highgid(group));
+	return ksys_chown(filename, low2highuid(user), low2highgid(group));
 }
 
 SYSCALL_DEFINE3(lchown16, const char __user *, filename, old_uid_t, user, old_gid_t, group)
 {
-	return sys_lchown(filename, low2highuid(user), low2highgid(group));
+	return ksys_lchown(filename, low2highuid(user), low2highgid(group));
 }
 
 SYSCALL_DEFINE3(fchown16, unsigned int, fd, old_uid_t, user, old_gid_t, group)
 {
-	return sys_fchown(fd, low2highuid(user), low2highgid(group));
+	return ksys_fchown(fd, low2highuid(user), low2highgid(group));
 }
 
 SYSCALL_DEFINE2(setregid16, old_gid_t, rgid, old_gid_t, egid)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 35/36] fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (33 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 34/36] fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 19:05 ` [PATCH v2 36/36] fs: add ksys_open() wrapper; remove in-kernel calls to sys_open() Dominik Brodowski
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using the ksys_close() wrapper allows us to get rid of in-kernel calls
to the sys_close() syscall.

The few places which checked the return value did not care about the return
value re-writing in sys_close(), so simply use a wrapper around
__close_fd().

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/autofs4/dev-ioctl.c   |  2 +-
 fs/binfmt_misc.c         |  2 +-
 fs/file.c                |  1 +
 fs/open.c                |  1 -
 include/linux/syscalls.h | 12 ++++++++++++
 init/do_mounts.c         |  4 ++--
 init/do_mounts_initrd.c  |  2 +-
 init/do_mounts_md.c      |  8 ++++----
 init/do_mounts_rd.c      |  6 +++---
 init/initramfs.c         |  8 ++++----
 10 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index b7c816f39404..26f6b4f41ce6 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -310,7 +310,7 @@ static int autofs_dev_ioctl_closemount(struct file *fp,
 				       struct autofs_sb_info *sbi,
 				       struct autofs_dev_ioctl *param)
 {
-	return sys_close(param->ioctlfd);
+	return ksys_close(param->ioctlfd);
 }
 
 /*
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index a7c5a9861bef..a41b48f82a70 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -241,7 +241,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	return retval;
 error:
 	if (fd_binary > 0)
-		sys_close(fd_binary);
+		ksys_close(fd_binary);
 	bprm->interp_flags = 0;
 	bprm->interp_data = 0;
 	goto ret;
diff --git a/fs/file.c b/fs/file.c
index d304004f0b65..7ffd6e9d103d 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -638,6 +638,7 @@ int __close_fd(struct files_struct *files, unsigned fd)
 	spin_unlock(&files->file_lock);
 	return -EBADF;
 }
+EXPORT_SYMBOL(__close_fd); /* for ksys_close() */
 
 void do_close_on_exec(struct files_struct *files)
 {
diff --git a/fs/open.c b/fs/open.c
index b3f3b2cd9f19..710102fc262b 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1200,7 +1200,6 @@ SYSCALL_DEFINE1(close, unsigned int, fd)
 
 	return retval;
 }
-EXPORT_SYMBOL(sys_close);
 
 /*
  * This routine simulates a hangup on the tty, to arrange that users
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index fe9bac39f3c2..4ff0b01ab277 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1062,4 +1062,16 @@ static inline long ksys_lchown(const char __user *filename, uid_t user,
 			     AT_SYMLINK_NOFOLLOW);
 }
 
+extern int __close_fd(struct files_struct *files, unsigned int fd);
+
+/*
+ * In contrast to sys_close(), this stub does not check whether the syscall
+ * should or should not be restarted, but returns the raw error codes from
+ * __close_fd().
+ */
+static inline int ksys_close(unsigned int fd)
+{
+	return __close_fd(current->files, fd);
+}
+
 #endif
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 89f18985fa90..a28dd42d1f84 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -492,7 +492,7 @@ void __init change_floppy(char *fmt, ...)
 	fd = sys_open("/dev/root", O_RDWR | O_NDELAY, 0);
 	if (fd >= 0) {
 		sys_ioctl(fd, FDEJECT, 0);
-		sys_close(fd);
+		ksys_close(fd);
 	}
 	printk(KERN_NOTICE "VFS: Insert %s and press ENTER\n", buf);
 	fd = sys_open("/dev/console", O_RDWR, 0);
@@ -503,7 +503,7 @@ void __init change_floppy(char *fmt, ...)
 		sys_read(fd, &c, 1);
 		termios.c_lflag |= ICANON;
 		sys_ioctl(fd, TCSETSF, (long)&termios);
-		sys_close(fd);
+		ksys_close(fd);
 	}
 }
 #endif
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index d30db6bbf014..50decd9999b7 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -111,7 +111,7 @@ static void __init handle_initrd(void)
 			error = fd;
 		} else {
 			error = sys_ioctl(fd, BLKFLSBUF, 0);
-			sys_close(fd);
+			ksys_close(fd);
 		}
 		printk(!error ? "okay\n" : "failed\n");
 	}
diff --git a/init/do_mounts_md.c b/init/do_mounts_md.c
index 3f733c760a8c..ebd4013d589e 100644
--- a/init/do_mounts_md.c
+++ b/init/do_mounts_md.c
@@ -191,7 +191,7 @@ static void __init md_setup_drive(void)
 			printk(KERN_WARNING
 			       "md: Ignoring md=%d, already autodetected. (Use raid=noautodetect)\n",
 			       minor);
-			sys_close(fd);
+			ksys_close(fd);
 			continue;
 		}
 
@@ -243,11 +243,11 @@ static void __init md_setup_drive(void)
 			 * boot a kernel with devfs compiled in from partitioned md
 			 * array without it
 			 */
-			sys_close(fd);
+			ksys_close(fd);
 			fd = sys_open(name, 0, 0);
 			sys_ioctl(fd, BLKRRPART, 0);
 		}
-		sys_close(fd);
+		ksys_close(fd);
 	}
 }
 
@@ -297,7 +297,7 @@ static void __init autodetect_raid(void)
 	fd = sys_open("/dev/md0", 0, 0);
 	if (fd >= 0) {
 		sys_ioctl(fd, RAID_AUTORUN, raid_autopart);
-		sys_close(fd);
+		ksys_close(fd);
 	}
 }
 
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 5b69056f610a..f1aa341862d3 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -257,7 +257,7 @@ int __init rd_load_image(char *from)
 		if (i && (i % devblocks == 0)) {
 			printk("done disk #%d.\n", disk++);
 			rotate = 0;
-			if (sys_close(in_fd)) {
+			if (ksys_close(in_fd)) {
 				printk("Error closing the disk.\n");
 				goto noclose_input;
 			}
@@ -283,9 +283,9 @@ int __init rd_load_image(char *from)
 successful_load:
 	res = 1;
 done:
-	sys_close(in_fd);
+	ksys_close(in_fd);
 noclose_input:
-	sys_close(out_fd);
+	ksys_close(out_fd);
 out:
 	kfree(buf);
 	ksys_unlink("/dev/ram");
diff --git a/init/initramfs.c b/init/initramfs.c
index 0d3b001b0dc5..ce2bcad97cdf 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -373,7 +373,7 @@ static int __init do_copy(void)
 	if (byte_count >= body_len) {
 		if (xwrite(wfd, victim, body_len) != body_len)
 			error("write error");
-		sys_close(wfd);
+		ksys_close(wfd);
 		do_utime(vcollected, mtime);
 		kfree(vcollected);
 		eat(body_len);
@@ -574,7 +574,7 @@ static void __init clean_rootfs(void)
 	buf = kzalloc(BUF_SIZE, GFP_KERNEL);
 	WARN_ON(!buf);
 	if (!buf) {
-		sys_close(fd);
+		ksys_close(fd);
 		return;
 	}
 
@@ -602,7 +602,7 @@ static void __init clean_rootfs(void)
 		num = sys_getdents64(fd, dirp, BUF_SIZE);
 	}
 
-	sys_close(fd);
+	ksys_close(fd);
 	kfree(buf);
 }
 #endif
@@ -639,7 +639,7 @@ static int __init populate_rootfs(void)
 				pr_err("/initrd.image: incomplete write (%zd != %ld)\n",
 				       written, initrd_end - initrd_start);
 
-			sys_close(fd);
+			ksys_close(fd);
 			free_initrd();
 		}
 	done:
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 36/36] fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (34 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 35/36] fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() Dominik Brodowski
@ 2018-03-15 19:05 ` Dominik Brodowski
  2018-03-15 21:02 ` [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Arnd Bergmann
  2018-03-16  9:01 ` Zhang, Ning A
  37 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-15 19:05 UTC (permalink / raw)
  To: linux-kernel, torvalds, viro; +Cc: luto, mingo, akpm, arnd

Using this wrapper allows us to avoid the in-kernel calls to the
sys_open() syscall.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 fs/open.c                |  2 +-
 include/linux/syscalls.h | 11 +++++++++++
 init/do_mounts.c         |  4 ++--
 init/do_mounts_initrd.c  |  4 ++--
 init/do_mounts_md.c      |  6 +++---
 init/do_mounts_rd.c      |  6 +++---
 init/initramfs.c         |  6 +++---
 init/main.c              |  2 +-
 8 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 710102fc262b..8a42a2961130 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1151,7 +1151,7 @@ COMPAT_SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, fla
  */
 SYSCALL_DEFINE2(creat, const char __user *, pathname, umode_t, mode)
 {
-	return sys_open(pathname, O_CREAT | O_WRONLY | O_TRUNC, mode);
+	return ksys_open(pathname, O_CREAT | O_WRONLY | O_TRUNC, mode);
 }
 
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 4ff0b01ab277..6976c9e140db 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1074,4 +1074,15 @@ static inline int ksys_close(unsigned int fd)
 	return __close_fd(current->files, fd);
 }
 
+extern long do_sys_open(int dfd, const char __user *filename, int flags,
+			umode_t mode);
+
+static inline long ksys_open(const char __user *filename, int flags,
+			     umode_t mode)
+{
+	if (force_o_largefile())
+		flags |= O_LARGEFILE;
+	return do_sys_open(AT_FDCWD, filename, flags, mode);
+}
+
 #endif
diff --git a/init/do_mounts.c b/init/do_mounts.c
index a28dd42d1f84..cc1103477071 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -489,13 +489,13 @@ void __init change_floppy(char *fmt, ...)
 	va_start(args, fmt);
 	vsprintf(buf, fmt, args);
 	va_end(args);
-	fd = sys_open("/dev/root", O_RDWR | O_NDELAY, 0);
+	fd = ksys_open("/dev/root", O_RDWR | O_NDELAY, 0);
 	if (fd >= 0) {
 		sys_ioctl(fd, FDEJECT, 0);
 		ksys_close(fd);
 	}
 	printk(KERN_NOTICE "VFS: Insert %s and press ENTER\n", buf);
-	fd = sys_open("/dev/console", O_RDWR, 0);
+	fd = ksys_open("/dev/console", O_RDWR, 0);
 	if (fd >= 0) {
 		sys_ioctl(fd, TCGETS, (long)&termios);
 		termios.c_lflag &= ~ICANON;
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 50decd9999b7..1e3b469fb54b 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -38,7 +38,7 @@ static int init_linuxrc(struct subprocess_info *info, struct cred *new)
 {
 	ksys_unshare(CLONE_FS | CLONE_FILES);
 	/* stdin/stdout/stderr for /linuxrc */
-	sys_open("/dev/console", O_RDWR, 0);
+	ksys_open("/dev/console", O_RDWR, 0);
 	ksys_dup(0);
 	ksys_dup(0);
 	/* move initrd over / and chdir/chroot in initrd root */
@@ -99,7 +99,7 @@ static void __init handle_initrd(void)
 	if (!error)
 		printk("okay\n");
 	else {
-		int fd = sys_open("/dev/root.old", O_RDWR, 0);
+		int fd = ksys_open("/dev/root.old", O_RDWR, 0);
 		if (error == -ENOENT)
 			printk("/initrd does not exist. Ignored.\n");
 		else
diff --git a/init/do_mounts_md.c b/init/do_mounts_md.c
index ebd4013d589e..76dcfaada3ed 100644
--- a/init/do_mounts_md.c
+++ b/init/do_mounts_md.c
@@ -181,7 +181,7 @@ static void __init md_setup_drive(void)
 			partitioned ? "_d" : "", minor,
 			md_setup_args[ent].device_names);
 
-		fd = sys_open(name, 0, 0);
+		fd = ksys_open(name, 0, 0);
 		if (fd < 0) {
 			printk(KERN_ERR "md: open failed - cannot start "
 					"array %s\n", name);
@@ -244,7 +244,7 @@ static void __init md_setup_drive(void)
 			 * array without it
 			 */
 			ksys_close(fd);
-			fd = sys_open(name, 0, 0);
+			fd = ksys_open(name, 0, 0);
 			sys_ioctl(fd, BLKRRPART, 0);
 		}
 		ksys_close(fd);
@@ -294,7 +294,7 @@ static void __init autodetect_raid(void)
 
 	wait_for_device_probe();
 
-	fd = sys_open("/dev/md0", 0, 0);
+	fd = ksys_open("/dev/md0", 0, 0);
 	if (fd >= 0) {
 		sys_ioctl(fd, RAID_AUTORUN, raid_autopart);
 		ksys_close(fd);
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index f1aa341862d3..a6706314baa7 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -196,11 +196,11 @@ int __init rd_load_image(char *from)
 	char rotator[4] = { '|' , '/' , '-' , '\\' };
 #endif
 
-	out_fd = sys_open("/dev/ram", O_RDWR, 0);
+	out_fd = ksys_open("/dev/ram", O_RDWR, 0);
 	if (out_fd < 0)
 		goto out;
 
-	in_fd = sys_open(from, O_RDONLY, 0);
+	in_fd = ksys_open(from, O_RDONLY, 0);
 	if (in_fd < 0)
 		goto noclose_input;
 
@@ -262,7 +262,7 @@ int __init rd_load_image(char *from)
 				goto noclose_input;
 			}
 			change_floppy("disk #%d", disk);
-			in_fd = sys_open(from, O_RDONLY, 0);
+			in_fd = ksys_open(from, O_RDONLY, 0);
 			if (in_fd < 0)  {
 				printk("Error opening disk.\n");
 				goto noclose_input;
diff --git a/init/initramfs.c b/init/initramfs.c
index ce2bcad97cdf..5f2ff1d2370e 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -340,7 +340,7 @@ static int __init do_name(void)
 			int openflags = O_WRONLY|O_CREAT;
 			if (ml != 1)
 				openflags |= O_TRUNC;
-			wfd = sys_open(collected, openflags, mode);
+			wfd = ksys_open(collected, openflags, mode);
 
 			if (wfd >= 0) {
 				ksys_fchown(wfd, uid, gid);
@@ -567,7 +567,7 @@ static void __init clean_rootfs(void)
 	struct linux_dirent64 *dirp;
 	int num;
 
-	fd = sys_open("/", O_RDONLY, 0);
+	fd = ksys_open("/", O_RDONLY, 0);
 	WARN_ON(fd < 0);
 	if (fd < 0)
 		return;
@@ -629,7 +629,7 @@ static int __init populate_rootfs(void)
 		}
 		printk(KERN_INFO "rootfs image is not initramfs (%s)"
 				"; looks like an initrd\n", err);
-		fd = sys_open("/initrd.image",
+		fd = ksys_open("/initrd.image",
 			      O_WRONLY|O_CREAT, 0700);
 		if (fd >= 0) {
 			ssize_t written = xwrite(fd, (char *)initrd_start,
diff --git a/init/main.c b/init/main.c
index d0ded4322c6b..e77951ae2c19 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1074,7 +1074,7 @@ static noinline void __init kernel_init_freeable(void)
 	do_basic_setup();
 
 	/* Open the /dev/console on the rootfs, this should never fail */
-	if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+	if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
 		pr_err("Warning: unable to open an initial console.\n");
 
 	(void) ksys_dup(0);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  2018-03-15 19:05 ` [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount() Dominik Brodowski
@ 2018-03-15 20:11   ` Arnd Bergmann
  2018-03-16  8:46     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Arnd Bergmann @ 2018-03-15 20:11 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
<linux@dominikbrodowski.net> wrote:
> Using this helper allows us to avoid the in-kernel calls to the sys_mount()
> syscall.
>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 50025d7959cb..4afb04686c8e 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -356,7 +356,8 @@ int devtmpfs_mount(const char *mntdir)
>         if (!thread)
>                 return 0;
>
> -       err = sys_mount("devtmpfs", (char *)mntdir, "devtmpfs", MS_SILENT, NULL);
> +       err = ksys_mount("devtmpfs", (char *)mntdir, "devtmpfs", MS_SILENT,
> +                        NULL);
>         if (err)
>                 printk(KERN_INFO "devtmpfs: error mounting %i\n", err);
>         else
> @@ -382,7 +383,7 @@ static int devtmpfsd(void *p)
>         *err = sys_unshare(CLONE_NEWNS);
>         if (*err)
>                 goto out;
> -       *err = sys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
> +       *err = ksys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
>         if (*err)
>                 goto out;
>         sys_chdir("/.."); /* will traverse into overmounted root */

Shouldn't the callers of sys_mount just call do_mount() instead?

As I understand it, sys_mount is already a wrapper around do_mount()
that copies its arguments from user space, but we don't need that
when called from inside the kernel.

       Arnd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
  2018-03-15 19:05 ` [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink() Dominik Brodowski
@ 2018-03-15 20:21   ` Arnd Bergmann
  2018-03-17 17:09     ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Arnd Bergmann @ 2018-03-15 20:21 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
<linux@dominikbrodowski.net> wrote:
> Using this wrapper allows us to avoid the in-kernel calls to the
> sys_unlink() syscall.
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> ---
>  include/linux/syscalls.h | 11 +++++++++++
>  init/do_mounts.h         |  2 +-
>  init/do_mounts_initrd.c  |  4 ++--
>  init/do_mounts_rd.c      |  2 +-
>  init/initramfs.c         |  4 ++--
>  5 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 8f0f99702e7a..31aea3873de7 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -971,4 +971,15 @@ int ksys_chdir(const char __user *filename);
>  int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
>                          unsigned int flags);
>
> +/*
> + * The following kernel syscall equivalents are just wrappers to fs-internal
> + * functions. Therefore, provide stubs to be inlined at the callsites.
> + */
> +extern long do_unlinkat(int dfd, struct filename *name);
> +
> +static inline long ksys_unlink(const char __user *pathname)
> +{
> +       return do_unlinkat(AT_FDCWD, getname(pathname));
> +}

Why does this take a __user pointer?

>  static inline int create_dev(char *name, dev_t dev)
>  {
> -       sys_unlink(name);
> +       ksys_unlink(name);
>         return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev));
>  }
>
> diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
> index c19d9070134e..784576b633fd 100644
> --- a/init/do_mounts_initrd.c
> +++ b/init/do_mounts_initrd.c
> @@ -128,11 +128,11 @@ bool __init initrd_load(void)
>                  * mounted in the normal path.
>                  */
>                 if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
> -                       sys_unlink("/initrd.image");
> +                       ksys_unlink("/initrd.image");
>                         handle_initrd();
>                         return true;
>                 }
>         }
> -       sys_unlink("/initrd.image");
> +       ksys_unlink("/initrd.image");
>         return false;

In all callers we seem to have regular kernel strings, so I think
you should skip the getname() and change the argument to
a regular pointer.

      Arnd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall
  2018-03-15 19:05 ` [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() " Dominik Brodowski
@ 2018-03-15 20:30   ` Arnd Bergmann
  2018-03-17 17:11     ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Arnd Bergmann @ 2018-03-15 20:30 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
<linux@dominikbrodowski.net> wrote:
>
>   */
> -SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
> -               int, newdfd, const char __user *, newname, int, flags)
> +int do_linkat(int olddfd, const char __user *oldname, int newdfd,
> +             const char __user *newname, int flags)
>  {
>         struct dentry *new_dentry;
>         struct path old_path, new_path;

For consistency with other do_*() functions, I think it would be nice
to make this one not take a __user pointer either. However, I
have no idea how to do that without making the common case worse.

> --- a/init/initramfs.c
> +++ b/init/initramfs.c
> @@ -306,7 +306,7 @@ static int __init maybe_link(void)
>         if (nlink >= 2) {
>                 char *old = find_link(major, minor, ino, mode, collected);
>                 if (old)
> -                       return (sys_link(old, collected) < 0) ? -1 : 1;
> +                       return (ksys_link(old, collected) < 0) ? -1 : 1;
>         }
>         return 0;
>  }

Since this is the only caller outside of fs/namei.c, maybe it can be
changed to use vfs_link() instead? That might still be a larger rework
than you want to do.

        Arnd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  2018-03-15 19:05 ` [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot() Dominik Brodowski
@ 2018-03-15 20:44   ` Arnd Bergmann
  2018-03-16  8:49     ` Christoph Hellwig
  2018-03-17 17:04     ` Dominik Brodowski
  0 siblings, 2 replies; 76+ messages in thread
From: Arnd Bergmann @ 2018-03-15 20:44 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
<linux@dominikbrodowski.net> wrote:
> Using this helper allows us to avoid the in-kernel calls to the sys_chroot()
> syscall.
>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> ---
>  drivers/base/devtmpfs.c  | 2 +-
>  fs/open.c                | 7 ++++++-
>  include/linux/syscalls.h | 1 +
>  init/do_mounts.c         | 2 +-
>  init/do_mounts_initrd.c  | 4 ++--
>  5 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 4afb04686c8e..5743f04014ca 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -387,7 +387,7 @@ static int devtmpfsd(void *p)
>         if (*err)
>                 goto out;
>         sys_chdir("/.."); /* will traverse into overmounted root */
> -       sys_chroot(".");
> +       ksys_chroot(".");
>         complete(&setup_done);
>         while (1) {
>                 spin_lock(&req_lock);

Could this be done using kern_path()/set_fs_root() instead so we
avoid the __user pointer?

       Arnd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 21/36] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
  2018-03-15 19:05 ` [PATCH v2 21/36] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
@ 2018-03-15 20:54   ` Arnd Bergmann
  0 siblings, 0 replies; 76+ messages in thread
From: Arnd Bergmann @ 2018-03-15 20:54 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton, Linux-MM

On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
<linux@dominikbrodowski.net> wrote:
> Using this helper allows us to avoid the in-kernel calls to the
> sys_mmap_pgoff() syscall.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

It might be a good idea to clean up the sys_mmap2()/sys_mmap_pgoff()
distinction as well: From what I understand (I'm sure Al will correct me
if this is wrong), all 32-bit architectures have a sys_mmap2() syscall
that has a fixed bit shift value, possibly always 12.
sys_mmap_pgoff() is defined to have a shift of PAGE_SHIFT, which
may or may not depend on the kernel configuration.

If we replace the

+SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
+               unsigned long, prot, unsigned long, flags,
+               unsigned long, fd, unsigned long, pgoff)
+{
+       return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+}

with a corresponding sys_mmap2() definition, it seems we can
simplify a number of architectures that today need to define
sys_mmap2() as a wrapper around sys_mmap_pgoff().

        Arnd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (35 preceding siblings ...)
  2018-03-15 19:05 ` [PATCH v2 36/36] fs: add ksys_open() wrapper; remove in-kernel calls to sys_open() Dominik Brodowski
@ 2018-03-15 21:02 ` Arnd Bergmann
  2018-03-16  0:38   ` Andy Lutomirski
  2018-03-17 17:13   ` Dominik Brodowski
  2018-03-16  9:01 ` Zhang, Ning A
  37 siblings, 2 replies; 76+ messages in thread
From: Arnd Bergmann @ 2018-03-15 21:02 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
<linux@dominikbrodowski.net> wrote:
> Here is a re-spin of the first set of patches which reduce the number of
> syscall invocations from within the kernel; the RFC may be found at
>
> The rationale for this change is described in patch 1 as follows:
>
>         The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>         and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>         through kernel entry points, but not from the kernel itself. This
>         will allow cleanups and optimizations to the entry paths *and* to
>         the parts of the kernel code which currently need to pretend to be
>         userspace in order to make use of syscalls.
>
> The whole series can be found at
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next
>
> and will be submitted for merging for the v4.17-rc1 cycle, probably together
> with another batch of related patches I hope to send out tomorrow as a RFC.

Nice work!

I've already commented on a few patches that now have a kernel-internal
helper function that takes a __user pointer. I think those are all only used
in the early boot code (initramfs etc) that runs before we set_fs() to the
user address space, but it also causes warnings with sparse. If we
can change all of them to take kernel pointers, that would let us avoid
the sparse warnings and start running with a normal user address space
view. Unfortunately, some of the syscall seem to be harder to change to
that than others, so not sure if it's worth the effort.

Another open question are the declarations in include/linux/syscalls.h.
These serve as a help for type-checking today, making sure that
each syscall we refer to from either the syscall table or called
by some kernel function uses the same prototype that matches
the syscall definition, which raises the question of whether we want
to keep the header around at all.

        Arnd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-15 21:02 ` [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Arnd Bergmann
@ 2018-03-16  0:38   ` Andy Lutomirski
  2018-03-16  0:54     ` Linus Torvalds
  2018-03-17 17:13   ` Dominik Brodowski
  1 sibling, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2018-03-16  0:38 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dominik Brodowski, Linux Kernel Mailing List, Linus Torvalds,
	Al Viro, Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 9:02 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
> <linux@dominikbrodowski.net> wrote:
>> Here is a re-spin of the first set of patches which reduce the number of
>> syscall invocations from within the kernel; the RFC may be found at
>>
>> The rationale for this change is described in patch 1 as follows:
>>
>>         The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>>         and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>>         through kernel entry points, but not from the kernel itself. This
>>         will allow cleanups and optimizations to the entry paths *and* to
>>         the parts of the kernel code which currently need to pretend to be
>>         userspace in order to make use of syscalls.
>>
>> The whole series can be found at
>>
>>         https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next
>>
>> and will be submitted for merging for the v4.17-rc1 cycle, probably together
>> with another batch of related patches I hope to send out tomorrow as a RFC.
>
> Nice work!
>
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

It would be fantastic to get rid of set_fs() entirely and make it
impossible for get_user(), etc to ever access kernel memory.  And this
effort is necessary to ever achieve that.

I don't think this patch series should wait for any of these cleanups,
though.  We need these patches to change the x86_64 internal syscall
function signature, which we've been wanting to do for a little while.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-16  0:38   ` Andy Lutomirski
@ 2018-03-16  0:54     ` Linus Torvalds
  2018-03-16  8:54       ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Linus Torvalds @ 2018-03-16  0:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Dominik Brodowski, Linux Kernel Mailing List,
	Al Viro, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 5:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> I don't think this patch series should wait for any of these cleanups,
> though.  We need these patches to change the x86_64 internal syscall
> function signature, which we've been wanting to do for a little while.

Yes. And honestly, I'd rather have these kinds of "just change the
calling convention" almost automated patches separately - and then the
cleanups later.

Mixing the calling convention change and the cleanup together is just
confusing and potentially causes subtle issues.

                         Linus

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  2018-03-15 19:05 ` [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Dominik Brodowski
@ 2018-03-16  8:43   ` Christoph Hellwig
  2018-03-16 11:13     ` Dominik Brodowski
  2018-03-16 12:00   ` Thomas Gleixner
  1 sibling, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:43 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Thomas Gleixner, Ingo Molnar, Jiri Slaby, x86

On Thu, Mar 15, 2018 at 08:05:06PM +0100, Dominik Brodowski wrote:
> Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
> syscall.

Why not do_ioperm or kernel_ioperm as for most other syscalls?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  2018-03-15 20:11   ` Arnd Bergmann
@ 2018-03-16  8:46     ` Christoph Hellwig
  2018-03-16 16:58     ` Linus Torvalds
  2018-03-17 16:52     ` Dominik Brodowski
  2 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dominik Brodowski, Linux Kernel Mailing List, Linus Torvalds,
	Al Viro, Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 09:11:27PM +0100, Arnd Bergmann wrote:
> Shouldn't the callers of sys_mount just call do_mount() instead?
> 
> As I understand it, sys_mount is already a wrapper around do_mount()
> that copies its arguments from user space, but we don't need that
> when called from inside the kernel.

In general yes.  do_mount.c has some really strange calling context
where it tries to operate on kernel and user pointers interchangably,
but even with that just switching to do_mount seems like the right thing
to me.  In fact once we do that and take care or chdir/chdoot we could
probably get rid of the sparse disable hack in favour of a few __force
casts in change_floppy and sort this mess out as well.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
  2018-03-15 19:05 ` [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount() Dominik Brodowski
@ 2018-03-16  8:47   ` Christoph Hellwig
  2018-03-17 16:58     ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:47 UTC (permalink / raw)
  To: Dominik Brodowski; +Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd

On Thu, Mar 15, 2018 at 08:05:08PM +0100, Dominik Brodowski wrote:
> Using this helper allows us to avoid the in-kernel call to the sys_umount()
> syscall.

kern_unmount, please.  And make it operate on kernel pointers please.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
  2018-03-15 19:05 ` [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}() Dominik Brodowski
@ 2018-03-16  8:48   ` Christoph Hellwig
  2018-03-17 17:01     ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:48 UTC (permalink / raw)
  To: Dominik Brodowski; +Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd

On Thu, Mar 15, 2018 at 08:05:09PM +0100, Dominik Brodowski wrote:
> Using ksys_dup() and ksys_dup3() as helper functions allows us to
> avoid the in-kernel calls to the sys_dup() and sys_dup3() syscalls.

do_dup/dup3 or kern_dup/dup3, please.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  2018-03-15 20:44   ` Arnd Bergmann
@ 2018-03-16  8:49     ` Christoph Hellwig
  2018-03-17 17:04     ` Dominik Brodowski
  1 sibling, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dominik Brodowski, Linux Kernel Mailing List, Linus Torvalds,
	Al Viro, Andy Lutomirski, Ingo Molnar, Andrew Morton

> > +       ksys_chroot(".");
> >         complete(&setup_done);
> >         while (1) {
> >                 spin_lock(&req_lock);
> 
> Could this be done using kern_path()/set_fs_root() instead so we
> avoid the __user pointer?

Agreed.  Especially as we don't need any of the permission checks here.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write()
  2018-03-15 19:05 ` [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write() Dominik Brodowski
@ 2018-03-16  8:52   ` Christoph Hellwig
  2018-03-17 17:06     ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:52 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd, linux-s390

I really don't like this, as this is the wrong level of abstraction.

> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index 79b7a3438d54..5a9cfde5fc28 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -468,7 +468,7 @@ COMPAT_SYSCALL_DEFINE3(s390_write, unsigned int, fd, const char __user *, buf, c
>  	if ((compat_ssize_t) count < 0)
>  		return -EINVAL; 
>  
> -	return sys_write(fd, buf, count);
> +	return ksys_write(fd, buf, count);
>  }

This looks bogus to me.  Why does s390 have its own compat version of
write but not any of the other read and write familty calls?

> diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
> index 99e0b649fc0e..2d365c398ccc 100644
> --- a/init/do_mounts_rd.c
> +++ b/init/do_mounts_rd.c
> @@ -270,7 +270,7 @@ int __init rd_load_image(char *from)
>  			printk("Loading disk #%d... ", disk);
>  		}
>  		sys_read(in_fd, buf, BLOCK_SIZE);
> -		sys_write(out_fd, buf, BLOCK_SIZE);
> +		ksys_write(out_fd, buf, BLOCK_SIZE);
>  #if !defined(CONFIG_S390)
>  		if (!(i % 16)) {
>  			pr_cont("%c\b", rotator[rotate & 0x3]);

All the do_mounts / initramfs code should be rewritten to use filp_open
and vfs_read/vfs_write instead of adding hacks like this.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-16  0:54     ` Linus Torvalds
@ 2018-03-16  8:54       ` Christoph Hellwig
  2018-03-16 14:20         ` Al Viro
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2018-03-16  8:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Arnd Bergmann, Dominik Brodowski,
	Linux Kernel Mailing List, Al Viro, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 05:54:27PM -0700, Linus Torvalds wrote:
> Yes. And honestly, I'd rather have these kinds of "just change the
> calling convention" almost automated patches separately - and then the
> cleanups later.
> 
> Mixing the calling convention change and the cleanup together is just
> confusing and potentially causes subtle issues.

A lot of the issues here is that the initramfs / do_mount code
is written as if it was user space code, but in kernel space.  E.g.
using file desriptors etc.  I think doing one or a few patches
before this series to sort this out would really reduce the scope
of work and be the right thing.  For any additional minor cleanups
I agree that it might make sense to postpone them.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
                   ` (36 preceding siblings ...)
  2018-03-15 21:02 ` [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Arnd Bergmann
@ 2018-03-16  9:01 ` Zhang, Ning A
  2018-03-16 10:25   ` Dominik Brodowski
  37 siblings, 1 reply; 76+ messages in thread
From: Zhang, Ning A @ 2018-03-16  9:01 UTC (permalink / raw)
  To: torvalds, linux, linux-kernel, viro; +Cc: mingo, luto, akpm, arnd

在 2018-03-15四的 20:04 +0100,Dominik Brodowski写道:
> Here is a re-spin of the first set of patches which reduce the number of
> syscall invocations from within the kernel; the RFC may be found at
> 
> The rationale for this change is described in patch 1 as follows:
> 
> 	The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> 	and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> 	through kernel entry points, but not from the kernel itself. This
> 	will allow cleanups and optimizations to the entry paths *and* to
> 	the parts of the kernel code which currently need to pretend to be
> 	userspace in order to make use of syscalls.

I think this is really bad to change syscalls one by one, to do_*

why not change SYSCALL_DEFINEx to define kernel wrappers?


> 
The whole series can be found at 
> 
> 	https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next
> 
> and will be submitted for merging for the v4.17-rc1 cycle, probably together
> with another batch of related patches I hope to send out tomorrow as a RFC.
> 
> Changes since the RFC / v1:
> 
> - rebase to v4.15-rc5; sys_ioperm already got its SYSCALL_DEFINE3
> - add ACKs
> - CC: -> Cc: (suggested by Ingo Molnar)
> - update comment in include/linux/syscalls.h (suggested by Ingo Molnar and
> 	Andy Lutomirski)
> - separate declarations from definitions with newlines in
> 	include/linux/syscalls.h; add comment on ksys_close() (suggested by
> 	Ingo Molnar)
> - expand commit messages (suggested by Christoph Hellwig)
> - include patch 36:
> 	fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
> - do not worry about the following archs, as they are going away:
> 	cris, frv, metag, mn10300, score, tile
> 	(solving conflicts in -next)
> - fix builds with CONFIG_FUTEX=n, CONFIG_ADVISE_SYSCALLS=n (solving issues
> 	found by Stephen Rothwell)
> 
> Thanks,
> 	Dominik
> 
> 
> Dominik Brodowski (36):
>   syscalls: define goal to not call sys_xyzzy() from within the kernel
>   kernel: use kernel_wait4() instead of sys_wait4()
>   mm: use do_futex() instead of sys_futex() in mm_release()
>   kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
>   fs: add do_readlinkat() helper; remove internal call to
>     sys_readlinkat()
>   fs: add do_pipe2() helper; remove internal call to sys_pipe2()
>   fs: add do_renameat2() helper; remove internal call to sys_renameat2()
>   fs: add do_futimesat() helper; remove internal call to sys_futimesat()
>   syscalls: add do_epoll_*() helpers; remove internal calls to
>     sys_epoll_*()
>   fs: add do_signalfd4() helper; remove internal calls to
>     sys_signalfd4()
>   fs: add do_eventfd() helper; remove internal call to sys_eventfd()
>   kernel: open-code sys_rt_sigpending() in sys_sigpending()
>   x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to
>     sys_ioperm()
>   fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
>   fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
>   fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
>   fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
>   fs: add ksys_write() helper; remove in-kernel calls to sys_write()
>   kernel: add ksys_unshare() helper; remove in-kernel calls to
>     sys_unshare()
>   mm: add ksys_fadvise64_64() helper; remove in-kernel call to
>     sys_fadvise64_64()
>   mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to
>     sys_mmap_pgoff()
>   fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
>   fs: add ksys_sync_file_range helper(); remove in-kernel calls to
>     syscall
>   fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
>   hostfs: rename do_rmdir() to hostfs_do_rmdir()
>   fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
>   fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel
>     calls to syscall
>   fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove
>     in-kernel calls to syscall
>   fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel
>     calls to syscall
>   fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel
>     calls to syscall
>   fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod()
>     wrapper; remove in-kernel calls to syscall
>   fs: add do_faccessat() helper and ksys_access() wrapper; remove
>     in-kernel calls to syscall
>   fs: add ksys_ftruncate() wrapper; remove in-kernel calls to
>     sys_ftruncate()
>   fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown()
>     wrappers
>   fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
>   fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
> 
>  Documentation/process/adding-syscalls.rst |  14 ---
>  arch/alpha/kernel/osf_sys.c               |   2 +-
>  arch/arm/kernel/sys_arm.c                 |   2 +-
>  arch/arm64/kernel/sys.c                   |   2 +-
>  arch/ia64/kernel/sys_ia64.c               |   4 +-
>  arch/m68k/kernel/sys_m68k.c               |   2 +-
>  arch/microblaze/kernel/sys_microblaze.c   |   6 +-
>  arch/mips/kernel/linux32.c                |  10 +-
>  arch/mips/kernel/syscall.c                |   6 +-
>  arch/parisc/kernel/sys_parisc.c           |  14 +--
>  arch/powerpc/kernel/sys_ppc32.c           |   8 +-
>  arch/powerpc/kernel/syscalls.c            |   6 +-
>  arch/riscv/kernel/sys_riscv.c             |   4 +-
>  arch/s390/kernel/compat_linux.c           |  23 ++---
>  arch/s390/kernel/sys_s390.c               |   2 +-
>  arch/sh/kernel/sys_sh.c                   |   4 +-
>  arch/sh/kernel/sys_sh32.c                 |   8 +-
>  arch/sparc/kernel/sys_sparc32.c           |  14 +--
>  arch/sparc/kernel/sys_sparc_32.c          |   6 +-
>  arch/sparc/kernel/sys_sparc_64.c          |   2 +-
>  arch/um/kernel/syscall.c                  |   2 +-
>  arch/x86/ia32/sys_ia32.c                  |  22 ++---
>  arch/x86/include/asm/syscalls.h           |   1 +
>  arch/x86/kernel/ioport.c                  |   7 +-
>  arch/x86/kernel/sys_x86_64.c              |   2 +-
>  arch/xtensa/kernel/syscall.c              |   2 +-
>  drivers/base/devtmpfs.c                   |  11 ++-
>  drivers/tty/vt/vt_ioctl.c                 |   6 +-
>  fs/autofs4/dev-ioctl.c                    |   2 +-
>  fs/binfmt_misc.c                          |   2 +-
>  fs/eventfd.c                              |   9 +-
>  fs/eventpoll.c                            |  23 +++--
>  fs/file.c                                 |  17 +++-
>  fs/hostfs/hostfs.h                        |   2 +-
>  fs/hostfs/hostfs_kern.c                   |   2 +-
>  fs/hostfs/hostfs_user.c                   |   2 +-
>  fs/internal.h                             |  14 +++
>  fs/namei.c                                |  61 +++++++++----
>  fs/namespace.c                            |  19 +++-
>  fs/open.c                                 |  68 ++++++++++----
>  fs/pipe.c                                 |   9 +-
>  fs/read_write.c                           |   9 +-
>  fs/signalfd.c                             |  14 ++-
>  fs/stat.c                                 |  12 ++-
>  fs/sync.c                                 |  12 ++-
>  fs/utimes.c                               |  13 ++-
>  include/linux/futex.h                     |  13 ++-
>  include/linux/syscalls.h                  | 146 +++++++++++++++++++++++++++++-
>  init/do_mounts.c                          |  16 ++--
>  init/do_mounts.h                          |   4 +-
>  init/do_mounts_initrd.c                   |  38 ++++----
>  init/do_mounts_md.c                       |  14 +--
>  init/do_mounts_rd.c                       |  18 ++--
>  init/initramfs.c                          |  48 +++++-----
>  init/main.c                               |   9 +-
>  init/noinitramfs.c                        |   6 +-
>  kernel/exit.c                             |   2 +-
>  kernel/fork.c                             |  11 ++-
>  kernel/pid_namespace.c                    |   6 +-
>  kernel/signal.c                           |  15 ++-
>  kernel/sys.c                              |   9 +-
>  kernel/uid16.c                            |   6 +-
>  kernel/umh.c                              |   2 +-
>  mm/fadvise.c                              |  10 +-
>  mm/mmap.c                                 |  17 +++-
>  mm/nommu.c                                |  17 +++-
>  66 files changed, 614 insertions(+), 275 deletions(-)
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-16  9:01 ` Zhang, Ning A
@ 2018-03-16 10:25   ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-16 10:25 UTC (permalink / raw)
  To: Zhang, Ning A; +Cc: torvalds, linux-kernel, viro, mingo, luto, akpm, arnd

On Fri, Mar 16, 2018 at 09:01:11AM +0000, Zhang, Ning A wrote:
> 在 2018-03-15四的 20:04 +0100,Dominik Brodowski写道:
> > Here is a re-spin of the first set of patches which reduce the number of
> > syscall invocations from within the kernel; the RFC may be found at
> > 
> > The rationale for this change is described in patch 1 as follows:
> > 
> > 	The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> > 	and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> > 	through kernel entry points, but not from the kernel itself. This
> > 	will allow cleanups and optimizations to the entry paths *and* to
> > 	the parts of the kernel code which currently need to pretend to be
> > 	userspace in order to make use of syscalls.
> 
> I think this is really bad to change syscalls one by one, to do_*
> 
> why not change SYSCALL_DEFINEx to define kernel wrappers?

Basically, for two reasons: First, only a subset of all syscalls require
such wrappers -- only about a third of all syscalls are called from within
the kernel at the moment (rough guess). Second, and more important: We want
to reduce the amount of such usage; see, e.g., the messages by Christoph and
Arnd in this thread.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  2018-03-16  8:43   ` Christoph Hellwig
@ 2018-03-16 11:13     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-16 11:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Thomas Gleixner, Ingo Molnar, Jiri Slaby, x86

On Fri, Mar 16, 2018 at 01:43:08AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 08:05:06PM +0100, Dominik Brodowski wrote:
> > Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
> > syscall.
> 
> Why not do_ioperm or kernel_ioperm as for most other syscalls?

The newly introduced ksys_*() functions/helpers/wrappers take the same
parameters and use the same calling conventions as the "real" syscalls, and
are made available through include/linux/syscalls.h for (at least temporary)
in-kernel use.

Contrary to that, do_*() are mostly kept internal to one file or subsystem,
and seem to be more flexible with the calling convention. Same for
kernel_*().

But if you prefer the do_*() or kernel_*() namespace for the
in-kernel-syscall-equivalent for fs/*, I'm fine with that, just let me know.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release()
  2018-03-15 19:04 ` [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release() Dominik Brodowski
@ 2018-03-16 11:58   ` Thomas Gleixner
  2018-03-16 18:43   ` Darren Hart
  1 sibling, 0 replies; 76+ messages in thread
From: Thomas Gleixner @ 2018-03-16 11:58 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Ingo Molnar, Peter Zijlstra, Darren Hart

On Thu, 15 Mar 2018, Dominik Brodowski wrote:

> sys_futex() is a wrapper to do_futex() which does not modify any
> values here:
> 
> - uaddr, val and val3 are kept the same
> 
> - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
>   Therefore, val2 is always 0.
> 
> - as utime is set to NULL, *timeout is NULL
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  2018-03-15 19:05 ` [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Dominik Brodowski
  2018-03-16  8:43   ` Christoph Hellwig
@ 2018-03-16 12:00   ` Thomas Gleixner
  2018-03-16 14:45     ` Dominik Brodowski
  1 sibling, 1 reply; 76+ messages in thread
From: Thomas Gleixner @ 2018-03-16 12:00 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Ingo Molnar, Jiri Slaby, x86

On Thu, 15 Mar 2018, Dominik Brodowski wrote:

> Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
> syscall.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jiri Slaby <jslaby@suse.com>
> Cc: x86@kernel.org
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

Please add a few lines explaining the ksys_ prefix as you did in your reply
to Christoph. Other than that:

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-16  8:54       ` Christoph Hellwig
@ 2018-03-16 14:20         ` Al Viro
  2018-03-16 16:47           ` Linus Torvalds
  0 siblings, 1 reply; 76+ messages in thread
From: Al Viro @ 2018-03-16 14:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Andy Lutomirski, Arnd Bergmann,
	Dominik Brodowski, Linux Kernel Mailing List, Ingo Molnar,
	Andrew Morton

On Fri, Mar 16, 2018 at 01:54:23AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 05:54:27PM -0700, Linus Torvalds wrote:
> > Yes. And honestly, I'd rather have these kinds of "just change the
> > calling convention" almost automated patches separately - and then the
> > cleanups later.
> > 
> > Mixing the calling convention change and the cleanup together is just
> > confusing and potentially causes subtle issues.
> 
> A lot of the issues here is that the initramfs / do_mount code
> is written as if it was user space code, but in kernel space.  E.g.
> using file desriptors etc.

... and I still wonder if it would make more sense to kick that crap
out into userland.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  2018-03-16 12:00   ` Thomas Gleixner
@ 2018-03-16 14:45     ` Dominik Brodowski
  2018-03-16 14:47       ` Thomas Gleixner
  0 siblings, 1 reply; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-16 14:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Ingo Molnar, Jiri Slaby, x86

On Fri, Mar 16, 2018 at 01:00:48PM +0100, Thomas Gleixner wrote:
> On Thu, 15 Mar 2018, Dominik Brodowski wrote:
> 
> > Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
> > syscall.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Jiri Slaby <jslaby@suse.com>
> > Cc: x86@kernel.org
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> 
> Please add a few lines explaining the ksys_ prefix as you did in your reply
> to Christoph. Other than that:
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

Thanks! The commit message now reads

	Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
	syscall. The ksys_ prefix denotes that this function is meant as a drop-in
	replacement for the syscall. In particular, it uses the same calling
	convention as sys_ioperm().

Does that sound OK?

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  2018-03-16 14:45     ` Dominik Brodowski
@ 2018-03-16 14:47       ` Thomas Gleixner
  0 siblings, 0 replies; 76+ messages in thread
From: Thomas Gleixner @ 2018-03-16 14:47 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Ingo Molnar, Jiri Slaby, x86

On Fri, 16 Mar 2018, Dominik Brodowski wrote:
> On Fri, Mar 16, 2018 at 01:00:48PM +0100, Thomas Gleixner wrote:
> > On Thu, 15 Mar 2018, Dominik Brodowski wrote:
> > 
> > > Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
> > > syscall.
> > > 
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Jiri Slaby <jslaby@suse.com>
> > > Cc: x86@kernel.org
> > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> > 
> > Please add a few lines explaining the ksys_ prefix as you did in your reply
> > to Christoph. Other than that:
> > 
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Thanks! The commit message now reads
> 
> 	Using this helper allows us to avoid the in-kernel calls to the sys_ioperm()
> 	syscall. The ksys_ prefix denotes that this function is meant as a drop-in
> 	replacement for the syscall. In particular, it uses the same calling
> 	convention as sys_ioperm().
> 
> Does that sound OK?

Looks good.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-16 14:20         ` Al Viro
@ 2018-03-16 16:47           ` Linus Torvalds
  0 siblings, 0 replies; 76+ messages in thread
From: Linus Torvalds @ 2018-03-16 16:47 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andy Lutomirski, Arnd Bergmann,
	Dominik Brodowski, Linux Kernel Mailing List, Ingo Molnar,
	Andrew Morton

On Fri, Mar 16, 2018 at 7:20 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Mar 16, 2018 at 01:54:23AM -0700, Christoph Hellwig wrote:
>>
>> A lot of the issues here is that the initramfs / do_mount code
>> is written as if it was user space code, but in kernel space.  E.g.
>> using file desriptors etc.

Yeah, some of it could probably pass a 'struct filp *' around instead.

So there are definitely things we could do once we no longer use the
raw system calls anyway.

> ... and I still wonder if it would make more sense to kick that crap
> out into userland.

Oh, no, let's not do that. Even if we were to still maintain control
of user space, it would mean yet another nasty special case for the
compiler and linker scripts and for our initrd generation.

And if we were to spin it out entirely (aka udevd and friends), it
would become one of those nasty situations where there's some *very*
odd code that we need to keep compatibility with because you might run
a new kernel and some old "pre-init user code" stuff.

I'd much rather just make it look more like kernel code.

And maybe remove some code entirely. Christ, we still have the logic
in there to change *floppies* if the ramdisk doesn't fit on a single
floppy disk.  Does it work? Probably not, since presumably it hasn't
been used in ages. But it's still there.

So some of the ioctl's etc are due to insanely old legacy cases.

                 Linus

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4()
  2018-03-15 19:04 ` [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4() Dominik Brodowski
@ 2018-03-16 16:58   ` Luis R. Rodriguez
  2018-03-17 16:44     ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Luis R. Rodriguez @ 2018-03-16 16:58 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Luis R . Rodriguez

On Thu, Mar 15, 2018 at 08:04:55PM +0100, Dominik Brodowski wrote:
> diff --git a/kernel/umh.c b/kernel/umh.c
> index 18e5fa4b0e71..f4b557cadf08 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -135,7 +135,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>  		 *
>  		 * Thus the __user pointer cast is valid here.
>  		 */
> -		sys_wait4(pid, (int __user *)&ret, 0, NULL);
> +		kernel_wait4(pid, (int __user *)&ret, 0, NULL);
>  
>  		/*
>  		 * If ret is 0, either call_usermodehelper_exec_async failed and

There is also a reference to sys_wait4() usage on umh.c:

        /* If SIGCLD is ignored sys_wait4 won't populate the status. */         
        kernel_sigaction(SIGCHLD, SIG_DFL);     

Does that remain true for kernel_wait4()? If so that comment should be updated
as well.

I don't see any kdoc for kernel_wait4(), can you update it and also clarify to
recommend it so that other users do the same? In fact not a kernel helper
which just takes no last argument, and passes NULL to kernel_wait4() as well?

  Luis

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  2018-03-15 20:11   ` Arnd Bergmann
  2018-03-16  8:46     ` Christoph Hellwig
@ 2018-03-16 16:58     ` Linus Torvalds
  2018-03-17 16:52     ` Dominik Brodowski
  2 siblings, 0 replies; 76+ messages in thread
From: Linus Torvalds @ 2018-03-16 16:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dominik Brodowski, Linux Kernel Mailing List, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 1:11 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> Shouldn't the callers of sys_mount just call do_mount() instead?

So for most of these, I'd rather not really change the code and just
do a direct translation, but I have to agree that "sys_mount ->
do_mount" might be special.

As you say, the only thing sys_mount() does is to copy things from
user space and allocate temporaries, and then call do_mount(). And we
already use do_mount in other places.

So it translating sys_mount() to do_mount() might just be the right thing to do.

Of course, do_mount() still ends up doing yet more "translate user
mode stuff to kernel stuff", so it's kind o fa confusing half-way
state where the user mountpoint name is still in user space, and the
flags are still in the MS namespace, but the device name and the mount
options have been moved to kernel space.

So I dunno. It might make sense to convert to do_mount(), but in many
ways ksys_mount() that just passes *everything* as if it was a user
call is actually more logical, even if it's just pointless churn to
then copy things.

I do like the mindless "let's just use the SYSCALL_DEFINEx()
interfaces for in-kernel use" model, exactly because it's so
black-and-white and doesn't have these kinds of "on one hand.."
issues.

                   Linus

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release()
  2018-03-15 19:04 ` [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release() Dominik Brodowski
  2018-03-16 11:58   ` Thomas Gleixner
@ 2018-03-16 18:43   ` Darren Hart
  2018-03-16 19:03     ` Andy Lutomirski
  1 sibling, 1 reply; 76+ messages in thread
From: Darren Hart @ 2018-03-16 18:43 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra

On Thu, Mar 15, 2018 at 08:04:56PM +0100, Dominik Brodowski wrote:
> sys_futex() is a wrapper to do_futex() which does not modify any
> values here:
> 
> - uaddr, val and val3 are kept the same
> 
> - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
>   Therefore, val2 is always 0.
> 
> - as utime is set to NULL, *timeout is NULL
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

Hi Dominik,

I'm missing the "why" part here. What is it you are trying to address?

do_futex is not currently in use outside of the futex implementation,
while sys_futex is. This decouples the interface from the
implementation. While this is perhaps less critical within the
kernel, I don't see a compelling reason to increase the coupling
between the mm and futex implementations.

Without a compelling WHY, Nack from me.

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release()
  2018-03-16 18:43   ` Darren Hart
@ 2018-03-16 19:03     ` Andy Lutomirski
  2018-03-16 21:44       ` Darren Hart
  0 siblings, 1 reply; 76+ messages in thread
From: Andy Lutomirski @ 2018-03-16 19:03 UTC (permalink / raw)
  To: Darren Hart
  Cc: Dominik Brodowski, LKML, Linus Torvalds, Al Viro,
	Andrew Lutomirski, Ingo Molnar, Andrew Morton, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra

On Fri, Mar 16, 2018 at 6:43 PM, Darren Hart <dvhart@infradead.org> wrote:
> On Thu, Mar 15, 2018 at 08:04:56PM +0100, Dominik Brodowski wrote:
>> sys_futex() is a wrapper to do_futex() which does not modify any
>> values here:
>>
>> - uaddr, val and val3 are kept the same
>>
>> - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
>>   Therefore, val2 is always 0.
>>
>> - as utime is set to NULL, *timeout is NULL
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Darren Hart <dvhart@infradead.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
>
> Hi Dominik,
>
> I'm missing the "why" part here. What is it you are trying to address?
>
> do_futex is not currently in use outside of the futex implementation,
> while sys_futex is. This decouples the interface from the
> implementation. While this is perhaps less critical within the
> kernel, I don't see a compelling reason to increase the coupling
> between the mm and futex implementations.
>
> Without a compelling WHY, Nack from me.
>

We want to make some changes to the way that the syscall entry code
invokes syscalls, and these changes will make it impossible to call
sys_xyz() functions from the kernel.  So we can make sys_futex() be a
trivial wrapper around a new ksys_futex(), or we can do a patch like
this.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release()
  2018-03-16 19:03     ` Andy Lutomirski
@ 2018-03-16 21:44       ` Darren Hart
  2018-03-17 16:39         ` Dominik Brodowski
  0 siblings, 1 reply; 76+ messages in thread
From: Darren Hart @ 2018-03-16 21:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dominik Brodowski, LKML, Linus Torvalds, Al Viro, Ingo Molnar,
	Andrew Morton, Arnd Bergmann, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra

On Fri, Mar 16, 2018 at 07:03:53PM +0000, Andy Lutomirski wrote:
> On Fri, Mar 16, 2018 at 6:43 PM, Darren Hart <dvhart@infradead.org> wrote:
> > On Thu, Mar 15, 2018 at 08:04:56PM +0100, Dominik Brodowski wrote:
> >> sys_futex() is a wrapper to do_futex() which does not modify any
> >> values here:
> >>
> >> - uaddr, val and val3 are kept the same
> >>
> >> - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
> >>   Therefore, val2 is always 0.
> >>
> >> - as utime is set to NULL, *timeout is NULL
> >>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Darren Hart <dvhart@infradead.org>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> >
> > Hi Dominik,
> >
> > I'm missing the "why" part here. What is it you are trying to address?
> >
> > do_futex is not currently in use outside of the futex implementation,
> > while sys_futex is. This decouples the interface from the
> > implementation. While this is perhaps less critical within the
> > kernel, I don't see a compelling reason to increase the coupling
> > between the mm and futex implementations.
> >
> > Without a compelling WHY, Nack from me.
> >
> 
> We want to make some changes to the way that the syscall entry code
> invokes syscalls, and these changes will make it impossible to call
> sys_xyz() functions from the kernel.  So we can make sys_futex() be a
> trivial wrapper around a new ksys_futex(), or we can do a patch like
> this.

I dug up the cover letter and got the motivation and withdraw my
objection. I understand the motivation to put the motivation in the
cover letter in a large series, but I think there should have been
something indicating the need for this change in the individual patches,
even just a single line like Andy's first sentence above.

Thanks,

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release()
  2018-03-16 21:44       ` Darren Hart
@ 2018-03-17 16:39         ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 16:39 UTC (permalink / raw)
  To: Darren Hart
  Cc: Andy Lutomirski, LKML, Linus Torvalds, Al Viro, Ingo Molnar,
	Andrew Morton, Arnd Bergmann, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra

On Fri, Mar 16, 2018 at 02:44:54PM -0700, Darren Hart wrote:
> On Fri, Mar 16, 2018 at 07:03:53PM +0000, Andy Lutomirski wrote:
> > On Fri, Mar 16, 2018 at 6:43 PM, Darren Hart <dvhart@infradead.org> wrote:
> > > On Thu, Mar 15, 2018 at 08:04:56PM +0100, Dominik Brodowski wrote:
> > >> sys_futex() is a wrapper to do_futex() which does not modify any
> > >> values here:
> > >>
> > >> - uaddr, val and val3 are kept the same
> > >>
> > >> - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
> > >>   Therefore, val2 is always 0.
> > >>
> > >> - as utime is set to NULL, *timeout is NULL
> > >>
> > >> Cc: Thomas Gleixner <tglx@linutronix.de>
> > >> Cc: Ingo Molnar <mingo@redhat.com>
> > >> Cc: Peter Zijlstra <peterz@infradead.org>
> > >> Cc: Darren Hart <dvhart@infradead.org>
> > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > >> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> > >
> > > Hi Dominik,
> > >
> > > I'm missing the "why" part here. What is it you are trying to address?
> > >
> > > do_futex is not currently in use outside of the futex implementation,
> > > while sys_futex is. This decouples the interface from the
> > > implementation. While this is perhaps less critical within the
> > > kernel, I don't see a compelling reason to increase the coupling
> > > between the mm and futex implementations.
> > >
> > > Without a compelling WHY, Nack from me.
> > >
> > 
> > We want to make some changes to the way that the syscall entry code
> > invokes syscalls, and these changes will make it impossible to call
> > sys_xyz() functions from the kernel.  So we can make sys_futex() be a
> > trivial wrapper around a new ksys_futex(), or we can do a patch like
> > this.
> 
> I dug up the cover letter and got the motivation and withdraw my
> objection. I understand the motivation to put the motivation in the
> cover letter in a large series, but I think there should have been
> something indicating the need for this change in the individual patches,
> even just a single line like Andy's first sentence above.

It's two lines, but I added

	This patch is part of a series which tries to remove in-kernel calls to
	syscalls. On this basis, the syscall entry path can be streamlined.

to all commits in these series which remove in-kernel calls to syscalls.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4()
  2018-03-16 16:58   ` Luis R. Rodriguez
@ 2018-03-17 16:44     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 16:44 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd

On Fri, Mar 16, 2018 at 04:58:31PM +0000, Luis R. Rodriguez wrote:
> On Thu, Mar 15, 2018 at 08:04:55PM +0100, Dominik Brodowski wrote:
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index 18e5fa4b0e71..f4b557cadf08 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -135,7 +135,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >  		 *
> >  		 * Thus the __user pointer cast is valid here.
> >  		 */
> > -		sys_wait4(pid, (int __user *)&ret, 0, NULL);
> > +		kernel_wait4(pid, (int __user *)&ret, 0, NULL);
> >  
> >  		/*
> >  		 * If ret is 0, either call_usermodehelper_exec_async failed and
> 
> There is also a reference to sys_wait4() usage on umh.c:
> 
>         /* If SIGCLD is ignored sys_wait4 won't populate the status. */         
>         kernel_sigaction(SIGCHLD, SIG_DFL);     
> 
> Does that remain true for kernel_wait4()? If so that comment should be updated
> as well.

Thanks, have updated the comment.

	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  2018-03-15 20:11   ` Arnd Bergmann
  2018-03-16  8:46     ` Christoph Hellwig
  2018-03-16 16:58     ` Linus Torvalds
@ 2018-03-17 16:52     ` Dominik Brodowski
  2 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 16:52 UTC (permalink / raw)
  To: Arnd Bergmann, Christoph Hellwig, Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro, Andy Lutomirski, Ingo Molnar,
	Andrew Morton

Arnd, Christoph,

thanks for your review. On the basis of the goal of this patch series, I
will stick with the "mindless" conversion for the time being, but add the
following caveat to the commit message:

	In the near future, all callers of ksys_mount() should be converted to call
	do_mount() directly.

Could that be a compromise?

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
  2018-03-16  8:47   ` Christoph Hellwig
@ 2018-03-17 16:58     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 16:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd

On Fri, Mar 16, 2018 at 01:47:50AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 08:05:08PM +0100, Dominik Brodowski wrote:
> > Using this helper allows us to avoid the in-kernel call to the sys_umount()
> > syscall.
> 
> kern_unmount, please.  And make it operate on kernel pointers please.

On the naming issue, see my other message from a few days ago. Concerning
the added complexity and the need-for-cleanup in init/do_mounts_initrd.c, I
have added the following comment:

	In the near future, the only fs-external caller of ksys_umount() should be
	converted to call do_umount() directly. Then, ksys_umount() can be moved
	within sys_umount() again.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
  2018-03-16  8:48   ` Christoph Hellwig
@ 2018-03-17 17:01     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 17:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd

On Fri, Mar 16, 2018 at 01:48:34AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 08:05:09PM +0100, Dominik Brodowski wrote:
> > Using ksys_dup() and ksys_dup3() as helper functions allows us to
> > avoid the in-kernel calls to the sys_dup() and sys_dup3() syscalls.
> 
> do_dup/dup3 or kern_dup/dup3, please.

On the naming issue, see my other message from a few days ago. Concerning
the added complexity and the need-for-cleanup in init/do_mounts_initrd.c
and init/main.c, I have added the following comment:

	In the near future, the fs-external callers of ksys_dup{,3}() should be
	converted to call do_dup2() directly. Then, ksys_dup{,3}() can be moved
	within sys_dup{,3}() again.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  2018-03-15 20:44   ` Arnd Bergmann
  2018-03-16  8:49     ` Christoph Hellwig
@ 2018-03-17 17:04     ` Dominik Brodowski
  1 sibling, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 17:04 UTC (permalink / raw)
  To: Arnd Bergmann, Christoph Hellwig
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

Arnd, Christoph,

On Thu, Mar 15, 2018 at 09:44:24PM +0100, Arnd Bergmann wrote:
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 4afb04686c8e..5743f04014ca 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -387,7 +387,7 @@ static int devtmpfsd(void *p)
> >         if (*err)
> >                 goto out;
> >         sys_chdir("/.."); /* will traverse into overmounted root */
> > -       sys_chroot(".");
> > +       ksys_chroot(".");
> >         complete(&setup_done);
> >         while (1) {
> >                 spin_lock(&req_lock);
> 
> Could this be done using kern_path()/set_fs_root() instead so we
> avoid the __user pointer?
> 
>        Arnd

On Fri, Mar 16, 2018 at 01:49:41AM -0700, Christoph Hellwig wrote:
> Agreed.  Especially as we don't need any of the permission checks here.

Thanks for your input. As re-working this code to use the vfs-internal
helpers would probably mean that the syscall-cleanup code has to wait
for another release cycle, I propose to address this issue with the
following paragraph in the commit message:

	In the near future, the fs-external callers of ksys_chroot() should be
	converted to use kern_path()/set_fs_root() directly. Then ksys_chroot()
	can be moved within sys_chroot() again.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write()
  2018-03-16  8:52   ` Christoph Hellwig
@ 2018-03-17 17:06     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 17:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, torvalds, viro, luto, mingo, akpm, arnd, linux-s390

On Fri, Mar 16, 2018 at 01:52:40AM -0700, Christoph Hellwig wrote:
> I really don't like this, as this is the wrong level of abstraction.
> 
> > diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> > index 79b7a3438d54..5a9cfde5fc28 100644
> > --- a/arch/s390/kernel/compat_linux.c
> > +++ b/arch/s390/kernel/compat_linux.c
> > @@ -468,7 +468,7 @@ COMPAT_SYSCALL_DEFINE3(s390_write, unsigned int, fd, const char __user *, buf, c
> >  	if ((compat_ssize_t) count < 0)
> >  		return -EINVAL; 
> >  
> > -	return sys_write(fd, buf, count);
> > +	return ksys_write(fd, buf, count);
> >  }
> 
> This looks bogus to me.  Why does s390 have its own compat version of
> write but not any of the other read and write familty calls?
> 
> > diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
> > index 99e0b649fc0e..2d365c398ccc 100644
> > --- a/init/do_mounts_rd.c
> > +++ b/init/do_mounts_rd.c
> > @@ -270,7 +270,7 @@ int __init rd_load_image(char *from)
> >  			printk("Loading disk #%d... ", disk);
> >  		}
> >  		sys_read(in_fd, buf, BLOCK_SIZE);
> > -		sys_write(out_fd, buf, BLOCK_SIZE);
> > +		ksys_write(out_fd, buf, BLOCK_SIZE);
> >  #if !defined(CONFIG_S390)
> >  		if (!(i % 16)) {
> >  			pr_cont("%c\b", rotator[rotate & 0x3]);
> 
> All the do_mounts / initramfs code should be rewritten to use filp_open
> and vfs_read/vfs_write instead of adding hacks like this.

In line with the other patches, I have added the following paragraph to the
commit message:

	In the near future, the do_mounts / initramfs callers of ksys_write()
	should be converted to use filp_open() and vfs_write() instead.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
  2018-03-15 20:21   ` Arnd Bergmann
@ 2018-03-17 17:09     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 17:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 09:21:39PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
> <linux@dominikbrodowski.net> wrote:
> > Using this wrapper allows us to avoid the in-kernel calls to the
> > sys_unlink() syscall.
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> > ---
> >  include/linux/syscalls.h | 11 +++++++++++
> >  init/do_mounts.h         |  2 +-
> >  init/do_mounts_initrd.c  |  4 ++--
> >  init/do_mounts_rd.c      |  2 +-
> >  init/initramfs.c         |  4 ++--
> >  5 files changed, 17 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 8f0f99702e7a..31aea3873de7 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -971,4 +971,15 @@ int ksys_chdir(const char __user *filename);
> >  int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
> >                          unsigned int flags);
> >
> > +/*
> > + * The following kernel syscall equivalents are just wrappers to fs-internal
> > + * functions. Therefore, provide stubs to be inlined at the callsites.
> > + */
> > +extern long do_unlinkat(int dfd, struct filename *name);
> > +
> > +static inline long ksys_unlink(const char __user *pathname)
> > +{
> > +       return do_unlinkat(AT_FDCWD, getname(pathname));
> > +}
> 
> Why does this take a __user pointer?
> 
> >  static inline int create_dev(char *name, dev_t dev)
> >  {
> > -       sys_unlink(name);
> > +       ksys_unlink(name);
> >         return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev));
> >  }
> >
> > diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
> > index c19d9070134e..784576b633fd 100644
> > --- a/init/do_mounts_initrd.c
> > +++ b/init/do_mounts_initrd.c
> > @@ -128,11 +128,11 @@ bool __init initrd_load(void)
> >                  * mounted in the normal path.
> >                  */
> >                 if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
> > -                       sys_unlink("/initrd.image");
> > +                       ksys_unlink("/initrd.image");
> >                         handle_initrd();
> >                         return true;
> >                 }
> >         }
> > -       sys_unlink("/initrd.image");
> > +       ksys_unlink("/initrd.image");
> >         return false;
> 
> In all callers we seem to have regular kernel strings, so I think
> you should skip the getname() and change the argument to
> a regular pointer.

In line with my explanations in other messages, I have added the following
paragraph to the commit message:
	
	In the near future, all callers of ksys_unlink() should be converted to
	call do_unlinkat() directly or, at least, to operate on regular kernel
	pointers.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall
  2018-03-15 20:30   ` Arnd Bergmann
@ 2018-03-17 17:11     ` Dominik Brodowski
  0 siblings, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 17:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 09:30:58PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 15, 2018 at 8:05 PM, Dominik Brodowski
> <linux@dominikbrodowski.net> wrote:
> >
> >   */
> > -SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
> > -               int, newdfd, const char __user *, newname, int, flags)
> > +int do_linkat(int olddfd, const char __user *oldname, int newdfd,
> > +             const char __user *newname, int flags)
> >  {
> >         struct dentry *new_dentry;
> >         struct path old_path, new_path;
> 
> For consistency with other do_*() functions, I think it would be nice
> to make this one not take a __user pointer either. However, I
> have no idea how to do that without making the common case worse.
> 
> > --- a/init/initramfs.c
> > +++ b/init/initramfs.c
> > @@ -306,7 +306,7 @@ static int __init maybe_link(void)
> >         if (nlink >= 2) {
> >                 char *old = find_link(major, minor, ino, mode, collected);
> >                 if (old)
> > -                       return (sys_link(old, collected) < 0) ? -1 : 1;
> > +                       return (ksys_link(old, collected) < 0) ? -1 : 1;
> >         }
> >         return 0;
> >  }
> 
> Since this is the only caller outside of fs/namei.c, maybe it can be
> changed to use vfs_link() instead? That might still be a larger rework
> than you want to do.

In line with what I propose generally for this series, I have added the
following paragraph to the commit message:

	In the near future, the only fs-external user of ksys_link() should be
	converted to use vfs_link() instead.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)
  2018-03-15 21:02 ` [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Arnd Bergmann
  2018-03-16  0:38   ` Andy Lutomirski
@ 2018-03-17 17:13   ` Dominik Brodowski
  1 sibling, 0 replies; 76+ messages in thread
From: Dominik Brodowski @ 2018-03-17 17:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, Linus Torvalds, Al Viro,
	Andy Lutomirski, Ingo Molnar, Andrew Morton

On Thu, Mar 15, 2018 at 10:02:04PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
> <linux@dominikbrodowski.net> wrote:
> > Here is a re-spin of the first set of patches which reduce the number of
> > syscall invocations from within the kernel; the RFC may be found at
> >
> > The rationale for this change is described in patch 1 as follows:
> >
> >         The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> >         and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> >         through kernel entry points, but not from the kernel itself. This
> >         will allow cleanups and optimizations to the entry paths *and* to
> >         the parts of the kernel code which currently need to pretend to be
> >         userspace in order to make use of syscalls.
> >
> > The whole series can be found at
> >
> >         https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next
> >
> > and will be submitted for merging for the v4.17-rc1 cycle, probably together
> > with another batch of related patches I hope to send out tomorrow as a RFC.
> 
> Nice work!
> 
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

Thanks for your review -- on this issue, please see my other messages.

> Another open question are the declarations in include/linux/syscalls.h.
> These serve as a help for type-checking today, making sure that
> each syscall we refer to from either the syscall table or called
> by some kernel function uses the same prototype that matches
> the syscall definition, which raises the question of whether we want
> to keep the header around at all.

Well, we do not want to call syscalls from other kernel functions, so that
issue will go away by means of these patchsets anyway. With regard to
type-checking the syscall table, we might want to keep the definitions
around and/or generate prototypes from SYSCALL_DEFINEx() directly.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2018-03-17 17:23 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 19:04 [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Dominik Brodowski
2018-03-15 19:04 ` [PATCH v2 01/36] syscalls: define goal to not call sys_xyzzy() from within the kernel Dominik Brodowski
2018-03-15 19:04 ` [PATCH v2 02/36] kernel: use kernel_wait4() instead of sys_wait4() Dominik Brodowski
2018-03-16 16:58   ` Luis R. Rodriguez
2018-03-17 16:44     ` Dominik Brodowski
2018-03-15 19:04 ` [PATCH v2 03/36] mm: use do_futex() instead of sys_futex() in mm_release() Dominik Brodowski
2018-03-16 11:58   ` Thomas Gleixner
2018-03-16 18:43   ` Darren Hart
2018-03-16 19:03     ` Andy Lutomirski
2018-03-16 21:44       ` Darren Hart
2018-03-17 16:39         ` Dominik Brodowski
2018-03-15 19:04 ` [PATCH v2 04/36] kernel: add do_getpgid() helper; remove internal call to sys_getpgid() Dominik Brodowski
2018-03-15 19:04 ` [PATCH v2 05/36] fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() Dominik Brodowski
2018-03-15 19:04 ` [PATCH v2 06/36] fs: add do_pipe2() helper; remove internal call to sys_pipe2() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 07/36] fs: add do_renameat2() helper; remove internal call to sys_renameat2() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 08/36] fs: add do_futimesat() helper; remove internal call to sys_futimesat() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 09/36] syscalls: add do_epoll_*() helpers; remove internal calls to sys_epoll_*() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 10/36] fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 11/36] fs: add do_eventfd() helper; remove internal call to sys_eventfd() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 12/36] kernel: open-code sys_rt_sigpending() in sys_sigpending() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 13/36] x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Dominik Brodowski
2018-03-16  8:43   ` Christoph Hellwig
2018-03-16 11:13     ` Dominik Brodowski
2018-03-16 12:00   ` Thomas Gleixner
2018-03-16 14:45     ` Dominik Brodowski
2018-03-16 14:47       ` Thomas Gleixner
2018-03-15 19:05 ` [PATCH v2 14/36] fs: add ksys_mount() helper; remove in-kernel calls to sys_mount() Dominik Brodowski
2018-03-15 20:11   ` Arnd Bergmann
2018-03-16  8:46     ` Christoph Hellwig
2018-03-16 16:58     ` Linus Torvalds
2018-03-17 16:52     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 15/36] fs: add ksys_umount() helper; remove in-kernel call to sys_umount() Dominik Brodowski
2018-03-16  8:47   ` Christoph Hellwig
2018-03-17 16:58     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 16/36] fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}() Dominik Brodowski
2018-03-16  8:48   ` Christoph Hellwig
2018-03-17 17:01     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 17/36] fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot() Dominik Brodowski
2018-03-15 20:44   ` Arnd Bergmann
2018-03-16  8:49     ` Christoph Hellwig
2018-03-17 17:04     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 18/36] fs: add ksys_write() helper; remove in-kernel calls to sys_write() Dominik Brodowski
2018-03-16  8:52   ` Christoph Hellwig
2018-03-17 17:06     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 19/36] kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 20/36] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 21/36] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
2018-03-15 20:54   ` Arnd Bergmann
2018-03-15 19:05 ` [PATCH v2 22/36] fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 23/36] fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 24/36] fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink() Dominik Brodowski
2018-03-15 20:21   ` Arnd Bergmann
2018-03-17 17:09     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 25/36] hostfs: rename do_rmdir() to hostfs_do_rmdir() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 26/36] fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 27/36] fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 28/36] fs: add do_symlinkat() helper and ksys_symlink() " Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 29/36] fs: add do_mknodat() helper and ksys_mknod() " Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 30/36] fs: add do_linkat() helper and ksys_link() " Dominik Brodowski
2018-03-15 20:30   ` Arnd Bergmann
2018-03-17 17:11     ` Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 31/36] fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() " Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 32/36] fs: add do_faccessat() helper and ksys_access() " Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 33/36] fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 34/36] fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 35/36] fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() Dominik Brodowski
2018-03-15 19:05 ` [PATCH v2 36/36] fs: add ksys_open() wrapper; remove in-kernel calls to sys_open() Dominik Brodowski
2018-03-15 21:02 ` [PATCH v2 00/36] remove in-kernel syscall invocations (part 1) Arnd Bergmann
2018-03-16  0:38   ` Andy Lutomirski
2018-03-16  0:54     ` Linus Torvalds
2018-03-16  8:54       ` Christoph Hellwig
2018-03-16 14:20         ` Al Viro
2018-03-16 16:47           ` Linus Torvalds
2018-03-17 17:13   ` Dominik Brodowski
2018-03-16  9:01 ` Zhang, Ning A
2018-03-16 10:25   ` Dominik Brodowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.