From: Dominik Brodowski <linux@dominikbrodowski.net> To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>, viro@zeniv.linux.org.uk, x86@kernel.org, torvalds@linux-foundation.org, mingo@kernel.org, tglx@linutronix.de, luto@amacapital.net Subject: [PATCH] syscalls: define and explain goal to not call syscalls in the kernel Date: Sun, 25 Mar 2018 18:25:27 +0200 Message-ID: <20180325162527.GA17492@light.dominikbrodowski.net> (raw) The syscall entry points to the kernel defined by SYSCALL_DEFINEx() and COMPAT_SYSCALL_DEFINEx() should only be called from userspace through kernel entry points, but not from the kernel itself. This will allow cleanups and optimizations to the entry paths *and* to the parts of the kernel code which currently need to pretend to be userspace in order to make use of syscalls. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> --- As there have been multiple inquiries on the rationale of my patchsets removing in-kernel calls to sys_xyzzy(), here is an updated patch 01/NN which I will push upstream for v4.17-rc1. I will also include a reference to this mail (and therefore to the explanation below) in all related patches of the series. Any improvements, hints, suggestions, spelling fixes, and/or objections? Thanks, Dominik diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst index 8cc25a06f353..556613744556 100644 --- a/Documentation/process/adding-syscalls.rst +++ b/Documentation/process/adding-syscalls.rst @@ -487,6 +487,38 @@ patchset, for the convenience of reviewers. The man page should be cc'ed to linux-man@vger.kernel.org For more details, see https://www.kernel.org/doc/man-pages/patches.html + +Do not call System Calls in the Kernel +-------------------------------------- + +System calls are, as stated above, interaction points between userspace and +the kernel. Therefore, system call functions such as ``sys_xyzzy()`` or +``compat_sys_xyzzy()`` should only be called from userspace via the syscall +table, but not from elsewhere in the kernel. If the syscall functionality is +useful to be used within the kernel, needs to be shared between an old and a +new syscall, or needs to be shared between a syscall and its compatibility +variant, it should be implemented by means of a "helper" function (such as +``kern_xyzzy()``). This kernel function may then be called within the +syscall stub (``sys_xyzzy()``), the compatibility syscall stub +(``compat_sys_xyzzy()``), and/or other kernel code. + +At least on 64-bit x86, it will be a hard requirement from v4.17 onwards to not +call system call functions in the kernel. It uses a different calling +convention for system calls where ``struct pt_regs`` is decoded on-the-fly in a +syscall wrapper which then hands processing over to the actual syscall function. +This means that only those parameters which are actually needed for a specific +syscall are passed on during syscall entry, instead of filling in six CPU +registers with random user space content all the time (which may cause serious +trouble down the call chain). + +Moreover, rules on how data may be accessed may differ between kernel data and +user data. This is another reason why calling ``sys_xyzzy()`` is generally a +bad idea. + +Exceptions to this rule are only allowed in architecture-specific overrides, +architecture-specific compatibility wrappers, or other code in arch/. + + References and Sources ---------------------- diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a78186d826d7..0526286a0314 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -941,4 +941,11 @@ asmlinkage long sys_pkey_free(int pkey); asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer); + +/* + * Kernel code should not call syscalls (i.e., sys_xyzyyz()) directly. + * Instead, use one of the functions which work equivalently, such as + * the ksys_xyzyyz() functions prototyped below. + */ + #endif -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply index Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-25 16:25 Dominik Brodowski [this message] 2018-03-30 15:35 ` Jonathan Corbet 2018-03-30 18:31 ` Dominik Brodowski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180325162527.GA17492@light.dominikbrodowski.net \ --to=linux@dominikbrodowski.net \ --cc=corbet@lwn.net \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@amacapital.net \ --cc=mingo@kernel.org \ --cc=tglx@linutronix.de \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-Doc Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \ linux-doc@vger.kernel.org public-inbox-index linux-doc Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc AGPL code for this site: git clone https://public-inbox.org/public-inbox.git