All of lore.kernel.org
 help / color / mirror / Atom feed
From: wanghongzhe <wanghongzhe@huawei.com>
To: <luto@amacapital.net>
Cc: <andrii@kernel.org>, <ast@kernel.org>, <bpf@vger.kernel.org>,
	<daniel@iogearbox.net>, <john.fastabend@gmail.com>,
	<kafai@fb.com>, <keescook@chromium.org>, <kpsingh@kernel.org>,
	<linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>,
	<songliubraving@fb.com>, <wad@chromium.org>,
	<wanghongzhe@huawei.com>, <yhs@fb.com>
Subject: [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said.
Date: Tue, 2 Feb 2021 18:13:07 +0800	[thread overview]
Message-ID: <1612260787-28015-1-git-send-email-wanghongzhe@huawei.com> (raw)
In-Reply-To: <B1DC6A42-15AF-4804-B20E-FC6E2BDD1C8E@amacapital.net>

Secondly, the smp_rmb() should be put between reading SYSCALL_WORK_SECCOMP and reading
seccomp.mode, not between reading seccomp.mode and seccomp->filter, to make
sure that any changes to mode from another thread have been seen after
SYSCALL_WORK_SECCOMP was seen, as the original comment shown. This issue seems to be
misintroduced at 13aa72f0fd0a9f98a41cefb662487269e2f1ad65 which aims to
refactor the filter callback and the API. So the intuitive solution is to put
it back like:

Thirdly, however, we can go further to improve the performace of checking
syscall, considering that smp_rmb is always executed on the syscall-check
path at each time for both FILTER and STRICT check while the TSYNC case
which may lead to race condition is just a rare situation, and that in
some arch like Arm64 smp_rmb is dsb(ishld) not a cheap barrier() in x86-64.

As a result, smp_rmb() should only be executed when necessary, e.g, it is
only necessary when current thread's mode is SECCOMP_MODE_DISABLED at the
first TYSNCed time, because after that the current thread's mode will always
be SECCOMP_MODE_FILTER (and SYSCALL_WORK_SECCOMP will always be set) and can not be
changed anymore by anyone. In other words, after that, any thread can not
change the mode (and SYSCALL_WORK_SECCOMP), so the race condition disappeared, and
no more smb_rmb() needed ever.

So the solution is to read mode again behind smp_rmb() after the mode is seen
as SECCOMP_MODE_DISABLED by current thread at the first TSYNCed time, and if
the new mode don't equals to SECCOMP_MODE_FILTER, do BUG(), go to FILTER path
otherwise.

RFC -> v1:
 - replace rmb() with smp_rmb()
 - move the smp_rmb() logic to the middle between SYSCALL_WORK_SECCOMP and mode

Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
---
 kernel/seccomp.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..a621fb913ec6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1160,12 +1160,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	int data;
 	struct seccomp_data sd_local;
 
-	/*
-	 * Make sure that any changes to mode from another thread have
-	 * been seen after SYSCALL_WORK_SECCOMP was seen.
-	 */
-	rmb();
-
 	if (!sd) {
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
@@ -1289,7 +1283,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 int __secure_computing(const struct seccomp_data *sd)
 {
-	int mode = current->seccomp.mode;
 	int this_syscall;
 
 	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
@@ -1299,10 +1292,26 @@ int __secure_computing(const struct seccomp_data *sd)
 	this_syscall = sd ? sd->nr :
 		syscall_get_nr(current, current_pt_regs());
 
-	switch (mode) {
+	/*
+	 * Make sure that any changes to mode from another thread have
+	 * been seen after SYSCALL_WORK_SECCOMP was seen.
+	 */
+	smp_rmb();
+
+	switch (current->seccomp.mode) {
 	case SECCOMP_MODE_STRICT:
 		__secure_computing_strict(this_syscall);  /* may call do_exit */
 		return 0;
+	/*
+	 * Make sure that change to mode (from SECCOMP_MODE_DISABLED to
+	 * SECCOMP_MODE_FILTER) from another thread using TSYNC ability
+	 * have been seen after SYSCALL_WORK_SECCOMP was seen. Read mode again behind
+	 * smp_rmb(), if it equals SECCOMP_MODE_FILTER, go to the right path.
+	 */
+	case SECCOMP_MODE_DISABLED:
+		smp_rmb();
+		if (unlikely(current->seccomp.mode != SECCOMP_MODE_FILTER))
+			BUG();
 	case SECCOMP_MODE_FILTER:
 		return __seccomp_filter(this_syscall, sd, false);
 	default:
-- 
2.19.1


  parent reply	other threads:[~2021-02-02  9:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01 12:50 [PATCH] seccomp: Improve performance by optimizing memory barrier wanghongzhe
2021-02-01 15:39 ` Andy Lutomirski
2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
2021-02-02 10:13   ` wanghongzhe [this message]
2021-02-02 11:53     ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said Greg KH
2021-02-02 14:01     ` kernel test robot
2021-02-02 14:01       ` kernel test robot
2021-02-02 19:02     ` Kees Cook
2021-02-04  8:45       ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers imp Wanghongzhe (Hongzhe, EulerOS)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1612260787-28015-1-git-send-email-wanghongzhe@huawei.com \
    --to=wanghongzhe@huawei.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=keescook@chromium.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=wad@chromium.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.