linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] seccomp: Improve performance by optimizing memory barrier
@ 2021-02-01 12:50 wanghongzhe
  2021-02-01 15:39 ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: wanghongzhe @ 2021-02-01 12:50 UTC (permalink / raw)
  To: keescook, luto, wad, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, linux-kernel, netdev, bpf
  Cc: wanghongzhe

If a thread(A)'s TSYNC flag is set from seccomp(), then it will
synchronize its seccomp filter to other threads(B) in same thread
group. To avoid race condition, seccomp puts rmb() between
reading the mode and filter in seccomp check patch(in B thread).
As a result, every syscall's seccomp check is slowed down by the
memory barrier.

However, we can optimize it by calling rmb() only when filter is
NULL and reading it again after the barrier, which means the rmb()
is called only once in thread lifetime.

The 'filter is NULL' conditon means that it is the first time
attaching filter and is by other thread(A) using TSYNC flag.
In this case, thread B may read the filter first and mode later
in CPU out-of-order exection. After this time, the thread B's
mode is always be set, and there will no race condition with the
filter/bitmap.

In addtion, we should puts a write memory barrier between writing
the filter and mode in smp_mb__before_atomic(), to avoid
the race condition in TSYNC case.

Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
---
 kernel/seccomp.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..b944cb2b6b94 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
 			READ_ONCE(current->seccomp.filter);
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
-	if (WARN_ON(f == NULL))
-		return SECCOMP_RET_KILL_PROCESS;
+	if (WARN_ON(f == NULL)) {
+		/*
+		 * Make sure the first filter addtion (from another
+		 * thread using TSYNC flag) are seen.
+		 */
+		rmb();
+		
+		/* Read again */
+		f = READ_ONCE(current->seccomp.filter);
+
+		/* Ensure unexpected behavior doesn't result in failing open. */
+		if (WARN_ON(f == NULL))
+			return SECCOMP_RET_KILL_PROCESS;
+	}
 
 	if (seccomp_cache_check_allow(f, sd))
 		return SECCOMP_RET_ALLOW;
@@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
 		 * equivalent (see ptrace_may_access), it is safe to
 		 * allow one thread to transition the other.
 		 */
-		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
+			/*
+			 * Make sure mode cannot be set before the filter
+			 * are set.
+			 */
+			smp_mb__before_atomic();
+
 			seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
 					    flags);
+		}
 	}
 }
 
@@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	int data;
 	struct seccomp_data sd_local;
 
-	/*
-	 * Make sure that any changes to mode from another thread have
-	 * been seen after SYSCALL_WORK_SECCOMP was seen.
-	 */
-	rmb();
-
 	if (!sd) {
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] seccomp: Improve performance by optimizing memory barrier
  2021-02-01 12:50 [PATCH] seccomp: Improve performance by optimizing memory barrier wanghongzhe
@ 2021-02-01 15:39 ` Andy Lutomirski
  2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
  2021-02-02 10:13   ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said wanghongzhe
  0 siblings, 2 replies; 8+ messages in thread
From: Andy Lutomirski @ 2021-02-01 15:39 UTC (permalink / raw)
  To: wanghongzhe
  Cc: keescook, wad, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, linux-kernel, netdev, bpf



> On Feb 1, 2021, at 4:06 AM, wanghongzhe <wanghongzhe@huawei.com> wrote:
> 
> If a thread(A)'s TSYNC flag is set from seccomp(), then it will
> synchronize its seccomp filter to other threads(B) in same thread
> group. To avoid race condition, seccomp puts rmb() between
> reading the mode and filter in seccomp check patch(in B thread).
> As a result, every syscall's seccomp check is slowed down by the
> memory barrier.
> 
> However, we can optimize it by calling rmb() only when filter is
> NULL and reading it again after the barrier, which means the rmb()
> is called only once in thread lifetime.
> 
> The 'filter is NULL' conditon means that it is the first time
> attaching filter and is by other thread(A) using TSYNC flag.
> In this case, thread B may read the filter first and mode later
> in CPU out-of-order exection. After this time, the thread B's
> mode is always be set, and there will no race condition with the
> filter/bitmap.
> 
> In addtion, we should puts a write memory barrier between writing
> the filter and mode in smp_mb__before_atomic(), to avoid
> the race condition in TSYNC case.

I haven’t fully worked this out, but rmb() is bogus. This should be smp_rmb().

> 
> Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
> ---
> kernel/seccomp.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 952dc1c90229..b944cb2b6b94 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>            READ_ONCE(current->seccomp.filter);
> 
>    /* Ensure unexpected behavior doesn't result in failing open. */
> -    if (WARN_ON(f == NULL))
> -        return SECCOMP_RET_KILL_PROCESS;
> +    if (WARN_ON(f == NULL)) {
> +        /*
> +         * Make sure the first filter addtion (from another
> +         * thread using TSYNC flag) are seen.
> +         */
> +        rmb();
> +        
> +        /* Read again */
> +        f = READ_ONCE(current->seccomp.filter);
> +
> +        /* Ensure unexpected behavior doesn't result in failing open. */
> +        if (WARN_ON(f == NULL))
> +            return SECCOMP_RET_KILL_PROCESS;
> +    }
> 
>    if (seccomp_cache_check_allow(f, sd))
>        return SECCOMP_RET_ALLOW;
> @@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
>         * equivalent (see ptrace_may_access), it is safe to
>         * allow one thread to transition the other.
>         */
> -        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
> +        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
> +            /*
> +             * Make sure mode cannot be set before the filter
> +             * are set.
> +             */
> +            smp_mb__before_atomic();
> +
>            seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
>                        flags);
> +        }
>    }
> }
> 
> @@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>    int data;
>    struct seccomp_data sd_local;
> 
> -    /*
> -     * Make sure that any changes to mode from another thread have
> -     * been seen after SYSCALL_WORK_SECCOMP was seen.
> -     */
> -    rmb();
> -
>    if (!sd) {
>        populate_seccomp_data(&sd_local);
>        sd = &sd_local;
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] seccomp: Improve performance by optimizing memory barrier
  2021-02-01 15:39 ` Andy Lutomirski
@ 2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
  2021-02-02 10:13   ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said wanghongzhe
  1 sibling, 0 replies; 8+ messages in thread
From: Wanghongzhe (Hongzhe, EulerOS) @ 2021-02-02  1:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: keescook, wad, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, linux-kernel, netdev, bpf


>> On Feb 1, 2021, at 4:06 AM, wanghongzhe <wanghongzhe@huawei.com> wrote:
>> 
>> If a thread(A)'s TSYNC flag is set from seccomp(), then it will 
>> synchronize its seccomp filter to other threads(B) in same thread 
>> group. To avoid race condition, seccomp puts rmb() between reading the 
>> mode and filter in seccomp check patch(in B thread).
>> As a result, every syscall's seccomp check is slowed down by the 
>> memory barrier.
>> 
>> However, we can optimize it by calling rmb() only when filter is NULL 
>> and reading it again after the barrier, which means the rmb() is 
>> called only once in thread lifetime.
>> 
>> The 'filter is NULL' conditon means that it is the first time 
>> attaching filter and is by other thread(A) using TSYNC flag.
>> In this case, thread B may read the filter first and mode later in CPU 
>> out-of-order exection. After this time, the thread B's mode is always 
>> be set, and there will no race condition with the filter/bitmap.
>> 
>> In addtion, we should puts a write memory barrier between writing the 
>> filter and mode in smp_mb__before_atomic(), to avoid the race 
>> condition in TSYNC case.
>
> I haven’t fully worked this out, but rmb() is bogus. This should be smp_rmb().

Yes, I think you are right.I will fix it and send another patch.
>> 
>> Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
>> ---
>> kernel/seccomp.c | 31 ++++++++++++++++++++++---------
>> 1 file changed, 22 insertions(+), 9 deletions(-)
>> 
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 
>> 952dc1c90229..b944cb2b6b94 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>>            READ_ONCE(current->seccomp.filter);
>> 
>>    /* Ensure unexpected behavior doesn't result in failing open. */
>> -    if (WARN_ON(f == NULL))
>> -        return SECCOMP_RET_KILL_PROCESS;
>> +    if (WARN_ON(f == NULL)) {
>> +        /*
>> +         * Make sure the first filter addtion (from another
>> +         * thread using TSYNC flag) are seen.
>> +         */
>> +        rmb();
>> +        
>> +        /* Read again */
>> +        f = READ_ONCE(current->seccomp.filter);
>> +
>> +        /* Ensure unexpected behavior doesn't result in failing open. */
>> +        if (WARN_ON(f == NULL))
>> +            return SECCOMP_RET_KILL_PROCESS;
>> +    }
>> 
>>    if (seccomp_cache_check_allow(f, sd))
>>        return SECCOMP_RET_ALLOW;
>> @@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
>>         * equivalent (see ptrace_may_access), it is safe to
>>         * allow one thread to transition the other.
>>         */
>> -        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
>> +        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
>> +            /*
>> +             * Make sure mode cannot be set before the filter
>> +             * are set.
>> +             */
>> +            smp_mb__before_atomic();
>> +
>>            seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
>>                        flags);
>> +        }
>>    }
>> }
>> 
>> @@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>>    int data;
>>    struct seccomp_data sd_local;
>> 
>> -    /*
>> -     * Make sure that any changes to mode from another thread have
>> -     * been seen after SYSCALL_WORK_SECCOMP was seen.
>> -     */
>> -    rmb();
>> -
>>    if (!sd) {
>>        populate_seccomp_data(&sd_local);
>>        sd = &sd_local;
>> --
>> 2.19.1
>> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said.
  2021-02-01 15:39 ` Andy Lutomirski
  2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
@ 2021-02-02 10:13   ` wanghongzhe
  2021-02-02 11:53     ` Greg KH
                       ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: wanghongzhe @ 2021-02-02 10:13 UTC (permalink / raw)
  To: luto
  Cc: andrii, ast, bpf, daniel, john.fastabend, kafai, keescook,
	kpsingh, linux-kernel, netdev, songliubraving, wad, wanghongzhe,
	yhs

Secondly, the smp_rmb() should be put between reading SYSCALL_WORK_SECCOMP and reading
seccomp.mode, not between reading seccomp.mode and seccomp->filter, to make
sure that any changes to mode from another thread have been seen after
SYSCALL_WORK_SECCOMP was seen, as the original comment shown. This issue seems to be
misintroduced at 13aa72f0fd0a9f98a41cefb662487269e2f1ad65 which aims to
refactor the filter callback and the API. So the intuitive solution is to put
it back like:

Thirdly, however, we can go further to improve the performace of checking
syscall, considering that smp_rmb is always executed on the syscall-check
path at each time for both FILTER and STRICT check while the TSYNC case
which may lead to race condition is just a rare situation, and that in
some arch like Arm64 smp_rmb is dsb(ishld) not a cheap barrier() in x86-64.

As a result, smp_rmb() should only be executed when necessary, e.g, it is
only necessary when current thread's mode is SECCOMP_MODE_DISABLED at the
first TYSNCed time, because after that the current thread's mode will always
be SECCOMP_MODE_FILTER (and SYSCALL_WORK_SECCOMP will always be set) and can not be
changed anymore by anyone. In other words, after that, any thread can not
change the mode (and SYSCALL_WORK_SECCOMP), so the race condition disappeared, and
no more smb_rmb() needed ever.

So the solution is to read mode again behind smp_rmb() after the mode is seen
as SECCOMP_MODE_DISABLED by current thread at the first TSYNCed time, and if
the new mode don't equals to SECCOMP_MODE_FILTER, do BUG(), go to FILTER path
otherwise.

RFC -> v1:
 - replace rmb() with smp_rmb()
 - move the smp_rmb() logic to the middle between SYSCALL_WORK_SECCOMP and mode

Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
---
 kernel/seccomp.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..a621fb913ec6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1160,12 +1160,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	int data;
 	struct seccomp_data sd_local;
 
-	/*
-	 * Make sure that any changes to mode from another thread have
-	 * been seen after SYSCALL_WORK_SECCOMP was seen.
-	 */
-	rmb();
-
 	if (!sd) {
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
@@ -1289,7 +1283,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 int __secure_computing(const struct seccomp_data *sd)
 {
-	int mode = current->seccomp.mode;
 	int this_syscall;
 
 	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
@@ -1299,10 +1292,26 @@ int __secure_computing(const struct seccomp_data *sd)
 	this_syscall = sd ? sd->nr :
 		syscall_get_nr(current, current_pt_regs());
 
-	switch (mode) {
+	/*
+	 * Make sure that any changes to mode from another thread have
+	 * been seen after SYSCALL_WORK_SECCOMP was seen.
+	 */
+	smp_rmb();
+
+	switch (current->seccomp.mode) {
 	case SECCOMP_MODE_STRICT:
 		__secure_computing_strict(this_syscall);  /* may call do_exit */
 		return 0;
+	/*
+	 * Make sure that change to mode (from SECCOMP_MODE_DISABLED to
+	 * SECCOMP_MODE_FILTER) from another thread using TSYNC ability
+	 * have been seen after SYSCALL_WORK_SECCOMP was seen. Read mode again behind
+	 * smp_rmb(), if it equals SECCOMP_MODE_FILTER, go to the right path.
+	 */
+	case SECCOMP_MODE_DISABLED:
+		smp_rmb();
+		if (unlikely(current->seccomp.mode != SECCOMP_MODE_FILTER))
+			BUG();
 	case SECCOMP_MODE_FILTER:
 		return __seccomp_filter(this_syscall, sd, false);
 	default:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said.
  2021-02-02 10:13   ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said wanghongzhe
@ 2021-02-02 11:53     ` Greg KH
  2021-02-02 14:01     ` kernel test robot
  2021-02-02 19:02     ` Kees Cook
  2 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2021-02-02 11:53 UTC (permalink / raw)
  To: wanghongzhe
  Cc: luto, andrii, ast, bpf, daniel, john.fastabend, kafai, keescook,
	kpsingh, linux-kernel, netdev, songliubraving, wad, yhs

On Tue, Feb 02, 2021 at 06:13:07PM +0800, wanghongzhe wrote:
> Secondly, the smp_rmb() should be put between reading SYSCALL_WORK_SECCOMP and reading

<snip>

Your subject line of the patch is a bit odd :)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said.
  2021-02-02 10:13   ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said wanghongzhe
  2021-02-02 11:53     ` Greg KH
@ 2021-02-02 14:01     ` kernel test robot
  2021-02-02 19:02     ` Kees Cook
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2021-02-02 14:01 UTC (permalink / raw)
  To: wanghongzhe, luto
  Cc: kbuild-all, andrii, ast, bpf, daniel, john.fastabend, kafai,
	keescook, kpsingh, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3377 bytes --]

Hi wanghongzhe,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.11-rc6]
[also build test WARNING on next-20210125]
[cannot apply to kees/for-next/seccomp]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/wanghongzhe/Firstly-as-Andy-mentioned-this-should-be-smp_rmb-instead-of-rmb-considering-that-TSYNC-is-a-cross-thread-situation-and-r/20210202-173311
base:    1048ba83fb1c00cd24172e23e8263972f6b5d9ac
config: i386-randconfig-s001-20210202 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/f79414957fc8acb6b680bbcd26fa987328a5724a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review wanghongzhe/Firstly-as-Andy-mentioned-this-should-be-smp_rmb-instead-of-rmb-considering-that-TSYNC-is-a-cross-thread-situation-and-r/20210202-173311
        git checkout f79414957fc8acb6b680bbcd26fa987328a5724a
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/seccomp.c: In function '__secure_computing':
>> kernel/seccomp.c:1313:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    1313 |   if (unlikely(current->seccomp.mode != SECCOMP_MODE_FILTER))
         |      ^
   kernel/seccomp.c:1315:2: note: here
    1315 |  case SECCOMP_MODE_FILTER:
         |  ^~~~


vim +1313 kernel/seccomp.c

  1283	
  1284	int __secure_computing(const struct seccomp_data *sd)
  1285	{
  1286		int this_syscall;
  1287	
  1288		if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
  1289		    unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
  1290			return 0;
  1291	
  1292		this_syscall = sd ? sd->nr :
  1293			syscall_get_nr(current, current_pt_regs());
  1294	
  1295		/*
  1296		 * Make sure that any changes to mode from another thread have
  1297		 * been seen after SYSCALL_WORK_SECCOMP was seen.
  1298		 */
  1299		smp_rmb();
  1300	
  1301		switch (current->seccomp.mode) {
  1302		case SECCOMP_MODE_STRICT:
  1303			__secure_computing_strict(this_syscall);  /* may call do_exit */
  1304			return 0;
  1305		/*
  1306		 * Make sure that change to mode (from SECCOMP_MODE_DISABLED to
  1307		 * SECCOMP_MODE_FILTER) from another thread using TSYNC ability
  1308		 * have been seen after SYSCALL_WORK_SECCOMP was seen. Read mode again behind
  1309		 * smp_rmb(), if it equals SECCOMP_MODE_FILTER, go to the right path.
  1310		 */
  1311		case SECCOMP_MODE_DISABLED:
  1312			smp_rmb();
> 1313			if (unlikely(current->seccomp.mode != SECCOMP_MODE_FILTER))
  1314				BUG();
  1315		case SECCOMP_MODE_FILTER:
  1316			return __seccomp_filter(this_syscall, sd, false);
  1317		default:
  1318			BUG();
  1319		}
  1320	}
  1321	#endif /* CONFIG_HAVE_ARCH_SECCOMP_FILTER */
  1322	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35554 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said.
  2021-02-02 10:13   ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said wanghongzhe
  2021-02-02 11:53     ` Greg KH
  2021-02-02 14:01     ` kernel test robot
@ 2021-02-02 19:02     ` Kees Cook
  2021-02-04  8:45       ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers imp Wanghongzhe (Hongzhe, EulerOS)
  2 siblings, 1 reply; 8+ messages in thread
From: Kees Cook @ 2021-02-02 19:02 UTC (permalink / raw)
  To: wanghongzhe
  Cc: luto, andrii, ast, bpf, daniel, john.fastabend, kafai, kpsingh,
	linux-kernel, netdev, songliubraving, wad, yhs

On Tue, Feb 02, 2021 at 06:13:07PM +0800, wanghongzhe wrote:
> Secondly, the smp_rmb() should be put between reading SYSCALL_WORK_SECCOMP and reading
> seccomp.mode, not between reading seccomp.mode and seccomp->filter, to make
> sure that any changes to mode from another thread have been seen after
> SYSCALL_WORK_SECCOMP was seen, as the original comment shown. This issue seems to be
> misintroduced at 13aa72f0fd0a9f98a41cefb662487269e2f1ad65 which aims to
> refactor the filter callback and the API. So the intuitive solution is to put
> it back like:
> 
> Thirdly, however, we can go further to improve the performace of checking
> syscall, considering that smp_rmb is always executed on the syscall-check
> path at each time for both FILTER and STRICT check while the TSYNC case
> which may lead to race condition is just a rare situation, and that in
> some arch like Arm64 smp_rmb is dsb(ishld) not a cheap barrier() in x86-64.
> 
> As a result, smp_rmb() should only be executed when necessary, e.g, it is
> only necessary when current thread's mode is SECCOMP_MODE_DISABLED at the
> first TYSNCed time, because after that the current thread's mode will always
> be SECCOMP_MODE_FILTER (and SYSCALL_WORK_SECCOMP will always be set) and can not be
> changed anymore by anyone. In other words, after that, any thread can not
> change the mode (and SYSCALL_WORK_SECCOMP), so the race condition disappeared, and
> no more smb_rmb() needed ever.
> 
> So the solution is to read mode again behind smp_rmb() after the mode is seen
> as SECCOMP_MODE_DISABLED by current thread at the first TSYNCed time, and if
> the new mode don't equals to SECCOMP_MODE_FILTER, do BUG(), go to FILTER path
> otherwise.
> 
> RFC -> v1:
>  - replace rmb() with smp_rmb()
>  - move the smp_rmb() logic to the middle between SYSCALL_WORK_SECCOMP and mode
> 
> Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  kernel/seccomp.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 952dc1c90229..a621fb913ec6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1160,12 +1160,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>  	int data;
>  	struct seccomp_data sd_local;
>  
> -	/*
> -	 * Make sure that any changes to mode from another thread have
> -	 * been seen after SYSCALL_WORK_SECCOMP was seen.
> -	 */
> -	rmb();
> -
>  	if (!sd) {
>  		populate_seccomp_data(&sd_local);
>  		sd = &sd_local;
> @@ -1289,7 +1283,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>  
>  int __secure_computing(const struct seccomp_data *sd)
>  {
> -	int mode = current->seccomp.mode;
>  	int this_syscall;
>  
>  	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> @@ -1299,10 +1292,26 @@ int __secure_computing(const struct seccomp_data *sd)
>  	this_syscall = sd ? sd->nr :
>  		syscall_get_nr(current, current_pt_regs());
>  
> -	switch (mode) {
> +	/*
> +	 * Make sure that any changes to mode from another thread have
> +	 * been seen after SYSCALL_WORK_SECCOMP was seen.
> +	 */
> +	smp_rmb();

Let's start with a patch that just replaces rmb() with smp_rmb() and
then work on optimizing. Can you provide performance numbers that show
rmb() (and soon smp_rmb()) is causing actual problems here?

> +
> +	switch (current->seccomp.mode) {
>  	case SECCOMP_MODE_STRICT:
>  		__secure_computing_strict(this_syscall);  /* may call do_exit */
>  		return 0;
> +	/*
> +	 * Make sure that change to mode (from SECCOMP_MODE_DISABLED to
> +	 * SECCOMP_MODE_FILTER) from another thread using TSYNC ability
> +	 * have been seen after SYSCALL_WORK_SECCOMP was seen. Read mode again behind
> +	 * smp_rmb(), if it equals SECCOMP_MODE_FILTER, go to the right path.
> +	 */
> +	case SECCOMP_MODE_DISABLED:
> +		smp_rmb();
> +		if (unlikely(current->seccomp.mode != SECCOMP_MODE_FILTER))
> +			BUG();

BUG() should never be used[1]. This is a recoverable situation, I think, and
should be handled as such.

-Kees

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on

>  	case SECCOMP_MODE_FILTER:
>  		return __seccomp_filter(this_syscall, sd, false);
>  	default:
> -- 
> 2.19.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers imp...
  2021-02-02 19:02     ` Kees Cook
@ 2021-02-04  8:45       ` Wanghongzhe (Hongzhe, EulerOS)
  0 siblings, 0 replies; 8+ messages in thread
From: Wanghongzhe (Hongzhe, EulerOS) @ 2021-02-04  8:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: luto, andrii, ast, bpf, daniel, john.fastabend, kafai, kpsingh,
	linux-kernel, netdev, songliubraving, wad, yhs

> Let's start with a patch that just replaces rmb() with smp_rmb() and then work
> on optimizing. Can you provide performance numbers that show
> rmb() (and soon smp_rmb()) is causing actual problems here?
Ok, I will send a patch that just replaces rmb() with smp_rmb() and give performance numbers.

> BUG() should never be used[1]. This is a recoverable situation, I think, and
> should be handled as such.

I just follow the default case behind. Let's discuss this issue in next patches. 

-- 
wanghongzhe

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-04  8:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 12:50 [PATCH] seccomp: Improve performance by optimizing memory barrier wanghongzhe
2021-02-01 15:39 ` Andy Lutomirski
2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
2021-02-02 10:13   ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers impose unnecessary overhead on both SMP and UP systems, as kernel Documentation said wanghongzhe
2021-02-02 11:53     ` Greg KH
2021-02-02 14:01     ` kernel test robot
2021-02-02 19:02     ` Kees Cook
2021-02-04  8:45       ` [PATCH v1 1/1] Firstly, as Andy mentioned, this should be smp_rmb() instead of rmb(). considering that TSYNC is a cross-thread situation, and rmb() is a mandatory barrier which should not be used to control SMP effects, since mandatory barriers imp Wanghongzhe (Hongzhe, EulerOS)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).