linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] seccomp: Improve performance by optimizing memory barrier
@ 2021-02-01 12:49 wanghongzhe
  2021-02-08  6:43 ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: wanghongzhe @ 2021-02-01 12:49 UTC (permalink / raw)
  To: keescook, luto, wad, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, linux-kernel, netdev, bpf

If a thread(A)'s TSYNC flag is set from seccomp(), then it will
synchronize its seccomp filter to other threads(B) in same thread
group. To avoid race condition, seccomp puts rmb() between
reading the mode and filter in seccomp check patch(in B thread).
As a result, every syscall's seccomp check is slowed down by the
memory barrier.

However, we can optimize it by calling rmb() only when filter is
NULL and reading it again after the barrier, which means the rmb()
is called only once in thread lifetime.

The 'filter is NULL' conditon means that it is the first time
attaching filter and is by other thread(A) using TSYNC flag.
In this case, thread B may read the filter first and mode later
in CPU out-of-order exection. After this time, the thread B's
mode is always be set, and there will no race condition with the
filter/bitmap.

In addtion, we should puts a write memory barrier between writing
the filter and mode in smp_mb__before_atomic(), to avoid
the race condition in TSYNC case.

Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
---
 kernel/seccomp.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..b944cb2b6b94 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
 			READ_ONCE(current->seccomp.filter);
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
-	if (WARN_ON(f == NULL))
-		return SECCOMP_RET_KILL_PROCESS;
+	if (WARN_ON(f == NULL)) {
+		/*
+		 * Make sure the first filter addtion (from another
+		 * thread using TSYNC flag) are seen.
+		 */
+		rmb();
+		
+		/* Read again */
+		f = READ_ONCE(current->seccomp.filter);
+
+		/* Ensure unexpected behavior doesn't result in failing open. */
+		if (WARN_ON(f == NULL))
+			return SECCOMP_RET_KILL_PROCESS;
+	}
 
 	if (seccomp_cache_check_allow(f, sd))
 		return SECCOMP_RET_ALLOW;
@@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
 		 * equivalent (see ptrace_may_access), it is safe to
 		 * allow one thread to transition the other.
 		 */
-		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
+			/*
+			 * Make sure mode cannot be set before the filter
+			 * are set.
+			 */
+			smp_mb__before_atomic();
+
 			seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
 					    flags);
+		}
 	}
 }
 
@@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	int data;
 	struct seccomp_data sd_local;
 
-	/*
-	 * Make sure that any changes to mode from another thread have
-	 * been seen after SYSCALL_WORK_SECCOMP was seen.
-	 */
-	rmb();
-
 	if (!sd) {
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] seccomp: Improve performance by optimizing memory barrier
  2021-02-01 12:49 [PATCH] seccomp: Improve performance by optimizing memory barrier wanghongzhe
@ 2021-02-08  6:43 ` Leon Romanovsky
  2021-02-08  7:10   ` Wanghongzhe (Hongzhe, EulerOS)
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2021-02-08  6:43 UTC (permalink / raw)
  To: wanghongzhe
  Cc: keescook, luto, wad, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, linux-kernel, netdev, bpf

On Mon, Feb 01, 2021 at 08:49:41PM +0800, wanghongzhe wrote:
> If a thread(A)'s TSYNC flag is set from seccomp(), then it will
> synchronize its seccomp filter to other threads(B) in same thread
> group. To avoid race condition, seccomp puts rmb() between
> reading the mode and filter in seccomp check patch(in B thread).
> As a result, every syscall's seccomp check is slowed down by the
> memory barrier.
>
> However, we can optimize it by calling rmb() only when filter is
> NULL and reading it again after the barrier, which means the rmb()
> is called only once in thread lifetime.
>
> The 'filter is NULL' conditon means that it is the first time
> attaching filter and is by other thread(A) using TSYNC flag.
> In this case, thread B may read the filter first and mode later
> in CPU out-of-order exection. After this time, the thread B's
> mode is always be set, and there will no race condition with the
> filter/bitmap.
>
> In addtion, we should puts a write memory barrier between writing
> the filter and mode in smp_mb__before_atomic(), to avoid
> the race condition in TSYNC case.
>
> Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
> ---
>  kernel/seccomp.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 952dc1c90229..b944cb2b6b94 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>  			READ_ONCE(current->seccomp.filter);
>
>  	/* Ensure unexpected behavior doesn't result in failing open. */
> -	if (WARN_ON(f == NULL))
> -		return SECCOMP_RET_KILL_PROCESS;
> +	if (WARN_ON(f == NULL)) {
> +		/*
> +		 * Make sure the first filter addtion (from another
> +		 * thread using TSYNC flag) are seen.
> +		 */
> +		rmb();
> +
> +		/* Read again */
> +		f = READ_ONCE(current->seccomp.filter);
> +
> +		/* Ensure unexpected behavior doesn't result in failing open. */
> +		if (WARN_ON(f == NULL))
> +			return SECCOMP_RET_KILL_PROCESS;
> +	}

IMHO, double WARN_ON() for the fallback flow is too much.
Also according to the description, this "f == NULL" check is due to
races and not programming error which WARN_ON() are intended to catch.

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] seccomp: Improve performance by optimizing memory barrier
  2021-02-08  6:43 ` Leon Romanovsky
@ 2021-02-08  7:10   ` Wanghongzhe (Hongzhe, EulerOS)
  0 siblings, 0 replies; 6+ messages in thread
From: Wanghongzhe (Hongzhe, EulerOS) @ 2021-02-08  7:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: keescook, luto, wad, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, linux-kernel, netdev, bpf

> From: Leon Romanovsky [mailto:leon@kernel.org]
> Sent: Monday, February 8, 2021 2:44 PM
> To: Wanghongzhe (Hongzhe, EulerOS) <wanghongzhe@huawei.com>
> Cc: keescook@chromium.org; luto@amacapital.net; wad@chromium.org;
> ast@kernel.org; daniel@iogearbox.net; andrii@kernel.org; kafai@fb.com;
> songliubraving@fb.com; yhs@fb.com; john.fastabend@gmail.com;
> kpsingh@kernel.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org;
> bpf@vger.kernel.org
> Subject: Re: [PATCH] seccomp: Improve performance by optimizing memory
> barrier
> 
> On Mon, Feb 01, 2021 at 08:49:41PM +0800, wanghongzhe wrote:
> > If a thread(A)'s TSYNC flag is set from seccomp(), then it will
> > synchronize its seccomp filter to other threads(B) in same thread
> > group. To avoid race condition, seccomp puts rmb() between reading the
> > mode and filter in seccomp check patch(in B thread).
> > As a result, every syscall's seccomp check is slowed down by the
> > memory barrier.
> >
> > However, we can optimize it by calling rmb() only when filter is NULL
> > and reading it again after the barrier, which means the rmb() is
> > called only once in thread lifetime.
> >
> > The 'filter is NULL' conditon means that it is the first time
> > attaching filter and is by other thread(A) using TSYNC flag.
> > In this case, thread B may read the filter first and mode later in CPU
> > out-of-order exection. After this time, the thread B's mode is always
> > be set, and there will no race condition with the filter/bitmap.
> >
> > In addtion, we should puts a write memory barrier between writing the
> > filter and mode in smp_mb__before_atomic(), to avoid the race
> > condition in TSYNC case.
> >
> > Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
> > ---
> >  kernel/seccomp.c | 31 ++++++++++++++++++++++---------
> >  1 file changed, 22 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c index
> > 952dc1c90229..b944cb2b6b94 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct
> seccomp_data *sd,
> >  			READ_ONCE(current->seccomp.filter);
> >
> >  	/* Ensure unexpected behavior doesn't result in failing open. */
> > -	if (WARN_ON(f == NULL))
> > -		return SECCOMP_RET_KILL_PROCESS;
> > +	if (WARN_ON(f == NULL)) {
> > +		/*
> > +		 * Make sure the first filter addtion (from another
> > +		 * thread using TSYNC flag) are seen.
> > +		 */
> > +		rmb();
> > +
> > +		/* Read again */
> > +		f = READ_ONCE(current->seccomp.filter);
> > +
> > +		/* Ensure unexpected behavior doesn't result in failing open. */
> > +		if (WARN_ON(f == NULL))
> > +			return SECCOMP_RET_KILL_PROCESS;
> > +	}
> 
> IMHO, double WARN_ON() for the fallback flow is too much.
> Also according to the description, this "f == NULL" check is due to races and
> not programming error which WARN_ON() are intended to catch.
> 
> Thanks

Maybe you are right. I think 'if (f == NULL)' is enough for this optimizing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] seccomp: Improve performance by optimizing memory barrier
  2021-02-01 15:39 ` Andy Lutomirski
@ 2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
  0 siblings, 0 replies; 6+ messages in thread
From: Wanghongzhe (Hongzhe, EulerOS) @ 2021-02-02  1:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: keescook, wad, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, linux-kernel, netdev, bpf


>> On Feb 1, 2021, at 4:06 AM, wanghongzhe <wanghongzhe@huawei.com> wrote:
>> 
>> If a thread(A)'s TSYNC flag is set from seccomp(), then it will 
>> synchronize its seccomp filter to other threads(B) in same thread 
>> group. To avoid race condition, seccomp puts rmb() between reading the 
>> mode and filter in seccomp check patch(in B thread).
>> As a result, every syscall's seccomp check is slowed down by the 
>> memory barrier.
>> 
>> However, we can optimize it by calling rmb() only when filter is NULL 
>> and reading it again after the barrier, which means the rmb() is 
>> called only once in thread lifetime.
>> 
>> The 'filter is NULL' conditon means that it is the first time 
>> attaching filter and is by other thread(A) using TSYNC flag.
>> In this case, thread B may read the filter first and mode later in CPU 
>> out-of-order exection. After this time, the thread B's mode is always 
>> be set, and there will no race condition with the filter/bitmap.
>> 
>> In addtion, we should puts a write memory barrier between writing the 
>> filter and mode in smp_mb__before_atomic(), to avoid the race 
>> condition in TSYNC case.
>
> I haven’t fully worked this out, but rmb() is bogus. This should be smp_rmb().

Yes, I think you are right.I will fix it and send another patch.
>> 
>> Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
>> ---
>> kernel/seccomp.c | 31 ++++++++++++++++++++++---------
>> 1 file changed, 22 insertions(+), 9 deletions(-)
>> 
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 
>> 952dc1c90229..b944cb2b6b94 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>>            READ_ONCE(current->seccomp.filter);
>> 
>>    /* Ensure unexpected behavior doesn't result in failing open. */
>> -    if (WARN_ON(f == NULL))
>> -        return SECCOMP_RET_KILL_PROCESS;
>> +    if (WARN_ON(f == NULL)) {
>> +        /*
>> +         * Make sure the first filter addtion (from another
>> +         * thread using TSYNC flag) are seen.
>> +         */
>> +        rmb();
>> +        
>> +        /* Read again */
>> +        f = READ_ONCE(current->seccomp.filter);
>> +
>> +        /* Ensure unexpected behavior doesn't result in failing open. */
>> +        if (WARN_ON(f == NULL))
>> +            return SECCOMP_RET_KILL_PROCESS;
>> +    }
>> 
>>    if (seccomp_cache_check_allow(f, sd))
>>        return SECCOMP_RET_ALLOW;
>> @@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
>>         * equivalent (see ptrace_may_access), it is safe to
>>         * allow one thread to transition the other.
>>         */
>> -        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
>> +        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
>> +            /*
>> +             * Make sure mode cannot be set before the filter
>> +             * are set.
>> +             */
>> +            smp_mb__before_atomic();
>> +
>>            seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
>>                        flags);
>> +        }
>>    }
>> }
>> 
>> @@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>>    int data;
>>    struct seccomp_data sd_local;
>> 
>> -    /*
>> -     * Make sure that any changes to mode from another thread have
>> -     * been seen after SYSCALL_WORK_SECCOMP was seen.
>> -     */
>> -    rmb();
>> -
>>    if (!sd) {
>>        populate_seccomp_data(&sd_local);
>>        sd = &sd_local;
>> --
>> 2.19.1
>> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] seccomp: Improve performance by optimizing memory barrier
  2021-02-01 12:50 wanghongzhe
@ 2021-02-01 15:39 ` Andy Lutomirski
  2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2021-02-01 15:39 UTC (permalink / raw)
  To: wanghongzhe
  Cc: keescook, wad, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, linux-kernel, netdev, bpf



> On Feb 1, 2021, at 4:06 AM, wanghongzhe <wanghongzhe@huawei.com> wrote:
> 
> If a thread(A)'s TSYNC flag is set from seccomp(), then it will
> synchronize its seccomp filter to other threads(B) in same thread
> group. To avoid race condition, seccomp puts rmb() between
> reading the mode and filter in seccomp check patch(in B thread).
> As a result, every syscall's seccomp check is slowed down by the
> memory barrier.
> 
> However, we can optimize it by calling rmb() only when filter is
> NULL and reading it again after the barrier, which means the rmb()
> is called only once in thread lifetime.
> 
> The 'filter is NULL' conditon means that it is the first time
> attaching filter and is by other thread(A) using TSYNC flag.
> In this case, thread B may read the filter first and mode later
> in CPU out-of-order exection. After this time, the thread B's
> mode is always be set, and there will no race condition with the
> filter/bitmap.
> 
> In addtion, we should puts a write memory barrier between writing
> the filter and mode in smp_mb__before_atomic(), to avoid
> the race condition in TSYNC case.

I haven’t fully worked this out, but rmb() is bogus. This should be smp_rmb().

> 
> Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
> ---
> kernel/seccomp.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 952dc1c90229..b944cb2b6b94 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>            READ_ONCE(current->seccomp.filter);
> 
>    /* Ensure unexpected behavior doesn't result in failing open. */
> -    if (WARN_ON(f == NULL))
> -        return SECCOMP_RET_KILL_PROCESS;
> +    if (WARN_ON(f == NULL)) {
> +        /*
> +         * Make sure the first filter addtion (from another
> +         * thread using TSYNC flag) are seen.
> +         */
> +        rmb();
> +        
> +        /* Read again */
> +        f = READ_ONCE(current->seccomp.filter);
> +
> +        /* Ensure unexpected behavior doesn't result in failing open. */
> +        if (WARN_ON(f == NULL))
> +            return SECCOMP_RET_KILL_PROCESS;
> +    }
> 
>    if (seccomp_cache_check_allow(f, sd))
>        return SECCOMP_RET_ALLOW;
> @@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
>         * equivalent (see ptrace_may_access), it is safe to
>         * allow one thread to transition the other.
>         */
> -        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
> +        if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
> +            /*
> +             * Make sure mode cannot be set before the filter
> +             * are set.
> +             */
> +            smp_mb__before_atomic();
> +
>            seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
>                        flags);
> +        }
>    }
> }
> 
> @@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>    int data;
>    struct seccomp_data sd_local;
> 
> -    /*
> -     * Make sure that any changes to mode from another thread have
> -     * been seen after SYSCALL_WORK_SECCOMP was seen.
> -     */
> -    rmb();
> -
>    if (!sd) {
>        populate_seccomp_data(&sd_local);
>        sd = &sd_local;
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] seccomp: Improve performance by optimizing memory barrier
@ 2021-02-01 12:50 wanghongzhe
  2021-02-01 15:39 ` Andy Lutomirski
  0 siblings, 1 reply; 6+ messages in thread
From: wanghongzhe @ 2021-02-01 12:50 UTC (permalink / raw)
  To: keescook, luto, wad, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, linux-kernel, netdev, bpf
  Cc: wanghongzhe

If a thread(A)'s TSYNC flag is set from seccomp(), then it will
synchronize its seccomp filter to other threads(B) in same thread
group. To avoid race condition, seccomp puts rmb() between
reading the mode and filter in seccomp check patch(in B thread).
As a result, every syscall's seccomp check is slowed down by the
memory barrier.

However, we can optimize it by calling rmb() only when filter is
NULL and reading it again after the barrier, which means the rmb()
is called only once in thread lifetime.

The 'filter is NULL' conditon means that it is the first time
attaching filter and is by other thread(A) using TSYNC flag.
In this case, thread B may read the filter first and mode later
in CPU out-of-order exection. After this time, the thread B's
mode is always be set, and there will no race condition with the
filter/bitmap.

In addtion, we should puts a write memory barrier between writing
the filter and mode in smp_mb__before_atomic(), to avoid
the race condition in TSYNC case.

Signed-off-by: wanghongzhe <wanghongzhe@huawei.com>
---
 kernel/seccomp.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..b944cb2b6b94 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -397,8 +397,20 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
 			READ_ONCE(current->seccomp.filter);
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
-	if (WARN_ON(f == NULL))
-		return SECCOMP_RET_KILL_PROCESS;
+	if (WARN_ON(f == NULL)) {
+		/*
+		 * Make sure the first filter addtion (from another
+		 * thread using TSYNC flag) are seen.
+		 */
+		rmb();
+		
+		/* Read again */
+		f = READ_ONCE(current->seccomp.filter);
+
+		/* Ensure unexpected behavior doesn't result in failing open. */
+		if (WARN_ON(f == NULL))
+			return SECCOMP_RET_KILL_PROCESS;
+	}
 
 	if (seccomp_cache_check_allow(f, sd))
 		return SECCOMP_RET_ALLOW;
@@ -614,9 +626,16 @@ static inline void seccomp_sync_threads(unsigned long flags)
 		 * equivalent (see ptrace_may_access), it is safe to
 		 * allow one thread to transition the other.
 		 */
-		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
+			/*
+			 * Make sure mode cannot be set before the filter
+			 * are set.
+			 */
+			smp_mb__before_atomic();
+
 			seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
 					    flags);
+		}
 	}
 }
 
@@ -1160,12 +1179,6 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	int data;
 	struct seccomp_data sd_local;
 
-	/*
-	 * Make sure that any changes to mode from another thread have
-	 * been seen after SYSCALL_WORK_SECCOMP was seen.
-	 */
-	rmb();
-
 	if (!sd) {
 		populate_seccomp_data(&sd_local);
 		sd = &sd_local;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-08  7:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 12:49 [PATCH] seccomp: Improve performance by optimizing memory barrier wanghongzhe
2021-02-08  6:43 ` Leon Romanovsky
2021-02-08  7:10   ` Wanghongzhe (Hongzhe, EulerOS)
2021-02-01 12:50 wanghongzhe
2021-02-01 15:39 ` Andy Lutomirski
2021-02-02  1:50   ` Wanghongzhe (Hongzhe, EulerOS)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).