All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug-report] possible performance problem in ret_to_user_from_irq
@ 2022-12-26  8:45 ` Hui Tang
  0 siblings, 0 replies; 16+ messages in thread
From: Hui Tang @ 2022-12-26  8:45 UTC (permalink / raw)
  To: Russell King; +Cc: linux-arm-kernel, linux-kernel, Jens Axboe, tanghui20

hi folks.

I found a performance problem which is introduced by commit
32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
After the commit,  any bit in the range of 0..15 will cause
do_work_pending() to be invoked. More frequent do_work_pending()
invoked possible result in worse performance.

Some of the tests I've done, as follows:
lmbench test			base		with patch
./lat_ctx -P 1 -s 0  2		7.3167		11.04
./lat_ctx -P 1 -s 16 2          8.0467		14.5367
./lat_ctx -P 1 -s 64 2		7.8667		11.43
./lat_ctx -P 1 -s 16 16		16.47		18.3667
./lat_pipe -P 1			28.1671		44.7904

libMicro-0.4.1 test		base		with patch
./cascade_cond -E -C 200\
  -L -S -W -N "c_cond_1" -I 100	286.3333	358

When I adjust test bit, the performance problem gone.
-	movs	r1, r1, lsl #16
+	ldr	r2, =#_TIF_WORK_MASK
+	tst	r1, r2

Does anyone have a good suggestion for this problem?
should just test _TIF_WORK_MASK, as before?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [bug-report] possible performance problem in ret_to_user_from_irq
@ 2022-12-26  8:45 ` Hui Tang
  0 siblings, 0 replies; 16+ messages in thread
From: Hui Tang @ 2022-12-26  8:45 UTC (permalink / raw)
  To: Russell King; +Cc: linux-arm-kernel, linux-kernel, Jens Axboe, tanghui20

hi folks.

I found a performance problem which is introduced by commit
32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
After the commit,  any bit in the range of 0..15 will cause
do_work_pending() to be invoked. More frequent do_work_pending()
invoked possible result in worse performance.

Some of the tests I've done, as follows:
lmbench test			base		with patch
./lat_ctx -P 1 -s 0  2		7.3167		11.04
./lat_ctx -P 1 -s 16 2          8.0467		14.5367
./lat_ctx -P 1 -s 64 2		7.8667		11.43
./lat_ctx -P 1 -s 16 16		16.47		18.3667
./lat_pipe -P 1			28.1671		44.7904

libMicro-0.4.1 test		base		with patch
./cascade_cond -E -C 200\
  -L -S -W -N "c_cond_1" -I 100	286.3333	358

When I adjust test bit, the performance problem gone.
-	movs	r1, r1, lsl #16
+	ldr	r2, =#_TIF_WORK_MASK
+	tst	r1, r2

Does anyone have a good suggestion for this problem?
should just test _TIF_WORK_MASK, as before?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2022-12-26  8:45 ` Hui Tang
@ 2023-01-03 10:06   ` Russell King (Oracle)
  -1 siblings, 0 replies; 16+ messages in thread
From: Russell King (Oracle) @ 2023-01-03 10:06 UTC (permalink / raw)
  To: Hui Tang; +Cc: linux-arm-kernel, linux-kernel, Jens Axboe

On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
> hi folks.
> 
> I found a performance problem which is introduced by commit
> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
> After the commit,  any bit in the range of 0..15 will cause
> do_work_pending() to be invoked. More frequent do_work_pending()
> invoked possible result in worse performance.
> 
> Some of the tests I've done, as follows:
> lmbench test			base		with patch
> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
> ./lat_pipe -P 1			28.1671		44.7904
> 
> libMicro-0.4.1 test		base		with patch
> ./cascade_cond -E -C 200\
>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
> 
> When I adjust test bit, the performance problem gone.
> -	movs	r1, r1, lsl #16
> +	ldr	r2, =#_TIF_WORK_MASK
> +	tst	r1, r2
> 
> Does anyone have a good suggestion for this problem?
> should just test _TIF_WORK_MASK, as before?

I think it should be fine - but I would suggest re-organising the
TIF definitions so that those TIF bits that shouldn't trigger
do_work_pending are not in the first 16 bits.

Note that all four bits in _TIF_SYSCALL_WORK need to stay within
an 8-bit even-bit-aligned range, so the value is suitable for an
immediate assembly constant.

I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
20..23, and then 8 to 4.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-03 10:06   ` Russell King (Oracle)
  0 siblings, 0 replies; 16+ messages in thread
From: Russell King (Oracle) @ 2023-01-03 10:06 UTC (permalink / raw)
  To: Hui Tang; +Cc: linux-arm-kernel, linux-kernel, Jens Axboe

On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
> hi folks.
> 
> I found a performance problem which is introduced by commit
> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
> After the commit,  any bit in the range of 0..15 will cause
> do_work_pending() to be invoked. More frequent do_work_pending()
> invoked possible result in worse performance.
> 
> Some of the tests I've done, as follows:
> lmbench test			base		with patch
> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
> ./lat_pipe -P 1			28.1671		44.7904
> 
> libMicro-0.4.1 test		base		with patch
> ./cascade_cond -E -C 200\
>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
> 
> When I adjust test bit, the performance problem gone.
> -	movs	r1, r1, lsl #16
> +	ldr	r2, =#_TIF_WORK_MASK
> +	tst	r1, r2
> 
> Does anyone have a good suggestion for this problem?
> should just test _TIF_WORK_MASK, as before?

I think it should be fine - but I would suggest re-organising the
TIF definitions so that those TIF bits that shouldn't trigger
do_work_pending are not in the first 16 bits.

Note that all four bits in _TIF_SYSCALL_WORK need to stay within
an 8-bit even-bit-aligned range, so the value is suitable for an
immediate assembly constant.

I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
20..23, and then 8 to 4.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2023-01-03 10:06   ` Russell King (Oracle)
@ 2023-01-03 14:25     ` Jens Axboe
  -1 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-01-03 14:25 UTC (permalink / raw)
  To: Russell King (Oracle), Hui Tang; +Cc: linux-arm-kernel, linux-kernel

On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>> hi folks.
>>
>> I found a performance problem which is introduced by commit
>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>> After the commit,  any bit in the range of 0..15 will cause
>> do_work_pending() to be invoked. More frequent do_work_pending()
>> invoked possible result in worse performance.
>>
>> Some of the tests I've done? as follows:
>> lmbench test			base		with patch
>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>> ./lat_pipe -P 1			28.1671		44.7904
>>
>> libMicro-0.4.1 test		base		with patch
>> ./cascade_cond -E -C 200\
>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>
>> When I adjust test bit, the performance problem gone.
>> -	movs	r1, r1, lsl #16
>> +	ldr	r2, =#_TIF_WORK_MASK
>> +	tst	r1, r2
>>
>> Does anyone have a good suggestion for this problem?
>> should just test _TIF_WORK_MASK, as before?
> 
> I think it should be fine - but I would suggest re-organising the
> TIF definitions so that those TIF bits that shouldn't trigger
> do_work_pending are not in the first 16 bits.
> 
> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
> an 8-bit even-bit-aligned range, so the value is suitable for an
> immediate assembly constant.
> 
> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
> 20..23, and then 8 to 4.

Like this?

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index aecc403b2880..7f092cb55a41 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
 #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_UPROBE		3	/* breakpointed or singlestepping */
-#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
-#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
-#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
-#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
-#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
+#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
 
 #define TIF_USING_IWMMXT	17
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
-#define TIF_RESTORE_SIGMASK	20
+#define TIF_RESTORE_SIGMASK	19
+#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
+#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
+#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
+#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
+
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-03 14:25     ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-01-03 14:25 UTC (permalink / raw)
  To: Russell King (Oracle), Hui Tang; +Cc: linux-arm-kernel, linux-kernel

On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>> hi folks.
>>
>> I found a performance problem which is introduced by commit
>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>> After the commit,  any bit in the range of 0..15 will cause
>> do_work_pending() to be invoked. More frequent do_work_pending()
>> invoked possible result in worse performance.
>>
>> Some of the tests I've done? as follows:
>> lmbench test			base		with patch
>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>> ./lat_pipe -P 1			28.1671		44.7904
>>
>> libMicro-0.4.1 test		base		with patch
>> ./cascade_cond -E -C 200\
>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>
>> When I adjust test bit, the performance problem gone.
>> -	movs	r1, r1, lsl #16
>> +	ldr	r2, =#_TIF_WORK_MASK
>> +	tst	r1, r2
>>
>> Does anyone have a good suggestion for this problem?
>> should just test _TIF_WORK_MASK, as before?
> 
> I think it should be fine - but I would suggest re-organising the
> TIF definitions so that those TIF bits that shouldn't trigger
> do_work_pending are not in the first 16 bits.
> 
> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
> an 8-bit even-bit-aligned range, so the value is suitable for an
> immediate assembly constant.
> 
> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
> 20..23, and then 8 to 4.

Like this?

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index aecc403b2880..7f092cb55a41 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
 #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_UPROBE		3	/* breakpointed or singlestepping */
-#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
-#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
-#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
-#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
-#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
+#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
 
 #define TIF_USING_IWMMXT	17
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
-#define TIF_RESTORE_SIGMASK	20
+#define TIF_RESTORE_SIGMASK	19
+#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
+#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
+#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
+#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
+
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)

-- 
Jens Axboe


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2023-01-03 14:25     ` Jens Axboe
@ 2023-01-03 14:34       ` Russell King (Oracle)
  -1 siblings, 0 replies; 16+ messages in thread
From: Russell King (Oracle) @ 2023-01-03 14:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Hui Tang, linux-arm-kernel, linux-kernel

On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
> > On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
> >> hi folks.
> >>
> >> I found a performance problem which is introduced by commit
> >> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
> >> After the commit,  any bit in the range of 0..15 will cause
> >> do_work_pending() to be invoked. More frequent do_work_pending()
> >> invoked possible result in worse performance.
> >>
> >> Some of the tests I've done? as follows:
> >> lmbench test			base		with patch
> >> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
> >> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
> >> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
> >> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
> >> ./lat_pipe -P 1			28.1671		44.7904
> >>
> >> libMicro-0.4.1 test		base		with patch
> >> ./cascade_cond -E -C 200\
> >>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
> >>
> >> When I adjust test bit, the performance problem gone.
> >> -	movs	r1, r1, lsl #16
> >> +	ldr	r2, =#_TIF_WORK_MASK
> >> +	tst	r1, r2
> >>
> >> Does anyone have a good suggestion for this problem?
> >> should just test _TIF_WORK_MASK, as before?
> > 
> > I think it should be fine - but I would suggest re-organising the
> > TIF definitions so that those TIF bits that shouldn't trigger
> > do_work_pending are not in the first 16 bits.
> > 
> > Note that all four bits in _TIF_SYSCALL_WORK need to stay within
> > an 8-bit even-bit-aligned range, so the value is suitable for an
> > immediate assembly constant.
> > 
> > I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
> > 20..23, and then 8 to 4.
> 
> Like this?
> 
> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> index aecc403b2880..7f092cb55a41 100644
> --- a/arch/arm/include/asm/thread_info.h
> +++ b/arch/arm/include/asm/thread_info.h
> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>  
>  #define TIF_USING_IWMMXT	17
>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
> -#define TIF_RESTORE_SIGMASK	20
> +#define TIF_RESTORE_SIGMASK	19
> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
> +
>  
>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)

Yep, LGTM, thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-03 14:34       ` Russell King (Oracle)
  0 siblings, 0 replies; 16+ messages in thread
From: Russell King (Oracle) @ 2023-01-03 14:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Hui Tang, linux-arm-kernel, linux-kernel

On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
> > On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
> >> hi folks.
> >>
> >> I found a performance problem which is introduced by commit
> >> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
> >> After the commit,  any bit in the range of 0..15 will cause
> >> do_work_pending() to be invoked. More frequent do_work_pending()
> >> invoked possible result in worse performance.
> >>
> >> Some of the tests I've done? as follows:
> >> lmbench test			base		with patch
> >> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
> >> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
> >> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
> >> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
> >> ./lat_pipe -P 1			28.1671		44.7904
> >>
> >> libMicro-0.4.1 test		base		with patch
> >> ./cascade_cond -E -C 200\
> >>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
> >>
> >> When I adjust test bit, the performance problem gone.
> >> -	movs	r1, r1, lsl #16
> >> +	ldr	r2, =#_TIF_WORK_MASK
> >> +	tst	r1, r2
> >>
> >> Does anyone have a good suggestion for this problem?
> >> should just test _TIF_WORK_MASK, as before?
> > 
> > I think it should be fine - but I would suggest re-organising the
> > TIF definitions so that those TIF bits that shouldn't trigger
> > do_work_pending are not in the first 16 bits.
> > 
> > Note that all four bits in _TIF_SYSCALL_WORK need to stay within
> > an 8-bit even-bit-aligned range, so the value is suitable for an
> > immediate assembly constant.
> > 
> > I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
> > 20..23, and then 8 to 4.
> 
> Like this?
> 
> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> index aecc403b2880..7f092cb55a41 100644
> --- a/arch/arm/include/asm/thread_info.h
> +++ b/arch/arm/include/asm/thread_info.h
> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>  
>  #define TIF_USING_IWMMXT	17
>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
> -#define TIF_RESTORE_SIGMASK	20
> +#define TIF_RESTORE_SIGMASK	19
> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
> +
>  
>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)

Yep, LGTM, thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2023-01-03 14:34       ` Russell King (Oracle)
@ 2023-01-03 14:59         ` Jens Axboe
  -1 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-01-03 14:59 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: Hui Tang, linux-arm-kernel, linux-kernel

On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>> hi folks.
>>>>
>>>> I found a performance problem which is introduced by commit
>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>> After the commit,  any bit in the range of 0..15 will cause
>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>> invoked possible result in worse performance.
>>>>
>>>> Some of the tests I've done? as follows:
>>>> lmbench test			base		with patch
>>>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>>>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>>>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>>>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>>>> ./lat_pipe -P 1			28.1671		44.7904
>>>>
>>>> libMicro-0.4.1 test		base		with patch
>>>> ./cascade_cond -E -C 200\
>>>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>>>
>>>> When I adjust test bit, the performance problem gone.
>>>> -	movs	r1, r1, lsl #16
>>>> +	ldr	r2, =#_TIF_WORK_MASK
>>>> +	tst	r1, r2
>>>>
>>>> Does anyone have a good suggestion for this problem?
>>>> should just test _TIF_WORK_MASK, as before?
>>>
>>> I think it should be fine - but I would suggest re-organising the
>>> TIF definitions so that those TIF bits that shouldn't trigger
>>> do_work_pending are not in the first 16 bits.
>>>
>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>> immediate assembly constant.
>>>
>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>> 20..23, and then 8 to 4.
>>
>> Like this?
>>
>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>> index aecc403b2880..7f092cb55a41 100644
>> --- a/arch/arm/include/asm/thread_info.h
>> +++ b/arch/arm/include/asm/thread_info.h
>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
>> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
>> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
>> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
>> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
>> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
>> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>>  
>>  #define TIF_USING_IWMMXT	17
>>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
>> -#define TIF_RESTORE_SIGMASK	20
>> +#define TIF_RESTORE_SIGMASK	19
>> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
>> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
>> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
>> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
>> +
>>  
>>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
> 
> Yep, LGTM, thanks.

Hui Tang, can you give it a whirl? Just checked and it applies to
5.10-stable as well, just with a slight offset.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-03 14:59         ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-01-03 14:59 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: Hui Tang, linux-arm-kernel, linux-kernel

On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>> hi folks.
>>>>
>>>> I found a performance problem which is introduced by commit
>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>> After the commit,  any bit in the range of 0..15 will cause
>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>> invoked possible result in worse performance.
>>>>
>>>> Some of the tests I've done? as follows:
>>>> lmbench test			base		with patch
>>>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>>>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>>>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>>>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>>>> ./lat_pipe -P 1			28.1671		44.7904
>>>>
>>>> libMicro-0.4.1 test		base		with patch
>>>> ./cascade_cond -E -C 200\
>>>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>>>
>>>> When I adjust test bit, the performance problem gone.
>>>> -	movs	r1, r1, lsl #16
>>>> +	ldr	r2, =#_TIF_WORK_MASK
>>>> +	tst	r1, r2
>>>>
>>>> Does anyone have a good suggestion for this problem?
>>>> should just test _TIF_WORK_MASK, as before?
>>>
>>> I think it should be fine - but I would suggest re-organising the
>>> TIF definitions so that those TIF bits that shouldn't trigger
>>> do_work_pending are not in the first 16 bits.
>>>
>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>> immediate assembly constant.
>>>
>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>> 20..23, and then 8 to 4.
>>
>> Like this?
>>
>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>> index aecc403b2880..7f092cb55a41 100644
>> --- a/arch/arm/include/asm/thread_info.h
>> +++ b/arch/arm/include/asm/thread_info.h
>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
>> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
>> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
>> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
>> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
>> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
>> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>>  
>>  #define TIF_USING_IWMMXT	17
>>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
>> -#define TIF_RESTORE_SIGMASK	20
>> +#define TIF_RESTORE_SIGMASK	19
>> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
>> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
>> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
>> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
>> +
>>  
>>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
> 
> Yep, LGTM, thanks.

Hui Tang, can you give it a whirl? Just checked and it applies to
5.10-stable as well, just with a slight offset.

-- 
Jens Axboe


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2023-01-03 14:59         ` Jens Axboe
@ 2023-01-04  1:31           ` Hui Tang
  -1 siblings, 0 replies; 16+ messages in thread
From: Hui Tang @ 2023-01-04  1:31 UTC (permalink / raw)
  To: Jens Axboe, Russell King (Oracle); +Cc: linux-arm-kernel, linux-kernel



On 2023/1/3 22:59, Jens Axboe wrote:
> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>> hi folks.
>>>>>
>>>>> I found a performance problem which is introduced by commit
>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>> invoked possible result in worse performance.
>>>>>
>>>>> Some of the tests I've done? as follows:
>>>>> lmbench test			base		with patch
>>>>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>>>>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>>>>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>>>>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>>>>> ./lat_pipe -P 1			28.1671		44.7904
>>>>>
>>>>> libMicro-0.4.1 test		base		with patch
>>>>> ./cascade_cond -E -C 200\
>>>>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>>>>
>>>>> When I adjust test bit, the performance problem gone.
>>>>> -	movs	r1, r1, lsl #16
>>>>> +	ldr	r2, =#_TIF_WORK_MASK
>>>>> +	tst	r1, r2
>>>>>
>>>>> Does anyone have a good suggestion for this problem?
>>>>> should just test _TIF_WORK_MASK, as before?
>>>>
>>>> I think it should be fine - but I would suggest re-organising the
>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>> do_work_pending are not in the first 16 bits.
>>>>
>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>> immediate assembly constant.
>>>>
>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>> 20..23, and then 8 to 4.
>>>
>>> Like this?
>>>
>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>> index aecc403b2880..7f092cb55a41 100644
>>> --- a/arch/arm/include/asm/thread_info.h
>>> +++ b/arch/arm/include/asm/thread_info.h
>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>>>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>>>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
>>> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
>>> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
>>> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
>>> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
>>> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
>>> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>>>
>>>  #define TIF_USING_IWMMXT	17
>>>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
>>> -#define TIF_RESTORE_SIGMASK	20
>>> +#define TIF_RESTORE_SIGMASK	19
>>> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
>>> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
>>> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
>>> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
>>> +
>>>
>>>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>>>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
>>
>> Yep, LGTM, thanks.
>
> Hui Tang, can you give it a whirl? Just checked and it applies to
> 5.10-stable as well, just with a slight offset.

Okay, I'll test it today.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-04  1:31           ` Hui Tang
  0 siblings, 0 replies; 16+ messages in thread
From: Hui Tang @ 2023-01-04  1:31 UTC (permalink / raw)
  To: Jens Axboe, Russell King (Oracle); +Cc: linux-arm-kernel, linux-kernel



On 2023/1/3 22:59, Jens Axboe wrote:
> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>> hi folks.
>>>>>
>>>>> I found a performance problem which is introduced by commit
>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>> invoked possible result in worse performance.
>>>>>
>>>>> Some of the tests I've done? as follows:
>>>>> lmbench test			base		with patch
>>>>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>>>>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>>>>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>>>>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>>>>> ./lat_pipe -P 1			28.1671		44.7904
>>>>>
>>>>> libMicro-0.4.1 test		base		with patch
>>>>> ./cascade_cond -E -C 200\
>>>>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>>>>
>>>>> When I adjust test bit, the performance problem gone.
>>>>> -	movs	r1, r1, lsl #16
>>>>> +	ldr	r2, =#_TIF_WORK_MASK
>>>>> +	tst	r1, r2
>>>>>
>>>>> Does anyone have a good suggestion for this problem?
>>>>> should just test _TIF_WORK_MASK, as before?
>>>>
>>>> I think it should be fine - but I would suggest re-organising the
>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>> do_work_pending are not in the first 16 bits.
>>>>
>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>> immediate assembly constant.
>>>>
>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>> 20..23, and then 8 to 4.
>>>
>>> Like this?
>>>
>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>> index aecc403b2880..7f092cb55a41 100644
>>> --- a/arch/arm/include/asm/thread_info.h
>>> +++ b/arch/arm/include/asm/thread_info.h
>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>>>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>>>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
>>> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
>>> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
>>> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
>>> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
>>> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
>>> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>>>
>>>  #define TIF_USING_IWMMXT	17
>>>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
>>> -#define TIF_RESTORE_SIGMASK	20
>>> +#define TIF_RESTORE_SIGMASK	19
>>> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
>>> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
>>> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
>>> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
>>> +
>>>
>>>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>>>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
>>
>> Yep, LGTM, thanks.
>
> Hui Tang, can you give it a whirl? Just checked and it applies to
> 5.10-stable as well, just with a slight offset.

Okay, I'll test it today.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2023-01-03 14:59         ` Jens Axboe
@ 2023-01-04  7:04           ` Hui Tang
  -1 siblings, 0 replies; 16+ messages in thread
From: Hui Tang @ 2023-01-04  7:04 UTC (permalink / raw)
  To: Jens Axboe, Russell King (Oracle); +Cc: linux-arm-kernel, linux-kernel



On 2023/1/3 22:59, Jens Axboe wrote:
> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>> hi folks.
>>>>>
>>>>> I found a performance problem which is introduced by commit
>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>> invoked possible result in worse performance.
>>>>>
>>>>> Some of the tests I've done? as follows:
>>>>> lmbench test			base		with patch
>>>>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>>>>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>>>>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>>>>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>>>>> ./lat_pipe -P 1			28.1671		44.7904
>>>>>
>>>>> libMicro-0.4.1 test		base		with patch
>>>>> ./cascade_cond -E -C 200\
>>>>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>>>>
>>>>> When I adjust test bit, the performance problem gone.
>>>>> -	movs	r1, r1, lsl #16
>>>>> +	ldr	r2, =#_TIF_WORK_MASK
>>>>> +	tst	r1, r2
>>>>>
>>>>> Does anyone have a good suggestion for this problem?
>>>>> should just test _TIF_WORK_MASK, as before?
>>>>
>>>> I think it should be fine - but I would suggest re-organising the
>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>> do_work_pending are not in the first 16 bits.
>>>>
>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>> immediate assembly constant.
>>>>
>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>> 20..23, and then 8 to 4.
>>>
>>> Like this?
>>>
>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>> index aecc403b2880..7f092cb55a41 100644
>>> --- a/arch/arm/include/asm/thread_info.h
>>> +++ b/arch/arm/include/asm/thread_info.h
>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>>>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>>>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
>>> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
>>> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
>>> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
>>> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
>>> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
>>> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>>>
>>>  #define TIF_USING_IWMMXT	17
>>>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
>>> -#define TIF_RESTORE_SIGMASK	20
>>> +#define TIF_RESTORE_SIGMASK	19
>>> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
>>> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
>>> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
>>> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
>>> +
>>>
>>>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>>>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
>>
>> Yep, LGTM, thanks.
>
> Hui Tang, can you give it a whirl? Just checked and it applies to
> 5.10-stable as well, just with a slight offset.

With the latest patch, the testcase rusults shown in the 'new patch' column.
I also retested previous commit of 32d59773da38, the results shown in the 'base' column.

lmbench test                base        32d59773da38  	new patch
./lat_ctx -P 1 -s 0  2      8.04          11.04		8.25
./lat_ctx -P 1 -s 16 2      9.08          14.5367     	9.26
./lat_ctx -P 1 -s 64 2      8.78          11.43		8.71
./lat_ctx -P 1 -s 16 16     17.22         18.3667     	17.32
./lat_pipe -P 1             43.5021       44.7904     	41.3729

libMicro-0.4.1 test        	  base  32d59773da38    new patch
./cascade_cond -E -C 200\
  -L -S -W -N "c_cond_1" -I 100    281    358     	 281

The performance problem also seem to gone with the latest patch, thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-04  7:04           ` Hui Tang
  0 siblings, 0 replies; 16+ messages in thread
From: Hui Tang @ 2023-01-04  7:04 UTC (permalink / raw)
  To: Jens Axboe, Russell King (Oracle); +Cc: linux-arm-kernel, linux-kernel



On 2023/1/3 22:59, Jens Axboe wrote:
> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>> hi folks.
>>>>>
>>>>> I found a performance problem which is introduced by commit
>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>> invoked possible result in worse performance.
>>>>>
>>>>> Some of the tests I've done? as follows:
>>>>> lmbench test			base		with patch
>>>>> ./lat_ctx -P 1 -s 0  2		7.3167		11.04
>>>>> ./lat_ctx -P 1 -s 16 2          8.0467		14.5367
>>>>> ./lat_ctx -P 1 -s 64 2		7.8667		11.43
>>>>> ./lat_ctx -P 1 -s 16 16		16.47		18.3667
>>>>> ./lat_pipe -P 1			28.1671		44.7904
>>>>>
>>>>> libMicro-0.4.1 test		base		with patch
>>>>> ./cascade_cond -E -C 200\
>>>>>  -L -S -W -N "c_cond_1" -I 100	286.3333	358
>>>>>
>>>>> When I adjust test bit, the performance problem gone.
>>>>> -	movs	r1, r1, lsl #16
>>>>> +	ldr	r2, =#_TIF_WORK_MASK
>>>>> +	tst	r1, r2
>>>>>
>>>>> Does anyone have a good suggestion for this problem?
>>>>> should just test _TIF_WORK_MASK, as before?
>>>>
>>>> I think it should be fine - but I would suggest re-organising the
>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>> do_work_pending are not in the first 16 bits.
>>>>
>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>> immediate assembly constant.
>>>>
>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>> 20..23, and then 8 to 4.
>>>
>>> Like this?
>>>
>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>> index aecc403b2880..7f092cb55a41 100644
>>> --- a/arch/arm/include/asm/thread_info.h
>>> +++ b/arch/arm/include/asm/thread_info.h
>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>  #define TIF_NEED_RESCHED	1	/* rescheduling necessary */
>>>  #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>>>  #define TIF_UPROBE		3	/* breakpointed or singlestepping */
>>> -#define TIF_SYSCALL_TRACE	4	/* syscall trace active */
>>> -#define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
>>> -#define TIF_SYSCALL_TRACEPOINT	6	/* syscall tracepoint instrumentation */
>>> -#define TIF_SECCOMP		7	/* seccomp syscall filtering active */
>>> -#define TIF_NOTIFY_SIGNAL	8	/* signal notifications exist */
>>> +#define TIF_NOTIFY_SIGNAL	4	/* signal notifications exist */
>>>
>>>  #define TIF_USING_IWMMXT	17
>>>  #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
>>> -#define TIF_RESTORE_SIGMASK	20
>>> +#define TIF_RESTORE_SIGMASK	19
>>> +#define TIF_SYSCALL_TRACE	20	/* syscall trace active */
>>> +#define TIF_SYSCALL_AUDIT	21	/* syscall auditing active */
>>> +#define TIF_SYSCALL_TRACEPOINT	22	/* syscall tracepoint instrumentation */
>>> +#define TIF_SECCOMP		23	/* seccomp syscall filtering active */
>>> +
>>>
>>>  #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>>>  #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
>>
>> Yep, LGTM, thanks.
>
> Hui Tang, can you give it a whirl? Just checked and it applies to
> 5.10-stable as well, just with a slight offset.

With the latest patch, the testcase rusults shown in the 'new patch' column.
I also retested previous commit of 32d59773da38, the results shown in the 'base' column.

lmbench test                base        32d59773da38  	new patch
./lat_ctx -P 1 -s 0  2      8.04          11.04		8.25
./lat_ctx -P 1 -s 16 2      9.08          14.5367     	9.26
./lat_ctx -P 1 -s 64 2      8.78          11.43		8.71
./lat_ctx -P 1 -s 16 16     17.22         18.3667     	17.32
./lat_pipe -P 1             43.5021       44.7904     	41.3729

libMicro-0.4.1 test        	  base  32d59773da38    new patch
./cascade_cond -E -C 200\
  -L -S -W -N "c_cond_1" -I 100    281    358     	 281

The performance problem also seem to gone with the latest patch, thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
  2023-01-04  7:04           ` Hui Tang
@ 2023-01-04 14:45             ` Jens Axboe
  -1 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-01-04 14:45 UTC (permalink / raw)
  To: Hui Tang, Russell King (Oracle); +Cc: linux-arm-kernel, linux-kernel

On 1/4/23 12:04?AM, Hui Tang wrote:
> 
> 
> On 2023/1/3 22:59, Jens Axboe wrote:
>> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>>> hi folks.
>>>>>>
>>>>>> I found a performance problem which is introduced by commit
>>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>>> invoked possible result in worse performance.
>>>>>>
>>>>>> Some of the tests I've done? as follows:
>>>>>> lmbench test            base        with patch
>>>>>> ./lat_ctx -P 1 -s 0  2        7.3167        11.04
>>>>>> ./lat_ctx -P 1 -s 16 2          8.0467        14.5367
>>>>>> ./lat_ctx -P 1 -s 64 2        7.8667        11.43
>>>>>> ./lat_ctx -P 1 -s 16 16        16.47        18.3667
>>>>>> ./lat_pipe -P 1            28.1671        44.7904
>>>>>>
>>>>>> libMicro-0.4.1 test        base        with patch
>>>>>> ./cascade_cond -E -C 200\
>>>>>>  -L -S -W -N "c_cond_1" -I 100    286.3333    358
>>>>>>
>>>>>> When I adjust test bit, the performance problem gone.
>>>>>> -    movs    r1, r1, lsl #16
>>>>>> +    ldr    r2, =#_TIF_WORK_MASK
>>>>>> +    tst    r1, r2
>>>>>>
>>>>>> Does anyone have a good suggestion for this problem?
>>>>>> should just test _TIF_WORK_MASK, as before?
>>>>>
>>>>> I think it should be fine - but I would suggest re-organising the
>>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>>> do_work_pending are not in the first 16 bits.
>>>>>
>>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>>> immediate assembly constant.
>>>>>
>>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>>> 20..23, and then 8 to 4.
>>>>
>>>> Like this?
>>>>
>>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>>> index aecc403b2880..7f092cb55a41 100644
>>>> --- a/arch/arm/include/asm/thread_info.h
>>>> +++ b/arch/arm/include/asm/thread_info.h
>>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>>  #define TIF_NEED_RESCHED    1    /* rescheduling necessary */
>>>>  #define TIF_NOTIFY_RESUME    2    /* callback before returning to user */
>>>>  #define TIF_UPROBE        3    /* breakpointed or singlestepping */
>>>> -#define TIF_SYSCALL_TRACE    4    /* syscall trace active */
>>>> -#define TIF_SYSCALL_AUDIT    5    /* syscall auditing active */
>>>> -#define TIF_SYSCALL_TRACEPOINT    6    /* syscall tracepoint instrumentation */
>>>> -#define TIF_SECCOMP        7    /* seccomp syscall filtering active */
>>>> -#define TIF_NOTIFY_SIGNAL    8    /* signal notifications exist */
>>>> +#define TIF_NOTIFY_SIGNAL    4    /* signal notifications exist */
>>>>
>>>>  #define TIF_USING_IWMMXT    17
>>>>  #define TIF_MEMDIE        18    /* is terminating due to OOM killer */
>>>> -#define TIF_RESTORE_SIGMASK    20
>>>> +#define TIF_RESTORE_SIGMASK    19
>>>> +#define TIF_SYSCALL_TRACE    20    /* syscall trace active */
>>>> +#define TIF_SYSCALL_AUDIT    21    /* syscall auditing active */
>>>> +#define TIF_SYSCALL_TRACEPOINT    22    /* syscall tracepoint instrumentation */
>>>> +#define TIF_SECCOMP        23    /* seccomp syscall filtering active */
>>>> +
>>>>
>>>>  #define _TIF_SIGPENDING        (1 << TIF_SIGPENDING)
>>>>  #define _TIF_NEED_RESCHED    (1 << TIF_NEED_RESCHED)
>>>
>>> Yep, LGTM, thanks.
>>
>> Hui Tang, can you give it a whirl? Just checked and it applies to
>> 5.10-stable as well, just with a slight offset.
> 
> With the latest patch, the testcase rusults shown in the 'new patch' column.
> I also retested previous commit of 32d59773da38, the results shown in the 'base' column.
> 
> lmbench test                base        32d59773da38      new patch
> ./lat_ctx -P 1 -s 0  2      8.04          11.04        8.25
> ./lat_ctx -P 1 -s 16 2      9.08          14.5367         9.26
> ./lat_ctx -P 1 -s 64 2      8.78          11.43        8.71
> ./lat_ctx -P 1 -s 16 16     17.22         18.3667         17.32
> ./lat_pipe -P 1             43.5021       44.7904         41.3729
> 
> libMicro-0.4.1 test              base  32d59773da38    new patch
> ./cascade_cond -E -C 200\
>  -L -S -W -N "c_cond_1" -I 100    281    358          281
> 
> The performance problem also seem to gone with the latest patch, thanks.

Thanks for testing! I'm going to send it out and add your tested-by (and
reported-by).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bug-report] possible performance problem in ret_to_user_from_irq
@ 2023-01-04 14:45             ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-01-04 14:45 UTC (permalink / raw)
  To: Hui Tang, Russell King (Oracle); +Cc: linux-arm-kernel, linux-kernel

On 1/4/23 12:04?AM, Hui Tang wrote:
> 
> 
> On 2023/1/3 22:59, Jens Axboe wrote:
>> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>>> hi folks.
>>>>>>
>>>>>> I found a performance problem which is introduced by commit
>>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>>> invoked possible result in worse performance.
>>>>>>
>>>>>> Some of the tests I've done? as follows:
>>>>>> lmbench test            base        with patch
>>>>>> ./lat_ctx -P 1 -s 0  2        7.3167        11.04
>>>>>> ./lat_ctx -P 1 -s 16 2          8.0467        14.5367
>>>>>> ./lat_ctx -P 1 -s 64 2        7.8667        11.43
>>>>>> ./lat_ctx -P 1 -s 16 16        16.47        18.3667
>>>>>> ./lat_pipe -P 1            28.1671        44.7904
>>>>>>
>>>>>> libMicro-0.4.1 test        base        with patch
>>>>>> ./cascade_cond -E -C 200\
>>>>>>  -L -S -W -N "c_cond_1" -I 100    286.3333    358
>>>>>>
>>>>>> When I adjust test bit, the performance problem gone.
>>>>>> -    movs    r1, r1, lsl #16
>>>>>> +    ldr    r2, =#_TIF_WORK_MASK
>>>>>> +    tst    r1, r2
>>>>>>
>>>>>> Does anyone have a good suggestion for this problem?
>>>>>> should just test _TIF_WORK_MASK, as before?
>>>>>
>>>>> I think it should be fine - but I would suggest re-organising the
>>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>>> do_work_pending are not in the first 16 bits.
>>>>>
>>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>>> immediate assembly constant.
>>>>>
>>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>>> 20..23, and then 8 to 4.
>>>>
>>>> Like this?
>>>>
>>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>>> index aecc403b2880..7f092cb55a41 100644
>>>> --- a/arch/arm/include/asm/thread_info.h
>>>> +++ b/arch/arm/include/asm/thread_info.h
>>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>>  #define TIF_NEED_RESCHED    1    /* rescheduling necessary */
>>>>  #define TIF_NOTIFY_RESUME    2    /* callback before returning to user */
>>>>  #define TIF_UPROBE        3    /* breakpointed or singlestepping */
>>>> -#define TIF_SYSCALL_TRACE    4    /* syscall trace active */
>>>> -#define TIF_SYSCALL_AUDIT    5    /* syscall auditing active */
>>>> -#define TIF_SYSCALL_TRACEPOINT    6    /* syscall tracepoint instrumentation */
>>>> -#define TIF_SECCOMP        7    /* seccomp syscall filtering active */
>>>> -#define TIF_NOTIFY_SIGNAL    8    /* signal notifications exist */
>>>> +#define TIF_NOTIFY_SIGNAL    4    /* signal notifications exist */
>>>>
>>>>  #define TIF_USING_IWMMXT    17
>>>>  #define TIF_MEMDIE        18    /* is terminating due to OOM killer */
>>>> -#define TIF_RESTORE_SIGMASK    20
>>>> +#define TIF_RESTORE_SIGMASK    19
>>>> +#define TIF_SYSCALL_TRACE    20    /* syscall trace active */
>>>> +#define TIF_SYSCALL_AUDIT    21    /* syscall auditing active */
>>>> +#define TIF_SYSCALL_TRACEPOINT    22    /* syscall tracepoint instrumentation */
>>>> +#define TIF_SECCOMP        23    /* seccomp syscall filtering active */
>>>> +
>>>>
>>>>  #define _TIF_SIGPENDING        (1 << TIF_SIGPENDING)
>>>>  #define _TIF_NEED_RESCHED    (1 << TIF_NEED_RESCHED)
>>>
>>> Yep, LGTM, thanks.
>>
>> Hui Tang, can you give it a whirl? Just checked and it applies to
>> 5.10-stable as well, just with a slight offset.
> 
> With the latest patch, the testcase rusults shown in the 'new patch' column.
> I also retested previous commit of 32d59773da38, the results shown in the 'base' column.
> 
> lmbench test                base        32d59773da38      new patch
> ./lat_ctx -P 1 -s 0  2      8.04          11.04        8.25
> ./lat_ctx -P 1 -s 16 2      9.08          14.5367         9.26
> ./lat_ctx -P 1 -s 64 2      8.78          11.43        8.71
> ./lat_ctx -P 1 -s 16 16     17.22         18.3667         17.32
> ./lat_pipe -P 1             43.5021       44.7904         41.3729
> 
> libMicro-0.4.1 test              base  32d59773da38    new patch
> ./cascade_cond -E -C 200\
>  -L -S -W -N "c_cond_1" -I 100    281    358          281
> 
> The performance problem also seem to gone with the latest patch, thanks.

Thanks for testing! I'm going to send it out and add your tested-by (and
reported-by).

-- 
Jens Axboe


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-01-04 15:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-26  8:45 [bug-report] possible performance problem in ret_to_user_from_irq Hui Tang
2022-12-26  8:45 ` Hui Tang
2023-01-03 10:06 ` Russell King (Oracle)
2023-01-03 10:06   ` Russell King (Oracle)
2023-01-03 14:25   ` Jens Axboe
2023-01-03 14:25     ` Jens Axboe
2023-01-03 14:34     ` Russell King (Oracle)
2023-01-03 14:34       ` Russell King (Oracle)
2023-01-03 14:59       ` Jens Axboe
2023-01-03 14:59         ` Jens Axboe
2023-01-04  1:31         ` Hui Tang
2023-01-04  1:31           ` Hui Tang
2023-01-04  7:04         ` Hui Tang
2023-01-04  7:04           ` Hui Tang
2023-01-04 14:45           ` Jens Axboe
2023-01-04 14:45             ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.