linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] tracing: Introduce restart_critical_timings()
@ 2024-03-20 16:20 Steven Rostedt
  2024-03-20 17:15 ` Mathieu Desnoyers
  2024-03-21  0:46 ` Waiman Long
  0 siblings, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2024-03-20 16:20 UTC (permalink / raw)
  To: LKML, Linux Trace Kernel
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-rt-users

From: Steven Rostedt (Google) <rostedt@goodmis.org>

I'm debugging some latency issues on a Chromebook and the preemptirqsoff
tracer hit this:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 5.15.148-21853-g165fd2387469-dirty
# --------------------------------------------------------------------
# latency: 7813 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
#    -----------------
#    | task: init-1 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: rwsem_optimistic_spin
#  => ended at:   rwsem_optimistic_spin
#
#
#                    _------=> CPU#            
#                   / _-----=> irqs-off        
#                  | / _----=> need-resched    
#                  || / _---=> hardirq/softirq 
#                  ||| / _--=> preempt-depth   
#                  |||| / _-=> migrate-disable 
#                  ||||| /     delay           
#  cmd     pid     |||||| time  |   caller     
#     \   /        ||||||  \    |    /       
   <...>-1         1...1.    0us!: rwsem_optimistic_spin+0x20/0x194 <-rwsem_optimistic_spin+0x20/0x194
   <...>-1         1.N.1. 7813us : rwsem_optimistic_spin+0x140/0x194 <-rwsem_optimistic_spin+0x140/0x194
   <...>-1         1.N.1. 7814us+: tracer_preempt_on+0x4c/0x6a <-rwsem_optimistic_spin+0x140/0x194
   <...>-1         1.N.1. 7824us : <stack trace>
 => rwsem_optimistic_spin+0x140/0x194
 => rwsem_down_write_slowpath+0xc9/0x3fe
 => copy_process+0xd08/0x1b8a
 => kernel_clone+0x94/0x256
 => __x64_sys_clone+0x7a/0x9a
 => do_syscall_64+0x51/0xa1
 => entry_SYSCALL_64_after_hwframe+0x5c/0xc6

Which isn't a real issue, as it's in the rwsem_optimistic_spin() code that
spins with interrupts enabled, preempt disabled, and checks for
need_resched(). If it is true, it breaks out and schedules.

Hence, it's hiding real issues, because it can spin for a very long time
and this is not the source of the latency I'm tracking down.

I would like to introduce restart_critical_timings() and place it in
locations that have this behavior.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 147feebd508c..e9f97f60bfc0 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -145,6 +145,13 @@ do {						\
 # define start_critical_timings() do { } while (0)
 #endif
 
+/* Used in spins that check need_resched() with preemption disabled */
+static inline void restart_critical_timings(void)
+{
+	stop_critical_timings();
+	start_critical_timings();
+}
+
 #ifdef CONFIG_DEBUG_IRQFLAGS
 extern void warn_bogus_irq_restore(void);
 #define raw_check_bogus_irq_restore()			\
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 2340b6d90ec6..fa7b43e53d20 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -27,6 +27,7 @@
 #include <linux/export.h>
 #include <linux/rwsem.h>
 #include <linux/atomic.h>
+#include <linux/irqflags.h>
 #include <trace/events/lock.h>
 
 #ifndef CONFIG_PREEMPT_RT
@@ -780,6 +781,7 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
 		 */
 		barrier();
 
+		restart_critical_timings();
 		if (need_resched() || !owner_on_cpu(owner)) {
 			state = OWNER_NONSPINNABLE;
 			break;
@@ -912,6 +914,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * a writer, need_resched() check needs to be done here.
 		 */
 		if (owner_state != OWNER_WRITER) {
+			restart_critical_timings();
 			if (need_resched())
 				break;
 			if (rt_task(current) &&

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] tracing: Introduce restart_critical_timings()
  2024-03-20 16:20 [RFC][PATCH] tracing: Introduce restart_critical_timings() Steven Rostedt
@ 2024-03-20 17:15 ` Mathieu Desnoyers
  2024-03-20 17:58   ` Steven Rostedt
  2024-03-21  0:46 ` Waiman Long
  1 sibling, 1 reply; 6+ messages in thread
From: Mathieu Desnoyers @ 2024-03-20 17:15 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linux Trace Kernel
  Cc: Masami Hiramatsu, Daniel Bristot de Oliveira, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, linux-rt-users

On 2024-03-20 12:20, Steven Rostedt wrote:
> From: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> I'm debugging some latency issues on a Chromebook and the preemptirqsoff
> tracer hit this:
> 
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 5.15.148-21853-g165fd2387469-dirty
> # --------------------------------------------------------------------
> # latency: 7813 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
> #    -----------------
> #    | task: init-1 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: rwsem_optimistic_spin
> #  => ended at:   rwsem_optimistic_spin
> #
> #
> #                    _------=> CPU#
> #                   / _-----=> irqs-off
> #                  | / _----=> need-resched
> #                  || / _---=> hardirq/softirq
> #                  ||| / _--=> preempt-depth
> #                  |||| / _-=> migrate-disable
> #                  ||||| /     delay
> #  cmd     pid     |||||| time  |   caller
> #     \   /        ||||||  \    |    /
>     <...>-1         1...1.    0us!: rwsem_optimistic_spin+0x20/0x194 <-rwsem_optimistic_spin+0x20/0x194
>     <...>-1         1.N.1. 7813us : rwsem_optimistic_spin+0x140/0x194 <-rwsem_optimistic_spin+0x140/0x194
>     <...>-1         1.N.1. 7814us+: tracer_preempt_on+0x4c/0x6a <-rwsem_optimistic_spin+0x140/0x194
>     <...>-1         1.N.1. 7824us : <stack trace>
>   => rwsem_optimistic_spin+0x140/0x194
>   => rwsem_down_write_slowpath+0xc9/0x3fe
>   => copy_process+0xd08/0x1b8a
>   => kernel_clone+0x94/0x256
>   => __x64_sys_clone+0x7a/0x9a
>   => do_syscall_64+0x51/0xa1
>   => entry_SYSCALL_64_after_hwframe+0x5c/0xc6
> 
> Which isn't a real issue, as it's in the rwsem_optimistic_spin() code that
> spins with interrupts enabled, preempt disabled, and checks for
> need_resched(). If it is true, it breaks out and schedules.
> 
> Hence, it's hiding real issues, because it can spin for a very long time
> and this is not the source of the latency I'm tracking down.
> 
> I would like to introduce restart_critical_timings() and place it in
> locations that have this behavior.

Is there any way you could move this to need_resched() rather than
sprinkle those everywhere ?

Thanks,

Mathieu

> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 147feebd508c..e9f97f60bfc0 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -145,6 +145,13 @@ do {						\
>   # define start_critical_timings() do { } while (0)
>   #endif
>   
> +/* Used in spins that check need_resched() with preemption disabled */
> +static inline void restart_critical_timings(void)
> +{
> +	stop_critical_timings();
> +	start_critical_timings();
> +}
> +
>   #ifdef CONFIG_DEBUG_IRQFLAGS
>   extern void warn_bogus_irq_restore(void);
>   #define raw_check_bogus_irq_restore()			\
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 2340b6d90ec6..fa7b43e53d20 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -27,6 +27,7 @@
>   #include <linux/export.h>
>   #include <linux/rwsem.h>
>   #include <linux/atomic.h>
> +#include <linux/irqflags.h>
>   #include <trace/events/lock.h>
>   
>   #ifndef CONFIG_PREEMPT_RT
> @@ -780,6 +781,7 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
>   		 */
>   		barrier();
>   
> +		restart_critical_timings();
>   		if (need_resched() || !owner_on_cpu(owner)) {
>   			state = OWNER_NONSPINNABLE;
>   			break;
> @@ -912,6 +914,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>   		 * a writer, need_resched() check needs to be done here.
>   		 */
>   		if (owner_state != OWNER_WRITER) {
> +			restart_critical_timings();
>   			if (need_resched())
>   				break;
>   			if (rt_task(current) &&

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] tracing: Introduce restart_critical_timings()
  2024-03-20 17:15 ` Mathieu Desnoyers
@ 2024-03-20 17:58   ` Steven Rostedt
  2024-03-20 18:35     ` Mathieu Desnoyers
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2024-03-20 17:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu,
	Daniel Bristot de Oliveira, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	linux-rt-users

On Wed, 20 Mar 2024 13:15:39 -0400
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > I would like to introduce restart_critical_timings() and place it in
> > locations that have this behavior.  
> 
> Is there any way you could move this to need_resched() rather than
> sprinkle those everywhere ?

Because need_resched() itself does not mean it's going to schedule
immediately. I looked at a few locations that need_resched() is called.
Most are in idle code where the critical timings are already handled.

I'm not sure I'd add it for places like mm/memory.c or drivers/md/bcache/btree.c.

A lot of places look to use it more for PREEMPT_NONE situations as a open
coded cond_resched().

The main reason this one is particularly an issue, is that it spins as long
as the owner is still running. Which may be some time, as here it was 7ms.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] tracing: Introduce restart_critical_timings()
  2024-03-20 17:58   ` Steven Rostedt
@ 2024-03-20 18:35     ` Mathieu Desnoyers
  0 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2024-03-20 18:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu,
	Daniel Bristot de Oliveira, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	linux-rt-users

On 2024-03-20 13:58, Steven Rostedt wrote:
> On Wed, 20 Mar 2024 13:15:39 -0400
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>>> I would like to introduce restart_critical_timings() and place it in
>>> locations that have this behavior.
>>
>> Is there any way you could move this to need_resched() rather than
>> sprinkle those everywhere ?
> 
> Because need_resched() itself does not mean it's going to schedule
> immediately. I looked at a few locations that need_resched() is called.
> Most are in idle code where the critical timings are already handled.
> 
> I'm not sure I'd add it for places like mm/memory.c or drivers/md/bcache/btree.c.
> 
> A lot of places look to use it more for PREEMPT_NONE situations as a open
> coded cond_resched().
> 
> The main reason this one is particularly an issue, is that it spins as long
> as the owner is still running. Which may be some time, as here it was 7ms.

What I think we should be discussing here is how calling need_resched()
should interact with the latency tracked by critical timings.

AFAIU, when code explicitly calls need_resched() in a loop, there are
two cases:

- need_resched() returns false: This means the loop can continue without
   causing long latency on the system. Technically we could restart the
   critical timings at this point.

- need_resched() returns true: This means the loop should exit quickly
   and call the scheduler. I would not reset the critical timings there,
   as whatever code is executed between need_resched() returning true
   and calling the scheduler is adding to latency.

Having stop/start critical timings around idle loops seems to just be
an optimization over that.

As for mm and driver/md code, what is wrong with doing a critical
timings reset when need_resched() returns false ? It would prevent
a whole class of false-positives rather than playing whack-a-mole with
those that pop up.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] tracing: Introduce restart_critical_timings()
  2024-03-20 16:20 [RFC][PATCH] tracing: Introduce restart_critical_timings() Steven Rostedt
  2024-03-20 17:15 ` Mathieu Desnoyers
@ 2024-03-21  0:46 ` Waiman Long
  2024-03-21  1:03   ` Steven Rostedt
  1 sibling, 1 reply; 6+ messages in thread
From: Waiman Long @ 2024-03-21  0:46 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linux Trace Kernel
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-rt-users

On 3/20/24 12:20, Steven Rostedt wrote:
> From: Steven Rostedt (Google) <rostedt@goodmis.org>
>
> I'm debugging some latency issues on a Chromebook and the preemptirqsoff
> tracer hit this:
>
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 5.15.148-21853-g165fd2387469-dirty
> # --------------------------------------------------------------------
> # latency: 7813 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
> #    -----------------
> #    | task: init-1 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: rwsem_optimistic_spin
> #  => ended at:   rwsem_optimistic_spin
> #
> #
> #                    _------=> CPU#
> #                   / _-----=> irqs-off
> #                  | / _----=> need-resched
> #                  || / _---=> hardirq/softirq
> #                  ||| / _--=> preempt-depth
> #                  |||| / _-=> migrate-disable
> #                  ||||| /     delay
> #  cmd     pid     |||||| time  |   caller
> #     \   /        ||||||  \    |    /
>     <...>-1         1...1.    0us!: rwsem_optimistic_spin+0x20/0x194 <-rwsem_optimistic_spin+0x20/0x194
>     <...>-1         1.N.1. 7813us : rwsem_optimistic_spin+0x140/0x194 <-rwsem_optimistic_spin+0x140/0x194
>     <...>-1         1.N.1. 7814us+: tracer_preempt_on+0x4c/0x6a <-rwsem_optimistic_spin+0x140/0x194
>     <...>-1         1.N.1. 7824us : <stack trace>
>   => rwsem_optimistic_spin+0x140/0x194
>   => rwsem_down_write_slowpath+0xc9/0x3fe
>   => copy_process+0xd08/0x1b8a
>   => kernel_clone+0x94/0x256
>   => __x64_sys_clone+0x7a/0x9a
>   => do_syscall_64+0x51/0xa1
>   => entry_SYSCALL_64_after_hwframe+0x5c/0xc6
>
> Which isn't a real issue, as it's in the rwsem_optimistic_spin() code that
> spins with interrupts enabled, preempt disabled, and checks for
> need_resched(). If it is true, it breaks out and schedules.
>
> Hence, it's hiding real issues, because it can spin for a very long time
> and this is not the source of the latency I'm tracking down.
>
> I would like to introduce restart_critical_timings() and place it in
> locations that have this behavior.
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

I have no objection to that. However, there are now 2 function call 
overhead in each iteration if either CONFIG_IRQSOFF_TRACER or 
CONFIG_PREEMPT_TRACER is on. Is it possible to do it with just one 
function call? IOW, make restart_critical_timings() a real function.

Cheers,
Longman

>
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 147feebd508c..e9f97f60bfc0 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -145,6 +145,13 @@ do {						\
>   # define start_critical_timings() do { } while (0)
>   #endif
>   
> +/* Used in spins that check need_resched() with preemption disabled */
> +static inline void restart_critical_timings(void)
> +{
> +	stop_critical_timings();
> +	start_critical_timings();
> +}
> +
>   #ifdef CONFIG_DEBUG_IRQFLAGS
>   extern void warn_bogus_irq_restore(void);
>   #define raw_check_bogus_irq_restore()			\
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 2340b6d90ec6..fa7b43e53d20 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -27,6 +27,7 @@
>   #include <linux/export.h>
>   #include <linux/rwsem.h>
>   #include <linux/atomic.h>
> +#include <linux/irqflags.h>
>   #include <trace/events/lock.h>
>   
>   #ifndef CONFIG_PREEMPT_RT
> @@ -780,6 +781,7 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
>   		 */
>   		barrier();
>   
> +		restart_critical_timings();
>   		if (need_resched() || !owner_on_cpu(owner)) {
>   			state = OWNER_NONSPINNABLE;
>   			break;
> @@ -912,6 +914,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>   		 * a writer, need_resched() check needs to be done here.
>   		 */
>   		if (owner_state != OWNER_WRITER) {
> +			restart_critical_timings();
>   			if (need_resched())
>   				break;
>   			if (rt_task(current) &&
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] tracing: Introduce restart_critical_timings()
  2024-03-21  0:46 ` Waiman Long
@ 2024-03-21  1:03   ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2024-03-21  1:03 UTC (permalink / raw)
  To: Waiman Long
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Will Deacon, Boqun Feng, linux-rt-users

On Wed, 20 Mar 2024 20:46:11 -0400
Waiman Long <longman@redhat.com> wrote:

> I have no objection to that. However, there are now 2 function call 
> overhead in each iteration if either CONFIG_IRQSOFF_TRACER or 
> CONFIG_PREEMPT_TRACER is on. Is it possible to do it with just one 
> function call? IOW, make restart_critical_timings() a real function.

Yeah, I could do that.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-21  1:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-20 16:20 [RFC][PATCH] tracing: Introduce restart_critical_timings() Steven Rostedt
2024-03-20 17:15 ` Mathieu Desnoyers
2024-03-20 17:58   ` Steven Rostedt
2024-03-20 18:35     ` Mathieu Desnoyers
2024-03-21  0:46 ` Waiman Long
2024-03-21  1:03   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).