Linux-rt-users archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
       [not found] ` <20190903132602.3440-2-viktor.rosendahl@gmail.com>
@ 2019-09-04  4:00   ` Joel Fernandes
  2019-09-04  8:19     ` Peter Zijlstra
  2019-09-04 10:21     ` Paul E. McKenney
  0 siblings, 2 replies; 5+ messages in thread
From: Joel Fernandes @ 2019-09-04  4:00 UTC (permalink / raw)
  To: Viktor Rosendahl, paulmck
  Cc: Steven Rostedt, Ingo Molnar, linux-kernel, Peter Zijlstra,
	linux-rt-users

[ Resending since I messed up my last email's headers! ]

On Tue, Sep 03, 2019 at 03:25:59PM +0200, Viktor Rosendahl wrote:
> This patch implements the feature that the tracing_max_latency file,
> e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
> notifications through the fsnotify framework when a new latency is
> available.
> 
> One particularly interesting use of this facility is when enabling
> threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
> together with the preempt/irqsoff tracers. This makes it possible to
> implement a user space program that can, with equal probability,
> obtain traces of latencies that occur immediately after each other in
> spite of the fact that the preempt/irqsoff tracers operate in overwrite
> mode.

Adding Paul since RCU faces similar situations, i.e. raising softirq risks
scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to
avoid raising the softirq and instead use irq_work.

I was wondering, if we can rename __raise_softirq_irqoff() to
raise_softirq_irqoff_no_wake() and call that from places where there is risk
of scheduler related deadlocks. Then I think this can be used from Viktor's
code.  Let us discuss - what would happen if the softirq is raised, but
ksoftirqd is not awakened for this latency notification path? Is this really
an issue considering the softirq will execute during the next interrupt exit?

thanks,

 - Joel


> This facility works with the hwlat, preempt/irqsoff, and wakeup
> tracers.
> 
> This patch also adds some unfortunate calls from __schedule() and
> do_idle(). Those calls to the latency_fsnotify_disable/enable() are
> needed because we cannot wake up the workqueue from these critical
> sections without risking a deadlock. Similar problems would also arise
> if we try to schedule a tasklet, raise a softirq, or wake up a kernel
> thread. If a notification event would happen in the forbidden sections,
> we schedule the fsnotify work as soon as we have exited them.
> 
> There was a suggestion to remove this latency_fsnotify_enable/disable()
> gunk, or at least to combine it with the start_critical_timings() and
> stop_critical_timings(). I have however not been able to come up with
> a way to do it.
> 
> It seems like it would be possible to simply replace the calls to
> latency_fsnotify_enable/disable() with calls to
> start/stop_critical_timings(). However, the main problem is that it
> would not work for the wakup tracer. The wakeup tracer needs a
> facility that postpones the notifications, not one that prevents the
> measurements because all its measurements takes place in the middle
> of __schedule(). On the other hand, in some places, like in idle and
> the console we need start stop functions that prevents the
> measurements from being make.
> 
> Signed-off-by: Viktor Rosendahl <viktor.rosendahl@gmail.com>
> ---
>  include/linux/ftrace.h            |  31 +++++++++
>  kernel/sched/core.c               |   3 +
>  kernel/sched/idle.c               |   3 +
>  kernel/sched/sched.h              |   1 +
>  kernel/trace/trace.c              | 112 +++++++++++++++++++++++++++++-
>  kernel/trace/trace.h              |  22 ++++++
>  kernel/trace/trace_hwlat.c        |   4 +-
>  kernel/trace/trace_irqsoff.c      |   4 ++
>  kernel/trace/trace_sched_wakeup.c |   4 ++
>  9 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 8a8cb3c401b2..b4d9700ef917 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -907,4 +907,35 @@ unsigned long arch_syscall_addr(int nr);
>  
>  #endif /* CONFIG_FTRACE_SYSCALLS */
>  
> +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> +	defined(CONFIG_FSNOTIFY)
> +
> +DECLARE_PER_CPU(int, latency_notify_disable);
> +DECLARE_STATIC_KEY_FALSE(latency_notify_key);
> +
> +void latency_fsnotify_process(void);
> +
> +/*
> + * Disable/enable fsnotify while in scheduler and idle code. Trying to wake
> + * anything up from there, such as calling queue_work() is prone to deadlock.
> + */
> +static inline void latency_fsnotify_disable(void)
> +{
> +	this_cpu_inc(latency_notify_disable);
> +}
> +
> +static inline void latency_fsnotify_enable(void)
> +{
> +	this_cpu_dec(latency_notify_disable);
> +	if (static_branch_unlikely(&latency_notify_key))
> +		latency_fsnotify_process();
> +}
> +
> +#else
> +
> +#define latency_fsnotify_disable() do { } while (0)
> +#define latency_fsnotify_enable()  do { } while (0)
> +
> +#endif
> +
>  #endif /* _LINUX_FTRACE_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 010d578118d6..e3c1dc801073 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3198,6 +3198,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
>  	 */
>  
>  	rq = finish_task_switch(prev);
> +	latency_fsnotify_enable();
>  	balance_callback(rq);
>  	preempt_enable();
>  
> @@ -3820,6 +3821,7 @@ static void __sched notrace __schedule(bool preempt)
>  
>  	local_irq_disable();
>  	rcu_note_context_switch(preempt);
> +	latency_fsnotify_disable();
>  
>  	/*
>  	 * Make sure that signal_pending_state()->signal_pending() below
> @@ -3883,6 +3885,7 @@ static void __sched notrace __schedule(bool preempt)
>  		rq_unlock_irq(rq, &rf);
>  	}
>  
> +	latency_fsnotify_enable();
>  	balance_callback(rq);
>  }
>  
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 80940939b733..5fc87d99a407 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -236,6 +236,7 @@ static void do_idle(void)
>  
>  	__current_set_polling();
>  	tick_nohz_idle_enter();
> +	latency_fsnotify_disable();
>  
>  	while (!need_resched()) {
>  		check_pgt_cache();
> @@ -265,6 +266,8 @@ static void do_idle(void)
>  		arch_cpu_idle_exit();
>  	}
>  
> +	latency_fsnotify_enable();
> +
>  	/*
>  	 * Since we fell out of the loop above, we know TIF_NEED_RESCHED must
>  	 * be set, propagate it into PREEMPT_NEED_RESCHED.
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 802b1f3405f2..467d6ad03f16 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -46,6 +46,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/delayacct.h>
>  #include <linux/energy_model.h>
> +#include <linux/ftrace.h>
>  #include <linux/init_task.h>
>  #include <linux/kprobes.h>
>  #include <linux/kthread.h>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 563e80f9006a..a622263a69e4 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -44,6 +44,10 @@
>  #include <linux/trace.h>
>  #include <linux/sched/clock.h>
>  #include <linux/sched/rt.h>
> +#include <linux/fsnotify.h>
> +#include <linux/workqueue.h>
> +#include <trace/events/power.h>
> +#include <trace/events/sched.h>
>  
>  #include "trace.h"
>  #include "trace_output.h"
> @@ -1480,6 +1484,110 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt)
>  
>  unsigned long __read_mostly	tracing_thresh;
>  
> +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> +	defined(CONFIG_FSNOTIFY)
> +
> +static const struct file_operations tracing_max_lat_fops;
> +static struct workqueue_struct *fsnotify_wq;
> +static DEFINE_PER_CPU(struct llist_head, notify_list);
> +
> +DEFINE_PER_CPU(int, latency_notify_disable);
> +DEFINE_STATIC_KEY_FALSE(latency_notify_key);
> +
> +static void latency_fsnotify_workfn(struct work_struct *work)
> +{
> +	struct trace_array *tr = container_of(work, struct trace_array,
> +					      fsnotify_work);
> +	fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
> +		 tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
> +}
> +
> +static void trace_create_maxlat_file(struct trace_array *tr,
> +				     struct dentry *d_tracer)
> +{
> +	INIT_WORK(&tr->fsnotify_work, latency_fsnotify_workfn);
> +	atomic_set(&tr->notify_pending, 0);
> +	tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
> +					      d_tracer, &tr->max_latency,
> +					      &tracing_max_lat_fops);
> +}
> +
> +void latency_fsnotify_stop(void)
> +{
> +	/* Make sure all CPUs see caller's previous actions to stop tracer */
> +	smp_wmb();
> +	static_branch_disable(&latency_notify_key);
> +	latency_fsnotify_process();
> +}
> +
> +void latency_fsnotify_start(void)
> +{
> +	static_branch_enable(&latency_notify_key);
> +	/* Make sure all CPUs see key value before caller continue */
> +	smp_wmb();
> +}
> +
> +void latency_fsnotify_process(void)
> +{
> +	struct trace_array *tr;
> +	struct llist_head *list;
> +	struct llist_node *node;
> +
> +	if (this_cpu_read(latency_notify_disable))
> +		return;
> +
> +	list = this_cpu_ptr(&notify_list);
> +	for (node = llist_del_first(list); node != NULL;
> +	     node = llist_del_first(list)) {
> +		tr = llist_entry(node, struct trace_array, notify_ll);
> +		atomic_set(&tr->notify_pending, 0);
> +		queue_work(fsnotify_wq, &tr->fsnotify_work);
> +	}
> +}
> +
> +__init static int latency_fsnotify_init(void)
> +{
> +	fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
> +				      WQ_UNBOUND | WQ_HIGHPRI, 0);
> +	if (!fsnotify_wq) {
> +		pr_err("Unable to allocate tr_max_lat_wq\n");
> +		return -ENOMEM;
> +	}
> +	return 0;
> +}
> +
> +late_initcall_sync(latency_fsnotify_init);
> +
> +void latency_fsnotify(struct trace_array *tr)
> +{
> +	if (!fsnotify_wq)
> +		return;
> +
> +	if (!this_cpu_read(latency_notify_disable))
> +		queue_work(fsnotify_wq, &tr->fsnotify_work);
> +	else {
> +		/*
> +		 * notify_pending prevents us from adding the same entry to
> +		 * more than one notify_list. It will get queued in
> +		 * latency_enable_fsnotify()
> +		 */
> +		if (!atomic_xchg(&tr->notify_pending, 1))
> +			llist_add(&tr->notify_ll, this_cpu_ptr(&notify_list));
> +	}
> +}
> +
> +/*
> + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> + *  defined(CONFIG_FSNOTIFY)
> + */
> +#else
> +
> +#define trace_create_maxlat_file(tr, d_tracer)				\
> +	trace_create_file("tracing_max_latency", 0644, d_tracer,	\
> +			  &tr->max_latency, &tracing_max_lat_fops)
> +
> +#endif
> +
>  #ifdef CONFIG_TRACER_MAX_TRACE
>  /*
>   * Copy the new maximum trace into the separate maximum-trace
> @@ -1518,6 +1626,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
>  
>  	/* record this tasks comm */
>  	tracing_record_cmdline(tsk);
> +	latency_fsnotify(tr);
>  }
>  
>  /**
> @@ -8550,8 +8659,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
>  	create_trace_options_dir(tr);
>  
>  #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)
> -	trace_create_file("tracing_max_latency", 0644, d_tracer,
> -			&tr->max_latency, &tracing_max_lat_fops);
> +	trace_create_maxlat_file(tr, d_tracer);
>  #endif
>  
>  	if (ftrace_create_function_files(tr, d_tracer))
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 005f08629b8b..d9f83b2aaa71 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -16,6 +16,7 @@
>  #include <linux/trace_events.h>
>  #include <linux/compiler.h>
>  #include <linux/glob.h>
> +#include <linux/workqueue.h>
>  
>  #ifdef CONFIG_FTRACE_SYSCALLS
>  #include <asm/unistd.h>		/* For NR_SYSCALLS	     */
> @@ -264,6 +265,12 @@ struct trace_array {
>  #endif
>  #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)
>  	unsigned long		max_latency;
> +#ifdef CONFIG_FSNOTIFY
> +	struct dentry		*d_max_latency;
> +	struct work_struct	fsnotify_work;
> +	atomic_t		notify_pending;
> +	struct llist_node	notify_ll;
> +#endif
>  #endif
>  	struct trace_pid_list	__rcu *filtered_pids;
>  	/*
> @@ -785,6 +792,21 @@ void update_max_tr_single(struct trace_array *tr,
>  			  struct task_struct *tsk, int cpu);
>  #endif /* CONFIG_TRACER_MAX_TRACE */
>  
> +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> +	defined(CONFIG_FSNOTIFY)
> +
> +void latency_fsnotify(struct trace_array *tr);
> +void latency_fsnotify_start(void);
> +void latency_fsnotify_stop(void);
> +
> +#else
> +
> +#define latency_fsnotify(tr)     do { } while (0)
> +#define latency_fsnotify_start() do { } while (0)
> +#define latency_fsnotify_stop()  do { } while (0)
> +
> +#endif
> +
>  #ifdef CONFIG_STACKTRACE
>  void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
>  		   int pc);
> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> index fa95139445b2..9c379261ee89 100644
> --- a/kernel/trace/trace_hwlat.c
> +++ b/kernel/trace/trace_hwlat.c
> @@ -254,8 +254,10 @@ static int get_sample(void)
>  		trace_hwlat_sample(&s);
>  
>  		/* Keep a running maximum ever recorded hardware latency */
> -		if (sample > tr->max_latency)
> +		if (sample > tr->max_latency) {
>  			tr->max_latency = sample;
> +			latency_fsnotify(tr);
> +		}
>  	}
>  
>  out:
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index a745b0cee5d3..29403a83a5f0 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -557,6 +557,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
>  	if (irqsoff_busy)
>  		return -EBUSY;
>  
> +	latency_fsnotify_start();
>  	save_flags = tr->trace_flags;
>  
>  	/* non overwrite screws up the latency tracers */
> @@ -591,16 +592,19 @@ static void __irqsoff_tracer_reset(struct trace_array *tr)
>  	ftrace_reset_array_ops(tr);
>  
>  	irqsoff_busy = false;
> +	latency_fsnotify_stop();
>  }
>  
>  static void irqsoff_tracer_start(struct trace_array *tr)
>  {
> +	latency_fsnotify_start();
>  	tracer_enabled = 1;
>  }
>  
>  static void irqsoff_tracer_stop(struct trace_array *tr)
>  {
>  	tracer_enabled = 0;
> +	latency_fsnotify_stop();
>  }
>  
>  #ifdef CONFIG_IRQSOFF_TRACER
> diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> index 743b2b520d34..3dc90d9f605b 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -669,6 +669,7 @@ static bool wakeup_busy;
>  
>  static int __wakeup_tracer_init(struct trace_array *tr)
>  {
> +	latency_fsnotify_start();
>  	save_flags = tr->trace_flags;
>  
>  	/* non overwrite screws up the latency tracers */
> @@ -727,10 +728,12 @@ static void wakeup_tracer_reset(struct trace_array *tr)
>  	set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag);
>  	ftrace_reset_array_ops(tr);
>  	wakeup_busy = false;
> +	latency_fsnotify_stop();
>  }
>  
>  static void wakeup_tracer_start(struct trace_array *tr)
>  {
> +	latency_fsnotify_start();
>  	wakeup_reset(tr);
>  	tracer_enabled = 1;
>  }
> @@ -738,6 +741,7 @@ static void wakeup_tracer_start(struct trace_array *tr)
>  static void wakeup_tracer_stop(struct trace_array *tr)
>  {
>  	tracer_enabled = 0;
> +	latency_fsnotify_stop();
>  }
>  
>  static struct tracer wakeup_tracer __read_mostly =
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
  2019-09-04  4:00   ` [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency Joel Fernandes
@ 2019-09-04  8:19     ` Peter Zijlstra
  2019-09-04 13:42       ` Joel Fernandes
  2019-09-04 18:17       ` Viktor Rosendahl
  2019-09-04 10:21     ` Paul E. McKenney
  1 sibling, 2 replies; 5+ messages in thread
From: Peter Zijlstra @ 2019-09-04  8:19 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Viktor Rosendahl, paulmck, Steven Rostedt, Ingo Molnar,
	linux-kernel, linux-rt-users

On Wed, Sep 04, 2019 at 12:00:39AM -0400, Joel Fernandes wrote:
> [ Resending since I messed up my last email's headers! ]
> 
> On Tue, Sep 03, 2019 at 03:25:59PM +0200, Viktor Rosendahl wrote:
> > This patch implements the feature that the tracing_max_latency file,
> > e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
> > notifications through the fsnotify framework when a new latency is
> > available.
> > 
> > One particularly interesting use of this facility is when enabling
> > threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
> > together with the preempt/irqsoff tracers. This makes it possible to
> > implement a user space program that can, with equal probability,
> > obtain traces of latencies that occur immediately after each other in
> > spite of the fact that the preempt/irqsoff tracers operate in overwrite
> > mode.
> 
> Adding Paul since RCU faces similar situations, i.e. raising softirq risks
> scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to
> avoid raising the softirq and instead use irq_work.

Which is right.

> I was wondering, if we can rename __raise_softirq_irqoff() to
> raise_softirq_irqoff_no_wake() and call that from places where there is risk
> of scheduler related deadlocks. Then I think this can be used from Viktor's
> code.  Let us discuss - what would happen if the softirq is raised, but
> ksoftirqd is not awakened for this latency notification path? Is this really
> an issue considering the softirq will execute during the next interrupt exit?

You'd get unbounded latency for processing the softirq and warnings on
going idle with softirqs pending.

I really don't see why we should/want to be using softirq here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
  2019-09-04  4:00   ` [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency Joel Fernandes
  2019-09-04  8:19     ` Peter Zijlstra
@ 2019-09-04 10:21     ` Paul E. McKenney
  1 sibling, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2019-09-04 10:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Viktor Rosendahl, Steven Rostedt, Ingo Molnar, linux-kernel,
	Peter Zijlstra, linux-rt-users

On Wed, Sep 04, 2019 at 12:00:39AM -0400, Joel Fernandes wrote:
> [ Resending since I messed up my last email's headers! ]
> 
> On Tue, Sep 03, 2019 at 03:25:59PM +0200, Viktor Rosendahl wrote:
> > This patch implements the feature that the tracing_max_latency file,
> > e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
> > notifications through the fsnotify framework when a new latency is
> > available.
> > 
> > One particularly interesting use of this facility is when enabling
> > threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
> > together with the preempt/irqsoff tracers. This makes it possible to
> > implement a user space program that can, with equal probability,
> > obtain traces of latencies that occur immediately after each other in
> > spite of the fact that the preempt/irqsoff tracers operate in overwrite
> > mode.
> 
> Adding Paul since RCU faces similar situations, i.e. raising softirq risks
> scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to
> avoid raising the softirq and instead use irq_work.

Peter's solution, actually.  ;-)

							Thanx, Paul

> I was wondering, if we can rename __raise_softirq_irqoff() to
> raise_softirq_irqoff_no_wake() and call that from places where there is risk
> of scheduler related deadlocks. Then I think this can be used from Viktor's
> code.  Let us discuss - what would happen if the softirq is raised, but
> ksoftirqd is not awakened for this latency notification path? Is this really
> an issue considering the softirq will execute during the next interrupt exit?
> 
> thanks,
> 
>  - Joel
> 
> 
> > This facility works with the hwlat, preempt/irqsoff, and wakeup
> > tracers.
> > 
> > This patch also adds some unfortunate calls from __schedule() and
> > do_idle(). Those calls to the latency_fsnotify_disable/enable() are
> > needed because we cannot wake up the workqueue from these critical
> > sections without risking a deadlock. Similar problems would also arise
> > if we try to schedule a tasklet, raise a softirq, or wake up a kernel
> > thread. If a notification event would happen in the forbidden sections,
> > we schedule the fsnotify work as soon as we have exited them.
> > 
> > There was a suggestion to remove this latency_fsnotify_enable/disable()
> > gunk, or at least to combine it with the start_critical_timings() and
> > stop_critical_timings(). I have however not been able to come up with
> > a way to do it.
> > 
> > It seems like it would be possible to simply replace the calls to
> > latency_fsnotify_enable/disable() with calls to
> > start/stop_critical_timings(). However, the main problem is that it
> > would not work for the wakup tracer. The wakeup tracer needs a
> > facility that postpones the notifications, not one that prevents the
> > measurements because all its measurements takes place in the middle
> > of __schedule(). On the other hand, in some places, like in idle and
> > the console we need start stop functions that prevents the
> > measurements from being make.
> > 
> > Signed-off-by: Viktor Rosendahl <viktor.rosendahl@gmail.com>
> > ---
> >  include/linux/ftrace.h            |  31 +++++++++
> >  kernel/sched/core.c               |   3 +
> >  kernel/sched/idle.c               |   3 +
> >  kernel/sched/sched.h              |   1 +
> >  kernel/trace/trace.c              | 112 +++++++++++++++++++++++++++++-
> >  kernel/trace/trace.h              |  22 ++++++
> >  kernel/trace/trace_hwlat.c        |   4 +-
> >  kernel/trace/trace_irqsoff.c      |   4 ++
> >  kernel/trace/trace_sched_wakeup.c |   4 ++
> >  9 files changed, 181 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 8a8cb3c401b2..b4d9700ef917 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -907,4 +907,35 @@ unsigned long arch_syscall_addr(int nr);
> >  
> >  #endif /* CONFIG_FTRACE_SYSCALLS */
> >  
> > +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> > +	defined(CONFIG_FSNOTIFY)
> > +
> > +DECLARE_PER_CPU(int, latency_notify_disable);
> > +DECLARE_STATIC_KEY_FALSE(latency_notify_key);
> > +
> > +void latency_fsnotify_process(void);
> > +
> > +/*
> > + * Disable/enable fsnotify while in scheduler and idle code. Trying to wake
> > + * anything up from there, such as calling queue_work() is prone to deadlock.
> > + */
> > +static inline void latency_fsnotify_disable(void)
> > +{
> > +	this_cpu_inc(latency_notify_disable);
> > +}
> > +
> > +static inline void latency_fsnotify_enable(void)
> > +{
> > +	this_cpu_dec(latency_notify_disable);
> > +	if (static_branch_unlikely(&latency_notify_key))
> > +		latency_fsnotify_process();
> > +}
> > +
> > +#else
> > +
> > +#define latency_fsnotify_disable() do { } while (0)
> > +#define latency_fsnotify_enable()  do { } while (0)
> > +
> > +#endif
> > +
> >  #endif /* _LINUX_FTRACE_H */
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 010d578118d6..e3c1dc801073 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3198,6 +3198,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >  	 */
> >  
> >  	rq = finish_task_switch(prev);
> > +	latency_fsnotify_enable();
> >  	balance_callback(rq);
> >  	preempt_enable();
> >  
> > @@ -3820,6 +3821,7 @@ static void __sched notrace __schedule(bool preempt)
> >  
> >  	local_irq_disable();
> >  	rcu_note_context_switch(preempt);
> > +	latency_fsnotify_disable();
> >  
> >  	/*
> >  	 * Make sure that signal_pending_state()->signal_pending() below
> > @@ -3883,6 +3885,7 @@ static void __sched notrace __schedule(bool preempt)
> >  		rq_unlock_irq(rq, &rf);
> >  	}
> >  
> > +	latency_fsnotify_enable();
> >  	balance_callback(rq);
> >  }
> >  
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index 80940939b733..5fc87d99a407 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -236,6 +236,7 @@ static void do_idle(void)
> >  
> >  	__current_set_polling();
> >  	tick_nohz_idle_enter();
> > +	latency_fsnotify_disable();
> >  
> >  	while (!need_resched()) {
> >  		check_pgt_cache();
> > @@ -265,6 +266,8 @@ static void do_idle(void)
> >  		arch_cpu_idle_exit();
> >  	}
> >  
> > +	latency_fsnotify_enable();
> > +
> >  	/*
> >  	 * Since we fell out of the loop above, we know TIF_NEED_RESCHED must
> >  	 * be set, propagate it into PREEMPT_NEED_RESCHED.
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 802b1f3405f2..467d6ad03f16 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -46,6 +46,7 @@
> >  #include <linux/debugfs.h>
> >  #include <linux/delayacct.h>
> >  #include <linux/energy_model.h>
> > +#include <linux/ftrace.h>
> >  #include <linux/init_task.h>
> >  #include <linux/kprobes.h>
> >  #include <linux/kthread.h>
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 563e80f9006a..a622263a69e4 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -44,6 +44,10 @@
> >  #include <linux/trace.h>
> >  #include <linux/sched/clock.h>
> >  #include <linux/sched/rt.h>
> > +#include <linux/fsnotify.h>
> > +#include <linux/workqueue.h>
> > +#include <trace/events/power.h>
> > +#include <trace/events/sched.h>
> >  
> >  #include "trace.h"
> >  #include "trace_output.h"
> > @@ -1480,6 +1484,110 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt)
> >  
> >  unsigned long __read_mostly	tracing_thresh;
> >  
> > +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> > +	defined(CONFIG_FSNOTIFY)
> > +
> > +static const struct file_operations tracing_max_lat_fops;
> > +static struct workqueue_struct *fsnotify_wq;
> > +static DEFINE_PER_CPU(struct llist_head, notify_list);
> > +
> > +DEFINE_PER_CPU(int, latency_notify_disable);
> > +DEFINE_STATIC_KEY_FALSE(latency_notify_key);
> > +
> > +static void latency_fsnotify_workfn(struct work_struct *work)
> > +{
> > +	struct trace_array *tr = container_of(work, struct trace_array,
> > +					      fsnotify_work);
> > +	fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
> > +		 tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
> > +}
> > +
> > +static void trace_create_maxlat_file(struct trace_array *tr,
> > +				     struct dentry *d_tracer)
> > +{
> > +	INIT_WORK(&tr->fsnotify_work, latency_fsnotify_workfn);
> > +	atomic_set(&tr->notify_pending, 0);
> > +	tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
> > +					      d_tracer, &tr->max_latency,
> > +					      &tracing_max_lat_fops);
> > +}
> > +
> > +void latency_fsnotify_stop(void)
> > +{
> > +	/* Make sure all CPUs see caller's previous actions to stop tracer */
> > +	smp_wmb();
> > +	static_branch_disable(&latency_notify_key);
> > +	latency_fsnotify_process();
> > +}
> > +
> > +void latency_fsnotify_start(void)
> > +{
> > +	static_branch_enable(&latency_notify_key);
> > +	/* Make sure all CPUs see key value before caller continue */
> > +	smp_wmb();
> > +}
> > +
> > +void latency_fsnotify_process(void)
> > +{
> > +	struct trace_array *tr;
> > +	struct llist_head *list;
> > +	struct llist_node *node;
> > +
> > +	if (this_cpu_read(latency_notify_disable))
> > +		return;
> > +
> > +	list = this_cpu_ptr(&notify_list);
> > +	for (node = llist_del_first(list); node != NULL;
> > +	     node = llist_del_first(list)) {
> > +		tr = llist_entry(node, struct trace_array, notify_ll);
> > +		atomic_set(&tr->notify_pending, 0);
> > +		queue_work(fsnotify_wq, &tr->fsnotify_work);
> > +	}
> > +}
> > +
> > +__init static int latency_fsnotify_init(void)
> > +{
> > +	fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
> > +				      WQ_UNBOUND | WQ_HIGHPRI, 0);
> > +	if (!fsnotify_wq) {
> > +		pr_err("Unable to allocate tr_max_lat_wq\n");
> > +		return -ENOMEM;
> > +	}
> > +	return 0;
> > +}
> > +
> > +late_initcall_sync(latency_fsnotify_init);
> > +
> > +void latency_fsnotify(struct trace_array *tr)
> > +{
> > +	if (!fsnotify_wq)
> > +		return;
> > +
> > +	if (!this_cpu_read(latency_notify_disable))
> > +		queue_work(fsnotify_wq, &tr->fsnotify_work);
> > +	else {
> > +		/*
> > +		 * notify_pending prevents us from adding the same entry to
> > +		 * more than one notify_list. It will get queued in
> > +		 * latency_enable_fsnotify()
> > +		 */
> > +		if (!atomic_xchg(&tr->notify_pending, 1))
> > +			llist_add(&tr->notify_ll, this_cpu_ptr(&notify_list));
> > +	}
> > +}
> > +
> > +/*
> > + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> > + *  defined(CONFIG_FSNOTIFY)
> > + */
> > +#else
> > +
> > +#define trace_create_maxlat_file(tr, d_tracer)				\
> > +	trace_create_file("tracing_max_latency", 0644, d_tracer,	\
> > +			  &tr->max_latency, &tracing_max_lat_fops)
> > +
> > +#endif
> > +
> >  #ifdef CONFIG_TRACER_MAX_TRACE
> >  /*
> >   * Copy the new maximum trace into the separate maximum-trace
> > @@ -1518,6 +1626,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
> >  
> >  	/* record this tasks comm */
> >  	tracing_record_cmdline(tsk);
> > +	latency_fsnotify(tr);
> >  }
> >  
> >  /**
> > @@ -8550,8 +8659,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
> >  	create_trace_options_dir(tr);
> >  
> >  #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)
> > -	trace_create_file("tracing_max_latency", 0644, d_tracer,
> > -			&tr->max_latency, &tracing_max_lat_fops);
> > +	trace_create_maxlat_file(tr, d_tracer);
> >  #endif
> >  
> >  	if (ftrace_create_function_files(tr, d_tracer))
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 005f08629b8b..d9f83b2aaa71 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -16,6 +16,7 @@
> >  #include <linux/trace_events.h>
> >  #include <linux/compiler.h>
> >  #include <linux/glob.h>
> > +#include <linux/workqueue.h>
> >  
> >  #ifdef CONFIG_FTRACE_SYSCALLS
> >  #include <asm/unistd.h>		/* For NR_SYSCALLS	     */
> > @@ -264,6 +265,12 @@ struct trace_array {
> >  #endif
> >  #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)
> >  	unsigned long		max_latency;
> > +#ifdef CONFIG_FSNOTIFY
> > +	struct dentry		*d_max_latency;
> > +	struct work_struct	fsnotify_work;
> > +	atomic_t		notify_pending;
> > +	struct llist_node	notify_ll;
> > +#endif
> >  #endif
> >  	struct trace_pid_list	__rcu *filtered_pids;
> >  	/*
> > @@ -785,6 +792,21 @@ void update_max_tr_single(struct trace_array *tr,
> >  			  struct task_struct *tsk, int cpu);
> >  #endif /* CONFIG_TRACER_MAX_TRACE */
> >  
> > +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
> > +	defined(CONFIG_FSNOTIFY)
> > +
> > +void latency_fsnotify(struct trace_array *tr);
> > +void latency_fsnotify_start(void);
> > +void latency_fsnotify_stop(void);
> > +
> > +#else
> > +
> > +#define latency_fsnotify(tr)     do { } while (0)
> > +#define latency_fsnotify_start() do { } while (0)
> > +#define latency_fsnotify_stop()  do { } while (0)
> > +
> > +#endif
> > +
> >  #ifdef CONFIG_STACKTRACE
> >  void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
> >  		   int pc);
> > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> > index fa95139445b2..9c379261ee89 100644
> > --- a/kernel/trace/trace_hwlat.c
> > +++ b/kernel/trace/trace_hwlat.c
> > @@ -254,8 +254,10 @@ static int get_sample(void)
> >  		trace_hwlat_sample(&s);
> >  
> >  		/* Keep a running maximum ever recorded hardware latency */
> > -		if (sample > tr->max_latency)
> > +		if (sample > tr->max_latency) {
> >  			tr->max_latency = sample;
> > +			latency_fsnotify(tr);
> > +		}
> >  	}
> >  
> >  out:
> > diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> > index a745b0cee5d3..29403a83a5f0 100644
> > --- a/kernel/trace/trace_irqsoff.c
> > +++ b/kernel/trace/trace_irqsoff.c
> > @@ -557,6 +557,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
> >  	if (irqsoff_busy)
> >  		return -EBUSY;
> >  
> > +	latency_fsnotify_start();
> >  	save_flags = tr->trace_flags;
> >  
> >  	/* non overwrite screws up the latency tracers */
> > @@ -591,16 +592,19 @@ static void __irqsoff_tracer_reset(struct trace_array *tr)
> >  	ftrace_reset_array_ops(tr);
> >  
> >  	irqsoff_busy = false;
> > +	latency_fsnotify_stop();
> >  }
> >  
> >  static void irqsoff_tracer_start(struct trace_array *tr)
> >  {
> > +	latency_fsnotify_start();
> >  	tracer_enabled = 1;
> >  }
> >  
> >  static void irqsoff_tracer_stop(struct trace_array *tr)
> >  {
> >  	tracer_enabled = 0;
> > +	latency_fsnotify_stop();
> >  }
> >  
> >  #ifdef CONFIG_IRQSOFF_TRACER
> > diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> > index 743b2b520d34..3dc90d9f605b 100644
> > --- a/kernel/trace/trace_sched_wakeup.c
> > +++ b/kernel/trace/trace_sched_wakeup.c
> > @@ -669,6 +669,7 @@ static bool wakeup_busy;
> >  
> >  static int __wakeup_tracer_init(struct trace_array *tr)
> >  {
> > +	latency_fsnotify_start();
> >  	save_flags = tr->trace_flags;
> >  
> >  	/* non overwrite screws up the latency tracers */
> > @@ -727,10 +728,12 @@ static void wakeup_tracer_reset(struct trace_array *tr)
> >  	set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag);
> >  	ftrace_reset_array_ops(tr);
> >  	wakeup_busy = false;
> > +	latency_fsnotify_stop();
> >  }
> >  
> >  static void wakeup_tracer_start(struct trace_array *tr)
> >  {
> > +	latency_fsnotify_start();
> >  	wakeup_reset(tr);
> >  	tracer_enabled = 1;
> >  }
> > @@ -738,6 +741,7 @@ static void wakeup_tracer_start(struct trace_array *tr)
> >  static void wakeup_tracer_stop(struct trace_array *tr)
> >  {
> >  	tracer_enabled = 0;
> > +	latency_fsnotify_stop();
> >  }
> >  
> >  static struct tracer wakeup_tracer __read_mostly =
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
  2019-09-04  8:19     ` Peter Zijlstra
@ 2019-09-04 13:42       ` Joel Fernandes
  2019-09-04 18:17       ` Viktor Rosendahl
  1 sibling, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2019-09-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viktor Rosendahl, paulmck, Steven Rostedt, Ingo Molnar,
	linux-kernel, linux-rt-users

On Wed, Sep 04, 2019 at 10:19:19AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 04, 2019 at 12:00:39AM -0400, Joel Fernandes wrote:
> > [ Resending since I messed up my last email's headers! ]
> > 
> > On Tue, Sep 03, 2019 at 03:25:59PM +0200, Viktor Rosendahl wrote:
> > > This patch implements the feature that the tracing_max_latency file,
> > > e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
> > > notifications through the fsnotify framework when a new latency is
> > > available.
> > > 
> > > One particularly interesting use of this facility is when enabling
> > > threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
> > > together with the preempt/irqsoff tracers. This makes it possible to
> > > implement a user space program that can, with equal probability,
> > > obtain traces of latencies that occur immediately after each other in
> > > spite of the fact that the preempt/irqsoff tracers operate in overwrite
> > > mode.
> > 
> > Adding Paul since RCU faces similar situations, i.e. raising softirq risks
> > scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to
> > avoid raising the softirq and instead use irq_work.
> 
> Which is right.

Cool.

> > I was wondering, if we can rename __raise_softirq_irqoff() to
> > raise_softirq_irqoff_no_wake() and call that from places where there is risk
> > of scheduler related deadlocks. Then I think this can be used from Viktor's
> > code.  Let us discuss - what would happen if the softirq is raised, but
> > ksoftirqd is not awakened for this latency notification path? Is this really
> > an issue considering the softirq will execute during the next interrupt exit?
> 
> You'd get unbounded latency for processing the softirq and warnings on
> going idle with softirqs pending.

Thanks for sharing that.

> I really don't see why we should/want to be using softirq here.

Sure. makes sense.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
  2019-09-04  8:19     ` Peter Zijlstra
  2019-09-04 13:42       ` Joel Fernandes
@ 2019-09-04 18:17       ` Viktor Rosendahl
  1 sibling, 0 replies; 5+ messages in thread
From: Viktor Rosendahl @ 2019-09-04 18:17 UTC (permalink / raw)
  To: Peter Zijlstra, Joel Fernandes
  Cc: paulmck, Steven Rostedt, Ingo Molnar, linux-kernel, linux-rt-users

On 9/4/19 10:19 AM, Peter Zijlstra wrote:

>>
>> Adding Paul since RCU faces similar situations, i.e. raising softirq risks
>> scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to
>> avoid raising the softirq and instead use irq_work.
> 
> Which is right.
> 

Thanks Joel and Peter for suggesting the irq_work facility.

I was ignorant of the existence of this facility.

It looks like it is the Holy Grail of deferred work, for this particular 
type of problem :)

It looks like it will be possible to both avoid deadlock and also avoid 
the ugly additions to the sched and idle code. In addition, the rest of 
the code will become a lot simpler.

best regards,

Viktor

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190903132602.3440-1-viktor.rosendahl@gmail.com>
     [not found] ` <20190903132602.3440-2-viktor.rosendahl@gmail.com>
2019-09-04  4:00   ` [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency Joel Fernandes
2019-09-04  8:19     ` Peter Zijlstra
2019-09-04 13:42       ` Joel Fernandes
2019-09-04 18:17       ` Viktor Rosendahl
2019-09-04 10:21     ` Paul E. McKenney

Linux-rt-users archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org linux-rt-users@archiver.kernel.org
	public-inbox-index linux-rt-users


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/ public-inbox