All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viktor Rosendahl <viktor.rosendahl@gmail.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	Viktor Rosendahl <viktor.rosendahl@gmail.com>
Subject: Re: [PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers
Date: Mon,  6 May 2019 00:39:15 +0200	[thread overview]
Message-ID: <20190505223915.4569-1-viktor.rosendahl@gmail.com> (raw)
In-Reply-To: <20190504164710.GA55790@google.com>

On 5/4/19 6:47 PM, Joel Fernandes wrote:
> On Wed, May 01, 2019 at 10:36:47PM +0200, Viktor Rosendahl wrote:
>> immediately after each other in spite of the fact that the
>> preempt/irqsoff tracers operate in overwrite mode.
>>
>> Signed-off-by: Viktor Rosendahl <viktor.rosendahl@gmail.com>
>
> I agree with the general idea, but I don't really like how it is done in the
> patch.
>

Can you explain more precisely what you agree with?

The general idea of being able to trace bursts of latencies?

Or the slightly more specific idea of notifying user space when
/sys/kernel/debug/tracing/trace has new data, and let user space do the rest?

> We do have a notification mechanism already in the form of trace_pipe. Can we
> not improve that in some way to be notified of a new trace data? In theory,
> the trace_pipe does fit into the description in the documentation: "Reads
> from this file will block until new data is retrieved"
>

I am not quite sure what kind of solution you are after here. Could you be more
specific?

I think that it would be weird if we used trace_pipe to send a message with
the meaning "new data is available in the trace file". To me this would seem
like (re)inventing a special purpose alternative to inotify.

Another option would be to allow the user to consume the the latency traces
directly from trace_pipe, without using the trace file. This would make sense
to me and would indeed be a much better solution. If somebody is able to make
such an implementation I am all for it.

However, I do not know how to do this. I believe that it would lead towards a
significant rewrite of the latency tracers, probably also how the ftrace
buffering works.

My solution is clearly a hack but it has the benefits of requiring quite
small and straightforward changes to the kernel and the really ugly things are
then done in user space.

>>
>> +	config PREEMPTIRQ_FSNOTIFY

> Does this have to be a CONFIG option? If prefer if the code automatically
> does the notification and it is always enabled. I don't see any drawbacks of
> that.
>

I was just thinking that sending fsnotify events to a sysfs file is a bit off
the beaten track and I figured that some people would like to opt out.

It can of course be changed so that it is always enabled when a tracer that
requires it is enabled, if there is general agreement on this point.

>> +
>>  config SCHED_TRACER
>>  	bool "Scheduling Latency Tracer"
>>  	select GENERIC_TRACER
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index ca1ee656d6d8..ebefb8d4e072 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -44,6 +44,8 @@
>>  #include <linux/trace.h>
>>  #include <linux/sched/clock.h>
>>  #include <linux/sched/rt.h>
>> +#include <linux/fsnotify.h>
>> +#include <linux/workqueue.h>
>>
>>  #include "trace.h"
>>  #include "trace_output.h"
>> @@ -8191,6 +8193,32 @@ static __init void create_trace_instances(struct dentry *d_tracer)
>>  		return;
>>  }
>>
>> +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
>> +
>> +static void trace_notify_workfn(struct work_struct *work)
> [snip]
>
> I prefer if this facility is available to other tracers as well such as
> the wakeup tracer which is similar in output (check
> Documentation/trace/ftrace.txt). I believe this should be a generic trace
> facility, and not tracer specific.
>

This should be possible.

If we stick with the fsnotify idea, it's possible to move the functions for it
to trace.c and to use it also from other tracers.

It is also possible to do a few small adjustments to the latency-collector and
to rename it the trace-collector, so that it can work for the wakeup case also.
The random sleep games probably don't make sense for the wakeup case but it's
already now a command line option.

Below is a suggestion for how to make the facility more generic and usable by
by both preempt/irqsoff and wakeup.

best regards,

Viktor
---
 kernel/trace/Kconfig              | 11 +++++++
 kernel/trace/trace.c              | 50 +++++++++++++++++++++++++++++--
 kernel/trace/trace.h              | 15 ++++++++++
 kernel/trace/trace_irqsoff.c      |  5 ++++
 kernel/trace/trace_sched_wakeup.c |  7 ++++-
 5 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8bd1d6d001d7..ca6a63004adf 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -244,6 +244,17 @@ config SCHED_TRACER
 	  This tracer tracks the latency of the highest priority task
 	  to be scheduled in, starting from the point it has woken up.
 
+config TRACE_FSNOTIFY
+	bool "Generate fsnotify events for the trace file"
+	default n
+	depends on (IRQSOFF_TRACER || PREEMPT_TRACER || SCHED_TRACER)
+	depends on FSNOTIFY
+	help
+	  This option will enable the generation of fsnotify events for the
+	  trace file. This makes it possible for userspace to be notified about
+	  modification of /sys/kernel/debug/tracing/trace through the inotify
+	  interface.
+
 config HWLAT_TRACER
 	bool "Tracer to detect hardware latencies (like SMIs)"
 	select GENERIC_TRACER
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ca1ee656d6d8..1640fd302e9a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,8 @@
 #include <linux/trace.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/rt.h>
+#include <linux/fsnotify.h>
+#include <linux/workqueue.h>
 
 #include "trace.h"
 #include "trace_output.h"
@@ -8191,6 +8193,50 @@ static __init void create_trace_instances(struct dentry *d_tracer)
 		return;
 }
 
+#ifdef CONFIG_TRACE_FSNOTIFY
+
+static struct workqueue_struct *notify_wq;
+
+static void trace_notify_workfn(struct work_struct *work)
+{
+	struct trace_array *tr = container_of(work, struct trace_array,
+					      notify_work);
+	fsnotify(tr->d_trace->d_inode, FS_MODIFY, tr->d_trace->d_inode,
+		 FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void trace_create_trace_file(struct trace_array *tr,
+				    struct dentry *d_tracer)
+{
+	/* For notify we need to init the work structure and save the pointer */
+	INIT_WORK(&tr->notify_work, trace_notify_workfn);
+	tr->d_trace = trace_create_file("trace", 0644, d_tracer, tr,
+					&tracing_fops);
+}
+
+static __init void trace_file_notify_init(void)
+{
+	notify_wq = alloc_workqueue("irqsoff_notify_wq",
+				    WQ_UNBOUND | WQ_HIGHPRI, 0);
+	if (!notify_wq)
+		pr_err("Unable to allocate irqsoff_notify_wq");
+}
+
+void trace_file_notify(struct trace_array *tr)
+{
+	if (likely(notify_wq))
+		queue_work(notify_wq, &tr->notify_work);
+}
+
+#else /* !CONFIG_TRACE_FSNOTIFY */
+
+#define trace_create_trace_file(tr, d_tracer) \
+	trace_create_file("trace", 0644, d_tracer, tr, &tracing_fops)
+
+#define trace_file_notify_init() do {} while (0)
+
+#endif /* !CONFIG_TRACE_FSNOTIFY */
+
 static void
 init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 {
@@ -8209,8 +8255,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 	trace_create_file("trace_options", 0644, d_tracer,
 			  tr, &tracing_iter_fops);
 
-	trace_create_file("trace", 0644, d_tracer,
-			  tr, &tracing_fops);
+	trace_create_trace_file(tr, d_tracer);
 
 	trace_create_file("trace_pipe", 0444, d_tracer,
 			  tr, &tracing_pipe_fops);
@@ -8415,6 +8460,7 @@ static __init int tracer_init_tracefs(void)
 {
 	struct dentry *d_tracer;
 
+	trace_file_notify_init();
 	trace_access_lock_init();
 
 	d_tracer = tracing_init_dentry();
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d80cee49e0eb..46f20c9bcf36 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -17,6 +17,7 @@
 #include <linux/compiler.h>
 #include <linux/trace_seq.h>
 #include <linux/glob.h>
+#include <linux/workqueue.h>
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 #include <asm/unistd.h>		/* For NR_SYSCALLS	     */
@@ -302,6 +303,10 @@ struct trace_array {
 	struct dentry		*options;
 	struct dentry		*percpu_dir;
 	struct dentry		*event_dir;
+#ifdef CONFIG_TRACE_FSNOTIFY
+	struct dentry		*d_trace;
+	struct work_struct	notify_work;
+#endif
 	struct trace_options	*topts;
 	struct list_head	systems;
 	struct list_head	events;
@@ -774,6 +779,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
 		    struct trace_pid_list **new_pid_list,
 		    const char __user *ubuf, size_t cnt);
 
+#ifdef CONFIG_TRACE_FSNOTIFY
+
+void trace_file_notify(struct trace_array *tr);
+
+#else /* !CONFIG_TRACE_FSNOTIFY */
+
+#define trace_file_notify(tr) do {} while (0)
+
+#endif /* !CONFIG_TRACE_FSNOTIFY */
+
 #ifdef CONFIG_TRACER_MAX_TRACE
 void update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu,
 		   void *cond_data);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..911bb635fb4e 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -323,6 +323,7 @@ check_critical_timing(struct trace_array *tr,
 	u64 T0, T1, delta;
 	unsigned long flags;
 	int pc;
+	bool notify = false;
 
 	T0 = data->preempt_timestamp;
 	T1 = ftrace_now(cpu);
@@ -353,6 +354,7 @@ check_critical_timing(struct trace_array *tr,
 	if (likely(!is_tracing_stopped())) {
 		tr->max_latency = delta;
 		update_max_tr_single(tr, current, cpu);
+		notify = true;
 	}
 
 	max_sequence++;
@@ -364,6 +366,9 @@ check_critical_timing(struct trace_array *tr,
 	data->critical_sequence = max_sequence;
 	data->preempt_timestamp = ftrace_now(cpu);
 	__trace_function(tr, CALLER_ADDR0, parent_ip, flags, pc);
+
+	if (unlikely(notify))
+		trace_file_notify(tr);
 }
 
 static nokprobe_inline void
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 743b2b520d34..c87e901108b9 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -437,6 +437,8 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 	long disabled;
 	int cpu;
 	int pc;
+	bool report = false;
+	struct trace_array *tr = wakeup_trace;
 
 	tracing_record_cmdline(prev);
 
@@ -481,7 +483,8 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 	T1 = ftrace_now(cpu);
 	delta = T1-T0;
 
-	if (!report_latency(wakeup_trace, delta))
+	report = report_latency(wakeup_trace, delta);
+	if (!report)
 		goto out_unlock;
 
 	if (likely(!is_tracing_stopped())) {
@@ -495,6 +498,8 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 	local_irq_restore(flags);
 out:
 	atomic_dec(&per_cpu_ptr(wakeup_trace->trace_buffer.data, cpu)->disabled);
+	if (unlikely(report))
+		trace_file_notify(tr);
 }
 
 static void __wakeup_reset(struct trace_array *tr)
-- 
2.17.1


  parent reply	other threads:[~2019-05-05 22:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-01 20:36 [PATCH v2 0/4] Some new features for the preempt/irqsoff tracers Viktor Rosendahl
2019-05-01 20:36 ` [PATCH v2 1/4] ftrace: Implement fs notification for " Viktor Rosendahl
2019-05-04 16:47   ` Joel Fernandes
2019-05-05 19:43     ` Steven Rostedt
2019-05-05 22:39     ` Viktor Rosendahl [this message]
2019-05-05 23:01       ` Steven Rostedt
2019-05-05 23:54         ` Viktor Rosendahl
2019-05-06 14:00           ` Joel Fernandes
2019-05-01 20:36 ` [PATCH v2 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger Viktor Rosendahl
2019-05-01 20:36 ` [PATCH v2 3/4] Add the latency-collector to tools Viktor Rosendahl
2019-05-01 20:36 ` [PATCH v2 4/4] ftrace: Add an option for tracing console latencies Viktor Rosendahl
2019-05-02  1:38   ` Steven Rostedt
2019-05-02 18:37     ` Viktor Rosendahl
2019-05-02 21:12       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190505223915.4569-1-viktor.rosendahl@gmail.com \
    --to=viktor.rosendahl@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.