linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tracer_init_tracefs really slow
@ 2020-09-07 16:16 Lucas Stach
  2020-12-04  2:18 ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Lucas Stach @ 2020-09-07 16:16 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, kernel

Hi all,

one of my colleagues has taken a look at device boot times and stumbled
across a pretty big amount of kernel boot time being spent in
tracer_init_tracefs(). On this particular i.MX6Q based device the
kernel spends more than 1 second in this function, which is a
significant amount of the overall kernel inititalization time. While
this machine is no rocket with its Cortex A9 @ 800MHz, the amount of
CPU time being used there is pretty irritating.

Specifically the issue lies within trace_event_eval_update where ~1100
trace_event_calls get updated with ~500 trace_eval_maps. I haven't had
a chance yet to dig any deeper or try to understand more of what's
going on there, but I wanted to get the issue out there in case anyone
has some cycles to spare to help us along.

The obvious questions for now are:
1. Why is this function so damn expensive (at least on this whimpy ARM
machine)? and
2. Could any of this be done asynchronously, to not block the kernel in
early init?

Regards,
Lucas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tracer_init_tracefs really slow
  2020-09-07 16:16 tracer_init_tracefs really slow Lucas Stach
@ 2020-12-04  2:18 ` Steven Rostedt
  2020-12-07 16:24   ` Lucas Stach
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2020-12-04  2:18 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Ingo Molnar, linux-kernel, kernel


Sorry for the really late reply, but I received this while I was on
vacation, and my backlog was so big when I got back that I left most of
it unread. :-/  And to make matters worse, my out-of-office script
wasn't working, to let people know I was on vacation.

On Mon, 07 Sep 2020 18:16:52 +0200
Lucas Stach <l.stach@pengutronix.de> wrote:

> Hi all,
> 
> one of my colleagues has taken a look at device boot times and stumbled
> across a pretty big amount of kernel boot time being spent in
> tracer_init_tracefs(). On this particular i.MX6Q based device the
> kernel spends more than 1 second in this function, which is a
> significant amount of the overall kernel inititalization time. While
> this machine is no rocket with its Cortex A9 @ 800MHz, the amount of
> CPU time being used there is pretty irritating.
> 
> Specifically the issue lies within trace_event_eval_update where ~1100
> trace_event_calls get updated with ~500 trace_eval_maps. I haven't had
> a chance yet to dig any deeper or try to understand more of what's
> going on there, but I wanted to get the issue out there in case anyone
> has some cycles to spare to help us along.

OK, that makes sense. The macro TRACE_DEFINE_ENUM() will make a mapping
of enums into their values. This is needed because if an enum is used
in tp_printk() of a TRACE_EVENT(), the name of the ENUM is passed to
user space. The enum name is useless to user space, so this function
will scan the strings that are exported to user space and convert the
enum name to the enum values.

> 
> The obvious questions for now are:
> 1. Why is this function so damn expensive (at least on this whimpy ARM
> machine)? and

Well, it's doing a string substitution for thousands of events.


> 2. Could any of this be done asynchronously, to not block the kernel in
> early init?

Yes :-)

We could make a thread that does this, that the init wakes up and runs,
letting the kernel to move forward. Would you like to make that patch
or shall I?

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tracer_init_tracefs really slow
  2020-12-04  2:18 ` Steven Rostedt
@ 2020-12-07 16:24   ` Lucas Stach
  2020-12-07 16:47     ` Steven Rostedt
  2020-12-07 19:47     ` Steven Rostedt
  0 siblings, 2 replies; 7+ messages in thread
From: Lucas Stach @ 2020-12-07 16:24 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, kernel

Hi Steven,

Am Donnerstag, den 03.12.2020, 21:18 -0500 schrieb Steven Rostedt:
> Sorry for the really late reply, but I received this while I was on
> vacation, and my backlog was so big when I got back that I left most of
> it unread. :-/  And to make matters worse, my out-of-office script
> wasn't working, to let people know I was on vacation.

No problem, I already figured that this might have fallen through the
cracks. It's also not really a high prio issue for us.

> On Mon, 07 Sep 2020 18:16:52 +0200
> Lucas Stach <l.stach@pengutronix.de> wrote:
> 
> > Hi all,
> > 
> > one of my colleagues has taken a look at device boot times and stumbled
> > across a pretty big amount of kernel boot time being spent in
> > tracer_init_tracefs(). On this particular i.MX6Q based device the
> > kernel spends more than 1 second in this function, which is a
> > significant amount of the overall kernel inititalization time. While
> > this machine is no rocket with its Cortex A9 @ 800MHz, the amount of
> > CPU time being used there is pretty irritating.
> > 
> > Specifically the issue lies within trace_event_eval_update where ~1100
> > trace_event_calls get updated with ~500 trace_eval_maps. I haven't had
> > a chance yet to dig any deeper or try to understand more of what's
> > going on there, but I wanted to get the issue out there in case anyone
> > has some cycles to spare to help us along.
> 
> OK, that makes sense. The macro TRACE_DEFINE_ENUM() will make a mapping
> of enums into their values. This is needed because if an enum is used
> in tp_printk() of a TRACE_EVENT(), the name of the ENUM is passed to
> user space. The enum name is useless to user space, so this function
> will scan the strings that are exported to user space and convert the
> enum name to the enum values.
> 
> > 
> > The obvious questions for now are:
> > 1. Why is this function so damn expensive (at least on this whimpy ARM
> > machine)? and
> 
> Well, it's doing a string substitution for thousands of events.
> 
> 
> > 2. Could any of this be done asynchronously, to not block the kernel in
> > early init?
> 
> Yes :-)
> 
> We could make a thread that does this, that the init wakes up and runs,
> letting the kernel to move forward. Would you like to make that patch
> or shall I?

I guess you are much more likely to come up with a correct patch, as
I'm not really clear yet on when we would need to synchronize this
thread, to make sure things are available before they get used by
something. I likely won't have time in the near future to read enough
code in this particular spot of the kernel.

I would be happy to test a patch on our whimpy machines, though. :)

Regards,
Lucas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tracer_init_tracefs really slow
  2020-12-07 16:24   ` Lucas Stach
@ 2020-12-07 16:47     ` Steven Rostedt
  2020-12-07 19:47     ` Steven Rostedt
  1 sibling, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2020-12-07 16:47 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Ingo Molnar, linux-kernel, kernel

On Mon, 07 Dec 2020 17:24:58 +0100
Lucas Stach <l.stach@pengutronix.de> wrote:

> > We could make a thread that does this, that the init wakes up and runs,
> > letting the kernel to move forward. Would you like to make that patch
> > or shall I?  
> 
> I guess you are much more likely to come up with a correct patch, as
> I'm not really clear yet on when we would need to synchronize this
> thread, to make sure things are available before they get used by
> something. I likely won't have time in the near future to read enough
> code in this particular spot of the kernel.

No problem. I've just been told to try to delegate when I can ;-)

> 
> I would be happy to test a patch on our whimpy machines, though. :)

The code is only needed for user space, and we can make it a separate
thread to do the work to let kernel move forward. The code is wrapped with
a mutex so that even if it's still processing when user space is up and
running, and it tries to access the code, it should block until it's
finished as well.

I'll write something up and send it to you.

Thanks for the report!

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tracer_init_tracefs really slow
  2020-12-07 16:24   ` Lucas Stach
  2020-12-07 16:47     ` Steven Rostedt
@ 2020-12-07 19:47     ` Steven Rostedt
  2020-12-09 14:38       ` Lucas Stach
  1 sibling, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2020-12-07 19:47 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Ingo Molnar, linux-kernel, kernel

On Mon, 07 Dec 2020 17:24:58 +0100
Lucas Stach <l.stach@pengutronix.de> wrote:

> I would be happy to test a patch on our whimpy machines, though. :)

Let me know if this helps:

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 06134189e9a7..48aa5b6a7d15 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9062,7 +9062,10 @@ int tracing_init_dentry(void)
 extern struct trace_eval_map *__start_ftrace_eval_maps[];
 extern struct trace_eval_map *__stop_ftrace_eval_maps[];
 
-static void __init trace_eval_init(void)
+static struct workqueue_struct *eval_map_wq __initdata;
+struct work_struct eval_map_work __initdata;
+
+static void __init eval_map_work_func(struct work_struct *work)
 {
 	int len;
 
@@ -9070,6 +9073,34 @@ static void __init trace_eval_init(void)
 	trace_insert_eval_map(NULL, __start_ftrace_eval_maps, len);
 }
 
+static int __init trace_eval_init(void)
+{
+	INIT_WORK(&eval_map_work, eval_map_work_func);
+
+	eval_map_wq = alloc_workqueue("eval_map_wq", WQ_UNBOUND, 0);
+	if (!eval_map_wq) {
+		pr_err("Unable to allocate eval_map_wq\n");
+		/* Do work here */
+		eval_map_work_func(&eval_map_work);
+		return -ENOMEM;
+	}
+
+	queue_work(eval_map_wq, &eval_map_work);
+	return 0;
+}
+
+static int __init trace_eval_sync(void)
+{
+	/* Make sure the eval map updates are finished */
+	if (eval_map_wq)
+		destroy_workqueue(eval_map_wq);
+	return 0;
+}
+
+subsys_initcall(trace_eval_init);
+late_initcall_sync(trace_eval_sync);
+
+
 #ifdef CONFIG_MODULES
 static void trace_module_add_evals(struct module *mod)
 {
@@ -9171,8 +9202,6 @@ static __init int tracer_init_tracefs(void)
 	trace_create_file("saved_tgids", 0444, NULL,
 			NULL, &tracing_saved_tgids_fops);
 
-	trace_eval_init();
-
 	trace_create_eval_file(NULL);
 
 #ifdef CONFIG_MODULES

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: tracer_init_tracefs really slow
  2020-12-07 19:47     ` Steven Rostedt
@ 2020-12-09 14:38       ` Lucas Stach
  2020-12-09 16:17         ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Lucas Stach @ 2020-12-09 14:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, kernel

Hi Steven,

Am Montag, den 07.12.2020, 14:47 -0500 schrieb Steven Rostedt:
On Mon, 07 Dec 2020 17:24:58 +0100
Lucas Stach <l.stach@pengutronix.de> wrote:

> I would be happy to test a patch on our whimpy machines, though. :)

Let me know if this helps:

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 06134189e9a7..48aa5b6a7d15 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9062,7 +9062,10 @@ int tracing_init_dentry(void)
 extern struct trace_eval_map *__start_ftrace_eval_maps[];
 extern struct trace_eval_map *__stop_ftrace_eval_maps[];
 

-static void __init trace_eval_init(void)
+static struct workqueue_struct *eval_map_wq __initdata;
+struct work_struct eval_map_work __initdata;
+
+static void __init eval_map_work_func(struct work_struct *work)
 {
 	int len;
 

@@ -9070,6 +9073,34 @@ static void __init trace_eval_init(void)
 	trace_insert_eval_map(NULL, __start_ftrace_eval_maps, len);
 }
 

+static int __init trace_eval_init(void)
+{
+	INIT_WORK(&eval_map_work, eval_map_work_func);
+
+	eval_map_wq = alloc_workqueue("eval_map_wq", WQ_UNBOUND, 0);
+	if (!eval_map_wq) {
+		pr_err("Unable to allocate eval_map_wq\n");
+		/* Do work here */
+		eval_map_work_func(&eval_map_work);
+		return -ENOMEM;
+	}
+
+	queue_work(eval_map_wq, &eval_map_work);
+	return 0;
+}
+
+static int __init trace_eval_sync(void)
+{
+	/* Make sure the eval map updates are finished */
+	if (eval_map_wq)
+		destroy_workqueue(eval_map_wq);
+	return 0;
+}
+
+subsys_initcall(trace_eval_init);

You don't want to do this...

+late_initcall_sync(trace_eval_sync);
+
+
 #ifdef CONFIG_MODULES
 static void trace_module_add_evals(struct module *mod)
 {
@@ -9171,8 +9202,6 @@ static __init int tracer_init_tracefs(void)
 	trace_create_file("saved_tgids", 0444, NULL,
 			NULL, &tracing_saved_tgids_fops);
 

-	trace_eval_init();
-
... and this. Moving the trace_eval_init into its own initcall means it
gets started before tracer_init_tracefs. As it holds the
trace_event_sem while updating the eval maps, event_trace_init() then
blocks further initcall execution when it tries to grab this semaphore
a bit down the line, killing the parallelism we are trying to unlock
here.

With those 2 lines dropped, the change seems to work as intended and
shaves ~830ms from the kernel boot time on this system.

Regards,
Lucas

 	trace_create_eval_file(NULL);
 

 #ifdef CONFIG_MODULES



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: tracer_init_tracefs really slow
  2020-12-09 14:38       ` Lucas Stach
@ 2020-12-09 16:17         ` Steven Rostedt
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2020-12-09 16:17 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Ingo Molnar, linux-kernel, kernel

On Wed, 09 Dec 2020 15:38:50 +0100
Lucas Stach <l.stach@pengutronix.de> wrote:

> -	trace_eval_init();
> -
> ... and this. Moving the trace_eval_init into its own initcall means it
> gets started before tracer_init_tracefs. As it holds the
> trace_event_sem while updating the eval maps, event_trace_init() then
> blocks further initcall execution when it tries to grab this semaphore
> a bit down the line, killing the parallelism we are trying to unlock
> here.
> 
> With those 2 lines dropped, the change seems to work as intended and
> shaves ~830ms from the kernel boot time on this system.

Ah, the rationale to do that was so that it could start parsing earlier.
But you are right, if it's really slow, and is still parsing by the time we
start populating the tracefs directory, it will then cause that to block as
well.

OK, I wont move it into its own initcall and send a v2.

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-12-09 16:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 16:16 tracer_init_tracefs really slow Lucas Stach
2020-12-04  2:18 ` Steven Rostedt
2020-12-07 16:24   ` Lucas Stach
2020-12-07 16:47     ` Steven Rostedt
2020-12-07 19:47     ` Steven Rostedt
2020-12-09 14:38       ` Lucas Stach
2020-12-09 16:17         ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).