devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
@ 2018-01-21 14:31 Wolfram Sang
  2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Wolfram Sang @ 2018-01-21 14:31 UTC (permalink / raw)
  To: devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Frank Rowand, Steven Rostedt,
	linux-kernel, Wolfram Sang

I got a bug report for a DT node refcounting problem in the I2C subsystem. This
patch was a huge help in validating the bug report and the proposed solution.
So, I thought I bring it to attention again. Thanks Tyrel, for the initial
work!

Note that I did not test the dynamic updates, only of_node_{get|put} so far. I
read that Tyrel checked dynamic updates extensively with this patch. And since
DT overlays are also used within our Renesas dev team, this will help there, as
well.

Tested on a Renesas Salvator-XS board (R-Car H3).

Changes since RFC v1:
	* rebased to v4.15-rc8
	* fixed commit abbrev and one of the sysfs paths in commit desc
	* removed trailing space and fixed pointer declaration in code

I consider all the remaining checkpatch issues irrelevant for this patch.

So what about applying it?

Kind regards,

   Wolfram


Tyrel Datwyler (1):
  of: introduce event tracepoints for dynamic device_node lifecyle

 drivers/of/dynamic.c      | 32 ++++++----------
 include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 20 deletions(-)
 create mode 100644 include/trace/events/of.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
  2018-01-21 14:31 [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Wolfram Sang
@ 2018-01-21 14:31 ` Wolfram Sang
  2018-01-21 22:01   ` Steven Rostedt
                     ` (2 more replies)
  2018-01-22  8:43 ` [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Frank Rowand
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 25+ messages in thread
From: Wolfram Sang @ 2018-01-21 14:31 UTC (permalink / raw)
  To: devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Frank Rowand, Steven Rostedt,
	linux-kernel, Wolfram Sang

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

This patch introduces event tracepoints for tracking a device_nodes
reference cycle as well as reconfig notifications generated in response
to node/property manipulations.

With the recent upstreaming of the refcount API several device_node
underflows and leaks have come to my attention in the pseries (DLPAR)
dynamic logical partitioning code (ie. POWER speak for hotplugging
virtual and physcial resources at runtime such as cpus or IOAs). These
tracepoints provide a easy and quick mechanism for validating the
reference counting of device_nodes during their lifetime.

Further, when pseries lpars are migrated to a different machine we
perform a live update of our device tree to bring it into alignment with
the configuration of the new machine. The of_reconfig_notify trace point
provides a mechanism that can be turned for debuging the device tree
modifications with out having to build a custom kernel to get at the
DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
output for OF_RECONFIG notifiers").

The following trace events are provided: of_node_get, of_node_put,
of_node_release, and of_reconfig_notify. These trace points require a
kernel built with ftrace support to be enabled. In a typical environment
where debugfs is mounted at /sys/kernel/debug the entire set of
tracepoints can be set with the following:

  echo "of:*" > /sys/kernel/debug/tracing/set_event

or

  echo 1 > /sys/kernel/debug/tracing/events/of/enable

The following shows the trace point data from a DLPAR remove of a cpu
from a pseries lpar:

cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"

cpuhp/23-147   [023] ....   128.324827:
        of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324829:
        of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324829:
        of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324831:
        of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439000:
        of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439002:
        of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
                            prop->name=null, old_prop->name=null
   drmgr-7284  [009] ....   128.439015:
        of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439016:
        of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
[wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
removed trailing space and fixed pointer declaration in code]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/of/dynamic.c      | 32 ++++++----------
 include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 20 deletions(-)
 create mode 100644 include/trace/events/of.h

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index ab988d88704da0..b0d6ab5a35b8c6 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
 	return container_of(kobj, struct device_node, kobj);
 }
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/of.h>
+
 /**
  * of_node_get() - Increment refcount of a node
  * @node:	Node to inc refcount, NULL is supported to simplify writing of
@@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
  */
 struct device_node *of_node_get(struct device_node *node)
 {
-	if (node)
+	if (node) {
 		kobject_get(&node->kobj);
+		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
+	}
 	return node;
 }
 EXPORT_SYMBOL(of_node_get);
@@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
  */
 void of_node_put(struct device_node *node)
 {
-	if (node)
+	if (node) {
+		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
 		kobject_put(&node->kobj);
+	}
 }
 EXPORT_SYMBOL(of_node_put);
 
@@ -75,24 +82,7 @@ const char *action_names[] = {
 int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
 {
 	int rc;
-#ifdef DEBUG
-	struct of_reconfig_data *pr = p;
-
-	switch (action) {
-	case OF_RECONFIG_ATTACH_NODE:
-	case OF_RECONFIG_DETACH_NODE:
-		pr_debug("notify %-15s %pOF\n", action_names[action],
-			pr->dn);
-		break;
-	case OF_RECONFIG_ADD_PROPERTY:
-	case OF_RECONFIG_REMOVE_PROPERTY:
-	case OF_RECONFIG_UPDATE_PROPERTY:
-		pr_debug("notify %-15s %pOF:%s\n", action_names[action],
-			pr->dn, pr->prop->name);
-		break;
-
-	}
-#endif
+	trace_of_reconfig_notify(action, p);
 	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
 	return notifier_to_errno(rc);
 }
@@ -320,6 +310,8 @@ void of_node_release(struct kobject *kobj)
 {
 	struct device_node *node = kobj_to_device_node(kobj);
 
+	trace_of_node_release(node);
+
 	/* We should never be releasing nodes that haven't been detached. */
 	if (!of_node_check_flag(node, OF_DETACHED)) {
 		pr_err("ERROR: Bad of_node_put() on %pOF\n", node);
diff --git a/include/trace/events/of.h b/include/trace/events/of.h
new file mode 100644
index 00000000000000..e8b1302a6f0129
--- /dev/null
+++ b/include/trace/events/of.h
@@ -0,0 +1,93 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM of
+
+#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_OF_H
+
+#include <linux/of.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(of_node_ref_template,
+
+	TP_PROTO(int refcount, const char *dn_name),
+
+	TP_ARGS(refcount, dn_name),
+
+	TP_STRUCT__entry(
+		__string(dn_name, dn_name)
+		__field(int, refcount)
+	),
+
+	TP_fast_assign(
+		__assign_str(dn_name, dn_name);
+		__entry->refcount = refcount;
+	),
+
+	TP_printk("refcount=%d, dn->full_name=%s",
+		  __entry->refcount, __get_str(dn_name))
+);
+
+DEFINE_EVENT(of_node_ref_template, of_node_get,
+	     TP_PROTO(int refcount, const char *dn_name),
+	     TP_ARGS(refcount, dn_name));
+
+DEFINE_EVENT(of_node_ref_template, of_node_put,
+	     TP_PROTO(int refcount, const char *dn_name),
+	     TP_ARGS(refcount, dn_name));
+
+TRACE_EVENT(of_node_release,
+
+	TP_PROTO(struct device_node *dn),
+
+	TP_ARGS(dn),
+
+	TP_STRUCT__entry(
+		__string(dn_name, dn->full_name)
+		__field(unsigned long, flags)
+	),
+
+	TP_fast_assign(
+		__assign_str(dn_name, dn->full_name);
+		__entry->flags = dn->_flags;
+	),
+
+	TP_printk("dn->full_name=%s, dn->_flags=%lu",
+		  __get_str(dn_name), __entry->flags)
+);
+
+#define of_reconfig_action_names \
+	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
+	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
+	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
+	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
+	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
+
+TRACE_EVENT(of_reconfig_notify,
+
+	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
+
+	TP_ARGS(action, ord),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, action)
+		__string(dn_name, ord->dn->full_name)
+		__string(prop_name, ord->prop ? ord->prop->name : "null")
+		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
+	),
+
+	TP_fast_assign(
+		__entry->action = action;
+		__assign_str(dn_name, ord->dn->full_name);
+		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
+		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
+	),
+
+	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
+		  __print_symbolic(__entry->action, of_reconfig_action_names),
+		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
+);
+
+#endif /*	_TRACE_OF_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
  2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
@ 2018-01-21 22:01   ` Steven Rostedt
  2018-01-25  6:48   ` Frank Rowand
  2018-01-25  8:32   ` Frank Rowand
  2 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2018-01-21 22:01 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: devicetree, Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc, linuxppc-dev, Rob Herring, Frank Rowand,
	linux-kernel

On Sun, 21 Jan 2018 15:31:17 +0100
Wolfram Sang <wsa+renesas@sang-engineering.com> wrote:

> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
> 
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR)
> dynamic logical partitioning code (ie. POWER speak for hotplugging
> virtual and physcial resources at runtime such as cpus or IOAs). These
> tracepoints provide a easy and quick mechanism for validating the
> reference counting of device_nodes during their lifetime.
> 
> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with
> the configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
> output for OF_RECONFIG notifiers").
> 
> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a
> kernel built with ftrace support to be enabled. In a typical environment
> where debugfs is mounted at /sys/kernel/debug the entire set of
> tracepoints can be set with the following:
> 
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
> 
> or
> 
>   echo 1 > /sys/kernel/debug/tracing/events/of/enable
> 
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
> 
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
> 
> cpuhp/23-147   [023] ....   128.324827:
>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>                             prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
> removed trailing space and fixed pointer declaration in code]
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

>From the tracing perspective:

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> ---

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-21 14:31 [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Wolfram Sang
  2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
@ 2018-01-22  8:43 ` Frank Rowand
  2018-01-22 11:49   ` Wolfram Sang
  2018-01-25  6:48 ` Frank Rowand
       [not found] ` <20180121143117.19805-1-wsa+renesas-jBu1N2QxHDJrcw3mvpCnnVaTQe2KTcn/@public.gmane.org>
  3 siblings, 1 reply; 25+ messages in thread
From: Frank Rowand @ 2018-01-22  8:43 UTC (permalink / raw)
  To: Wolfram Sang, devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Steven Rostedt, linux-kernel

On 01/21/18 06:31, Wolfram Sang wrote:
> I got a bug report for a DT node refcounting problem in the I2C subsystem. This
> patch was a huge help in validating the bug report and the proposed solution.
> So, I thought I bring it to attention again. Thanks Tyrel, for the initial
> work!
> 
> Note that I did not test the dynamic updates, only of_node_{get|put} so far. I
> read that Tyrel checked dynamic updates extensively with this patch. And since
> DT overlays are also used within our Renesas dev team, this will help there, as
> well.
> 
> Tested on a Renesas Salvator-XS board (R-Car H3).
> 
> Changes since RFC v1:
> 	* rebased to v4.15-rc8
> 	* fixed commit abbrev and one of the sysfs paths in commit desc
> 	* removed trailing space and fixed pointer declaration in code
> 
> I consider all the remaining checkpatch issues irrelevant for this patch.
> 
> So what about applying it?
> 
> Kind regards,
> 
>    Wolfram
> 
> 
> Tyrel Datwyler (1):
>   of: introduce event tracepoints for dynamic device_node lifecyle
> 
>  drivers/of/dynamic.c      | 32 ++++++----------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 20 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 

Please go back and read the thread for version 1.  Simply resubmitting a
forward port is ignoring that whole conversation.

There is a lot of good info in that thread.  I certainly learned stuff in it.

Thanks,

Frank

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-22  8:43 ` [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Frank Rowand
@ 2018-01-22 11:49   ` Wolfram Sang
  2018-01-23 12:11     ` Michael Ellerman
  2018-01-25  6:47     ` Frank Rowand
  0 siblings, 2 replies; 25+ messages in thread
From: Wolfram Sang @ 2018-01-22 11:49 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Wolfram Sang, devicetree, Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc, linuxppc-dev, Rob Herring, Steven Rostedt,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

Hi Frank,

> Please go back and read the thread for version 1.  Simply resubmitting a
> forward port is ignoring that whole conversation.
> 
> There is a lot of good info in that thread.  I certainly learned stuff in it.

Yes, I did that and learned stuff, too. My summary of the discussion was:

- you mentioned some drawbacks you saw (like the mixture of trace output
  and printk output)
- most of them look like addressed to me? (e.g. Steven showed a way to redirect
  printk to trace)
- you posted your version (which was, however, marked as "not user friendly"
  even by yourself)
- The discussion stalled over having two approaches

So, I thought reposting would be a good way of finding out if your
concerns were addressed in the discussion or not. If I overlooked
something, I am sorry for that. Still, my intention is to continue the
discussion, not to ignore it. Because as it stands, we don't have such a
debugging mechanism in place currently, and with people working with DT
overlays, I'd think it would be nice to have.

Kind regards,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-22 11:49   ` Wolfram Sang
@ 2018-01-23 12:11     ` Michael Ellerman
  2018-01-23 19:53       ` Frank Rowand
  2018-01-25  6:47     ` Frank Rowand
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Ellerman @ 2018-01-23 12:11 UTC (permalink / raw)
  To: Wolfram Sang, Frank Rowand
  Cc: devicetree, linux-kernel, Steven Rostedt, linux-renesas-soc,
	Wolfram Sang, Rob Herring, Geert Uytterhoeven, Tyrel Datwyler,
	linuxppc-dev

Wolfram Sang <wsa@the-dreams.de> writes:

> Hi Frank,
>
>> Please go back and read the thread for version 1.  Simply resubmitting a
>> forward port is ignoring that whole conversation.
>> 
>> There is a lot of good info in that thread.  I certainly learned stuff in it.
>
> Yes, I did that and learned stuff, too. My summary of the discussion was:
>
> - you mentioned some drawbacks you saw (like the mixture of trace output
>   and printk output)
> - most of them look like addressed to me? (e.g. Steven showed a way to redirect
>   printk to trace)
> - you posted your version (which was, however, marked as "not user friendly"
>   even by yourself)
> - The discussion stalled over having two approaches
>
> So, I thought reposting would be a good way of finding out if your
> concerns were addressed in the discussion or not. If I overlooked
> something, I am sorry for that. Still, my intention is to continue the
> discussion, not to ignore it. Because as it stands, we don't have such a
> debugging mechanism in place currently, and with people working with DT
> overlays, I'd think it would be nice to have.

Yeah I agree with all of that, I didn't think there were really any
concerns left outstanding. These trace points are very useful, I've
twice added them to a kernel to debug something, so it would be great
for them to be in mainline.

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-23 12:11     ` Michael Ellerman
@ 2018-01-23 19:53       ` Frank Rowand
  0 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-23 19:53 UTC (permalink / raw)
  To: Michael Ellerman, Wolfram Sang
  Cc: devicetree, linux-kernel, Steven Rostedt, linux-renesas-soc,
	Wolfram Sang, Rob Herring, Geert Uytterhoeven, Tyrel Datwyler,
	linuxppc-dev

On 01/23/18 04:11, Michael Ellerman wrote:
> Wolfram Sang <wsa@the-dreams.de> writes:
> 
>> Hi Frank,
>>
>>> Please go back and read the thread for version 1.  Simply resubmitting a
>>> forward port is ignoring that whole conversation.
>>>
>>> There is a lot of good info in that thread.  I certainly learned stuff in it.
>>
>> Yes, I did that and learned stuff, too. My summary of the discussion was:
>>
>> - you mentioned some drawbacks you saw (like the mixture of trace output
>>   and printk output)
>> - most of them look like addressed to me? (e.g. Steven showed a way to redirect
>>   printk to trace)
>> - you posted your version (which was, however, marked as "not user friendly"
>>   even by yourself)
>> - The discussion stalled over having two approaches
>>
>> So, I thought reposting would be a good way of finding out if your
>> concerns were addressed in the discussion or not. If I overlooked
>> something, I am sorry for that. Still, my intention is to continue the
>> discussion, not to ignore it. Because as it stands, we don't have such a
>> debugging mechanism in place currently, and with people working with DT
>> overlays, I'd think it would be nice to have.
> 
> Yeah I agree with all of that, I didn't think there were really any
> concerns left outstanding. These trace points are very useful, I've
> twice added them to a kernel to debug something, so it would be great
> for them to be in mainline.
> 
> cheers
> 

Yes, I believe there are concerns outstanding.  I'll try to read through
the whole thread today to make sure I'm not missing anything.

-Frank

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-22 11:49   ` Wolfram Sang
  2018-01-23 12:11     ` Michael Ellerman
@ 2018-01-25  6:47     ` Frank Rowand
       [not found]       ` <11cf8fac-d2fe-ecdb-546f-de3c3b42a637-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Frank Rowand @ 2018-01-25  6:47 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Wolfram Sang, devicetree, Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc, linuxppc-dev, Rob Herring, Steven Rostedt,
	linux-kernel

On 01/22/18 03:49, Wolfram Sang wrote:
> Hi Frank,
> 
>> Please go back and read the thread for version 1.  Simply resubmitting a
>> forward port is ignoring that whole conversation.
>>
>> There is a lot of good info in that thread.  I certainly learned stuff in it.
> 
> Yes, I did that and learned stuff, too. My summary of the discussion was:
> 
> - you mentioned some drawbacks you saw (like the mixture of trace output
>   and printk output)> - most of them look like addressed to me? (e.g. Steven showed a way to redirect
>   printk to trace
> - you posted your version (which was, however, marked as "not user friendly"
>   even by yourself)

Not exactly a fair quoting.  There were two things I said:

  "Here is a patch that I have used.  It is not as user friendly in terms
  of human readable stack traces (though a very small user space program
  should be able to fix that)."

     So easy to fix using existing userspace programs to convert kernel
     addresses to symbols.

  "FIXME: Currently using pr_err() so I don't need to set loglevel on boot.

          So obviously not a user friendly tool!!!
          The process is:
             - apply patch
             - configure, build, boot kernel
             - analyze data
             - remove patch"

     So not friendly because it uses pr_err() instead of pr_debug().  In
     a reply I said if I submitted my patches I would change it to use
     pr_debug() instead.  So not an issue.

     And not user friendly because it requires patching the kernel.
     Again a NOP if I submitted my patch, because the patch would
     already be in the kernel.

But whatever, let's ignore that - a poor quoting is not a reason to
reject this version of the patch.


> - The discussion stalled over having two approaches

Then you should have stated such when you resubmitted.


> So, I thought reposting would be a good way of finding out if your
> concerns were addressed in the discussion or not. If I overlooked

Then you should have stated that there were concerns raised in the
discussion and asked me if my concerns were addressed.


> something, I am sorry for that. Still, my intention is to continue the
> discussion, not to ignore it. Because as it stands, we don't have such a
> debugging mechanism in place currently, and with people working with DT
> overlays, I'd think it would be nice to have.
> 
> Kind regards,
> 
>    Wolfram
> 


Rob suggested:

     >
     > @@ -25,8 +28,10 @@
     >   */
     >  struct device_node *of_node_get(struct device_node *node)
     >  {
     > -       if (node)
     > +       if (node) {
     >                 kobject_get(&node->kobj);
     > +               trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);

     Seems like there should be a kobj wrapper to read the refcount.

As far as I noticed, that was never addressed.  I don't know the answer, but
the question was asked.  And if there is no such function, then there is at
least kref_read(), which would improve the code a little bit.

I'll reply to the patch 0/1 and patch 1/1 emails with review comments.

-Frank

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-21 14:31 [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Wolfram Sang
  2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
  2018-01-22  8:43 ` [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Frank Rowand
@ 2018-01-25  6:48 ` Frank Rowand
       [not found] ` <20180121143117.19805-1-wsa+renesas-jBu1N2QxHDJrcw3mvpCnnVaTQe2KTcn/@public.gmane.org>
  3 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-25  6:48 UTC (permalink / raw)
  To: Wolfram Sang, devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Steven Rostedt, linux-kernel

On 01/21/18 06:31, Wolfram Sang wrote:
> I got a bug report for a DT node refcounting problem in the I2C subsystem. This
> patch was a huge help in validating the bug report and the proposed solution.
> So, I thought I bring it to attention again. Thanks Tyrel, for the initial
> work!
> 
> Note that I did not test the dynamic updates, only of_node_{get|put} so far. I
> read that Tyrel checked dynamic updates extensively with this patch. And since
> DT overlays are also used within our Renesas dev team, this will help there, as
> well.

It's been nine months since version 1.  If you are going to include the
dynamic updates part of the patch then please test them.


> Tested on a Renesas Salvator-XS board (R-Car H3).
> 
> Changes since RFC v1:
> 	* rebased to v4.15-rc8
> 	* fixed commit abbrev and one of the sysfs paths in commit desc
> 	* removed trailing space and fixed pointer declaration in code
> 

> I consider all the remaining checkpatch issues irrelevant for this patch.

I am OK with the line length warnings in this patch.

Why can't the macro error be fixed?

A file entry needs to be added to MAINTAINERS.


> 
> So what about applying it?
> 
> Kind regards,
> 
>    Wolfram
> 
> 
> Tyrel Datwyler (1):
>   of: introduce event tracepoints for dynamic device_node lifecyle
> 
>  drivers/of/dynamic.c      | 32 ++++++----------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 20 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
  2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
  2018-01-21 22:01   ` Steven Rostedt
@ 2018-01-25  6:48   ` Frank Rowand
  2018-01-25  6:58     ` Frank Rowand
                       ` (2 more replies)
  2018-01-25  8:32   ` Frank Rowand
  2 siblings, 3 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-25  6:48 UTC (permalink / raw)
  To: Wolfram Sang, devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Steven Rostedt, linux-kernel

On 01/21/18 06:31, Wolfram Sang wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
> 
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR)
> dynamic logical partitioning code (ie. POWER speak for hotplugging
> virtual and physcial resources at runtime such as cpus or IOAs). These
> tracepoints provide a easy and quick mechanism for validating the
> reference counting of device_nodes during their lifetime.
> 
> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with
> the configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
> output for OF_RECONFIG notifiers").
> 
> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a

Please add a note that the of_reconfig_notify trace event is not an
added bit of debug info, but is instead replacing information that
was previously available via pr_debug() when DEBUG was defined.


> kernel built with ftrace support to be enabled. In a typical environment
> where debugfs is mounted at /sys/kernel/debug the entire set of
> tracepoints can be set with the following:
> 
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
> 
> or
> 
>   echo 1 > /sys/kernel/debug/tracing/events/of/enable
> 
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
> 
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
> 
> cpuhp/23-147   [023] ....   128.324827:
>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>                             prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

The following belongs in a list of version 2 changes, below the "---" line:

> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
> removed trailing space and fixed pointer declaration in code]

> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
>  drivers/of/dynamic.c      | 32 ++++++----------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 20 deletions(-)
>  create mode 100644 include/trace/events/of.h

mode looks incorrect.  Existing files in include/trace/events/ are -rw-rw----


> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index ab988d88704da0..b0d6ab5a35b8c6 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>  	return container_of(kobj, struct device_node, kobj);
>  }
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/of.h>
> +
>  /**
>   * of_node_get() - Increment refcount of a node
>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
> @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>   */
>  struct device_node *of_node_get(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
>  		kobject_get(&node->kobj);
> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);

See the comment from Ron that I mentioned in my previous email.

Also, the path has been removed from node->full_name.  Does using it here
still give all of the information that is desired?  Same for all others uses
of full_name in this patch.

The trace point should have a single argument, node.  Accessing the two
fields can be done in the tracepoint assignment.  Or is there some
reason that can't be done?  Same for the trace_of_node_put() tracepoint.


> +	}
>  	return node;
>  }
>  EXPORT_SYMBOL(of_node_get);
> @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
>   */
>  void of_node_put(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);

If this is moved down one line to after kobject_put(), then no need
to subtract 1.


>  		kobject_put(&node->kobj);
> +	}
>  }
>  EXPORT_SYMBOL(of_node_put);
>  
> @@ -75,24 +82,7 @@ const char *action_names[] = {
>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>  {
>  	int rc;
> -#ifdef DEBUG
> -	struct of_reconfig_data *pr = p;
> -
> -	switch (action) {
> -	case OF_RECONFIG_ATTACH_NODE:
> -	case OF_RECONFIG_DETACH_NODE:
> -		pr_debug("notify %-15s %pOF\n", action_names[action],
> -			pr->dn);
> -		break;
> -	case OF_RECONFIG_ADD_PROPERTY:
> -	case OF_RECONFIG_REMOVE_PROPERTY:
> -	case OF_RECONFIG_UPDATE_PROPERTY:
> -		pr_debug("notify %-15s %pOF:%s\n", action_names[action],
> -			pr->dn, pr->prop->name);
> -		break;
> -
> -	}
> -#endif
> +	trace_of_reconfig_notify(action, p);
>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>  	return notifier_to_errno(rc);
>  }
> @@ -320,6 +310,8 @@ void of_node_release(struct kobject *kobj)
>  {
>  	struct device_node *node = kobj_to_device_node(kobj);
>  
> +	trace_of_node_release(node);
> +
>  	/* We should never be releasing nodes that haven't been detached. */
>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>  		pr_err("ERROR: Bad of_node_put() on %pOF\n", node);
> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
> new file mode 100644
> index 00000000000000..e8b1302a6f0129
> --- /dev/null
> +++ b/include/trace/events/of.h
> @@ -0,0 +1,93 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM of
> +
> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_OF_H
> +
> +#include <linux/of.h>
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(of_node_ref_template,
> +
> +	TP_PROTO(int refcount, const char *dn_name),
> +
> +	TP_ARGS(refcount, dn_name),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn_name)
> +		__field(int, refcount)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn_name);
> +		__entry->refcount = refcount;
> +	),
> +
> +	TP_printk("refcount=%d, dn->full_name=%s",

For version 1 of the patch, the example trace output was reformatted
by hand because the lines were so long.  it seems that it is easy to
create long lines and trace output, so extra effort on keeping the
lines shorter is well spent.

A little verbose.  I would prefer to get rid of "refcount=", ",",
and "dn->full_name=" entirely.


> +		  __entry->refcount, __get_str(dn_name))
> +);
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_get,
> +	     TP_PROTO(int refcount, const char *dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_put,
> +	     TP_PROTO(int refcount, const char *dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +TRACE_EVENT(of_node_release,
> +
> +	TP_PROTO(struct device_node *dn),
> +
> +	TP_ARGS(dn),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn->full_name)
> +		__field(unsigned long, flags)

Does not follow the alignment conventions for files in this directory.

Steve: have you dropped the conventions or are they still relevant?


> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn->full_name);
> +		__entry->flags = dn->_flags;
> +	),
> +
> +	TP_printk("dn->full_name=%s, dn->_flags=%lu",
> +		  __get_str(dn_name), __entry->flags)

That the first field is the node name should be obvious to the
person reading the trace, so ditch "dn->full_name=".  One
could argue that _flags is not so obvious, but if someone
is looking at why of_node_release() is being called, they
will either care about flag, in which case it will be obvious
that the flag value is being printed, or they won't care
about the flag value and will be ignoring it anyway.  So
I would ditch "dn->-flags=" also.


> +);
> +
> +#define of_reconfig_action_names \
> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
> +
> +TRACE_EVENT(of_reconfig_notify,
> +
> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
> +
> +	TP_ARGS(action, ord),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, action)
> +		__string(dn_name, ord->dn->full_name)
> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")

Does not follow the alignment conventions for files in this directory.


> +	),
> +
> +	TP_fast_assign(
> +		__entry->action = action;
> +		__assign_str(dn_name, ord->dn->full_name);
> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
> +	),
> +
> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",

Isn't old_prop->name the same as prop->name, if old_prop exists?  If so, drop
it.

This is more verbose than the pr_debug() that is being replaced.  Please make
it more compact.


> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
> +);
> +
> +#endif /*	_TRACE_OF_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
       [not found] ` <20180121143117.19805-1-wsa+renesas-jBu1N2QxHDJrcw3mvpCnnVaTQe2KTcn/@public.gmane.org>
@ 2018-01-25  6:55   ` Frank Rowand
  2018-01-25 11:03     ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Frank Rowand @ 2018-01-25  6:55 UTC (permalink / raw)
  To: Wolfram Sang, devicetree-u79uwXL29TY76Z2rM5mHXA, Steven Rostedt
  Cc: Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Rob Herring,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Steve,

On 01/21/18 06:31, Wolfram Sang wrote:
> I got a bug report for a DT node refcounting problem in the I2C subsystem. This
> patch was a huge help in validating the bug report and the proposed solution.
> So, I thought I bring it to attention again. Thanks Tyrel, for the initial
> work!
> 
> Note that I did not test the dynamic updates, only of_node_{get|put} so far. I
> read that Tyrel checked dynamic updates extensively with this patch. And since
> DT overlays are also used within our Renesas dev team, this will help there, as
> well.
> 
> Tested on a Renesas Salvator-XS board (R-Car H3).
> 
> Changes since RFC v1:
> 	* rebased to v4.15-rc8
> 	* fixed commit abbrev and one of the sysfs paths in commit desc
> 	* removed trailing space and fixed pointer declaration in code
> 
> I consider all the remaining checkpatch issues irrelevant for this patch.
> 
> So what about applying it?
> 
> Kind regards,
> 
>    Wolfram
> 
> 
> Tyrel Datwyler (1):
>   of: introduce event tracepoints for dynamic device_node lifecyle
> 
>  drivers/of/dynamic.c      | 32 ++++++----------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 20 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 

Off the top of your head, can you tell me know early in the boot
process a trace_event can be called and successfully provide the
data to someone trying to debug early boot issues?

Also, way back when version 1 of this patch was being discussed,
a question about stacktrace triggers:

     >>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options
     >>> # cat trace | grep -A6 "/pci@800000020000018"  
     >>
     >> Just to let you know that there is now stacktrace event triggers, where
     >> you don't need to stacktrace all events, you can pick and choose. And
     >> even filter the stack trace on specific fields of the event.  
     >
     > This is great, and I did figure that out this afternoon. One thing I was
     > still trying to determine though was whether its possible to set these
     > triggers at boot? As far as I could tell I'm still limited to
     > "trace_options=stacktrace" as a kernel boot parameter to get the stack
     > for event tracepoints.

     No not yet. But I'll add that to the todo list.

     Thanks,

     -- Steve

Is this still on your todo list, or is it now available?

Thanks,

Frank
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
  2018-01-25  6:48   ` Frank Rowand
@ 2018-01-25  6:58     ` Frank Rowand
       [not found]     ` <4cc627c4-1aaf-b8d5-5a26-eea7e596743f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2018-01-25 22:40     ` Tyrel Datwyler
  2 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-25  6:58 UTC (permalink / raw)
  To: Wolfram Sang, devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Steven Rostedt, linux-kernel

On 01/24/18 22:48, Frank Rowand wrote:
> On 01/21/18 06:31, Wolfram Sang wrote:
>> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>>
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>>
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR)
>> dynamic logical partitioning code (ie. POWER speak for hotplugging
>> virtual and physcial resources at runtime such as cpus or IOAs). These
>> tracepoints provide a easy and quick mechanism for validating the
>> reference counting of device_nodes during their lifetime.
>>
>> Further, when pseries lpars are migrated to a different machine we
>> perform a live update of our device tree to bring it into alignment with
>> the configuration of the new machine. The of_reconfig_notify trace point
>> provides a mechanism that can be turned for debuging the device tree
>> modifications with out having to build a custom kernel to get at the
>> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
>> output for OF_RECONFIG notifiers").
>>
>> The following trace events are provided: of_node_get, of_node_put,
>> of_node_release, and of_reconfig_notify. These trace points require a
> 
> Please add a note that the of_reconfig_notify trace event is not an
> added bit of debug info, but is instead replacing information that
> was previously available via pr_debug() when DEBUG was defined.

I got a little carried away, "when DEBUG was defined" is extra
un-needed detail for the commit message.


> 
> 
>> kernel built with ftrace support to be enabled. In a typical environment
>> where debugfs is mounted at /sys/kernel/debug the entire set of
>> tracepoints can be set with the following:
>>
>>   echo "of:*" > /sys/kernel/debug/tracing/set_event
>>
>> or
>>
>>   echo 1 > /sys/kernel/debug/tracing/events/of/enable
>>
>> The following shows the trace point data from a DLPAR remove of a cpu
>> from a pseries lpar:
>>
>> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
>>
>> cpuhp/23-147   [023] ....   128.324827:
>>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324829:
>>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324829:
>>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324831:
>>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439000:
>>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439002:
>>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>>                             prop->name=null, old_prop->name=null
>>    drmgr-7284  [009] ....   128.439015:
>>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439016:
>>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> The following belongs in a list of version 2 changes, below the "---" line:
> 
>> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
>> removed trailing space and fixed pointer declaration in code]
> 
>> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>> ---
>>  drivers/of/dynamic.c      | 32 ++++++----------
>>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 105 insertions(+), 20 deletions(-)
>>  create mode 100644 include/trace/events/of.h
> 
> mode looks incorrect.  Existing files in include/trace/events/ are -rw-rw----
> 
> 
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index ab988d88704da0..b0d6ab5a35b8c6 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>>  	return container_of(kobj, struct device_node, kobj);
>>  }
>>  
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/of.h>
>> +
>>  /**
>>   * of_node_get() - Increment refcount of a node
>>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
>> @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>>   */
>>  struct device_node *of_node_get(struct device_node *node)
>>  {
>> -	if (node)
>> +	if (node) {
>>  		kobject_get(&node->kobj);
>> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
> 
> See the comment from Ron that I mentioned in my previous email.
                       ^^^^
                       Rob, darn it.


> Also, the path has been removed from node->full_name.  Does using it here
> still give all of the information that is desired?  Same for all others uses
> of full_name in this patch.
> 
> The trace point should have a single argument, node.  Accessing the two
> fields can be done in the tracepoint assignment.  Or is there some
> reason that can't be done?  Same for the trace_of_node_put() tracepoint.
> 
> 
>> +	}
>>  	return node;
>>  }
>>  EXPORT_SYMBOL(of_node_get);
>> @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
>>   */
>>  void of_node_put(struct device_node *node)
>>  {
>> -	if (node)
>> +	if (node) {
>> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
> 
> If this is moved down one line to after kobject_put(), then no need
> to subtract 1.
> 
> 
>>  		kobject_put(&node->kobj);
>> +	}
>>  }
>>  EXPORT_SYMBOL(of_node_put);
>>  
>> @@ -75,24 +82,7 @@ const char *action_names[] = {
>>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>>  {
>>  	int rc;
>> -#ifdef DEBUG
>> -	struct of_reconfig_data *pr = p;
>> -
>> -	switch (action) {
>> -	case OF_RECONFIG_ATTACH_NODE:
>> -	case OF_RECONFIG_DETACH_NODE:
>> -		pr_debug("notify %-15s %pOF\n", action_names[action],
>> -			pr->dn);
>> -		break;
>> -	case OF_RECONFIG_ADD_PROPERTY:
>> -	case OF_RECONFIG_REMOVE_PROPERTY:
>> -	case OF_RECONFIG_UPDATE_PROPERTY:
>> -		pr_debug("notify %-15s %pOF:%s\n", action_names[action],
>> -			pr->dn, pr->prop->name);
>> -		break;
>> -
>> -	}
>> -#endif
>> +	trace_of_reconfig_notify(action, p);
>>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>>  	return notifier_to_errno(rc);
>>  }
>> @@ -320,6 +310,8 @@ void of_node_release(struct kobject *kobj)
>>  {
>>  	struct device_node *node = kobj_to_device_node(kobj);
>>  
>> +	trace_of_node_release(node);
>> +
>>  	/* We should never be releasing nodes that haven't been detached. */
>>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>>  		pr_err("ERROR: Bad of_node_put() on %pOF\n", node);
>> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
>> new file mode 100644
>> index 00000000000000..e8b1302a6f0129
>> --- /dev/null
>> +++ b/include/trace/events/of.h
>> @@ -0,0 +1,93 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM of
>> +
>> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_OF_H
>> +
>> +#include <linux/of.h>
>> +#include <linux/tracepoint.h>
>> +
>> +DECLARE_EVENT_CLASS(of_node_ref_template,
>> +
>> +	TP_PROTO(int refcount, const char *dn_name),
>> +
>> +	TP_ARGS(refcount, dn_name),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dn_name, dn_name)
>> +		__field(int, refcount)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dn_name, dn_name);
>> +		__entry->refcount = refcount;
>> +	),
>> +
>> +	TP_printk("refcount=%d, dn->full_name=%s",
> 
> For version 1 of the patch, the example trace output was reformatted
> by hand because the lines were so long.  it seems that it is easy to
> create long lines and trace output, so extra effort on keeping the
> lines shorter is well spent.
> 
> A little verbose.  I would prefer to get rid of "refcount=", ",",
> and "dn->full_name=" entirely.
> 
> 
>> +		  __entry->refcount, __get_str(dn_name))
>> +);
>> +
>> +DEFINE_EVENT(of_node_ref_template, of_node_get,
>> +	     TP_PROTO(int refcount, const char *dn_name),
>> +	     TP_ARGS(refcount, dn_name));
>> +
>> +DEFINE_EVENT(of_node_ref_template, of_node_put,
>> +	     TP_PROTO(int refcount, const char *dn_name),
>> +	     TP_ARGS(refcount, dn_name));
>> +
>> +TRACE_EVENT(of_node_release,
>> +
>> +	TP_PROTO(struct device_node *dn),
>> +
>> +	TP_ARGS(dn),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dn_name, dn->full_name)
>> +		__field(unsigned long, flags)
> 
> Does not follow the alignment conventions for files in this directory.
> 
> Steve: have you dropped the conventions or are they still relevant?
> 
> 
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dn_name, dn->full_name);
>> +		__entry->flags = dn->_flags;
>> +	),
>> +
>> +	TP_printk("dn->full_name=%s, dn->_flags=%lu",
>> +		  __get_str(dn_name), __entry->flags)
> 
> That the first field is the node name should be obvious to the
> person reading the trace, so ditch "dn->full_name=".  One
> could argue that _flags is not so obvious, but if someone
> is looking at why of_node_release() is being called, they
> will either care about flag, in which case it will be obvious
> that the flag value is being printed, or they won't care
> about the flag value and will be ignoring it anyway.  So
> I would ditch "dn->-flags=" also.
> 
> 
>> +);
>> +
>> +#define of_reconfig_action_names \
>> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
>> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
>> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
>> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
>> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
>> +
>> +TRACE_EVENT(of_reconfig_notify,
>> +
>> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
>> +
>> +	TP_ARGS(action, ord),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(unsigned long, action)
>> +		__string(dn_name, ord->dn->full_name)
>> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
>> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
> 
> Does not follow the alignment conventions for files in this directory.
> 
> 
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->action = action;
>> +		__assign_str(dn_name, ord->dn->full_name);
>> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
>> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
>> +	),
>> +
>> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
> 
> Isn't old_prop->name the same as prop->name, if old_prop exists?  If so, drop
> it.
> 
> This is more verbose than the pr_debug() that is being replaced.  Please make
> it more compact.
> 
> 
>> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
>> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
>> +);
>> +
>> +#endif /*	_TRACE_OF_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>>
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
       [not found]     ` <4cc627c4-1aaf-b8d5-5a26-eea7e596743f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-25  8:01       ` Geert Uytterhoeven
  0 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-01-25  8:01 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Wolfram Sang,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Tyrel Datwyler, Linux-Renesas,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Rob Herring, Steven Rostedt,
	Linux Kernel Mailing List

Hi Frank,

On Thu, Jan 25, 2018 at 7:48 AM, Frank Rowand <frowand.list-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>  create mode 100644 include/trace/events/of.h
>
> mode looks incorrect.  Existing files in include/trace/events/ are -rw-rw----

Not in my git clone ;-) 644 should be fine.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
  2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
  2018-01-21 22:01   ` Steven Rostedt
  2018-01-25  6:48   ` Frank Rowand
@ 2018-01-25  8:32   ` Frank Rowand
  2 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-25  8:32 UTC (permalink / raw)
  To: Wolfram Sang, devicetree
  Cc: Tyrel Datwyler, Geert Uytterhoeven, linux-renesas-soc,
	linuxppc-dev, Rob Herring, Steven Rostedt, linux-kernel

On 01/21/18 06:31, Wolfram Sang wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
> 
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR)
> dynamic logical partitioning code (ie. POWER speak for hotplugging
> virtual and physcial resources at runtime such as cpus or IOAs). These
> tracepoints provide a easy and quick mechanism for validating the
> reference counting of device_nodes during their lifetime.
> 
> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with
> the configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
> output for OF_RECONFIG notifiers").
> 
> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a
> kernel built with ftrace support to be enabled. In a typical environment
> where debugfs is mounted at /sys/kernel/debug the entire set of
> tracepoints can be set with the following:
> 
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
> 
> or
> 
>   echo 1 > /sys/kernel/debug/tracing/events/of/enable
> 
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
> 
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
> 
> cpuhp/23-147   [023] ....   128.324827:
>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>                             prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
> removed trailing space and fixed pointer declaration in code]
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
>  drivers/of/dynamic.c      | 32 ++++++----------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 20 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index ab988d88704da0..b0d6ab5a35b8c6 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>  	return container_of(kobj, struct device_node, kobj);
>  }
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/of.h>
> +
>  /**
>   * of_node_get() - Increment refcount of a node
>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
> @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>   */
>  struct device_node *of_node_get(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
>  		kobject_get(&node->kobj);
> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
> +	}
>  	return node;
>  }
>  EXPORT_SYMBOL(of_node_get);
> @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
>   */
>  void of_node_put(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
>  		kobject_put(&node->kobj);
> +	}
>  }
>  EXPORT_SYMBOL(of_node_put);
>  
> @@ -75,24 +82,7 @@ const char *action_names[] = {
>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>  {
>  	int rc;
> -#ifdef DEBUG
> -	struct of_reconfig_data *pr = p;
> -
> -	switch (action) {
> -	case OF_RECONFIG_ATTACH_NODE:
> -	case OF_RECONFIG_DETACH_NODE:
> -		pr_debug("notify %-15s %pOF\n", action_names[action],
> -			pr->dn);
> -		break;
> -	case OF_RECONFIG_ADD_PROPERTY:
> -	case OF_RECONFIG_REMOVE_PROPERTY:
> -	case OF_RECONFIG_UPDATE_PROPERTY:
> -		pr_debug("notify %-15s %pOF:%s\n", action_names[action],
> -			pr->dn, pr->prop->name);
> -		break;
> -
> -	}
> -#endif
> +	trace_of_reconfig_notify(action, p);
>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>  	return notifier_to_errno(rc);
>  }
> @@ -320,6 +310,8 @@ void of_node_release(struct kobject *kobj)
>  {
>  	struct device_node *node = kobj_to_device_node(kobj);
>  
> +	trace_of_node_release(node);
> +
>  	/* We should never be releasing nodes that haven't been detached. */
>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>  		pr_err("ERROR: Bad of_node_put() on %pOF\n", node);
> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
> new file mode 100644
> index 00000000000000..e8b1302a6f0129
> --- /dev/null
> +++ b/include/trace/events/of.h

I overlooked the SPDX identifier.  Please add that to this new file.

-Frank


> @@ -0,0 +1,93 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM of
> +
> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_OF_H
> +
> +#include <linux/of.h>
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(of_node_ref_template,
> +
> +	TP_PROTO(int refcount, const char *dn_name),
> +
> +	TP_ARGS(refcount, dn_name),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn_name)
> +		__field(int, refcount)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn_name);
> +		__entry->refcount = refcount;
> +	),
> +
> +	TP_printk("refcount=%d, dn->full_name=%s",
> +		  __entry->refcount, __get_str(dn_name))
> +);
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_get,
> +	     TP_PROTO(int refcount, const char *dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_put,
> +	     TP_PROTO(int refcount, const char *dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +TRACE_EVENT(of_node_release,
> +
> +	TP_PROTO(struct device_node *dn),
> +
> +	TP_ARGS(dn),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn->full_name)
> +		__field(unsigned long, flags)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn->full_name);
> +		__entry->flags = dn->_flags;
> +	),
> +
> +	TP_printk("dn->full_name=%s, dn->_flags=%lu",
> +		  __get_str(dn_name), __entry->flags)
> +);
> +
> +#define of_reconfig_action_names \
> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
> +
> +TRACE_EVENT(of_reconfig_notify,
> +
> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
> +
> +	TP_ARGS(action, ord),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, action)
> +		__string(dn_name, ord->dn->full_name)
> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
> +	),
> +
> +	TP_fast_assign(
> +		__entry->action = action;
> +		__assign_str(dn_name, ord->dn->full_name);
> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
> +	),
> +
> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
> +);
> +
> +#endif /*	_TRACE_OF_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-25  6:55   ` Frank Rowand
@ 2018-01-25 11:03     ` Steven Rostedt
  2018-01-25 21:49       ` Frank Rowand
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2018-01-25 11:03 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Wolfram Sang, devicetree, Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc, linuxppc-dev, Rob Herring, linux-kernel

On Wed, 24 Jan 2018 22:55:13 -0800
Frank Rowand <frowand.list@gmail.com> wrote:

> Hi Steve,

> 
> Off the top of your head, can you tell me know early in the boot
> process a trace_event can be called and successfully provide the
> data to someone trying to debug early boot issues?

The trace events are enabled by early_initcall().

> 
> Also, way back when version 1 of this patch was being discussed,
> a question about stacktrace triggers:
> 
>      >>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options
>      >>> # cat trace | grep -A6 "/pci@800000020000018"    
>      >>
>      >> Just to let you know that there is now stacktrace event triggers, where
>      >> you don't need to stacktrace all events, you can pick and choose. And
>      >> even filter the stack trace on specific fields of the event.    
>      >
>      > This is great, and I did figure that out this afternoon. One thing I was
>      > still trying to determine though was whether its possible to set these
>      > triggers at boot? As far as I could tell I'm still limited to
>      > "trace_options=stacktrace" as a kernel boot parameter to get the stack
>      > for event tracepoints.  
> 
>      No not yet. But I'll add that to the todo list.
> 
>      Thanks,
> 
>      -- Steve
> 
> Is this still on your todo list, or is it now available?

Still on the todo list. As someone once said, there really is no todo
list, but just work on the stuff that people are yelling the loudest
about. Yell louder, and it will get done quicker ;-)

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-25 11:03     ` Steven Rostedt
@ 2018-01-25 21:49       ` Frank Rowand
       [not found]         ` <a30d4cd3-0068-8cca-cc2f-ca207d0d278b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2018-01-25 23:53         ` Tyrel Datwyler
  0 siblings, 2 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-25 21:49 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Steven Rostedt, devicetree, Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc, linuxppc-dev, Rob Herring, linux-kernel

Hi Wolfram,

On 01/25/18 03:03, Steven Rostedt wrote:
> On Wed, 24 Jan 2018 22:55:13 -0800
> Frank Rowand <frowand.list@gmail.com> wrote:
> 
>> Hi Steve,
> 
>>
>> Off the top of your head, can you tell me know early in the boot
>> process a trace_event can be called and successfully provide the
>> data to someone trying to debug early boot issues?
> 
> The trace events are enabled by early_initcall().

< snip >

This means that ftrace can not be used for the of_node_get(),
of_node_put(), and of_node_release() debug info, because
these functions are called before early_initcall().  Please
use pr_debug() for these functions.

As far as I know, the of_reconfig_notify() could remain an
ftrace instrumented function.  But now that the only thing
that would be ftrace instrumented is of_reconfig_notify(),
I don't see a strong justification for changing the existing
pr_debug() calls to an ftrace alternative.  Though I suspect
the original author of the patch still might desire to have
the "#ifdef DEBUG" surrounding the pr_debug() calls removed
since one of his issues was having to recompile his kernel
to do his debugging.

-Frank

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
  2018-01-25  6:48   ` Frank Rowand
  2018-01-25  6:58     ` Frank Rowand
       [not found]     ` <4cc627c4-1aaf-b8d5-5a26-eea7e596743f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-25 22:40     ` Tyrel Datwyler
       [not found]       ` <25f83d0d-bc67-f1b5-cadb-838987fd846d-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2 siblings, 1 reply; 25+ messages in thread
From: Tyrel Datwyler @ 2018-01-25 22:40 UTC (permalink / raw)
  To: Frank Rowand, Wolfram Sang, devicetree
  Cc: linux-kernel, Steven Rostedt, linux-renesas-soc, Rob Herring,
	Geert Uytterhoeven, linuxppc-dev

On 01/24/2018 10:48 PM, Frank Rowand wrote:
> On 01/21/18 06:31, Wolfram Sang wrote:
>> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>>
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>>
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR)
>> dynamic logical partitioning code (ie. POWER speak for hotplugging
>> virtual and physcial resources at runtime such as cpus or IOAs). These
>> tracepoints provide a easy and quick mechanism for validating the
>> reference counting of device_nodes during their lifetime.
>>
>> Further, when pseries lpars are migrated to a different machine we
>> perform a live update of our device tree to bring it into alignment with
>> the configuration of the new machine. The of_reconfig_notify trace point
>> provides a mechanism that can be turned for debuging the device tree
>> modifications with out having to build a custom kernel to get at the
>> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
>> output for OF_RECONFIG notifiers").
>>
>> The following trace events are provided: of_node_get, of_node_put,
>> of_node_release, and of_reconfig_notify. These trace points require a
> 
> Please add a note that the of_reconfig_notify trace event is not an
> added bit of debug info, but is instead replacing information that
> was previously available via pr_debug() when DEBUG was defined.
> 
> 
>> kernel built with ftrace support to be enabled. In a typical environment
>> where debugfs is mounted at /sys/kernel/debug the entire set of
>> tracepoints can be set with the following:
>>
>>   echo "of:*" > /sys/kernel/debug/tracing/set_event
>>
>> or
>>
>>   echo 1 > /sys/kernel/debug/tracing/events/of/enable
>>
>> The following shows the trace point data from a DLPAR remove of a cpu
>> from a pseries lpar:
>>
>> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
>>
>> cpuhp/23-147   [023] ....   128.324827:
>>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324829:
>>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324829:
>>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324831:
>>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439000:
>>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439002:
>>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>>                             prop->name=null, old_prop->name=null
>>    drmgr-7284  [009] ....   128.439015:
>>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439016:
>>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> The following belongs in a list of version 2 changes, below the "---" line:
> 
>> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
>> removed trailing space and fixed pointer declaration in code]
> 
>> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>> ---
>>  drivers/of/dynamic.c      | 32 ++++++----------
>>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 105 insertions(+), 20 deletions(-)
>>  create mode 100644 include/trace/events/of.h
> 
> mode looks incorrect.  Existing files in include/trace/events/ are -rw-rw----
> 
> 
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index ab988d88704da0..b0d6ab5a35b8c6 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>>  	return container_of(kobj, struct device_node, kobj);
>>  }
>>  
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/of.h>
>> +
>>  /**
>>   * of_node_get() - Increment refcount of a node
>>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
>> @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>>   */
>>  struct device_node *of_node_get(struct device_node *node)
>>  {
>> -	if (node)
>> +	if (node) {
>>  		kobject_get(&node->kobj);
>> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
> 
> See the comment from Ron that I mentioned in my previous email.
> 
> Also, the path has been removed from node->full_name.  Does using it here
> still give all of the information that is desired?  Same for all others uses
> of full_name in this patch.
> 
> The trace point should have a single argument, node.  Accessing the two
> fields can be done in the tracepoint assignment.  Or is there some
> reason that can't be done?  Same for the trace_of_node_put() tracepoint.

I originally was just passing the node, but added the refcount as an argument to deal with trace_of_node_put(). See my comment there for the reasoning.

> 
> 
>> +	}
>>  	return node;
>>  }
>>  EXPORT_SYMBOL(of_node_get);
>> @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
>>   */
>>  void of_node_put(struct device_node *node)
>>  {
>> -	if (node)
>> +	if (node) {
>> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
> 
> If this is moved down one line to after kobject_put(), then no need
> to subtract 1.

So, the problem here is that if this is the last put then the node structure will be destroyed by of_node_release before we trace the of_node_put and we get no useful tracing information about the device node. Further, you see an out of order trace with the of_node_release preceding the final of_node_put. So, I opted for the somewhat ugly implementation you see where we do the subtraction ahead of time to yield what the refcount is about to be, and pass both the refcount and the node.

> 
> 
>>  		kobject_put(&node->kobj);
>> +	}
>>  }
>>  EXPORT_SYMBOL(of_node_put);
>>  
>> @@ -75,24 +82,7 @@ const char *action_names[] = {
>>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>>  {
>>  	int rc;
>> -#ifdef DEBUG
>> -	struct of_reconfig_data *pr = p;
>> -
>> -	switch (action) {
>> -	case OF_RECONFIG_ATTACH_NODE:
>> -	case OF_RECONFIG_DETACH_NODE:
>> -		pr_debug("notify %-15s %pOF\n", action_names[action],
>> -			pr->dn);
>> -		break;
>> -	case OF_RECONFIG_ADD_PROPERTY:
>> -	case OF_RECONFIG_REMOVE_PROPERTY:
>> -	case OF_RECONFIG_UPDATE_PROPERTY:
>> -		pr_debug("notify %-15s %pOF:%s\n", action_names[action],
>> -			pr->dn, pr->prop->name);
>> -		break;
>> -
>> -	}
>> -#endif
>> +	trace_of_reconfig_notify(action, p);
>>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>>  	return notifier_to_errno(rc);
>>  }
>> @@ -320,6 +310,8 @@ void of_node_release(struct kobject *kobj)
>>  {
>>  	struct device_node *node = kobj_to_device_node(kobj);
>>  
>> +	trace_of_node_release(node);
>> +
>>  	/* We should never be releasing nodes that haven't been detached. */
>>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>>  		pr_err("ERROR: Bad of_node_put() on %pOF\n", node);
>> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
>> new file mode 100644
>> index 00000000000000..e8b1302a6f0129
>> --- /dev/null
>> +++ b/include/trace/events/of.h
>> @@ -0,0 +1,93 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM of
>> +
>> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_OF_H
>> +
>> +#include <linux/of.h>
>> +#include <linux/tracepoint.h>
>> +
>> +DECLARE_EVENT_CLASS(of_node_ref_template,
>> +
>> +	TP_PROTO(int refcount, const char *dn_name),
>> +
>> +	TP_ARGS(refcount, dn_name),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dn_name, dn_name)
>> +		__field(int, refcount)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dn_name, dn_name);
>> +		__entry->refcount = refcount;
>> +	),
>> +
>> +	TP_printk("refcount=%d, dn->full_name=%s",
> 
> For version 1 of the patch, the example trace output was reformatted
> by hand because the lines were so long.  it seems that it is easy to
> create long lines and trace output, so extra effort on keeping the
> lines shorter is well spent.

It was mostly due to my desire to keep commit logs under 80 columns wide. :)

> 
> A little verbose.  I would prefer to get rid of "refcount=", ",",
> and "dn->full_name=" entirely
> 
> 
>> +		  __entry->refcount, __get_str(dn_name))
>> +);
>> +
>> +DEFINE_EVENT(of_node_ref_template, of_node_get,
>> +	     TP_PROTO(int refcount, const char *dn_name),
>> +	     TP_ARGS(refcount, dn_name));
>> +
>> +DEFINE_EVENT(of_node_ref_template, of_node_put,
>> +	     TP_PROTO(int refcount, const char *dn_name),
>> +	     TP_ARGS(refcount, dn_name));
>> +
>> +TRACE_EVENT(of_node_release,
>> +
>> +	TP_PROTO(struct device_node *dn),
>> +
>> +	TP_ARGS(dn),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dn_name, dn->full_name)
>> +		__field(unsigned long, flags)
> 
> Does not follow the alignment conventions for files in this directory.
> 
> Steve: have you dropped the conventions or are they still relevant?
> 
> 
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dn_name, dn->full_name);
>> +		__entry->flags = dn->_flags;
>> +	),
>> +
>> +	TP_printk("dn->full_name=%s, dn->_flags=%lu",
>> +		  __get_str(dn_name), __entry->flags)
> 
> That the first field is the node name should be obvious to the
> person reading the trace, so ditch "dn->full_name=".  One
> could argue that _flags is not so obvious, but if someone
> is looking at why of_node_release() is being called, they
> will either care about flag, in which case it will be obvious
> that the flag value is being printed, or they won't care
> about the flag value and will be ignoring it anyway.  So
> I would ditch "dn->-flags=" also> 
> 
>> +);
>> +
>> +#define of_reconfig_action_names \
>> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
>> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
>> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
>> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
>> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
>> +
>> +TRACE_EVENT(of_reconfig_notify,
>> +
>> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
>> +
>> +	TP_ARGS(action, ord),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(unsigned long, action)
>> +		__string(dn_name, ord->dn->full_name)
>> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
>> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
> 
> Does not follow the alignment conventions for files in this directory.
> 
> 
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->action = action;
>> +		__assign_str(dn_name, ord->dn->full_name);
>> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
>> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
>> +	),
>> +
>> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
> 
> Isn't old_prop->name the same as prop->name, if old_prop exists?  If so, drop
> it.
> 
> This is more verbose than the pr_debug() that is being replaced.  Please make
> it more compact.
> 
> 
>> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
>> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
>> +);
>> +
>> +#endif /*	_TRACE_OF_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
       [not found]       ` <11cf8fac-d2fe-ecdb-546f-de3c3b42a637-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-25 23:12         ` Wolfram Sang
  2018-01-26  0:42           ` Frank Rowand
  0 siblings, 1 reply; 25+ messages in thread
From: Wolfram Sang @ 2018-01-25 23:12 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Wolfram Sang, devicetree-u79uwXL29TY76Z2rM5mHXA, Tyrel Datwyler,
	Geert Uytterhoeven, linux-renesas-soc-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Rob Herring, Steven Rostedt,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1624 bytes --]

Frank,

here seems to be a misunderstanding going on. I don't want to push this
patch upstream against all odds. I merely wanted to find out what the
status of this patch is. Because one possibility was that it had just
been forgotten...

> > So, I thought reposting would be a good way of finding out if your
> > concerns were addressed in the discussion or not. If I overlooked
> 
> Then you should have stated that there were concerns raised in the
> discussion and asked me if my concerns were addressed.

From my perspective, I have done that:

I marked the patch as RFC. I put you on the CC list. I asked about the
possibility of applying it. It was not very elaborate, but hey, this is
just a simple debugging patch :)

I totally would have accepted a high level "No, that won't fly
because...". Or a high level "This and that would need a change". Or
something like that. I intentionally sent this out as RFC because I know
there is some testing missing. I wanted to know if it is worth taking
further steps with this patch.

So, I simply wanted to know if you (still) have fundamental issues with
the patch? That needs to be discussed first before we go into coding
details. I think it is fine to not apply it if there are reasons. I'd
like to know them, however, for a better understanding.

For me, this is a super-super-side project, so if it causes too much
hazzle, I just leave it here and know interested people can find it
easier now. But if it could be applied with a sane amount of effort, I
was offering that.

Was that understandable?

Kind regards,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
       [not found]         ` <a30d4cd3-0068-8cca-cc2f-ca207d0d278b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-25 23:14           ` Wolfram Sang
  2018-01-26  0:46             ` Frank Rowand
  0 siblings, 1 reply; 25+ messages in thread
From: Wolfram Sang @ 2018-01-25 23:14 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Wolfram Sang, Steven Rostedt, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Rob Herring,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 264 bytes --]


> This means that ftrace can not be used for the of_node_get(),
> of_node_put(), and of_node_release() debug info, because
> these functions are called before early_initcall().

For the record: You can still unbind/bind devices. This is how I
debugged an issue.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
       [not found]       ` <25f83d0d-bc67-f1b5-cadb-838987fd846d-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2018-01-25 23:31         ` Frank Rowand
  0 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-25 23:31 UTC (permalink / raw)
  To: Tyrel Datwyler, Wolfram Sang, devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Steven Rostedt,
	linux-renesas-soc-u79uwXL29TY76Z2rM5mHXA, Rob Herring,
	Geert Uytterhoeven, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On 01/25/18 14:40, Tyrel Datwyler wrote:
> On 01/24/2018 10:48 PM, Frank Rowand wrote:
>> On 01/21/18 06:31, Wolfram Sang wrote:
>>> From: Tyrel Datwyler <tyreld-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>>
>>> This patch introduces event tracepoints for tracking a device_nodes
>>> reference cycle as well as reconfig notifications generated in response
>>> to node/property manipulations.
>>>
>>> With the recent upstreaming of the refcount API several device_node
>>> underflows and leaks have come to my attention in the pseries (DLPAR)
>>> dynamic logical partitioning code (ie. POWER speak for hotplugging
>>> virtual and physcial resources at runtime such as cpus or IOAs). These
>>> tracepoints provide a easy and quick mechanism for validating the
>>> reference counting of device_nodes during their lifetime.
>>>
>>> Further, when pseries lpars are migrated to a different machine we
>>> perform a live update of our device tree to bring it into alignment with
>>> the configuration of the new machine. The of_reconfig_notify trace point
>>> provides a mechanism that can be turned for debuging the device tree
>>> modifications with out having to build a custom kernel to get at the
>>> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug
>>> output for OF_RECONFIG notifiers").
>>>
>>> The following trace events are provided: of_node_get, of_node_put,
>>> of_node_release, and of_reconfig_notify. These trace points require a
>>
>> Please add a note that the of_reconfig_notify trace event is not an
>> added bit of debug info, but is instead replacing information that
>> was previously available via pr_debug() when DEBUG was defined.
>>
>>
>>> kernel built with ftrace support to be enabled. In a typical environment
>>> where debugfs is mounted at /sys/kernel/debug the entire set of
>>> tracepoints can be set with the following:
>>>
>>>   echo "of:*" > /sys/kernel/debug/tracing/set_event
>>>
>>> or
>>>
>>>   echo 1 > /sys/kernel/debug/tracing/events/of/enable
>>>
>>> The following shows the trace point data from a DLPAR remove of a cpu
>>> from a pseries lpar:
>>>
>>> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
>>>
>>> cpuhp/23-147   [023] ....   128.324827:
>>>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
>>> cpuhp/23-147   [023] ....   128.324829:
>>>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
>>> cpuhp/23-147   [023] ....   128.324829:
>>>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
>>> cpuhp/23-147   [023] ....   128.324831:
>>>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>>>    drmgr-7284  [009] ....   128.439000:
>>>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>>>    drmgr-7284  [009] ....   128.439002:
>>>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>>>                             prop->name=null, old_prop->name=null
>>>    drmgr-7284  [009] ....   128.439015:
>>>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>>>    drmgr-7284  [009] ....   128.439016:
>>>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
>>>
>>> Signed-off-by: Tyrel Datwyler <tyreld-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>
>> The following belongs in a list of version 2 changes, below the "---" line:
>>
>>> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc,
>>> removed trailing space and fixed pointer declaration in code]
>>
>>> Signed-off-by: Wolfram Sang <wsa+renesas-jBu1N2QxHDJrcw3mvpCnnVaTQe2KTcn/@public.gmane.org>
>>> ---
>>>  drivers/of/dynamic.c      | 32 ++++++----------
>>>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 105 insertions(+), 20 deletions(-)
>>>  create mode 100644 include/trace/events/of.h
>>
>> mode looks incorrect.  Existing files in include/trace/events/ are -rw-rw----
>>
>>
>>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>>> index ab988d88704da0..b0d6ab5a35b8c6 100644
>>> --- a/drivers/of/dynamic.c
>>> +++ b/drivers/of/dynamic.c
>>> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>>>  	return container_of(kobj, struct device_node, kobj);
>>>  }
>>>  
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/of.h>
>>> +
>>>  /**
>>>   * of_node_get() - Increment refcount of a node
>>>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
>>> @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct kobject *kobj)
>>>   */
>>>  struct device_node *of_node_get(struct device_node *node)
>>>  {
>>> -	if (node)
>>> +	if (node) {
>>>  		kobject_get(&node->kobj);
>>> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
>>
>> See the comment from Ron that I mentioned in my previous email.
>>
>> Also, the path has been removed from node->full_name.  Does using it here
>> still give all of the information that is desired?  Same for all others uses
>> of full_name in this patch.
>>
>> The trace point should have a single argument, node.  Accessing the two
>> fields can be done in the tracepoint assignment.  Or is there some
>> reason that can't be done?  Same for the trace_of_node_put() tracepoint.
> 
> I originally was just passing the node, but added the refcount as an argument to deal with trace_of_node_put(). See my comment there for the reasoning.

In that case I would remove the "- 1" in of_node_put()
(which I'll talk about in general terms below -- and I
wrote the comment below before writing this paragraph)
or change the of_node_get and of_node_put trace events
to not depend upon the common of_node_ref_template.

Anyway, this is somewhat academic, since I noted in
another email in a different part of this thread that
there is a need to be able to debug of_node_get(),
of_node_put(), and of_node_release() earlier in the
boot than ftrace is available, so trace points are
not the answer in those functions.

>>
>>
>>> +	}
>>>  	return node;
>>>  }
>>>  EXPORT_SYMBOL(of_node_get);
>>> @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
>>>   */
>>>  void of_node_put(struct device_node *node)
>>>  {
>>> -	if (node)
>>> +	if (node) {
>>> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
>>
>> If this is moved down one line to after kobject_put(), then no need
>> to subtract 1.
> 
> So, the problem here is that if this is the last put then the node structure will be destroyed by of_node_release before we trace the of_node_put and we get no useful tracing information about the device node. Further, you see an out of order trace with the of_node_release preceding the final of_node_put. So, I opted for the somewhat ugly implementation you see where we do the subtraction ahead of time to yield what the refcount is about to be, and pass both the refcount and the node.

Excellent point.  The tracepoint does need to be before the kobject_put().
I don't normally like in function comments, but this would be an
exception - there should be a comment noting that the node might
be freed by the kobject_put(), so trace first.

I could argue with myself as to whether the count value should be before
or after it is decremented.  I understand why both ways are valid.
I would get rid of the "- 1", even if traced before the kobject_put(),
but I won't insist on the change.


>>
>>
>>>  		kobject_put(&node->kobj);
>>> +	}
>>>  }
>>>  EXPORT_SYMBOL(of_node_put);
>>>  
>>> @@ -75,24 +82,7 @@ const char *action_names[] = {
>>>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>>>  {
>>>  	int rc;
>>> -#ifdef DEBUG
>>> -	struct of_reconfig_data *pr = p;
>>> -
>>> -	switch (action) {
>>> -	case OF_RECONFIG_ATTACH_NODE:
>>> -	case OF_RECONFIG_DETACH_NODE:
>>> -		pr_debug("notify %-15s %pOF\n", action_names[action],
>>> -			pr->dn);
>>> -		break;
>>> -	case OF_RECONFIG_ADD_PROPERTY:
>>> -	case OF_RECONFIG_REMOVE_PROPERTY:
>>> -	case OF_RECONFIG_UPDATE_PROPERTY:
>>> -		pr_debug("notify %-15s %pOF:%s\n", action_names[action],
>>> -			pr->dn, pr->prop->name);
>>> -		break;
>>> -
>>> -	}
>>> -#endif
>>> +	trace_of_reconfig_notify(action, p);
>>>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>>>  	return notifier_to_errno(rc);
>>>  }
>>> @@ -320,6 +310,8 @@ void of_node_release(struct kobject *kobj)
>>>  {
>>>  	struct device_node *node = kobj_to_device_node(kobj);
>>>  
>>> +	trace_of_node_release(node);
>>> +
>>>  	/* We should never be releasing nodes that haven't been detached. */
>>>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>>>  		pr_err("ERROR: Bad of_node_put() on %pOF\n", node);
>>> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
>>> new file mode 100644
>>> index 00000000000000..e8b1302a6f0129
>>> --- /dev/null
>>> +++ b/include/trace/events/of.h
>>> @@ -0,0 +1,93 @@
>>> +#undef TRACE_SYSTEM
>>> +#define TRACE_SYSTEM of
>>> +
>>> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
>>> +#define _TRACE_OF_H
>>> +
>>> +#include <linux/of.h>
>>> +#include <linux/tracepoint.h>
>>> +
>>> +DECLARE_EVENT_CLASS(of_node_ref_template,
>>> +
>>> +	TP_PROTO(int refcount, const char *dn_name),
>>> +
>>> +	TP_ARGS(refcount, dn_name),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__string(dn_name, dn_name)
>>> +		__field(int, refcount)
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__assign_str(dn_name, dn_name);
>>> +		__entry->refcount = refcount;
>>> +	),
>>> +
>>> +	TP_printk("refcount=%d, dn->full_name=%s",
>>
>> For version 1 of the patch, the example trace output was reformatted
>> by hand because the lines were so long.  it seems that it is easy to
>> create long lines and trace output, so extra effort on keeping the
>> lines shorter is well spent.
> 
> It was mostly due to my desire to keep commit logs under 80 columns wide. :)
My issue has nothing to do with the commit log.  The commit log simply helped
point a finger at the issue.

My issue has to do with the length of the line in the trace data.  Longer
lines and extraneous text in the data make data analysis messier.  Both
for humans looking at the data and for tools parsing the data.


>>
>> A little verbose.  I would prefer to get rid of "refcount=", ",",
>> and "dn->full_name=" entirely
>>
>>
>>> +		  __entry->refcount, __get_str(dn_name))
>>> +);
>>> +
>>> +DEFINE_EVENT(of_node_ref_template, of_node_get,
>>> +	     TP_PROTO(int refcount, const char *dn_name),
>>> +	     TP_ARGS(refcount, dn_name));
>>> +
>>> +DEFINE_EVENT(of_node_ref_template, of_node_put,
>>> +	     TP_PROTO(int refcount, const char *dn_name),
>>> +	     TP_ARGS(refcount, dn_name));
>>> +
>>> +TRACE_EVENT(of_node_release,
>>> +
>>> +	TP_PROTO(struct device_node *dn),
>>> +
>>> +	TP_ARGS(dn),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__string(dn_name, dn->full_name)
>>> +		__field(unsigned long, flags)
>>
>> Does not follow the alignment conventions for files in this directory.
>>
>> Steve: have you dropped the conventions or are they still relevant?
>>
>>
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__assign_str(dn_name, dn->full_name);
>>> +		__entry->flags = dn->_flags;
>>> +	),
>>> +
>>> +	TP_printk("dn->full_name=%s, dn->_flags=%lu",
>>> +		  __get_str(dn_name), __entry->flags)
>>
>> That the first field is the node name should be obvious to the
>> person reading the trace, so ditch "dn->full_name=".  One
>> could argue that _flags is not so obvious, but if someone
>> is looking at why of_node_release() is being called, they
>> will either care about flag, in which case it will be obvious
>> that the flag value is being printed, or they won't care
>> about the flag value and will be ignoring it anyway.  So
>> I would ditch "dn->-flags=" also> 
>>
>>> +);
>>> +
>>> +#define of_reconfig_action_names \
>>> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
>>> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
>>> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
>>> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
>>> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
>>> +
>>> +TRACE_EVENT(of_reconfig_notify,
>>> +
>>> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
>>> +
>>> +	TP_ARGS(action, ord),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__field(unsigned long, action)
>>> +		__string(dn_name, ord->dn->full_name)
>>> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
>>> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
>>
>> Does not follow the alignment conventions for files in this directory.
>>
>>
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__entry->action = action;
>>> +		__assign_str(dn_name, ord->dn->full_name);
>>> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
>>> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
>>> +	),
>>> +
>>> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
>>
>> Isn't old_prop->name the same as prop->name, if old_prop exists?  If so, drop
>> it.
>>
>> This is more verbose than the pr_debug() that is being replaced.  Please make
>> it more compact.
>>
>>
>>> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
>>> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
>>> +);
>>> +
>>> +#endif /*	_TRACE_OF_H */
>>> +
>>> +/* This part must be outside protection */
>>> +#include <trace/define_trace.h>
>>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-25 21:49       ` Frank Rowand
       [not found]         ` <a30d4cd3-0068-8cca-cc2f-ca207d0d278b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-25 23:53         ` Tyrel Datwyler
  2018-01-26  1:08           ` Frank Rowand
  1 sibling, 1 reply; 25+ messages in thread
From: Tyrel Datwyler @ 2018-01-25 23:53 UTC (permalink / raw)
  To: Frank Rowand, Wolfram Sang
  Cc: devicetree, linux-kernel, Steven Rostedt, linux-renesas-soc,
	Rob Herring, Geert Uytterhoeven, linuxppc-dev

On 01/25/2018 01:49 PM, Frank Rowand wrote:
> Hi Wolfram,
> 
> On 01/25/18 03:03, Steven Rostedt wrote:
>> On Wed, 24 Jan 2018 22:55:13 -0800
>> Frank Rowand <frowand.list@gmail.com> wrote:
>>
>>> Hi Steve,
>>
>>>
>>> Off the top of your head, can you tell me know early in the boot
>>> process a trace_event can be called and successfully provide the
>>> data to someone trying to debug early boot issues?
>>
>> The trace events are enabled by early_initcall().
> 
> < snip >
> 
> This means that ftrace can not be used for the of_node_get(),
> of_node_put(), and of_node_release() debug info, because
> these functions are called before early_initcall().  Please
> use pr_debug() for these functions.

I would argue that early boot debugging doesn't completely negate the usefulness of this tracing infrastructure. I get that no information is available in the trace up until ftrace is setup by its early_initcall, but I still found issues  after early boot using this patch and I would hope that it would be somewhat obvious if references are out of whack once the ftrace data becomes available. In the dynamic case on Power we often do reconfig well after boot on live systems which produces a lot of reference put/gets. This patch made it easy to identify several reference leaks and underflows in our attach and detach logic with the added aid of being able to turn on the stacktrace for each call in the ftrace data.

Another thought is it would be nice if we could have the best of both worlds such that the tracepoints were pr_debugs up until the ftrace early_initcall. Or, I suppose we could ifdef it and make the ftrace tracepoints a configuration option, such that if it wasn't configured we implement the tracepoint functions as pr_debugs. This makes early boot an option. Just spit balling ideas.

-Tyrel

> 
> As far as I know, the of_reconfig_notify() could remain an
> ftrace instrumented function.  But now that the only thing
> that would be ftrace instrumented is of_reconfig_notify(),
> I don't see a strong justification for changing the existing
> pr_debug() calls to an ftrace alternative.  Though I suspect
> the original author of the patch still might desire to have
> the "#ifdef DEBUG" surrounding the pr_debug() calls removed
> since one of his issues was having to recompile his kernel
> to do his debugging.
> 
> -Frank
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-25 23:12         ` Wolfram Sang
@ 2018-01-26  0:42           ` Frank Rowand
  0 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-26  0:42 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Wolfram Sang, devicetree, Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc, linuxppc-dev, Rob Herring, Steven Rostedt,
	linux-kernel

On 01/25/18 15:12, Wolfram Sang wrote:
> Frank,
> 
> here seems to be a misunderstanding going on. I don't want to push this
> patch upstream against all odds. I merely wanted to find out what the
> status of this patch is. Because one possibility was that it had just
> been forgotten...
> 
>>> So, I thought reposting would be a good way of finding out if your
>>> concerns were addressed in the discussion or not. If I overlooked

Marking a patch as RFC (as you mention just below here) is very different
than explicitly stating something like: Frank, you had concerns with
version 1, were your concerns addressed in the thread about version 1?

You mention below that adding RFC to the title was providing the
information that I needed.  I am saying that the communication was
not clear, was implied at best, and should have be more explicit.

I actually didn't even notice that the patch title had changed from
not an RFC, to being an RFC, so the subtle clue went right over my
head.  I just treated it as I would any RFC patch.


>> Then you should have stated that there were concerns raised in the
>> discussion and asked me if my concerns were addressed.
> 
> From my perspective, I have done that:
> 
> I marked the patch as RFC. I put you on the CC list. I asked about the
> possibility of applying it. It was not very elaborate, but hey, this is
> just a simple debugging patch :)

After reading through the original patch thread, I did not think that
the issues raised had been fully addressed.  You read the thread and
thought they had.  No big deal on coming to different conclusions.

I think you and I are talking past each other a little bit.  My
comments in the email you are responding to are because I don't
think that the previous emails have been as clear as you think.
I could read between the lines and see how you think that you
were being clear.  But from my perspective, I'm asking for more
explicit statements and less implied statements.

My first real response (the response that pointed out that Rob had
made an observation / suggestion that was not responded to) was coming
from the perspective that the issues in the first thread had not been
fully addressed.

As I was writing that response, I felt that I might as well do a review,
even though I thought the previous thread was dangling.  Which led to a
lot of issues and a few more emails pointing them out.


> I totally would have accepted a high level "No, that won't fly
> because...". Or a high level "This and that would need a change". Or
> something like that. I intentionally sent this out as RFC because I know
> there is some testing missing. I wanted to know if it is worth taking
> further steps with this patch.
> 
> So, I simply wanted to know if you (still) have fundamental issues with
> the patch?

It would have been good if you had simply stated so in exactly those
words.  I did not read the original email as saying that.  I read
the original email as saying (my paraphrase) "please apply it".  You
did _not_ use the words that I paraphrased, you actually said
"So what about applying it?".  I misunderstood what you were trying
to say.  I apologize for that.


> That needs to be discussed first before we go into coding
> details. I think it is fine to not apply it if there are reasons. I'd
> like to know them, however, for a better understanding.

Too late now. :-)  I've already done the reviewing and provided all
of the reasons that are show stoppers for the patch and how to fix.

One thing that I did not mention in this thread  is that I have an
aversion to using ftrace for what is purely debugging data (which
this is) because there is a danger that trace points become user
space ABI.  That is a whole long discussion that we do not have to
have because I am not subjecting this patch to that objection.


> For me, this is a super-super-side project, so if it causes too much
> hazzle, I just leave it here and know interested people can find it
> easier now. But if it could be applied with a sane amount of effort, I
> was offering that.
> 
> Was that understandable?

I think so.  And in return?  We can always talk more at the next
conference if you want.

-Frank

> 
> Kind regards,
> 
>    Wolfram
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-25 23:14           ` Wolfram Sang
@ 2018-01-26  0:46             ` Frank Rowand
  2018-02-01 17:10               ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Frank Rowand @ 2018-01-26  0:46 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Wolfram Sang, Steven Rostedt, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Tyrel Datwyler, Geert Uytterhoeven,
	linux-renesas-soc-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Rob Herring,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 01/25/18 15:14, Wolfram Sang wrote:
> 
>> This means that ftrace can not be used for the of_node_get(),
>> of_node_put(), and of_node_release() debug info, because
>> these functions are called before early_initcall().
> 
> For the record: You can still unbind/bind devices. This is how I
> debugged an issue.

I wasn't implying that the data wasn't usable for any use case.

The point is that using ftrace means there are use cases for the
debug information where the information will not be available.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-25 23:53         ` Tyrel Datwyler
@ 2018-01-26  1:08           ` Frank Rowand
  0 siblings, 0 replies; 25+ messages in thread
From: Frank Rowand @ 2018-01-26  1:08 UTC (permalink / raw)
  To: Tyrel Datwyler, Wolfram Sang
  Cc: devicetree, linux-kernel, Steven Rostedt, linux-renesas-soc,
	Rob Herring, Geert Uytterhoeven, linuxppc-dev

On 01/25/18 15:53, Tyrel Datwyler wrote:
> On 01/25/2018 01:49 PM, Frank Rowand wrote:
>> Hi Wolfram,
>>
>> On 01/25/18 03:03, Steven Rostedt wrote:
>>> On Wed, 24 Jan 2018 22:55:13 -0800
>>> Frank Rowand <frowand.list@gmail.com> wrote:
>>>
>>>> Hi Steve,
>>>
>>>>
>>>> Off the top of your head, can you tell me know early in the boot
>>>> process a trace_event can be called and successfully provide the
>>>> data to someone trying to debug early boot issues?
>>>
>>> The trace events are enabled by early_initcall().
>>
>> < snip >
>>
>> This means that ftrace can not be used for the of_node_get(),
>> of_node_put(), and of_node_release() debug info, because
>> these functions are called before early_initcall().  Please
>> use pr_debug() for these functions.
> 
> I would argue that early boot debugging doesn't completely negate the
> usefulness of this tracing infrastructure.

I did not say or imply that it did.  I am pointing out that this
implementation does not meet the needs of other use cases.  And
potentially provides misleading information (or more precisely
misleading lack of information) in some other use cases.


> I get that no information
> is available in the trace up until ftrace is setup by its
> early_initcall, but I still found issues after early boot using this
> patch and I would hope that it would be somewhat obvious if
> references are out of whack once the ftrace data becomes available.
> In the dynamic case on Power we often do reconfig well after boot on
> live systems which produces a lot of reference put/gets. This patch
> made it easy to identify several reference leaks and underflows in
> our attach and detach logic with the added aid of being able to turn
> on the stacktrace for each call in the ftrace data.

Yes, you can get stacktraces relatively easily.  This is the strongest
argument for using ftrace.

My assumption has been that the stack trace is useful for of_node_get()
and of_node_put().  Is there _large_ value to the stack trace for
of_reconfig_notify()?


> Another thought is it would be nice if we could have the best of both
> worlds such that the tracepoints were pr_debugs up until the ftrace
> early_initcall. Or, I suppose we could ifdef it and make the ftrace
> tracepoints a configuration option, such that if it wasn't configured
> we implement the tracepoint functions as pr_debugs. This makes early
> boot an option. Just spit balling ideas.

An overly complex solution.  This is just debug.  Worst case alternative
is that the patches live on, out of tree.  So nope.


> 
> -Tyrel
> 
>>
>> As far as I know, the of_reconfig_notify() could remain an
>> ftrace instrumented function.  But now that the only thing
>> that would be ftrace instrumented is of_reconfig_notify(),
>> I don't see a strong justification for changing the existing
>> pr_debug() calls to an ftrace alternative.  Though I suspect
>> the original author of the patch still might desire to have
>> the "#ifdef DEBUG" surrounding the pr_debug() calls removed
>> since one of his issues was having to recompile his kernel
>> to do his debugging.
>>
>> -Frank
>>
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
  2018-01-26  0:46             ` Frank Rowand
@ 2018-02-01 17:10               ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2018-02-01 17:10 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Wolfram Sang, Wolfram Sang, devicetree, Tyrel Datwyler,
	Geert Uytterhoeven, linux-renesas-soc, linuxppc-dev, Rob Herring,
	linux-kernel

On Thu, 25 Jan 2018 16:46:06 -0800
Frank Rowand <frowand.list@gmail.com> wrote:

> The point is that using ftrace means there are use cases for the
> debug information where the information will not be available.

Note, this email came out when I was traveling. I'm now looking at the
code and trace events are enabled right after rcu_init() in init/main.c.

The early_initcall() enables syscall events(), as they are not
available earlier.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-02-01 17:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-21 14:31 [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Wolfram Sang
2018-01-21 14:31 ` [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle Wolfram Sang
2018-01-21 22:01   ` Steven Rostedt
2018-01-25  6:48   ` Frank Rowand
2018-01-25  6:58     ` Frank Rowand
     [not found]     ` <4cc627c4-1aaf-b8d5-5a26-eea7e596743f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-25  8:01       ` Geert Uytterhoeven
2018-01-25 22:40     ` Tyrel Datwyler
     [not found]       ` <25f83d0d-bc67-f1b5-cadb-838987fd846d-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2018-01-25 23:31         ` Frank Rowand
2018-01-25  8:32   ` Frank Rowand
2018-01-22  8:43 ` [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues Frank Rowand
2018-01-22 11:49   ` Wolfram Sang
2018-01-23 12:11     ` Michael Ellerman
2018-01-23 19:53       ` Frank Rowand
2018-01-25  6:47     ` Frank Rowand
     [not found]       ` <11cf8fac-d2fe-ecdb-546f-de3c3b42a637-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-25 23:12         ` Wolfram Sang
2018-01-26  0:42           ` Frank Rowand
2018-01-25  6:48 ` Frank Rowand
     [not found] ` <20180121143117.19805-1-wsa+renesas-jBu1N2QxHDJrcw3mvpCnnVaTQe2KTcn/@public.gmane.org>
2018-01-25  6:55   ` Frank Rowand
2018-01-25 11:03     ` Steven Rostedt
2018-01-25 21:49       ` Frank Rowand
     [not found]         ` <a30d4cd3-0068-8cca-cc2f-ca207d0d278b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-25 23:14           ` Wolfram Sang
2018-01-26  0:46             ` Frank Rowand
2018-02-01 17:10               ` Steven Rostedt
2018-01-25 23:53         ` Tyrel Datwyler
2018-01-26  1:08           ` Frank Rowand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).