All of lore.kernel.org
 help / color / mirror / Atom feed
* OF_DYNAMIC node lifecycle
@ 2014-06-18 20:07 ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-18 20:07 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

Hi Nathan and Tyrel,

I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
I'm hoping you can help me. Right now, pseries seems to be the only
user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
the entire kernel because it requires all DT code to manage reference
counting with iterating over nodes. Most users simply get it wrong.
Pantelis did some investigation and found that the reference counts on
a running kernel are all over the place. I have my doubts that any
code really gets it right.

The problem is that users need to know when it is appropriate to call
of_node_get()/of_node_put(). All list traversals that exit early need
an extra call to of_node_put(), and code that is searching for a node
in the tree and holding a reference to it needs to call of_node_get().

I've got a few pseries questions:
- What are the changes being requested by pseries firmware? Is it only
CPUs and memory nodes, or does it manipulate things all over the tree?
- How frequent are the changes? How many changes would be likely over
the runtime of the system?
- Are you able to verify that removed nodes are actually able to be
freed correctly? Do you have any testcases for node removal?

I'm thinking very seriously about changing the locking semantics of DT
code entirely so that most users never have to worry about
of_node_get/put at all. If the DT code is switched to use rcu
primitives for tree iteration (which also means making DT code use
list_head, something I'm already investigating), then instead of
trying to figure out of_node_get/put rules, callers could use
rcu_read_lock()/rcu_read_unlock() to protect the region that is
searching over nodes, and only call of_node_get() if the node pointer
is needed outside the rcu read-side lock.

I'd really like to be rid of the node reference counting entirely, but
I can't figure out a way of doing that safely, so I'd settle for
making it a lot easier to get correct.

Thoughts?

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* OF_DYNAMIC node lifecycle
@ 2014-06-18 20:07 ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-18 20:07 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, linuxppc-dev

Hi Nathan and Tyrel,

I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
I'm hoping you can help me. Right now, pseries seems to be the only
user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
the entire kernel because it requires all DT code to manage reference
counting with iterating over nodes. Most users simply get it wrong.
Pantelis did some investigation and found that the reference counts on
a running kernel are all over the place. I have my doubts that any
code really gets it right.

The problem is that users need to know when it is appropriate to call
of_node_get()/of_node_put(). All list traversals that exit early need
an extra call to of_node_put(), and code that is searching for a node
in the tree and holding a reference to it needs to call of_node_get().

I've got a few pseries questions:
- What are the changes being requested by pseries firmware? Is it only
CPUs and memory nodes, or does it manipulate things all over the tree?
- How frequent are the changes? How many changes would be likely over
the runtime of the system?
- Are you able to verify that removed nodes are actually able to be
freed correctly? Do you have any testcases for node removal?

I'm thinking very seriously about changing the locking semantics of DT
code entirely so that most users never have to worry about
of_node_get/put at all. If the DT code is switched to use rcu
primitives for tree iteration (which also means making DT code use
list_head, something I'm already investigating), then instead of
trying to figure out of_node_get/put rules, callers could use
rcu_read_lock()/rcu_read_unlock() to protect the region that is
searching over nodes, and only call of_node_get() if the node pointer
is needed outside the rcu read-side lock.

I'd really like to be rid of the node reference counting entirely, but
I can't figure out a way of doing that safely, so I'd settle for
making it a lot easier to get correct.

Thoughts?

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-18 20:07 ` Grant Likely
@ 2014-06-19  8:33     ` Pantelis Antoniou
  -1 siblings, 0 replies; 48+ messages in thread
From: Pantelis Antoniou @ 2014-06-19  8:33 UTC (permalink / raw)
  To: Grant Likely
  Cc: Nathan Fontenot, Tyrel Datwyler, Benjamin Herrenschmidt,
	linuxppc-dev, devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

Hi Grant,

CCing Thomas Gleixner & Steven Rostedt, since they might have a few
ideas...

On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:

> Hi Nathan and Tyrel,
> 
> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> I'm hoping you can help me. Right now, pseries seems to be the only
> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> the entire kernel because it requires all DT code to manage reference
> counting with iterating over nodes. Most users simply get it wrong.
> Pantelis did some investigation and found that the reference counts on
> a running kernel are all over the place. I have my doubts that any
> code really gets it right.
> 
> The problem is that users need to know when it is appropriate to call
> of_node_get()/of_node_put(). All list traversals that exit early need
> an extra call to of_node_put(), and code that is searching for a node
> in the tree and holding a reference to it needs to call of_node_get().
> 

In hindsight it appears that drivers just can't get the lifecycle right.
So we need to simplify things.

> I've got a few pseries questions:
> - What are the changes being requested by pseries firmware? Is it only
> CPUs and memory nodes, or does it manipulate things all over the tree?
> - How frequent are the changes? How many changes would be likely over
> the runtime of the system?
> - Are you able to verify that removed nodes are actually able to be
> freed correctly? Do you have any testcases for node removal?
> 
> I'm thinking very seriously about changing the locking semantics of DT
> code entirely so that most users never have to worry about
> of_node_get/put at all. If the DT code is switched to use rcu
> primitives for tree iteration (which also means making DT code use
> list_head, something I'm already investigating), then instead of
> trying to figure out of_node_get/put rules, callers could use
> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> searching over nodes, and only call of_node_get() if the node pointer
> is needed outside the rcu read-side lock.
> 
> I'd really like to be rid of the node reference counting entirely, but
> I can't figure out a way of doing that safely, so I'd settle for
> making it a lot easier to get correct.
> 

Since we're going about changing things, how about that devtree_lock?

We're using a raw_spinlock and we're always taking the lock with
interrupts disabled.

If we're going to make DT changes frequently during normal runtime
and not only during boot time, those are bad for any kind of real-time
performance.

So the question is, do we really have code that access the live tree
during atomic sections? Is that something we want? Enforcing this
will make our lives easier, and we'll get the change to replace
that spinlock with a mutex.

 
> Thoughts?
> 
> g.

Regards

-- Pantelis

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-19  8:33     ` Pantelis Antoniou
  0 siblings, 0 replies; 48+ messages in thread
From: Pantelis Antoniou @ 2014-06-19  8:33 UTC (permalink / raw)
  To: Grant Likely
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

Hi Grant,

CCing Thomas Gleixner & Steven Rostedt, since they might have a few
ideas...

On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:

> Hi Nathan and Tyrel,
> 
> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> I'm hoping you can help me. Right now, pseries seems to be the only
> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> the entire kernel because it requires all DT code to manage reference
> counting with iterating over nodes. Most users simply get it wrong.
> Pantelis did some investigation and found that the reference counts on
> a running kernel are all over the place. I have my doubts that any
> code really gets it right.
> 
> The problem is that users need to know when it is appropriate to call
> of_node_get()/of_node_put(). All list traversals that exit early need
> an extra call to of_node_put(), and code that is searching for a node
> in the tree and holding a reference to it needs to call of_node_get().
> 

In hindsight it appears that drivers just can't get the lifecycle right.
So we need to simplify things.

> I've got a few pseries questions:
> - What are the changes being requested by pseries firmware? Is it only
> CPUs and memory nodes, or does it manipulate things all over the tree?
> - How frequent are the changes? How many changes would be likely over
> the runtime of the system?
> - Are you able to verify that removed nodes are actually able to be
> freed correctly? Do you have any testcases for node removal?
> 
> I'm thinking very seriously about changing the locking semantics of DT
> code entirely so that most users never have to worry about
> of_node_get/put at all. If the DT code is switched to use rcu
> primitives for tree iteration (which also means making DT code use
> list_head, something I'm already investigating), then instead of
> trying to figure out of_node_get/put rules, callers could use
> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> searching over nodes, and only call of_node_get() if the node pointer
> is needed outside the rcu read-side lock.
> 
> I'd really like to be rid of the node reference counting entirely, but
> I can't figure out a way of doing that safely, so I'd settle for
> making it a lot easier to get correct.
> 

Since we're going about changing things, how about that devtree_lock?

We're using a raw_spinlock and we're always taking the lock with
interrupts disabled.

If we're going to make DT changes frequently during normal runtime
and not only during boot time, those are bad for any kind of real-time
performance.

So the question is, do we really have code that access the live tree
during atomic sections? Is that something we want? Enforcing this
will make our lives easier, and we'll get the change to replace
that spinlock with a mutex.

 
> Thoughts?
> 
> g.

Regards

-- Pantelis

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-18 20:07 ` Grant Likely
@ 2014-06-19 15:26     ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-19 15:26 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On 06/18/2014 03:07 PM, Grant Likely wrote:
> Hi Nathan and Tyrel,
> 
> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> I'm hoping you can help me. Right now, pseries seems to be the only
> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> the entire kernel because it requires all DT code to manage reference
> counting with iterating over nodes. Most users simply get it wrong.
> Pantelis did some investigation and found that the reference counts on
> a running kernel are all over the place. I have my doubts that any
> code really gets it right.
> 
> The problem is that users need to know when it is appropriate to call
> of_node_get()/of_node_put(). All list traversals that exit early need
> an extra call to of_node_put(), and code that is searching for a node
> in the tree and holding a reference to it needs to call of_node_get().
> 
> I've got a few pseries questions:
> - What are the changes being requested by pseries firmware? Is it only
> CPUs and memory nodes, or does it manipulate things all over the tree?

The short answer, everything.

For pseries the two big actions that can change the device tree are
adding/removing resources and partition migration.

The most frequent updates to the device tree happen during resource
(cpu, memory, and pci/phb) add and remove. During this process we add
and remove the node and its properties from the device tree.
- For memory on newer systems this just involves updating the
  ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
  firmware levels add and remove the memroy@XXX nodes and their properties.
- For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
  or removed
- For pci/phb the pci@XXXXX nodes and properties are added/removed.

The less frequent operation of live partition migration (and suspend/resume)
can update just about anything in the device tree. When this occurs and the
systems starts after being migrated (or waking up after a suspend) we make
a call to firmware to get updates to the device tree for the new hardware
we are running on.
 
> - How frequent are the changes? How many changes would be likely over
> the runtime of the system?

This can happen frequently.

> - Are you able to verify that removed nodes are actually able to be
> freed correctly? Do you have any testcases for node removal?

I have always tested this by doing resource add/remove, usually cpu and memory
since it is the easiest.

> 
> I'm thinking very seriously about changing the locking semantics of DT
> code entirely so that most users never have to worry about
> of_node_get/put at all. If the DT code is switched to use rcu
> primitives for tree iteration (which also means making DT code use
> list_head, something I'm already investigating), then instead of
> trying to figure out of_node_get/put rules, callers could use
> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> searching over nodes, and only call of_node_get() if the node pointer
> is needed outside the rcu read-side lock.
> 

This sounds good. I like just taking the rcu lock around accessing the DT.
Do we have many places where DT node pointers are held that require
keeping the of_node_get/put calls? If this did exist perhaps we could
update those places to look up the DT node every time instead of
holding on to the pointer. We could just get rid of the reference counting
altogether then.

> I'd really like to be rid of the node reference counting entirely, but
> I can't figure out a way of doing that safely, so I'd settle for
> making it a lot easier to get correct.
> 

heh! I have often thought about adding reference counting to device tree
properties. I don't really want to but there are some properties that can
get updated frequently (namely the one mentioned above for memory) that
can also get pretty big, especially on systems with a lot of memory. We
never free the memory for old versions of a device tree property. This is
a pretty minor issue though and probably best suited for a separate
discussion after resolving this. 

Other than pseries, who else does dynamic device tree updating? Are we the
only ones?

-Nathan

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-19 15:26     ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-19 15:26 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler; +Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On 06/18/2014 03:07 PM, Grant Likely wrote:
> Hi Nathan and Tyrel,
> 
> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> I'm hoping you can help me. Right now, pseries seems to be the only
> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> the entire kernel because it requires all DT code to manage reference
> counting with iterating over nodes. Most users simply get it wrong.
> Pantelis did some investigation and found that the reference counts on
> a running kernel are all over the place. I have my doubts that any
> code really gets it right.
> 
> The problem is that users need to know when it is appropriate to call
> of_node_get()/of_node_put(). All list traversals that exit early need
> an extra call to of_node_put(), and code that is searching for a node
> in the tree and holding a reference to it needs to call of_node_get().
> 
> I've got a few pseries questions:
> - What are the changes being requested by pseries firmware? Is it only
> CPUs and memory nodes, or does it manipulate things all over the tree?

The short answer, everything.

For pseries the two big actions that can change the device tree are
adding/removing resources and partition migration.

The most frequent updates to the device tree happen during resource
(cpu, memory, and pci/phb) add and remove. During this process we add
and remove the node and its properties from the device tree.
- For memory on newer systems this just involves updating the
  ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
  firmware levels add and remove the memroy@XXX nodes and their properties.
- For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
  or removed
- For pci/phb the pci@XXXXX nodes and properties are added/removed.

The less frequent operation of live partition migration (and suspend/resume)
can update just about anything in the device tree. When this occurs and the
systems starts after being migrated (or waking up after a suspend) we make
a call to firmware to get updates to the device tree for the new hardware
we are running on.
 
> - How frequent are the changes? How many changes would be likely over
> the runtime of the system?

This can happen frequently.

> - Are you able to verify that removed nodes are actually able to be
> freed correctly? Do you have any testcases for node removal?

I have always tested this by doing resource add/remove, usually cpu and memory
since it is the easiest.

> 
> I'm thinking very seriously about changing the locking semantics of DT
> code entirely so that most users never have to worry about
> of_node_get/put at all. If the DT code is switched to use rcu
> primitives for tree iteration (which also means making DT code use
> list_head, something I'm already investigating), then instead of
> trying to figure out of_node_get/put rules, callers could use
> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> searching over nodes, and only call of_node_get() if the node pointer
> is needed outside the rcu read-side lock.
> 

This sounds good. I like just taking the rcu lock around accessing the DT.
Do we have many places where DT node pointers are held that require
keeping the of_node_get/put calls? If this did exist perhaps we could
update those places to look up the DT node every time instead of
holding on to the pointer. We could just get rid of the reference counting
altogether then.

> I'd really like to be rid of the node reference counting entirely, but
> I can't figure out a way of doing that safely, so I'd settle for
> making it a lot easier to get correct.
> 

heh! I have often thought about adding reference counting to device tree
properties. I don't really want to but there are some properties that can
get updated frequently (namely the one mentioned above for memory) that
can also get pretty big, especially on systems with a lot of memory. We
never free the memory for old versions of a device tree property. This is
a pretty minor issue though and probably best suited for a separate
discussion after resolving this. 

Other than pseries, who else does dynamic device tree updating? Are we the
only ones?

-Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-19 15:26     ` Nathan Fontenot
@ 2014-06-23 14:48         ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-23 14:48 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 06/18/2014 03:07 PM, Grant Likely wrote:
> > Hi Nathan and Tyrel,
> > 
> > I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> > I'm hoping you can help me. Right now, pseries seems to be the only
> > user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> > the entire kernel because it requires all DT code to manage reference
> > counting with iterating over nodes. Most users simply get it wrong.
> > Pantelis did some investigation and found that the reference counts on
> > a running kernel are all over the place. I have my doubts that any
> > code really gets it right.
> > 
> > The problem is that users need to know when it is appropriate to call
> > of_node_get()/of_node_put(). All list traversals that exit early need
> > an extra call to of_node_put(), and code that is searching for a node
> > in the tree and holding a reference to it needs to call of_node_get().
> > 
> > I've got a few pseries questions:
> > - What are the changes being requested by pseries firmware? Is it only
> > CPUs and memory nodes, or does it manipulate things all over the tree?
> 
> The short answer, everything.

:-)

> For pseries the two big actions that can change the device tree are
> adding/removing resources and partition migration.
> 
> The most frequent updates to the device tree happen during resource
> (cpu, memory, and pci/phb) add and remove. During this process we add
> and remove the node and its properties from the device tree.
> - For memory on newer systems this just involves updating the
>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
>   firmware levels add and remove the memroy@XXX nodes and their properties.
> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
>   or removed
> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
> 
> The less frequent operation of live partition migration (and suspend/resume)
> can update just about anything in the device tree. When this occurs and the
> systems starts after being migrated (or waking up after a suspend) we make
> a call to firmware to get updates to the device tree for the new hardware
> we are running on.
>  
> > - How frequent are the changes? How many changes would be likely over
> > the runtime of the system?
> 
> This can happen frequently.

Thanks, that is exactly the information that I want. I'm not so much
concerned with the addition or removal of nodes/properties, which is
actually pretty easy to handle. It is the lifecycle of allocations on
dynamic nodes that causes heartburn.

> > - Are you able to verify that removed nodes are actually able to be
> > freed correctly? Do you have any testcases for node removal?
> 
> I have always tested this by doing resource add/remove, usually cpu and memory
> since it is the easiest.

Is that just testing the functionality, or do you have tests that check
if the memory gets freed?

> > I'm thinking very seriously about changing the locking semantics of DT
> > code entirely so that most users never have to worry about
> > of_node_get/put at all. If the DT code is switched to use rcu
> > primitives for tree iteration (which also means making DT code use
> > list_head, something I'm already investigating), then instead of
> > trying to figure out of_node_get/put rules, callers could use
> > rcu_read_lock()/rcu_read_unlock() to protect the region that is
> > searching over nodes, and only call of_node_get() if the node pointer
> > is needed outside the rcu read-side lock.
> > 
> 
> This sounds good. I like just taking the rcu lock around accessing the DT.
> Do we have many places where DT node pointers are held that require
> keeping the of_node_get/put calls? If this did exist perhaps we could
> update those places to look up the DT node every time instead of
> holding on to the pointer. We could just get rid of the reference counting
> altogether then.

There are a few, but I would be happy to restrict reference counting to
only those locations. Most places will decode the DT data, and then
throw away the reference. We /might/ even be able to do rcu_lock/unlock
around the entire probe path which would make it transparent to all
device drivers.

> > I'd really like to be rid of the node reference counting entirely, but
> > I can't figure out a way of doing that safely, so I'd settle for
> > making it a lot easier to get correct.
> > 
> 
> heh! I have often thought about adding reference counting to device tree
> properties.

You horrible, horrible man.

> I don't really want to but there are some properties that can
> get updated frequently (namely the one mentioned above for memory) that
> can also get pretty big, especially on systems with a lot of memory. We
> never free the memory for old versions of a device tree property. This is
> a pretty minor issue though and probably best suited for a separate
> discussion after resolving this. 

We might be able to do some in-place modification of properties if the
size of the property doesn't change. That still leaves some nasty
lifecycle issues that need to be resolved though. It would require
swapping back and forth between memory for an old copy of the property
and a new one. Yes, this should be a separate discussion.

> 
> Other than pseries, who else does dynamic device tree updating? Are we the
> only ones?

Right now you're the only ones. Pantelis has a series that adds bulk
changes to the device tree which are also removable (called overlays). I
also have a GSoC student working on the selftest code which will
dynamically add testcase data to the tree.

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-23 14:48         ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-23 14:48 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 06/18/2014 03:07 PM, Grant Likely wrote:
> > Hi Nathan and Tyrel,
> > 
> > I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> > I'm hoping you can help me. Right now, pseries seems to be the only
> > user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> > the entire kernel because it requires all DT code to manage reference
> > counting with iterating over nodes. Most users simply get it wrong.
> > Pantelis did some investigation and found that the reference counts on
> > a running kernel are all over the place. I have my doubts that any
> > code really gets it right.
> > 
> > The problem is that users need to know when it is appropriate to call
> > of_node_get()/of_node_put(). All list traversals that exit early need
> > an extra call to of_node_put(), and code that is searching for a node
> > in the tree and holding a reference to it needs to call of_node_get().
> > 
> > I've got a few pseries questions:
> > - What are the changes being requested by pseries firmware? Is it only
> > CPUs and memory nodes, or does it manipulate things all over the tree?
> 
> The short answer, everything.

:-)

> For pseries the two big actions that can change the device tree are
> adding/removing resources and partition migration.
> 
> The most frequent updates to the device tree happen during resource
> (cpu, memory, and pci/phb) add and remove. During this process we add
> and remove the node and its properties from the device tree.
> - For memory on newer systems this just involves updating the
>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
>   firmware levels add and remove the memroy@XXX nodes and their properties.
> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
>   or removed
> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
> 
> The less frequent operation of live partition migration (and suspend/resume)
> can update just about anything in the device tree. When this occurs and the
> systems starts after being migrated (or waking up after a suspend) we make
> a call to firmware to get updates to the device tree for the new hardware
> we are running on.
>  
> > - How frequent are the changes? How many changes would be likely over
> > the runtime of the system?
> 
> This can happen frequently.

Thanks, that is exactly the information that I want. I'm not so much
concerned with the addition or removal of nodes/properties, which is
actually pretty easy to handle. It is the lifecycle of allocations on
dynamic nodes that causes heartburn.

> > - Are you able to verify that removed nodes are actually able to be
> > freed correctly? Do you have any testcases for node removal?
> 
> I have always tested this by doing resource add/remove, usually cpu and memory
> since it is the easiest.

Is that just testing the functionality, or do you have tests that check
if the memory gets freed?

> > I'm thinking very seriously about changing the locking semantics of DT
> > code entirely so that most users never have to worry about
> > of_node_get/put at all. If the DT code is switched to use rcu
> > primitives for tree iteration (which also means making DT code use
> > list_head, something I'm already investigating), then instead of
> > trying to figure out of_node_get/put rules, callers could use
> > rcu_read_lock()/rcu_read_unlock() to protect the region that is
> > searching over nodes, and only call of_node_get() if the node pointer
> > is needed outside the rcu read-side lock.
> > 
> 
> This sounds good. I like just taking the rcu lock around accessing the DT.
> Do we have many places where DT node pointers are held that require
> keeping the of_node_get/put calls? If this did exist perhaps we could
> update those places to look up the DT node every time instead of
> holding on to the pointer. We could just get rid of the reference counting
> altogether then.

There are a few, but I would be happy to restrict reference counting to
only those locations. Most places will decode the DT data, and then
throw away the reference. We /might/ even be able to do rcu_lock/unlock
around the entire probe path which would make it transparent to all
device drivers.

> > I'd really like to be rid of the node reference counting entirely, but
> > I can't figure out a way of doing that safely, so I'd settle for
> > making it a lot easier to get correct.
> > 
> 
> heh! I have often thought about adding reference counting to device tree
> properties.

You horrible, horrible man.

> I don't really want to but there are some properties that can
> get updated frequently (namely the one mentioned above for memory) that
> can also get pretty big, especially on systems with a lot of memory. We
> never free the memory for old versions of a device tree property. This is
> a pretty minor issue though and probably best suited for a separate
> discussion after resolving this. 

We might be able to do some in-place modification of properties if the
size of the property doesn't change. That still leaves some nasty
lifecycle issues that need to be resolved though. It would require
swapping back and forth between memory for an old copy of the property
and a new one. Yes, this should be a separate discussion.

> 
> Other than pseries, who else does dynamic device tree updating? Are we the
> only ones?

Right now you're the only ones. Pantelis has a series that adds bulk
changes to the device tree which are also removable (called overlays). I
also have a GSoC student working on the selftest code which will
dynamically add testcase data to the tree.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-19  8:33     ` Pantelis Antoniou
@ 2014-06-23 14:58         ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-23 14:58 UTC (permalink / raw)
  To: Pantelis Antoniou
  Cc: Nathan Fontenot, Tyrel Datwyler, Benjamin Herrenschmidt,
	linuxppc-dev, devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> Hi Grant,
> 
> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
> ideas...
> 
> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
> 
> > Hi Nathan and Tyrel,
> > 
> > I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> > I'm hoping you can help me. Right now, pseries seems to be the only
> > user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> > the entire kernel because it requires all DT code to manage reference
> > counting with iterating over nodes. Most users simply get it wrong.
> > Pantelis did some investigation and found that the reference counts on
> > a running kernel are all over the place. I have my doubts that any
> > code really gets it right.
> > 
> > The problem is that users need to know when it is appropriate to call
> > of_node_get()/of_node_put(). All list traversals that exit early need
> > an extra call to of_node_put(), and code that is searching for a node
> > in the tree and holding a reference to it needs to call of_node_get().
> > 
> 
> In hindsight it appears that drivers just can't get the lifecycle right.
> So we need to simplify things.
> 
> > I've got a few pseries questions:
> > - What are the changes being requested by pseries firmware? Is it only
> > CPUs and memory nodes, or does it manipulate things all over the tree?
> > - How frequent are the changes? How many changes would be likely over
> > the runtime of the system?
> > - Are you able to verify that removed nodes are actually able to be
> > freed correctly? Do you have any testcases for node removal?
> > 
> > I'm thinking very seriously about changing the locking semantics of DT
> > code entirely so that most users never have to worry about
> > of_node_get/put at all. If the DT code is switched to use rcu
> > primitives for tree iteration (which also means making DT code use
> > list_head, something I'm already investigating), then instead of
> > trying to figure out of_node_get/put rules, callers could use
> > rcu_read_lock()/rcu_read_unlock() to protect the region that is
> > searching over nodes, and only call of_node_get() if the node pointer
> > is needed outside the rcu read-side lock.
> > 
> > I'd really like to be rid of the node reference counting entirely, but
> > I can't figure out a way of doing that safely, so I'd settle for
> > making it a lot easier to get correct.
> > 
> 
> Since we're going about changing things, how about that devtree_lock?

I believe rcu would pretty much eliminate the devtree_lock entirely. All
modifiers would need to grab a mutex to ensure there is only one writer
at any given time, but readers would have free reign to parse the tree
however they like.

DT writers would have to follow some strict rules about how to handle
nodes that are removed (ie. don't modify or of_node_put() them until
after rcu is syncronized), but the number of writers is very small and
we have control of all of them.

> We're using a raw_spinlock and we're always taking the lock with
> interrupts disabled.
> 
> If we're going to make DT changes frequently during normal runtime
> and not only during boot time, those are bad for any kind of real-time
> performance.
> 
> So the question is, do we really have code that access the live tree
> during atomic sections?  Is that something we want? Enforcing this
> will make our lives easier, and we'll get the change to replace
> that spinlock with a mutex.

Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
sections. I cannot put my finger on the exact code however. Nathan might
know better. But, if I'm right, the whole problem goes away with RCU.

The design with RCU is to switch struct device_node and struct property
to use list_head and/or hlist_head with the _rcu accessors. They allow
items to be removed from a list without syncronizing with readers. Right
now we have two lists that need to be modified; the allnodes list and
the sibling list. I *think* it will be fine for the two list removals to
be non-atomic (there will be a brief period where the node can be found
on one list, but not the other) because it is a transient state already
accounted for in rcu read-side critical region.

That said, I've also got a design to remove the allnodes list entirely
and only work with the sibling list. I need to prototype this.

We'll also need a transition plan to move to RCU. I think the existing
iterators can be modified to do the rcu locking in-line, but still require
the of_node_get/put stuff (basically, so existing code continue to works
unchanged). Then we can add _rcu versions that drop the need for
of_node_get/put(). When everything is converted, the old iterators can
be dropped.

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-23 14:58         ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-23 14:58 UTC (permalink / raw)
  To: Pantelis Antoniou
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
> Hi Grant,
> 
> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
> ideas...
> 
> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
> 
> > Hi Nathan and Tyrel,
> > 
> > I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> > I'm hoping you can help me. Right now, pseries seems to be the only
> > user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> > the entire kernel because it requires all DT code to manage reference
> > counting with iterating over nodes. Most users simply get it wrong.
> > Pantelis did some investigation and found that the reference counts on
> > a running kernel are all over the place. I have my doubts that any
> > code really gets it right.
> > 
> > The problem is that users need to know when it is appropriate to call
> > of_node_get()/of_node_put(). All list traversals that exit early need
> > an extra call to of_node_put(), and code that is searching for a node
> > in the tree and holding a reference to it needs to call of_node_get().
> > 
> 
> In hindsight it appears that drivers just can't get the lifecycle right.
> So we need to simplify things.
> 
> > I've got a few pseries questions:
> > - What are the changes being requested by pseries firmware? Is it only
> > CPUs and memory nodes, or does it manipulate things all over the tree?
> > - How frequent are the changes? How many changes would be likely over
> > the runtime of the system?
> > - Are you able to verify that removed nodes are actually able to be
> > freed correctly? Do you have any testcases for node removal?
> > 
> > I'm thinking very seriously about changing the locking semantics of DT
> > code entirely so that most users never have to worry about
> > of_node_get/put at all. If the DT code is switched to use rcu
> > primitives for tree iteration (which also means making DT code use
> > list_head, something I'm already investigating), then instead of
> > trying to figure out of_node_get/put rules, callers could use
> > rcu_read_lock()/rcu_read_unlock() to protect the region that is
> > searching over nodes, and only call of_node_get() if the node pointer
> > is needed outside the rcu read-side lock.
> > 
> > I'd really like to be rid of the node reference counting entirely, but
> > I can't figure out a way of doing that safely, so I'd settle for
> > making it a lot easier to get correct.
> > 
> 
> Since we're going about changing things, how about that devtree_lock?

I believe rcu would pretty much eliminate the devtree_lock entirely. All
modifiers would need to grab a mutex to ensure there is only one writer
at any given time, but readers would have free reign to parse the tree
however they like.

DT writers would have to follow some strict rules about how to handle
nodes that are removed (ie. don't modify or of_node_put() them until
after rcu is syncronized), but the number of writers is very small and
we have control of all of them.

> We're using a raw_spinlock and we're always taking the lock with
> interrupts disabled.
> 
> If we're going to make DT changes frequently during normal runtime
> and not only during boot time, those are bad for any kind of real-time
> performance.
> 
> So the question is, do we really have code that access the live tree
> during atomic sections?  Is that something we want? Enforcing this
> will make our lives easier, and we'll get the change to replace
> that spinlock with a mutex.

Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
sections. I cannot put my finger on the exact code however. Nathan might
know better. But, if I'm right, the whole problem goes away with RCU.

The design with RCU is to switch struct device_node and struct property
to use list_head and/or hlist_head with the _rcu accessors. They allow
items to be removed from a list without syncronizing with readers. Right
now we have two lists that need to be modified; the allnodes list and
the sibling list. I *think* it will be fine for the two list removals to
be non-atomic (there will be a brief period where the node can be found
on one list, but not the other) because it is a transient state already
accounted for in rcu read-side critical region.

That said, I've also got a design to remove the allnodes list entirely
and only work with the sibling list. I need to prototype this.

We'll also need a transition plan to move to RCU. I think the existing
iterators can be modified to do the rcu locking in-line, but still require
the of_node_get/put stuff (basically, so existing code continue to works
unchanged). Then we can add _rcu versions that drop the need for
of_node_get/put(). When everything is converted, the old iterators can
be dropped.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-23 14:58         ` Grant Likely
@ 2014-06-23 15:26             ` Pantelis Antoniou
  -1 siblings, 0 replies; 48+ messages in thread
From: Pantelis Antoniou @ 2014-06-23 15:26 UTC (permalink / raw)
  To: Grant Likely
  Cc: Nathan Fontenot, Tyrel Datwyler, Benjamin Herrenschmidt,
	linuxppc-dev, devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

Hi Grant,

On Jun 23, 2014, at 5:58 PM, Grant Likely wrote:

> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Grant,
>> 
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>> 
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>> 
>>> Hi Nathan and Tyrel,
>>> 
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>> 
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>> 
>> 
>> In hindsight it appears that drivers just can't get the lifecycle right.
>> So we need to simplify things.
>> 
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>> 
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>> 
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>> 
>> 
>> Since we're going about changing things, how about that devtree_lock?
> 
> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> modifiers would need to grab a mutex to ensure there is only one writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
> 
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
> 

There's one final nitpick with transactions; we might need another 
node/property flag marking the 'in-progress' state so that
can be skipped by iterators, but in general this looks good.

>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>> 
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of real-time
>> performance.
>> 
>> So the question is, do we really have code that access the live tree
>> during atomic sections?  Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
> 
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan might
> know better. But, if I'm right, the whole problem goes away with RCU.
> 

This is just bad. Why would you need to that?

> The design with RCU is to switch struct device_node and struct property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals to
> be non-atomic (there will be a brief period where the node can be found
> on one list, but not the other) because it is a transient state already
> accounted for in rcu read-side critical region.
> 

See above about transient states.

> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
> 

In my original patchset for the overlays I used a single node as a root
and didn't deal with allnodes at all. So it can definitely can work.

> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still require
> the of_node_get/put stuff (basically, so existing code continue to works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
> 

Eventually yes. We're not close to that yet. I'd be happy if we get the 
lifecycle issues fixed right now (with of_node_get/put) and plan ahead.
I am sure we missed a few things, which we will come across.

> g.

Regards

-- Pantelis

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-23 15:26             ` Pantelis Antoniou
  0 siblings, 0 replies; 48+ messages in thread
From: Pantelis Antoniou @ 2014-06-23 15:26 UTC (permalink / raw)
  To: Grant Likely
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

Hi Grant,

On Jun 23, 2014, at 5:58 PM, Grant Likely wrote:

> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou =
<pantelis.antoniou@konsulko.com> wrote:
>> Hi Grant,
>>=20
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>>=20
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>=20
>>> Hi Nathan and Tyrel,
>>>=20
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, =
and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage =
reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts =
on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>=20
>>> The problem is that users need to know when it is appropriate to =
call
>>> of_node_get()/of_node_put(). All list traversals that exit early =
need
>>> an extra call to of_node_put(), and code that is searching for a =
node
>>> in the tree and holding a reference to it needs to call =
of_node_get().
>>>=20
>>=20
>> In hindsight it appears that drivers just can't get the lifecycle =
right.
>> So we need to simplify things.
>>=20
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it =
only
>>> CPUs and memory nodes, or does it manipulate things all over the =
tree?
>>> - How frequent are the changes? How many changes would be likely =
over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>>=20
>>> I'm thinking very seriously about changing the locking semantics of =
DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node =
pointer
>>> is needed outside the rcu read-side lock.
>>>=20
>>> I'd really like to be rid of the node reference counting entirely, =
but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>=20
>>=20
>> Since we're going about changing things, how about that devtree_lock?
>=20
> I believe rcu would pretty much eliminate the devtree_lock entirely. =
All
> modifiers would need to grab a mutex to ensure there is only one =
writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
>=20
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
>=20

There's one final nitpick with transactions; we might need another=20
node/property flag marking the 'in-progress' state so that
can be skipped by iterators, but in general this looks good.

>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>>=20
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of =
real-time
>> performance.
>>=20
>> So the question is, do we really have code that access the live tree
>> during atomic sections?  Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
>=20
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan =
might
> know better. But, if I'm right, the whole problem goes away with RCU.
>=20

This is just bad. Why would you need to that?

> The design with RCU is to switch struct device_node and struct =
property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. =
Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals =
to
> be non-atomic (there will be a brief period where the node can be =
found
> on one list, but not the other) because it is a transient state =
already
> accounted for in rcu read-side critical region.
>=20

See above about transient states.

> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
>=20

In my original patchset for the overlays I used a single node as a root
and didn't deal with allnodes at all. So it can definitely can work.

> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still =
require
> the of_node_get/put stuff (basically, so existing code continue to =
works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
>=20

Eventually yes. We're not close to that yet. I'd be happy if we get the=20=

lifecycle issues fixed right now (with of_node_get/put) and plan ahead.
I am sure we missed a few things, which we will come across.

> g.

Regards

-- Pantelis

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-23 15:26             ` Pantelis Antoniou
@ 2014-06-23 20:21                 ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-23 20:21 UTC (permalink / raw)
  To: Pantelis Antoniou
  Cc: Nathan Fontenot, Tyrel Datwyler, Benjamin Herrenschmidt,
	linuxppc-dev, devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On Mon, 23 Jun 2014 18:26:04 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> On Jun 23, 2014, at 5:58 PM, Grant Likely wrote:
> > We'll also need a transition plan to move to RCU. I think the existing
> > iterators can be modified to do the rcu locking in-line, but still require
> > the of_node_get/put stuff (basically, so existing code continue to works
> > unchanged). Then we can add _rcu versions that drop the need for
> > of_node_get/put(). When everything is converted, the old iterators can
> > be dropped.
> > 
> 
> Eventually yes. We're not close to that yet. I'd be happy if we get the 
> lifecycle issues fixed right now (with of_node_get/put) and plan ahead.
> I am sure we missed a few things, which we will come across.

If we agree on the plan to keep of_node_get/put, but strongly reduce the
usage by switching to RCU, then I'm generally okay with proceeding with
the overlay feature because I can see how it would work in the new
model.

g.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-23 20:21                 ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-23 20:21 UTC (permalink / raw)
  To: Pantelis Antoniou
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On Mon, 23 Jun 2014 18:26:04 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
> On Jun 23, 2014, at 5:58 PM, Grant Likely wrote:
> > We'll also need a transition plan to move to RCU. I think the existing
> > iterators can be modified to do the rcu locking in-line, but still require
> > the of_node_get/put stuff (basically, so existing code continue to works
> > unchanged). Then we can add _rcu versions that drop the need for
> > of_node_get/put(). When everything is converted, the old iterators can
> > be dropped.
> > 
> 
> Eventually yes. We're not close to that yet. I'd be happy if we get the 
> lifecycle issues fixed right now (with of_node_get/put) and plan ahead.
> I am sure we missed a few things, which we will come across.

If we agree on the plan to keep of_node_get/put, but strongly reduce the
usage by switching to RCU, then I'm generally okay with proceeding with
the overlay feature because I can see how it would work in the new
model.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-23 14:58         ` Grant Likely
@ 2014-06-24 20:07             ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-24 20:07 UTC (permalink / raw)
  To: Grant Likely, Pantelis Antoniou
  Cc: Tyrel Datwyler, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On 06/23/2014 09:58 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Grant,
>>
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>>
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>
>> In hindsight it appears that drivers just can't get the lifecycle right.
>> So we need to simplify things.
>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>>
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> Since we're going about changing things, how about that devtree_lock?
> 
> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> modifiers would need to grab a mutex to ensure there is only one writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
> 
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
> 
>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>>
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of real-time
>> performance.
>>
>> So the question is, do we really have code that access the live tree
>> during atomic sections?  Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
> 
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan might
> know better. But, if I'm right, the whole problem goes away with RCU.

I went back through the cpu hotplug code. we do update the DT during cpu
hotplug but I don't see it happening during atomic sections.

The code is in arch/powerpc/platforms/pseries/dlpar.c

-Nathan

> 
> The design with RCU is to switch struct device_node and struct property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals to
> be non-atomic (there will be a brief period where the node can be found
> on one list, but not the other) because it is a transient state already
> accounted for in rcu read-side critical region.
> 
> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
> 
> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still require
> the of_node_get/put stuff (basically, so existing code continue to works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
> 
> g.
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-24 20:07             ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-24 20:07 UTC (permalink / raw)
  To: Grant Likely, Pantelis Antoniou
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On 06/23/2014 09:58 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
>> Hi Grant,
>>
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>>
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>
>> In hindsight it appears that drivers just can't get the lifecycle right.
>> So we need to simplify things.
>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>>
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> Since we're going about changing things, how about that devtree_lock?
> 
> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> modifiers would need to grab a mutex to ensure there is only one writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
> 
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
> 
>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>>
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of real-time
>> performance.
>>
>> So the question is, do we really have code that access the live tree
>> during atomic sections?  Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
> 
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan might
> know better. But, if I'm right, the whole problem goes away with RCU.

I went back through the cpu hotplug code. we do update the DT during cpu
hotplug but I don't see it happening during atomic sections.

The code is in arch/powerpc/platforms/pseries/dlpar.c

-Nathan

> 
> The design with RCU is to switch struct device_node and struct property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals to
> be non-atomic (there will be a brief period where the node can be found
> on one list, but not the other) because it is a transient state already
> accounted for in rcu read-side critical region.
> 
> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
> 
> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still require
> the of_node_get/put stuff (basically, so existing code continue to works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
> 
> g.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-23 14:48         ` Grant Likely
@ 2014-06-24 20:10             ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-24 20:10 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On 06/23/2014 09:48 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>> On 06/18/2014 03:07 PM, Grant Likely wrote:
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>
>> The short answer, everything.
> 
> :-)
> 
>> For pseries the two big actions that can change the device tree are
>> adding/removing resources and partition migration.
>>
>> The most frequent updates to the device tree happen during resource
>> (cpu, memory, and pci/phb) add and remove. During this process we add
>> and remove the node and its properties from the device tree.
>> - For memory on newer systems this just involves updating the
>>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
>>   firmware levels add and remove the memroy@XXX nodes and their properties.
>> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
>>   or removed
>> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
>>
>> The less frequent operation of live partition migration (and suspend/resume)
>> can update just about anything in the device tree. When this occurs and the
>> systems starts after being migrated (or waking up after a suspend) we make
>> a call to firmware to get updates to the device tree for the new hardware
>> we are running on.
>>  
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>
>> This can happen frequently.
> 
> Thanks, that is exactly the information that I want. I'm not so much
> concerned with the addition or removal of nodes/properties, which is
> actually pretty easy to handle. It is the lifecycle of allocations on
> dynamic nodes that causes heartburn.
> 
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>
>> I have always tested this by doing resource add/remove, usually cpu and memory
>> since it is the easiest.
> 
> Is that just testing the functionality, or do you have tests that check
> if the memory gets freed?

In general it's just functionality testing.

> 
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>
>> This sounds good. I like just taking the rcu lock around accessing the DT.
>> Do we have many places where DT node pointers are held that require
>> keeping the of_node_get/put calls? If this did exist perhaps we could
>> update those places to look up the DT node every time instead of
>> holding on to the pointer. We could just get rid of the reference counting
>> altogether then.
> 
> There are a few, but I would be happy to restrict reference counting to
> only those locations. Most places will decode the DT data, and then
> throw away the reference. We /might/ even be able to do rcu_lock/unlock
> around the entire probe path which would make it transparent to all
> device drivers.
> 
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> heh! I have often thought about adding reference counting to device tree
>> properties.
> 
> You horrible, horrible man.

Yes. I are evil :)

After looking again the work needed to add reference counts to properties
would be huge. The few properties I am concerned with are specific to powerpc
so perhaps just adding an arch specific lock around updating those
properties would work.

-Nathan

> 
>> I don't really want to but there are some properties that can
>> get updated frequently (namely the one mentioned above for memory) that
>> can also get pretty big, especially on systems with a lot of memory. We
>> never free the memory for old versions of a device tree property. This is
>> a pretty minor issue though and probably best suited for a separate
>> discussion after resolving this. 
> 
> We might be able to do some in-place modification of properties if the
> size of the property doesn't change. That still leaves some nasty
> lifecycle issues that need to be resolved though. It would require
> swapping back and forth between memory for an old copy of the property
> and a new one. Yes, this should be a separate discussion.
> 
>>
>> Other than pseries, who else does dynamic device tree updating? Are we the
>> only ones?
> 
> Right now you're the only ones. Pantelis has a series that adds bulk
> changes to the device tree which are also removable (called overlays). I
> also have a GSoC student working on the selftest code which will
> dynamically add testcase data to the tree.
> 
> g.
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-24 20:10             ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-24 20:10 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler; +Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On 06/23/2014 09:48 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>> On 06/18/2014 03:07 PM, Grant Likely wrote:
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>
>> The short answer, everything.
> 
> :-)
> 
>> For pseries the two big actions that can change the device tree are
>> adding/removing resources and partition migration.
>>
>> The most frequent updates to the device tree happen during resource
>> (cpu, memory, and pci/phb) add and remove. During this process we add
>> and remove the node and its properties from the device tree.
>> - For memory on newer systems this just involves updating the
>>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
>>   firmware levels add and remove the memroy@XXX nodes and their properties.
>> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
>>   or removed
>> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
>>
>> The less frequent operation of live partition migration (and suspend/resume)
>> can update just about anything in the device tree. When this occurs and the
>> systems starts after being migrated (or waking up after a suspend) we make
>> a call to firmware to get updates to the device tree for the new hardware
>> we are running on.
>>  
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>
>> This can happen frequently.
> 
> Thanks, that is exactly the information that I want. I'm not so much
> concerned with the addition or removal of nodes/properties, which is
> actually pretty easy to handle. It is the lifecycle of allocations on
> dynamic nodes that causes heartburn.
> 
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>
>> I have always tested this by doing resource add/remove, usually cpu and memory
>> since it is the easiest.
> 
> Is that just testing the functionality, or do you have tests that check
> if the memory gets freed?

In general it's just functionality testing.

> 
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>
>> This sounds good. I like just taking the rcu lock around accessing the DT.
>> Do we have many places where DT node pointers are held that require
>> keeping the of_node_get/put calls? If this did exist perhaps we could
>> update those places to look up the DT node every time instead of
>> holding on to the pointer. We could just get rid of the reference counting
>> altogether then.
> 
> There are a few, but I would be happy to restrict reference counting to
> only those locations. Most places will decode the DT data, and then
> throw away the reference. We /might/ even be able to do rcu_lock/unlock
> around the entire probe path which would make it transparent to all
> device drivers.
> 
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> heh! I have often thought about adding reference counting to device tree
>> properties.
> 
> You horrible, horrible man.

Yes. I are evil :)

After looking again the work needed to add reference counts to properties
would be huge. The few properties I am concerned with are specific to powerpc
so perhaps just adding an arch specific lock around updating those
properties would work.

-Nathan

> 
>> I don't really want to but there are some properties that can
>> get updated frequently (namely the one mentioned above for memory) that
>> can also get pretty big, especially on systems with a lot of memory. We
>> never free the memory for old versions of a device tree property. This is
>> a pretty minor issue though and probably best suited for a separate
>> discussion after resolving this. 
> 
> We might be able to do some in-place modification of properties if the
> size of the property doesn't change. That still leaves some nasty
> lifecycle issues that need to be resolved though. It would require
> swapping back and forth between memory for an old copy of the property
> and a new one. Yes, this should be a separate discussion.
> 
>>
>> Other than pseries, who else does dynamic device tree updating? Are we the
>> only ones?
> 
> Right now you're the only ones. Pantelis has a series that adds bulk
> changes to the device tree which are also removable (called overlays). I
> also have a GSoC student working on the selftest code which will
> dynamically add testcase data to the tree.
> 
> g.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-24 20:07             ` Nathan Fontenot
@ 2014-06-25 20:22                 ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-25 20:22 UTC (permalink / raw)
  To: Nathan Fontenot, Pantelis Antoniou
  Cc: Tyrel Datwyler, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 06/23/2014 09:58 AM, Grant Likely wrote:
> > On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> >> Hi Grant,
> >>
> >> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
> >> ideas...
> >>
> >> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
> >>
> >>> Hi Nathan and Tyrel,
> >>>
> >>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>> the entire kernel because it requires all DT code to manage reference
> >>> counting with iterating over nodes. Most users simply get it wrong.
> >>> Pantelis did some investigation and found that the reference counts on
> >>> a running kernel are all over the place. I have my doubts that any
> >>> code really gets it right.
> >>>
> >>> The problem is that users need to know when it is appropriate to call
> >>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>> an extra call to of_node_put(), and code that is searching for a node
> >>> in the tree and holding a reference to it needs to call of_node_get().
> >>>
> >>
> >> In hindsight it appears that drivers just can't get the lifecycle right.
> >> So we need to simplify things.
> >>
> >>> I've got a few pseries questions:
> >>> - What are the changes being requested by pseries firmware? Is it only
> >>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>> - How frequent are the changes? How many changes would be likely over
> >>> the runtime of the system?
> >>> - Are you able to verify that removed nodes are actually able to be
> >>> freed correctly? Do you have any testcases for node removal?
> >>>
> >>> I'm thinking very seriously about changing the locking semantics of DT
> >>> code entirely so that most users never have to worry about
> >>> of_node_get/put at all. If the DT code is switched to use rcu
> >>> primitives for tree iteration (which also means making DT code use
> >>> list_head, something I'm already investigating), then instead of
> >>> trying to figure out of_node_get/put rules, callers could use
> >>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>> searching over nodes, and only call of_node_get() if the node pointer
> >>> is needed outside the rcu read-side lock.
> >>>
> >>> I'd really like to be rid of the node reference counting entirely, but
> >>> I can't figure out a way of doing that safely, so I'd settle for
> >>> making it a lot easier to get correct.
> >>>
> >>
> >> Since we're going about changing things, how about that devtree_lock?
> > 
> > I believe rcu would pretty much eliminate the devtree_lock entirely. All
> > modifiers would need to grab a mutex to ensure there is only one writer
> > at any given time, but readers would have free reign to parse the tree
> > however they like.
> > 
> > DT writers would have to follow some strict rules about how to handle
> > nodes that are removed (ie. don't modify or of_node_put() them until
> > after rcu is syncronized), but the number of writers is very small and
> > we have control of all of them.
> > 
> >> We're using a raw_spinlock and we're always taking the lock with
> >> interrupts disabled.
> >>
> >> If we're going to make DT changes frequently during normal runtime
> >> and not only during boot time, those are bad for any kind of real-time
> >> performance.
> >>
> >> So the question is, do we really have code that access the live tree
> >> during atomic sections?  Is that something we want? Enforcing this
> >> will make our lives easier, and we'll get the change to replace
> >> that spinlock with a mutex.
> > 
> > Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> > sections. I cannot put my finger on the exact code however. Nathan might
> > know better. But, if I'm right, the whole problem goes away with RCU.
> 
> I went back through the cpu hotplug code. we do update the DT during cpu
> hotplug but I don't see it happening during atomic sections.
> 
> The code is in arch/powerpc/platforms/pseries/dlpar.c

Great, thanks,

By the way, notifiers currently get sent before any updates are applied
to the tree. I want to change it so that the notifier gets sent
afterwards. Does that work for you? I've looked through all the users
and aside from a stupid block of code in arch/powerpc/kernel/prom.c
which does things that should be done by of_attach_node(), it looks like
everything should be fine.

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-25 20:22                 ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-25 20:22 UTC (permalink / raw)
  To: Nathan Fontenot, Pantelis Antoniou
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 06/23/2014 09:58 AM, Grant Likely wrote:
> > On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
> >> Hi Grant,
> >>
> >> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
> >> ideas...
> >>
> >> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
> >>
> >>> Hi Nathan and Tyrel,
> >>>
> >>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>> the entire kernel because it requires all DT code to manage reference
> >>> counting with iterating over nodes. Most users simply get it wrong.
> >>> Pantelis did some investigation and found that the reference counts on
> >>> a running kernel are all over the place. I have my doubts that any
> >>> code really gets it right.
> >>>
> >>> The problem is that users need to know when it is appropriate to call
> >>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>> an extra call to of_node_put(), and code that is searching for a node
> >>> in the tree and holding a reference to it needs to call of_node_get().
> >>>
> >>
> >> In hindsight it appears that drivers just can't get the lifecycle right.
> >> So we need to simplify things.
> >>
> >>> I've got a few pseries questions:
> >>> - What are the changes being requested by pseries firmware? Is it only
> >>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>> - How frequent are the changes? How many changes would be likely over
> >>> the runtime of the system?
> >>> - Are you able to verify that removed nodes are actually able to be
> >>> freed correctly? Do you have any testcases for node removal?
> >>>
> >>> I'm thinking very seriously about changing the locking semantics of DT
> >>> code entirely so that most users never have to worry about
> >>> of_node_get/put at all. If the DT code is switched to use rcu
> >>> primitives for tree iteration (which also means making DT code use
> >>> list_head, something I'm already investigating), then instead of
> >>> trying to figure out of_node_get/put rules, callers could use
> >>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>> searching over nodes, and only call of_node_get() if the node pointer
> >>> is needed outside the rcu read-side lock.
> >>>
> >>> I'd really like to be rid of the node reference counting entirely, but
> >>> I can't figure out a way of doing that safely, so I'd settle for
> >>> making it a lot easier to get correct.
> >>>
> >>
> >> Since we're going about changing things, how about that devtree_lock?
> > 
> > I believe rcu would pretty much eliminate the devtree_lock entirely. All
> > modifiers would need to grab a mutex to ensure there is only one writer
> > at any given time, but readers would have free reign to parse the tree
> > however they like.
> > 
> > DT writers would have to follow some strict rules about how to handle
> > nodes that are removed (ie. don't modify or of_node_put() them until
> > after rcu is syncronized), but the number of writers is very small and
> > we have control of all of them.
> > 
> >> We're using a raw_spinlock and we're always taking the lock with
> >> interrupts disabled.
> >>
> >> If we're going to make DT changes frequently during normal runtime
> >> and not only during boot time, those are bad for any kind of real-time
> >> performance.
> >>
> >> So the question is, do we really have code that access the live tree
> >> during atomic sections?  Is that something we want? Enforcing this
> >> will make our lives easier, and we'll get the change to replace
> >> that spinlock with a mutex.
> > 
> > Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> > sections. I cannot put my finger on the exact code however. Nathan might
> > know better. But, if I'm right, the whole problem goes away with RCU.
> 
> I went back through the cpu hotplug code. we do update the DT during cpu
> hotplug but I don't see it happening during atomic sections.
> 
> The code is in arch/powerpc/platforms/pseries/dlpar.c

Great, thanks,

By the way, notifiers currently get sent before any updates are applied
to the tree. I want to change it so that the notifier gets sent
afterwards. Does that work for you? I've looked through all the users
and aside from a stupid block of code in arch/powerpc/kernel/prom.c
which does things that should be done by of_attach_node(), it looks like
everything should be fine.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-24 20:10             ` Nathan Fontenot
@ 2014-06-25 20:24                 ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-25 20:24 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 06/23/2014 09:48 AM, Grant Likely wrote:
> > On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> >> On 06/18/2014 03:07 PM, Grant Likely wrote:
> >>> Hi Nathan and Tyrel,
> >>>
> >>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>> the entire kernel because it requires all DT code to manage reference
> >>> counting with iterating over nodes. Most users simply get it wrong.
> >>> Pantelis did some investigation and found that the reference counts on
> >>> a running kernel are all over the place. I have my doubts that any
> >>> code really gets it right.
> >>>
> >>> The problem is that users need to know when it is appropriate to call
> >>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>> an extra call to of_node_put(), and code that is searching for a node
> >>> in the tree and holding a reference to it needs to call of_node_get().
> >>>
> >>> I've got a few pseries questions:
> >>> - What are the changes being requested by pseries firmware? Is it only
> >>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>
> >> The short answer, everything.
> > 
> > :-)
> > 
> >> For pseries the two big actions that can change the device tree are
> >> adding/removing resources and partition migration.
> >>
> >> The most frequent updates to the device tree happen during resource
> >> (cpu, memory, and pci/phb) add and remove. During this process we add
> >> and remove the node and its properties from the device tree.
> >> - For memory on newer systems this just involves updating the
> >>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
> >>   firmware levels add and remove the memroy@XXX nodes and their properties.
> >> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
> >>   or removed
> >> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
> >>
> >> The less frequent operation of live partition migration (and suspend/resume)
> >> can update just about anything in the device tree. When this occurs and the
> >> systems starts after being migrated (or waking up after a suspend) we make
> >> a call to firmware to get updates to the device tree for the new hardware
> >> we are running on.
> >>  
> >>> - How frequent are the changes? How many changes would be likely over
> >>> the runtime of the system?
> >>
> >> This can happen frequently.
> > 
> > Thanks, that is exactly the information that I want. I'm not so much
> > concerned with the addition or removal of nodes/properties, which is
> > actually pretty easy to handle. It is the lifecycle of allocations on
> > dynamic nodes that causes heartburn.
> > 
> >>> - Are you able to verify that removed nodes are actually able to be
> >>> freed correctly? Do you have any testcases for node removal?
> >>
> >> I have always tested this by doing resource add/remove, usually cpu and memory
> >> since it is the easiest.
> > 
> > Is that just testing the functionality, or do you have tests that check
> > if the memory gets freed?
> 
> In general it's just functionality testing.
> 
> > 
> >>> I'm thinking very seriously about changing the locking semantics of DT
> >>> code entirely so that most users never have to worry about
> >>> of_node_get/put at all. If the DT code is switched to use rcu
> >>> primitives for tree iteration (which also means making DT code use
> >>> list_head, something I'm already investigating), then instead of
> >>> trying to figure out of_node_get/put rules, callers could use
> >>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>> searching over nodes, and only call of_node_get() if the node pointer
> >>> is needed outside the rcu read-side lock.
> >>>
> >>
> >> This sounds good. I like just taking the rcu lock around accessing the DT.
> >> Do we have many places where DT node pointers are held that require
> >> keeping the of_node_get/put calls? If this did exist perhaps we could
> >> update those places to look up the DT node every time instead of
> >> holding on to the pointer. We could just get rid of the reference counting
> >> altogether then.
> > 
> > There are a few, but I would be happy to restrict reference counting to
> > only those locations. Most places will decode the DT data, and then
> > throw away the reference. We /might/ even be able to do rcu_lock/unlock
> > around the entire probe path which would make it transparent to all
> > device drivers.
> > 
> >>> I'd really like to be rid of the node reference counting entirely, but
> >>> I can't figure out a way of doing that safely, so I'd settle for
> >>> making it a lot easier to get correct.
> >>>
> >>
> >> heh! I have often thought about adding reference counting to device tree
> >> properties.
> > 
> > You horrible, horrible man.
> 
> Yes. I are evil :)
> 
> After looking again the work needed to add reference counts to properties
> would be huge. The few properties I am concerned with are specific to powerpc
> so perhaps just adding an arch specific lock around updating those
> properties would work.

Which code/properties? I'd like to have a look myself.

g.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-25 20:24                 ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-25 20:24 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 06/23/2014 09:48 AM, Grant Likely wrote:
> > On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> >> On 06/18/2014 03:07 PM, Grant Likely wrote:
> >>> Hi Nathan and Tyrel,
> >>>
> >>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>> the entire kernel because it requires all DT code to manage reference
> >>> counting with iterating over nodes. Most users simply get it wrong.
> >>> Pantelis did some investigation and found that the reference counts on
> >>> a running kernel are all over the place. I have my doubts that any
> >>> code really gets it right.
> >>>
> >>> The problem is that users need to know when it is appropriate to call
> >>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>> an extra call to of_node_put(), and code that is searching for a node
> >>> in the tree and holding a reference to it needs to call of_node_get().
> >>>
> >>> I've got a few pseries questions:
> >>> - What are the changes being requested by pseries firmware? Is it only
> >>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>
> >> The short answer, everything.
> > 
> > :-)
> > 
> >> For pseries the two big actions that can change the device tree are
> >> adding/removing resources and partition migration.
> >>
> >> The most frequent updates to the device tree happen during resource
> >> (cpu, memory, and pci/phb) add and remove. During this process we add
> >> and remove the node and its properties from the device tree.
> >> - For memory on newer systems this just involves updating the
> >>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
> >>   firmware levels add and remove the memroy@XXX nodes and their properties.
> >> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
> >>   or removed
> >> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
> >>
> >> The less frequent operation of live partition migration (and suspend/resume)
> >> can update just about anything in the device tree. When this occurs and the
> >> systems starts after being migrated (or waking up after a suspend) we make
> >> a call to firmware to get updates to the device tree for the new hardware
> >> we are running on.
> >>  
> >>> - How frequent are the changes? How many changes would be likely over
> >>> the runtime of the system?
> >>
> >> This can happen frequently.
> > 
> > Thanks, that is exactly the information that I want. I'm not so much
> > concerned with the addition or removal of nodes/properties, which is
> > actually pretty easy to handle. It is the lifecycle of allocations on
> > dynamic nodes that causes heartburn.
> > 
> >>> - Are you able to verify that removed nodes are actually able to be
> >>> freed correctly? Do you have any testcases for node removal?
> >>
> >> I have always tested this by doing resource add/remove, usually cpu and memory
> >> since it is the easiest.
> > 
> > Is that just testing the functionality, or do you have tests that check
> > if the memory gets freed?
> 
> In general it's just functionality testing.
> 
> > 
> >>> I'm thinking very seriously about changing the locking semantics of DT
> >>> code entirely so that most users never have to worry about
> >>> of_node_get/put at all. If the DT code is switched to use rcu
> >>> primitives for tree iteration (which also means making DT code use
> >>> list_head, something I'm already investigating), then instead of
> >>> trying to figure out of_node_get/put rules, callers could use
> >>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>> searching over nodes, and only call of_node_get() if the node pointer
> >>> is needed outside the rcu read-side lock.
> >>>
> >>
> >> This sounds good. I like just taking the rcu lock around accessing the DT.
> >> Do we have many places where DT node pointers are held that require
> >> keeping the of_node_get/put calls? If this did exist perhaps we could
> >> update those places to look up the DT node every time instead of
> >> holding on to the pointer. We could just get rid of the reference counting
> >> altogether then.
> > 
> > There are a few, but I would be happy to restrict reference counting to
> > only those locations. Most places will decode the DT data, and then
> > throw away the reference. We /might/ even be able to do rcu_lock/unlock
> > around the entire probe path which would make it transparent to all
> > device drivers.
> > 
> >>> I'd really like to be rid of the node reference counting entirely, but
> >>> I can't figure out a way of doing that safely, so I'd settle for
> >>> making it a lot easier to get correct.
> >>>
> >>
> >> heh! I have often thought about adding reference counting to device tree
> >> properties.
> > 
> > You horrible, horrible man.
> 
> Yes. I are evil :)
> 
> After looking again the work needed to add reference counts to properties
> would be huge. The few properties I am concerned with are specific to powerpc
> so perhaps just adding an arch specific lock around updating those
> properties would work.

Which code/properties? I'd like to have a look myself.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-25 20:22                 ` Grant Likely
@ 2014-06-26 19:59                     ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-26 19:59 UTC (permalink / raw)
  To: Grant Likely, Pantelis Antoniou
  Cc: Tyrel Datwyler, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On 06/25/2014 03:22 PM, Grant Likely wrote:
> On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>> On 06/23/2014 09:58 AM, Grant Likely wrote:
>>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>>>> Hi Grant,
>>>>
>>>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>>>> ideas...
>>>>
>>>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>>>
>>>>> Hi Nathan and Tyrel,
>>>>>
>>>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>>>> the entire kernel because it requires all DT code to manage reference
>>>>> counting with iterating over nodes. Most users simply get it wrong.
>>>>> Pantelis did some investigation and found that the reference counts on
>>>>> a running kernel are all over the place. I have my doubts that any
>>>>> code really gets it right.
>>>>>
>>>>> The problem is that users need to know when it is appropriate to call
>>>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>>>> an extra call to of_node_put(), and code that is searching for a node
>>>>> in the tree and holding a reference to it needs to call of_node_get().
>>>>>
>>>>
>>>> In hindsight it appears that drivers just can't get the lifecycle right.
>>>> So we need to simplify things.
>>>>
>>>>> I've got a few pseries questions:
>>>>> - What are the changes being requested by pseries firmware? Is it only
>>>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>>>> - How frequent are the changes? How many changes would be likely over
>>>>> the runtime of the system?
>>>>> - Are you able to verify that removed nodes are actually able to be
>>>>> freed correctly? Do you have any testcases for node removal?
>>>>>
>>>>> I'm thinking very seriously about changing the locking semantics of DT
>>>>> code entirely so that most users never have to worry about
>>>>> of_node_get/put at all. If the DT code is switched to use rcu
>>>>> primitives for tree iteration (which also means making DT code use
>>>>> list_head, something I'm already investigating), then instead of
>>>>> trying to figure out of_node_get/put rules, callers could use
>>>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>>>> searching over nodes, and only call of_node_get() if the node pointer
>>>>> is needed outside the rcu read-side lock.
>>>>>
>>>>> I'd really like to be rid of the node reference counting entirely, but
>>>>> I can't figure out a way of doing that safely, so I'd settle for
>>>>> making it a lot easier to get correct.
>>>>>
>>>>
>>>> Since we're going about changing things, how about that devtree_lock?
>>>
>>> I believe rcu would pretty much eliminate the devtree_lock entirely. All
>>> modifiers would need to grab a mutex to ensure there is only one writer
>>> at any given time, but readers would have free reign to parse the tree
>>> however they like.
>>>
>>> DT writers would have to follow some strict rules about how to handle
>>> nodes that are removed (ie. don't modify or of_node_put() them until
>>> after rcu is syncronized), but the number of writers is very small and
>>> we have control of all of them.
>>>
>>>> We're using a raw_spinlock and we're always taking the lock with
>>>> interrupts disabled.
>>>>
>>>> If we're going to make DT changes frequently during normal runtime
>>>> and not only during boot time, those are bad for any kind of real-time
>>>> performance.
>>>>
>>>> So the question is, do we really have code that access the live tree
>>>> during atomic sections?  Is that something we want? Enforcing this
>>>> will make our lives easier, and we'll get the change to replace
>>>> that spinlock with a mutex.
>>>
>>> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
>>> sections. I cannot put my finger on the exact code however. Nathan might
>>> know better. But, if I'm right, the whole problem goes away with RCU.
>>
>> I went back through the cpu hotplug code. we do update the DT during cpu
>> hotplug but I don't see it happening during atomic sections.
>>
>> The code is in arch/powerpc/platforms/pseries/dlpar.c
> 
> Great, thanks,
> 
> By the way, notifiers currently get sent before any updates are applied
> to the tree. I want to change it so that the notifier gets sent
> afterwards. Does that work for you? I've looked through all the users
> and aside from a stupid block of code in arch/powerpc/kernel/prom.c
> which does things that should be done by of_attach_node(), it looks like
> everything should be fine.

This would affect property updates. When doing a property update the
notifier passes a pointer to a struct containing a device node
pointer and a pointer to the new device node property.

I know specifically in memory property updates we grab the current version
of the device tree property and compare it to the 'new' version that 
was passed to us.

If you want to do the DT update before calling the notifier that should be
fine for the memory update code and would only require very minimal
updates.

It does look like OF_RECONFIG_UPDATE_PROPERTY is used in a couple of other
places that would need to be investigated.

-Nathan 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-26 19:59                     ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-26 19:59 UTC (permalink / raw)
  To: Grant Likely, Pantelis Antoniou
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On 06/25/2014 03:22 PM, Grant Likely wrote:
> On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>> On 06/23/2014 09:58 AM, Grant Likely wrote:
>>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
>>>> Hi Grant,
>>>>
>>>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>>>> ideas...
>>>>
>>>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>>>
>>>>> Hi Nathan and Tyrel,
>>>>>
>>>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>>>> the entire kernel because it requires all DT code to manage reference
>>>>> counting with iterating over nodes. Most users simply get it wrong.
>>>>> Pantelis did some investigation and found that the reference counts on
>>>>> a running kernel are all over the place. I have my doubts that any
>>>>> code really gets it right.
>>>>>
>>>>> The problem is that users need to know when it is appropriate to call
>>>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>>>> an extra call to of_node_put(), and code that is searching for a node
>>>>> in the tree and holding a reference to it needs to call of_node_get().
>>>>>
>>>>
>>>> In hindsight it appears that drivers just can't get the lifecycle right.
>>>> So we need to simplify things.
>>>>
>>>>> I've got a few pseries questions:
>>>>> - What are the changes being requested by pseries firmware? Is it only
>>>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>>>> - How frequent are the changes? How many changes would be likely over
>>>>> the runtime of the system?
>>>>> - Are you able to verify that removed nodes are actually able to be
>>>>> freed correctly? Do you have any testcases for node removal?
>>>>>
>>>>> I'm thinking very seriously about changing the locking semantics of DT
>>>>> code entirely so that most users never have to worry about
>>>>> of_node_get/put at all. If the DT code is switched to use rcu
>>>>> primitives for tree iteration (which also means making DT code use
>>>>> list_head, something I'm already investigating), then instead of
>>>>> trying to figure out of_node_get/put rules, callers could use
>>>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>>>> searching over nodes, and only call of_node_get() if the node pointer
>>>>> is needed outside the rcu read-side lock.
>>>>>
>>>>> I'd really like to be rid of the node reference counting entirely, but
>>>>> I can't figure out a way of doing that safely, so I'd settle for
>>>>> making it a lot easier to get correct.
>>>>>
>>>>
>>>> Since we're going about changing things, how about that devtree_lock?
>>>
>>> I believe rcu would pretty much eliminate the devtree_lock entirely. All
>>> modifiers would need to grab a mutex to ensure there is only one writer
>>> at any given time, but readers would have free reign to parse the tree
>>> however they like.
>>>
>>> DT writers would have to follow some strict rules about how to handle
>>> nodes that are removed (ie. don't modify or of_node_put() them until
>>> after rcu is syncronized), but the number of writers is very small and
>>> we have control of all of them.
>>>
>>>> We're using a raw_spinlock and we're always taking the lock with
>>>> interrupts disabled.
>>>>
>>>> If we're going to make DT changes frequently during normal runtime
>>>> and not only during boot time, those are bad for any kind of real-time
>>>> performance.
>>>>
>>>> So the question is, do we really have code that access the live tree
>>>> during atomic sections?  Is that something we want? Enforcing this
>>>> will make our lives easier, and we'll get the change to replace
>>>> that spinlock with a mutex.
>>>
>>> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
>>> sections. I cannot put my finger on the exact code however. Nathan might
>>> know better. But, if I'm right, the whole problem goes away with RCU.
>>
>> I went back through the cpu hotplug code. we do update the DT during cpu
>> hotplug but I don't see it happening during atomic sections.
>>
>> The code is in arch/powerpc/platforms/pseries/dlpar.c
> 
> Great, thanks,
> 
> By the way, notifiers currently get sent before any updates are applied
> to the tree. I want to change it so that the notifier gets sent
> afterwards. Does that work for you? I've looked through all the users
> and aside from a stupid block of code in arch/powerpc/kernel/prom.c
> which does things that should be done by of_attach_node(), it looks like
> everything should be fine.

This would affect property updates. When doing a property update the
notifier passes a pointer to a struct containing a device node
pointer and a pointer to the new device node property.

I know specifically in memory property updates we grab the current version
of the device tree property and compare it to the 'new' version that 
was passed to us.

If you want to do the DT update before calling the notifier that should be
fine for the memory update code and would only require very minimal
updates.

It does look like OF_RECONFIG_UPDATE_PROPERTY is used in a couple of other
places that would need to be investigated.

-Nathan 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-25 20:24                 ` Grant Likely
@ 2014-06-26 20:01                     ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-26 20:01 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On 06/25/2014 03:24 PM, Grant Likely wrote:
> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>> On 06/23/2014 09:48 AM, Grant Likely wrote:
>>> On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>> On 06/18/2014 03:07 PM, Grant Likely wrote:
>>>>> Hi Nathan and Tyrel,
>>>>>
>>>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>>>> the entire kernel because it requires all DT code to manage reference
>>>>> counting with iterating over nodes. Most users simply get it wrong.
>>>>> Pantelis did some investigation and found that the reference counts on
>>>>> a running kernel are all over the place. I have my doubts that any
>>>>> code really gets it right.
>>>>>
>>>>> The problem is that users need to know when it is appropriate to call
>>>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>>>> an extra call to of_node_put(), and code that is searching for a node
>>>>> in the tree and holding a reference to it needs to call of_node_get().
>>>>>
>>>>> I've got a few pseries questions:
>>>>> - What are the changes being requested by pseries firmware? Is it only
>>>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>>>
>>>> The short answer, everything.
>>>
>>> :-)
>>>
>>>> For pseries the two big actions that can change the device tree are
>>>> adding/removing resources and partition migration.
>>>>
>>>> The most frequent updates to the device tree happen during resource
>>>> (cpu, memory, and pci/phb) add and remove. During this process we add
>>>> and remove the node and its properties from the device tree.
>>>> - For memory on newer systems this just involves updating the
>>>>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
>>>>   firmware levels add and remove the memroy@XXX nodes and their properties.
>>>> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
>>>>   or removed
>>>> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
>>>>
>>>> The less frequent operation of live partition migration (and suspend/resume)
>>>> can update just about anything in the device tree. When this occurs and the
>>>> systems starts after being migrated (or waking up after a suspend) we make
>>>> a call to firmware to get updates to the device tree for the new hardware
>>>> we are running on.
>>>>  
>>>>> - How frequent are the changes? How many changes would be likely over
>>>>> the runtime of the system?
>>>>
>>>> This can happen frequently.
>>>
>>> Thanks, that is exactly the information that I want. I'm not so much
>>> concerned with the addition or removal of nodes/properties, which is
>>> actually pretty easy to handle. It is the lifecycle of allocations on
>>> dynamic nodes that causes heartburn.
>>>
>>>>> - Are you able to verify that removed nodes are actually able to be
>>>>> freed correctly? Do you have any testcases for node removal?
>>>>
>>>> I have always tested this by doing resource add/remove, usually cpu and memory
>>>> since it is the easiest.
>>>
>>> Is that just testing the functionality, or do you have tests that check
>>> if the memory gets freed?
>>
>> In general it's just functionality testing.
>>
>>>
>>>>> I'm thinking very seriously about changing the locking semantics of DT
>>>>> code entirely so that most users never have to worry about
>>>>> of_node_get/put at all. If the DT code is switched to use rcu
>>>>> primitives for tree iteration (which also means making DT code use
>>>>> list_head, something I'm already investigating), then instead of
>>>>> trying to figure out of_node_get/put rules, callers could use
>>>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>>>> searching over nodes, and only call of_node_get() if the node pointer
>>>>> is needed outside the rcu read-side lock.
>>>>>
>>>>
>>>> This sounds good. I like just taking the rcu lock around accessing the DT.
>>>> Do we have many places where DT node pointers are held that require
>>>> keeping the of_node_get/put calls? If this did exist perhaps we could
>>>> update those places to look up the DT node every time instead of
>>>> holding on to the pointer. We could just get rid of the reference counting
>>>> altogether then.
>>>
>>> There are a few, but I would be happy to restrict reference counting to
>>> only those locations. Most places will decode the DT data, and then
>>> throw away the reference. We /might/ even be able to do rcu_lock/unlock
>>> around the entire probe path which would make it transparent to all
>>> device drivers.
>>>
>>>>> I'd really like to be rid of the node reference counting entirely, but
>>>>> I can't figure out a way of doing that safely, so I'd settle for
>>>>> making it a lot easier to get correct.
>>>>>
>>>>
>>>> heh! I have often thought about adding reference counting to device tree
>>>> properties.
>>>
>>> You horrible, horrible man.
>>
>> Yes. I are evil :)
>>
>> After looking again the work needed to add reference counts to properties
>> would be huge. The few properties I am concerned with are specific to powerpc
>> so perhaps just adding an arch specific lock around updating those
>> properties would work.
> 
> Which code/properties? I'd like to have a look myself.

/ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory

The property is updated in 
arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

-Nathan

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-26 20:01                     ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-26 20:01 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler; +Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On 06/25/2014 03:24 PM, Grant Likely wrote:
> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>> On 06/23/2014 09:48 AM, Grant Likely wrote:
>>> On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>>> On 06/18/2014 03:07 PM, Grant Likely wrote:
>>>>> Hi Nathan and Tyrel,
>>>>>
>>>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>>>> the entire kernel because it requires all DT code to manage reference
>>>>> counting with iterating over nodes. Most users simply get it wrong.
>>>>> Pantelis did some investigation and found that the reference counts on
>>>>> a running kernel are all over the place. I have my doubts that any
>>>>> code really gets it right.
>>>>>
>>>>> The problem is that users need to know when it is appropriate to call
>>>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>>>> an extra call to of_node_put(), and code that is searching for a node
>>>>> in the tree and holding a reference to it needs to call of_node_get().
>>>>>
>>>>> I've got a few pseries questions:
>>>>> - What are the changes being requested by pseries firmware? Is it only
>>>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>>>
>>>> The short answer, everything.
>>>
>>> :-)
>>>
>>>> For pseries the two big actions that can change the device tree are
>>>> adding/removing resources and partition migration.
>>>>
>>>> The most frequent updates to the device tree happen during resource
>>>> (cpu, memory, and pci/phb) add and remove. During this process we add
>>>> and remove the node and its properties from the device tree.
>>>> - For memory on newer systems this just involves updating the
>>>>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
>>>>   firmware levels add and remove the memroy@XXX nodes and their properties.
>>>> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
>>>>   or removed
>>>> - For pci/phb the pci@XXXXX nodes and properties are added/removed.
>>>>
>>>> The less frequent operation of live partition migration (and suspend/resume)
>>>> can update just about anything in the device tree. When this occurs and the
>>>> systems starts after being migrated (or waking up after a suspend) we make
>>>> a call to firmware to get updates to the device tree for the new hardware
>>>> we are running on.
>>>>  
>>>>> - How frequent are the changes? How many changes would be likely over
>>>>> the runtime of the system?
>>>>
>>>> This can happen frequently.
>>>
>>> Thanks, that is exactly the information that I want. I'm not so much
>>> concerned with the addition or removal of nodes/properties, which is
>>> actually pretty easy to handle. It is the lifecycle of allocations on
>>> dynamic nodes that causes heartburn.
>>>
>>>>> - Are you able to verify that removed nodes are actually able to be
>>>>> freed correctly? Do you have any testcases for node removal?
>>>>
>>>> I have always tested this by doing resource add/remove, usually cpu and memory
>>>> since it is the easiest.
>>>
>>> Is that just testing the functionality, or do you have tests that check
>>> if the memory gets freed?
>>
>> In general it's just functionality testing.
>>
>>>
>>>>> I'm thinking very seriously about changing the locking semantics of DT
>>>>> code entirely so that most users never have to worry about
>>>>> of_node_get/put at all. If the DT code is switched to use rcu
>>>>> primitives for tree iteration (which also means making DT code use
>>>>> list_head, something I'm already investigating), then instead of
>>>>> trying to figure out of_node_get/put rules, callers could use
>>>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>>>> searching over nodes, and only call of_node_get() if the node pointer
>>>>> is needed outside the rcu read-side lock.
>>>>>
>>>>
>>>> This sounds good. I like just taking the rcu lock around accessing the DT.
>>>> Do we have many places where DT node pointers are held that require
>>>> keeping the of_node_get/put calls? If this did exist perhaps we could
>>>> update those places to look up the DT node every time instead of
>>>> holding on to the pointer. We could just get rid of the reference counting
>>>> altogether then.
>>>
>>> There are a few, but I would be happy to restrict reference counting to
>>> only those locations. Most places will decode the DT data, and then
>>> throw away the reference. We /might/ even be able to do rcu_lock/unlock
>>> around the entire probe path which would make it transparent to all
>>> device drivers.
>>>
>>>>> I'd really like to be rid of the node reference counting entirely, but
>>>>> I can't figure out a way of doing that safely, so I'd settle for
>>>>> making it a lot easier to get correct.
>>>>>
>>>>
>>>> heh! I have often thought about adding reference counting to device tree
>>>> properties.
>>>
>>> You horrible, horrible man.
>>
>> Yes. I are evil :)
>>
>> After looking again the work needed to add reference counts to properties
>> would be huge. The few properties I am concerned with are specific to powerpc
>> so perhaps just adding an arch specific lock around updating those
>> properties would work.
> 
> Which code/properties? I'd like to have a look myself.

/ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory

The property is updated in 
arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

-Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-26 19:59                     ` Nathan Fontenot
@ 2014-06-27 12:32                         ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-27 12:32 UTC (permalink / raw)
  To: Nathan Fontenot, Pantelis Antoniou
  Cc: Tyrel Datwyler, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 06/25/2014 03:22 PM, Grant Likely wrote:
> > On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> >> On 06/23/2014 09:58 AM, Grant Likely wrote:
> >>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> >>>> Hi Grant,
> >>>>
> >>>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
> >>>> ideas...
> >>>>
> >>>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
> >>>>
> >>>>> Hi Nathan and Tyrel,
> >>>>>
> >>>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>>>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>>>> the entire kernel because it requires all DT code to manage reference
> >>>>> counting with iterating over nodes. Most users simply get it wrong.
> >>>>> Pantelis did some investigation and found that the reference counts on
> >>>>> a running kernel are all over the place. I have my doubts that any
> >>>>> code really gets it right.
> >>>>>
> >>>>> The problem is that users need to know when it is appropriate to call
> >>>>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>>>> an extra call to of_node_put(), and code that is searching for a node
> >>>>> in the tree and holding a reference to it needs to call of_node_get().
> >>>>>
> >>>>
> >>>> In hindsight it appears that drivers just can't get the lifecycle right.
> >>>> So we need to simplify things.
> >>>>
> >>>>> I've got a few pseries questions:
> >>>>> - What are the changes being requested by pseries firmware? Is it only
> >>>>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>>>> - How frequent are the changes? How many changes would be likely over
> >>>>> the runtime of the system?
> >>>>> - Are you able to verify that removed nodes are actually able to be
> >>>>> freed correctly? Do you have any testcases for node removal?
> >>>>>
> >>>>> I'm thinking very seriously about changing the locking semantics of DT
> >>>>> code entirely so that most users never have to worry about
> >>>>> of_node_get/put at all. If the DT code is switched to use rcu
> >>>>> primitives for tree iteration (which also means making DT code use
> >>>>> list_head, something I'm already investigating), then instead of
> >>>>> trying to figure out of_node_get/put rules, callers could use
> >>>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>>>> searching over nodes, and only call of_node_get() if the node pointer
> >>>>> is needed outside the rcu read-side lock.
> >>>>>
> >>>>> I'd really like to be rid of the node reference counting entirely, but
> >>>>> I can't figure out a way of doing that safely, so I'd settle for
> >>>>> making it a lot easier to get correct.
> >>>>>
> >>>>
> >>>> Since we're going about changing things, how about that devtree_lock?
> >>>
> >>> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> >>> modifiers would need to grab a mutex to ensure there is only one writer
> >>> at any given time, but readers would have free reign to parse the tree
> >>> however they like.
> >>>
> >>> DT writers would have to follow some strict rules about how to handle
> >>> nodes that are removed (ie. don't modify or of_node_put() them until
> >>> after rcu is syncronized), but the number of writers is very small and
> >>> we have control of all of them.
> >>>
> >>>> We're using a raw_spinlock and we're always taking the lock with
> >>>> interrupts disabled.
> >>>>
> >>>> If we're going to make DT changes frequently during normal runtime
> >>>> and not only during boot time, those are bad for any kind of real-time
> >>>> performance.
> >>>>
> >>>> So the question is, do we really have code that access the live tree
> >>>> during atomic sections?  Is that something we want? Enforcing this
> >>>> will make our lives easier, and we'll get the change to replace
> >>>> that spinlock with a mutex.
> >>>
> >>> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> >>> sections. I cannot put my finger on the exact code however. Nathan might
> >>> know better. But, if I'm right, the whole problem goes away with RCU.
> >>
> >> I went back through the cpu hotplug code. we do update the DT during cpu
> >> hotplug but I don't see it happening during atomic sections.
> >>
> >> The code is in arch/powerpc/platforms/pseries/dlpar.c
> > 
> > Great, thanks,
> > 
> > By the way, notifiers currently get sent before any updates are applied
> > to the tree. I want to change it so that the notifier gets sent
> > afterwards. Does that work for you? I've looked through all the users
> > and aside from a stupid block of code in arch/powerpc/kernel/prom.c
> > which does things that should be done by of_attach_node(), it looks like
> > everything should be fine.
> 
> This would affect property updates. When doing a property update the
> notifier passes a pointer to a struct containing a device node
> pointer and a pointer to the new device node property.
> 
> I know specifically in memory property updates we grab the current version
> of the device tree property and compare it to the 'new' version that 
> was passed to us.
> 
> If you want to do the DT update before calling the notifier that should be
> fine for the memory update code and would only require very minimal
> updates.

We could change the notifier to include both the old and new values.

I've been thinking about changing the notifier format anyway. With the
addition of bulk changes, it would be more efficient to send a single
notifier for all the changes with a link to the change set instead of
one at a time.

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-27 12:32                         ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-27 12:32 UTC (permalink / raw)
  To: Nathan Fontenot, Pantelis Antoniou
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 06/25/2014 03:22 PM, Grant Likely wrote:
> > On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> >> On 06/23/2014 09:58 AM, Grant Likely wrote:
> >>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
> >>>> Hi Grant,
> >>>>
> >>>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
> >>>> ideas...
> >>>>
> >>>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
> >>>>
> >>>>> Hi Nathan and Tyrel,
> >>>>>
> >>>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>>>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>>>> the entire kernel because it requires all DT code to manage reference
> >>>>> counting with iterating over nodes. Most users simply get it wrong.
> >>>>> Pantelis did some investigation and found that the reference counts on
> >>>>> a running kernel are all over the place. I have my doubts that any
> >>>>> code really gets it right.
> >>>>>
> >>>>> The problem is that users need to know when it is appropriate to call
> >>>>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>>>> an extra call to of_node_put(), and code that is searching for a node
> >>>>> in the tree and holding a reference to it needs to call of_node_get().
> >>>>>
> >>>>
> >>>> In hindsight it appears that drivers just can't get the lifecycle right.
> >>>> So we need to simplify things.
> >>>>
> >>>>> I've got a few pseries questions:
> >>>>> - What are the changes being requested by pseries firmware? Is it only
> >>>>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>>>> - How frequent are the changes? How many changes would be likely over
> >>>>> the runtime of the system?
> >>>>> - Are you able to verify that removed nodes are actually able to be
> >>>>> freed correctly? Do you have any testcases for node removal?
> >>>>>
> >>>>> I'm thinking very seriously about changing the locking semantics of DT
> >>>>> code entirely so that most users never have to worry about
> >>>>> of_node_get/put at all. If the DT code is switched to use rcu
> >>>>> primitives for tree iteration (which also means making DT code use
> >>>>> list_head, something I'm already investigating), then instead of
> >>>>> trying to figure out of_node_get/put rules, callers could use
> >>>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>>>> searching over nodes, and only call of_node_get() if the node pointer
> >>>>> is needed outside the rcu read-side lock.
> >>>>>
> >>>>> I'd really like to be rid of the node reference counting entirely, but
> >>>>> I can't figure out a way of doing that safely, so I'd settle for
> >>>>> making it a lot easier to get correct.
> >>>>>
> >>>>
> >>>> Since we're going about changing things, how about that devtree_lock?
> >>>
> >>> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> >>> modifiers would need to grab a mutex to ensure there is only one writer
> >>> at any given time, but readers would have free reign to parse the tree
> >>> however they like.
> >>>
> >>> DT writers would have to follow some strict rules about how to handle
> >>> nodes that are removed (ie. don't modify or of_node_put() them until
> >>> after rcu is syncronized), but the number of writers is very small and
> >>> we have control of all of them.
> >>>
> >>>> We're using a raw_spinlock and we're always taking the lock with
> >>>> interrupts disabled.
> >>>>
> >>>> If we're going to make DT changes frequently during normal runtime
> >>>> and not only during boot time, those are bad for any kind of real-time
> >>>> performance.
> >>>>
> >>>> So the question is, do we really have code that access the live tree
> >>>> during atomic sections?  Is that something we want? Enforcing this
> >>>> will make our lives easier, and we'll get the change to replace
> >>>> that spinlock with a mutex.
> >>>
> >>> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> >>> sections. I cannot put my finger on the exact code however. Nathan might
> >>> know better. But, if I'm right, the whole problem goes away with RCU.
> >>
> >> I went back through the cpu hotplug code. we do update the DT during cpu
> >> hotplug but I don't see it happening during atomic sections.
> >>
> >> The code is in arch/powerpc/platforms/pseries/dlpar.c
> > 
> > Great, thanks,
> > 
> > By the way, notifiers currently get sent before any updates are applied
> > to the tree. I want to change it so that the notifier gets sent
> > afterwards. Does that work for you? I've looked through all the users
> > and aside from a stupid block of code in arch/powerpc/kernel/prom.c
> > which does things that should be done by of_attach_node(), it looks like
> > everything should be fine.
> 
> This would affect property updates. When doing a property update the
> notifier passes a pointer to a struct containing a device node
> pointer and a pointer to the new device node property.
> 
> I know specifically in memory property updates we grab the current version
> of the device tree property and compare it to the 'new' version that 
> was passed to us.
> 
> If you want to do the DT update before calling the notifier that should be
> fine for the memory update code and would only require very minimal
> updates.

We could change the notifier to include both the old and new values.

I've been thinking about changing the notifier format anyway. With the
addition of bulk changes, it would be more efficient to send a single
notifier for all the changes with a link to the change set instead of
one at a time.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-27 12:32                         ` Grant Likely
@ 2014-06-27 12:40                             ` Pantelis Antoniou
  -1 siblings, 0 replies; 48+ messages in thread
From: Pantelis Antoniou @ 2014-06-27 12:40 UTC (permalink / raw)
  To: Grant Likely
  Cc: Nathan Fontenot, Tyrel Datwyler, Benjamin Herrenschmidt,
	linuxppc-dev, devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

Hi Grant,

On Jun 27, 2014, at 3:32 PM, Grant Likely wrote:

> On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>> On 06/25/2014 03:22 PM, Grant Likely wrote:
>>> On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>> On 06/23/2014 09:58 AM, Grant Likely wrote:
>>>>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>>>>>> Hi Grant,
>>>>>> 

[snip]

>> 
>> This would affect property updates. When doing a property update the
>> notifier passes a pointer to a struct containing a device node
>> pointer and a pointer to the new device node property.
>> 
>> I know specifically in memory property updates we grab the current version
>> of the device tree property and compare it to the 'new' version that 
>> was passed to us.
>> 
>> If you want to do the DT update before calling the notifier that should be
>> fine for the memory update code and would only require very minimal
>> updates.
> 
> We could change the notifier to include both the old and new values.
> 
> I've been thinking about changing the notifier format anyway. With the
> addition of bulk changes, it would be more efficient to send a single
> notifier for all the changes with a link to the change set instead of
> one at a time.
> 

That one has my vote. We also need a bulk change notifier, and for device
driver use, some kind of wrapper for specific node/properties.

At the moment a notification is fired for any change in the tree, we might
work something more fine-grained. Like 'watch this node & subnodes', or
'watch this property (or set of properties)'

> g.


Regards

-- Pantelis--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-27 12:40                             ` Pantelis Antoniou
  0 siblings, 0 replies; 48+ messages in thread
From: Pantelis Antoniou @ 2014-06-27 12:40 UTC (permalink / raw)
  To: Grant Likely
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

Hi Grant,

On Jun 27, 2014, at 3:32 PM, Grant Likely wrote:

> On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot =
<nfont@austin.ibm.com> wrote:
>> On 06/25/2014 03:22 PM, Grant Likely wrote:
>>> On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot =
<nfont@austin.ibm.com> wrote:
>>>> On 06/23/2014 09:58 AM, Grant Likely wrote:
>>>>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou =
<pantelis.antoniou@konsulko.com> wrote:
>>>>>> Hi Grant,
>>>>>>=20

[snip]

>>=20
>> This would affect property updates. When doing a property update the
>> notifier passes a pointer to a struct containing a device node
>> pointer and a pointer to the new device node property.
>>=20
>> I know specifically in memory property updates we grab the current =
version
>> of the device tree property and compare it to the 'new' version that=20=

>> was passed to us.
>>=20
>> If you want to do the DT update before calling the notifier that =
should be
>> fine for the memory update code and would only require very minimal
>> updates.
>=20
> We could change the notifier to include both the old and new values.
>=20
> I've been thinking about changing the notifier format anyway. With the
> addition of bulk changes, it would be more efficient to send a single
> notifier for all the changes with a link to the change set instead of
> one at a time.
>=20

That one has my vote. We also need a bulk change notifier, and for =
device
driver use, some kind of wrapper for specific node/properties.

At the moment a notification is fired for any change in the tree, we =
might
work something more fine-grained. Like 'watch this node & subnodes', or
'watch this property (or set of properties)'

> g.


Regards

-- Pantelis=

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-26 20:01                     ` Nathan Fontenot
@ 2014-06-27 12:41                         ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-27 12:41 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 06/25/2014 03:24 PM, Grant Likely wrote:
> > On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> >>>> heh! I have often thought about adding reference counting to device tree
> >>>> properties.
> >>>
> >>> You horrible, horrible man.
> >>
> >> Yes. I are evil :)
> >>
> >> After looking again the work needed to add reference counts to properties
> >> would be huge. The few properties I am concerned with are specific to powerpc
> >> so perhaps just adding an arch specific lock around updating those
> >> properties would work.
> > 
> > Which code/properties? I'd like to have a look myself.
> 
> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
> 
> The property is updated in 
> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

Specifically, what do you need for the locking? Are you wanting to hold
off additional changes while that function is executing? Pantelis is
adding a mutex for device tree writers. Holding that mutex would prevent
any changes from happening in the tree without affecting readers. Would
that be sufficient?

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-27 12:41                         ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-06-27 12:41 UTC (permalink / raw)
  To: Nathan Fontenot, Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 06/25/2014 03:24 PM, Grant Likely wrote:
> > On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> >>>> heh! I have often thought about adding reference counting to device tree
> >>>> properties.
> >>>
> >>> You horrible, horrible man.
> >>
> >> Yes. I are evil :)
> >>
> >> After looking again the work needed to add reference counts to properties
> >> would be huge. The few properties I am concerned with are specific to powerpc
> >> so perhaps just adding an arch specific lock around updating those
> >> properties would work.
> > 
> > Which code/properties? I'd like to have a look myself.
> 
> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
> 
> The property is updated in 
> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

Specifically, what do you need for the locking? Are you wanting to hold
off additional changes while that function is executing? Pantelis is
adding a mutex for device tree writers. Holding that mutex would prevent
any changes from happening in the tree without affecting readers. Would
that be sufficient?

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-27 12:40                             ` Pantelis Antoniou
@ 2014-06-27 14:41                                 ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-27 14:41 UTC (permalink / raw)
  To: Pantelis Antoniou, Grant Likely
  Cc: Tyrel Datwyler, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Steven Rostedt

On 06/27/2014 07:40 AM, Pantelis Antoniou wrote:
> Hi Grant,
> 
> On Jun 27, 2014, at 3:32 PM, Grant Likely wrote:
> 
>> On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>> On 06/25/2014 03:22 PM, Grant Likely wrote:
>>>> On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>>> On 06/23/2014 09:58 AM, Grant Likely wrote:
>>>>>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>>>>>>> Hi Grant,
>>>>>>>
> 
> [snip]
> 
>>>
>>> This would affect property updates. When doing a property update the
>>> notifier passes a pointer to a struct containing a device node
>>> pointer and a pointer to the new device node property.
>>>
>>> I know specifically in memory property updates we grab the current version
>>> of the device tree property and compare it to the 'new' version that 
>>> was passed to us.
>>>
>>> If you want to do the DT update before calling the notifier that should be
>>> fine for the memory update code and would only require very minimal
>>> updates.
>>
>> We could change the notifier to include both the old and new values.
>>
>> I've been thinking about changing the notifier format anyway. With the
>> addition of bulk changes, it would be more efficient to send a single
>> notifier for all the changes with a link to the change set instead of
>> one at a time.
>>
> 
> That one has my vote. We also need a bulk change notifier, and for device
> driver use, some kind of wrapper for specific node/properties.
> 
> At the moment a notification is fired for any change in the tree, we might
> work something more fine-grained. Like 'watch this node & subnodes', or
> 'watch this property (or set of properties)'
> 

Both of these updates would work.

For property updates the only real requirement is that we can get to the new
and the old version of the property value.

I like the idea of being able to watch a single node/property. My experience
is that most code is only interested in updates to a single node or
property. Being able to avoid notifying everyone that has registered a
notifier for DT updates for every change would be nice.

-Nathan
-Nathan

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-27 14:41                                 ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-27 14:41 UTC (permalink / raw)
  To: Pantelis Antoniou, Grant Likely
  Cc: devicetree, Steven Rostedt, linuxppc-dev, Tyrel Datwyler,
	Thomas Gleixner

On 06/27/2014 07:40 AM, Pantelis Antoniou wrote:
> Hi Grant,
> 
> On Jun 27, 2014, at 3:32 PM, Grant Likely wrote:
> 
>> On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>> On 06/25/2014 03:22 PM, Grant Likely wrote:
>>>> On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>>>> On 06/23/2014 09:58 AM, Grant Likely wrote:
>>>>>> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
>>>>>>> Hi Grant,
>>>>>>>
> 
> [snip]
> 
>>>
>>> This would affect property updates. When doing a property update the
>>> notifier passes a pointer to a struct containing a device node
>>> pointer and a pointer to the new device node property.
>>>
>>> I know specifically in memory property updates we grab the current version
>>> of the device tree property and compare it to the 'new' version that 
>>> was passed to us.
>>>
>>> If you want to do the DT update before calling the notifier that should be
>>> fine for the memory update code and would only require very minimal
>>> updates.
>>
>> We could change the notifier to include both the old and new values.
>>
>> I've been thinking about changing the notifier format anyway. With the
>> addition of bulk changes, it would be more efficient to send a single
>> notifier for all the changes with a link to the change set instead of
>> one at a time.
>>
> 
> That one has my vote. We also need a bulk change notifier, and for device
> driver use, some kind of wrapper for specific node/properties.
> 
> At the moment a notification is fired for any change in the tree, we might
> work something more fine-grained. Like 'watch this node & subnodes', or
> 'watch this property (or set of properties)'
> 

Both of these updates would work.

For property updates the only real requirement is that we can get to the new
and the old version of the property value.

I like the idea of being able to watch a single node/property. My experience
is that most code is only interested in updates to a single node or
property. Being able to avoid notifying everyone that has registered a
notifier for DT updates for every change would be nice.

-Nathan
-Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-27 12:41                         ` Grant Likely
@ 2014-06-27 14:41                             ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-27 14:41 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler
  Cc: Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

On 06/27/2014 07:41 AM, Grant Likely wrote:
> On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>> On 06/25/2014 03:24 PM, Grant Likely wrote:
>>> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>>>> heh! I have often thought about adding reference counting to device tree
>>>>>> properties.
>>>>>
>>>>> You horrible, horrible man.
>>>>
>>>> Yes. I are evil :)
>>>>
>>>> After looking again the work needed to add reference counts to properties
>>>> would be huge. The few properties I am concerned with are specific to powerpc
>>>> so perhaps just adding an arch specific lock around updating those
>>>> properties would work.
>>>
>>> Which code/properties? I'd like to have a look myself.
>>
>> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
>>
>> The property is updated in 
>> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
> 
> Specifically, what do you need for the locking? Are you wanting to hold
> off additional changes while that function is executing? Pantelis is
> adding a mutex for device tree writers. Holding that mutex would prevent
> any changes from happening in the tree without affecting readers. Would
> that be sufficient?

That would work.

-Nathan

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-06-27 14:41                             ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-06-27 14:41 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler; +Cc: devicetree, Pantelis Antoniou, linuxppc-dev

On 06/27/2014 07:41 AM, Grant Likely wrote:
> On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>> On 06/25/2014 03:24 PM, Grant Likely wrote:
>>> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>>>>> heh! I have often thought about adding reference counting to device tree
>>>>>> properties.
>>>>>
>>>>> You horrible, horrible man.
>>>>
>>>> Yes. I are evil :)
>>>>
>>>> After looking again the work needed to add reference counts to properties
>>>> would be huge. The few properties I am concerned with are specific to powerpc
>>>> so perhaps just adding an arch specific lock around updating those
>>>> properties would work.
>>>
>>> Which code/properties? I'd like to have a look myself.
>>
>> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
>>
>> The property is updated in 
>> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
> 
> Specifically, what do you need for the locking? Are you wanting to hold
> off additional changes while that function is executing? Pantelis is
> adding a mutex for device tree writers. Holding that mutex would prevent
> any changes from happening in the tree without affecting readers. Would
> that be sufficient?

That would work.

-Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-06-27 14:41                             ` Nathan Fontenot
@ 2014-07-16  5:33                                 ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-16  5:33 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Tyrel Datwyler, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou

I've got another question about powerpc reconfiguration. I was looking
at the dlpar_configure_connector() function in dlpar.c. I see that the
function has the ability to process multiple nodes with additional
sibling and child nodes. It appears to link them into a detached tree
structure, and the function returns a pointer to the first node.

All of the callers of that function then call dlpar_attach_node(),
which calls of_attach_node(). However, of_attach_node() only handles a
single node. It doesn't handle siblings or children. Is this a bug?
Does the configure connector ever actually receive more than one node
at once?

g.

On Fri, Jun 27, 2014 at 8:41 AM, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 06/27/2014 07:41 AM, Grant Likely wrote:
>> On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>> On 06/25/2014 03:24 PM, Grant Likely wrote:
>>>> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>>>>> heh! I have often thought about adding reference counting to device tree
>>>>>>> properties.
>>>>>>
>>>>>> You horrible, horrible man.
>>>>>
>>>>> Yes. I are evil :)
>>>>>
>>>>> After looking again the work needed to add reference counts to properties
>>>>> would be huge. The few properties I am concerned with are specific to powerpc
>>>>> so perhaps just adding an arch specific lock around updating those
>>>>> properties would work.
>>>>
>>>> Which code/properties? I'd like to have a look myself.
>>>
>>> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
>>>
>>> The property is updated in
>>> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
>>
>> Specifically, what do you need for the locking? Are you wanting to hold
>> off additional changes while that function is executing? Pantelis is
>> adding a mutex for device tree writers. Holding that mutex would prevent
>> any changes from happening in the tree without affecting readers. Would
>> that be sufficient?
>
> That would work.
>
> -Nathan
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-07-16  5:33                                 ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-16  5:33 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: devicetree, Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

I've got another question about powerpc reconfiguration. I was looking
at the dlpar_configure_connector() function in dlpar.c. I see that the
function has the ability to process multiple nodes with additional
sibling and child nodes. It appears to link them into a detached tree
structure, and the function returns a pointer to the first node.

All of the callers of that function then call dlpar_attach_node(),
which calls of_attach_node(). However, of_attach_node() only handles a
single node. It doesn't handle siblings or children. Is this a bug?
Does the configure connector ever actually receive more than one node
at once?

g.

On Fri, Jun 27, 2014 at 8:41 AM, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 06/27/2014 07:41 AM, Grant Likely wrote:
>> On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>> On 06/25/2014 03:24 PM, Grant Likely wrote:
>>>> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>>>>>> heh! I have often thought about adding reference counting to device tree
>>>>>>> properties.
>>>>>>
>>>>>> You horrible, horrible man.
>>>>>
>>>>> Yes. I are evil :)
>>>>>
>>>>> After looking again the work needed to add reference counts to properties
>>>>> would be huge. The few properties I am concerned with are specific to powerpc
>>>>> so perhaps just adding an arch specific lock around updating those
>>>>> properties would work.
>>>>
>>>> Which code/properties? I'd like to have a look myself.
>>>
>>> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
>>>
>>> The property is updated in
>>> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
>>
>> Specifically, what do you need for the locking? Are you wanting to hold
>> off additional changes while that function is executing? Pantelis is
>> adding a mutex for device tree writers. Holding that mutex would prevent
>> any changes from happening in the tree without affecting readers. Would
>> that be sufficient?
>
> That would work.
>
> -Nathan
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-07-16  5:33                                 ` Grant Likely
@ 2014-07-16 18:30                                     ` Tyrel Datwyler
  -1 siblings, 0 replies; 48+ messages in thread
From: Tyrel Datwyler @ 2014-07-16 18:30 UTC (permalink / raw)
  To: Grant Likely, Nathan Fontenot
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou,
	Tyrel Datwyler, linuxppc-dev

On 07/15/2014 10:33 PM, Grant Likely wrote:
> I've got another question about powerpc reconfiguration. I was looking
> at the dlpar_configure_connector() function in dlpar.c. I see that the
> function has the ability to process multiple nodes with additional
> sibling and child nodes. It appears to link them into a detached tree
> structure, and the function returns a pointer to the first node.
> 
> All of the callers of that function then call dlpar_attach_node(),
> which calls of_attach_node(). However, of_attach_node() only handles a
> single node. It doesn't handle siblings or children. Is this a bug?
> Does the configure connector ever actually receive more than one node
> at once?

Yes, it is sometimes the case we will get a tree structure back of more
than one node. Under the proc interface implementation this just worked.
With the move to sysfs it appears we have a regression here. What makes
more sense here, for us to walk the tree calling of_attach_node, or to
move such tree walking logic into of_attach_node? From what I can tell
we are the only consumers of of_attach_node.

-Tyrel

> 
> g.
> 
> On Fri, Jun 27, 2014 at 8:41 AM, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>> On 06/27/2014 07:41 AM, Grant Likely wrote:
>>> On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>> On 06/25/2014 03:24 PM, Grant Likely wrote:
>>>>> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
>>>>>>>> heh! I have often thought about adding reference counting to device tree
>>>>>>>> properties.
>>>>>>>
>>>>>>> You horrible, horrible man.
>>>>>>
>>>>>> Yes. I are evil :)
>>>>>>
>>>>>> After looking again the work needed to add reference counts to properties
>>>>>> would be huge. The few properties I am concerned with are specific to powerpc
>>>>>> so perhaps just adding an arch specific lock around updating those
>>>>>> properties would work.
>>>>>
>>>>> Which code/properties? I'd like to have a look myself.
>>>>
>>>> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
>>>>
>>>> The property is updated in
>>>> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
>>>
>>> Specifically, what do you need for the locking? Are you wanting to hold
>>> off additional changes while that function is executing? Pantelis is
>>> adding a mutex for device tree writers. Holding that mutex would prevent
>>> any changes from happening in the tree without affecting readers. Would
>>> that be sufficient?
>>
>> That would work.
>>
>> -Nathan
>>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-07-16 18:30                                     ` Tyrel Datwyler
  0 siblings, 0 replies; 48+ messages in thread
From: Tyrel Datwyler @ 2014-07-16 18:30 UTC (permalink / raw)
  To: Grant Likely, Nathan Fontenot
  Cc: devicetree, Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On 07/15/2014 10:33 PM, Grant Likely wrote:
> I've got another question about powerpc reconfiguration. I was looking
> at the dlpar_configure_connector() function in dlpar.c. I see that the
> function has the ability to process multiple nodes with additional
> sibling and child nodes. It appears to link them into a detached tree
> structure, and the function returns a pointer to the first node.
> 
> All of the callers of that function then call dlpar_attach_node(),
> which calls of_attach_node(). However, of_attach_node() only handles a
> single node. It doesn't handle siblings or children. Is this a bug?
> Does the configure connector ever actually receive more than one node
> at once?

Yes, it is sometimes the case we will get a tree structure back of more
than one node. Under the proc interface implementation this just worked.
With the move to sysfs it appears we have a regression here. What makes
more sense here, for us to walk the tree calling of_attach_node, or to
move such tree walking logic into of_attach_node? From what I can tell
we are the only consumers of of_attach_node.

-Tyrel

> 
> g.
> 
> On Fri, Jun 27, 2014 at 8:41 AM, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>> On 06/27/2014 07:41 AM, Grant Likely wrote:
>>> On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>>> On 06/25/2014 03:24 PM, Grant Likely wrote:
>>>>> On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont@austin.ibm.com> wrote:
>>>>>>>> heh! I have often thought about adding reference counting to device tree
>>>>>>>> properties.
>>>>>>>
>>>>>>> You horrible, horrible man.
>>>>>>
>>>>>> Yes. I are evil :)
>>>>>>
>>>>>> After looking again the work needed to add reference counts to properties
>>>>>> would be huge. The few properties I am concerned with are specific to powerpc
>>>>>> so perhaps just adding an arch specific lock around updating those
>>>>>> properties would work.
>>>>>
>>>>> Which code/properties? I'd like to have a look myself.
>>>>
>>>> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
>>>>
>>>> The property is updated in
>>>> arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
>>>
>>> Specifically, what do you need for the locking? Are you wanting to hold
>>> off additional changes while that function is executing? Pantelis is
>>> adding a mutex for device tree writers. Holding that mutex would prevent
>>> any changes from happening in the tree without affecting readers. Would
>>> that be sufficient?
>>
>> That would work.
>>
>> -Nathan
>>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-07-16 18:30                                     ` Tyrel Datwyler
@ 2014-07-16 20:57                                         ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-16 20:57 UTC (permalink / raw)
  To: Tyrel Datwyler
  Cc: Nathan Fontenot, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
<turtle.in.the.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 07/15/2014 10:33 PM, Grant Likely wrote:
>> I've got another question about powerpc reconfiguration. I was looking
>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>> function has the ability to process multiple nodes with additional
>> sibling and child nodes. It appears to link them into a detached tree
>> structure, and the function returns a pointer to the first node.
>>
>> All of the callers of that function then call dlpar_attach_node(),
>> which calls of_attach_node(). However, of_attach_node() only handles a
>> single node. It doesn't handle siblings or children. Is this a bug?
>> Does the configure connector ever actually receive more than one node
>> at once?
>
> Yes, it is sometimes the case we will get a tree structure back of more
> than one node. Under the proc interface implementation this just worked.
> With the move to sysfs it appears we have a regression here. What makes
> more sense here, for us to walk the tree calling of_attach_node, or to
> move such tree walking logic into of_attach_node? From what I can tell
> we are the only consumers of of_attach_node.

That is very shortly going to change. The overlay code also uses
of_attach_node(). I can make of_attach_node() recurse over
descendants, but I'm also considering moving the powerpc code over to
the of_changeset series that Panto and I are working on.

Either way, the handling of multiple nodes should be common code. I
think the easiest is to put the recursion into of_attach_node(), at
least for fixing the bug. It can be reworked later.

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-07-16 20:57                                         ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-16 20:57 UTC (permalink / raw)
  To: Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
<turtle.in.the.kernel@gmail.com> wrote:
> On 07/15/2014 10:33 PM, Grant Likely wrote:
>> I've got another question about powerpc reconfiguration. I was looking
>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>> function has the ability to process multiple nodes with additional
>> sibling and child nodes. It appears to link them into a detached tree
>> structure, and the function returns a pointer to the first node.
>>
>> All of the callers of that function then call dlpar_attach_node(),
>> which calls of_attach_node(). However, of_attach_node() only handles a
>> single node. It doesn't handle siblings or children. Is this a bug?
>> Does the configure connector ever actually receive more than one node
>> at once?
>
> Yes, it is sometimes the case we will get a tree structure back of more
> than one node. Under the proc interface implementation this just worked.
> With the move to sysfs it appears we have a regression here. What makes
> more sense here, for us to walk the tree calling of_attach_node, or to
> move such tree walking logic into of_attach_node? From what I can tell
> we are the only consumers of of_attach_node.

That is very shortly going to change. The overlay code also uses
of_attach_node(). I can make of_attach_node() recurse over
descendants, but I'm also considering moving the powerpc code over to
the of_changeset series that Panto and I are working on.

Either way, the handling of multiple nodes should be common code. I
think the easiest is to put the recursion into of_attach_node(), at
least for fixing the bug. It can be reworked later.

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-07-16 20:57                                         ` Grant Likely
@ 2014-07-16 22:26                                             ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-16 22:26 UTC (permalink / raw)
  To: Tyrel Datwyler
  Cc: Nathan Fontenot, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely <grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
> <turtle.in.the.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 07/15/2014 10:33 PM, Grant Likely wrote:
>>> I've got another question about powerpc reconfiguration. I was looking
>>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>>> function has the ability to process multiple nodes with additional
>>> sibling and child nodes. It appears to link them into a detached tree
>>> structure, and the function returns a pointer to the first node.
>>>
>>> All of the callers of that function then call dlpar_attach_node(),
>>> which calls of_attach_node(). However, of_attach_node() only handles a
>>> single node. It doesn't handle siblings or children. Is this a bug?
>>> Does the configure connector ever actually receive more than one node
>>> at once?
>>
>> Yes, it is sometimes the case we will get a tree structure back of more
>> than one node. Under the proc interface implementation this just worked.
>> With the move to sysfs it appears we have a regression here. What makes
>> more sense here, for us to walk the tree calling of_attach_node, or to
>> move such tree walking logic into of_attach_node? From what I can tell
>> we are the only consumers of of_attach_node.
>
> That is very shortly going to change. The overlay code also uses
> of_attach_node(). I can make of_attach_node() recurse over
> descendants, but I'm also considering moving the powerpc code over to
> the of_changeset series that Panto and I are working on.
>
> Either way, the handling of multiple nodes should be common code. I
> think the easiest is to put the recursion into of_attach_node(), at
> least for fixing the bug. It can be reworked later.

On pseries, do notifiers ever fail? ie. Does the reconfig code ever
object to a DT change and prevent it from being applied?

g.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-07-16 22:26                                             ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-16 22:26 UTC (permalink / raw)
  To: Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely <grant.likely@linaro.org> wrote:
> On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
> <turtle.in.the.kernel@gmail.com> wrote:
>> On 07/15/2014 10:33 PM, Grant Likely wrote:
>>> I've got another question about powerpc reconfiguration. I was looking
>>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>>> function has the ability to process multiple nodes with additional
>>> sibling and child nodes. It appears to link them into a detached tree
>>> structure, and the function returns a pointer to the first node.
>>>
>>> All of the callers of that function then call dlpar_attach_node(),
>>> which calls of_attach_node(). However, of_attach_node() only handles a
>>> single node. It doesn't handle siblings or children. Is this a bug?
>>> Does the configure connector ever actually receive more than one node
>>> at once?
>>
>> Yes, it is sometimes the case we will get a tree structure back of more
>> than one node. Under the proc interface implementation this just worked.
>> With the move to sysfs it appears we have a regression here. What makes
>> more sense here, for us to walk the tree calling of_attach_node, or to
>> move such tree walking logic into of_attach_node? From what I can tell
>> we are the only consumers of of_attach_node.
>
> That is very shortly going to change. The overlay code also uses
> of_attach_node(). I can make of_attach_node() recurse over
> descendants, but I'm also considering moving the powerpc code over to
> the of_changeset series that Panto and I are working on.
>
> Either way, the handling of multiple nodes should be common code. I
> think the easiest is to put the recursion into of_attach_node(), at
> least for fixing the bug. It can be reworked later.

On pseries, do notifiers ever fail? ie. Does the reconfig code ever
object to a DT change and prevent it from being applied?

g.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-07-16 22:26                                             ` Grant Likely
@ 2014-07-16 23:12                                                 ` Nathan Fontenot
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-07-16 23:12 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Pantelis Antoniou,
	Tyrel Datwyler, linuxppc-dev

On 07/16/2014 05:26 PM, Grant Likely wrote:
> On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely <grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
>> <turtle.in.the.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On 07/15/2014 10:33 PM, Grant Likely wrote:
>>>> I've got another question about powerpc reconfiguration. I was looking
>>>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>>>> function has the ability to process multiple nodes with additional
>>>> sibling and child nodes. It appears to link them into a detached tree
>>>> structure, and the function returns a pointer to the first node.
>>>>
>>>> All of the callers of that function then call dlpar_attach_node(),
>>>> which calls of_attach_node(). However, of_attach_node() only handles a
>>>> single node. It doesn't handle siblings or children. Is this a bug?
>>>> Does the configure connector ever actually receive more than one node
>>>> at once?
>>>
>>> Yes, it is sometimes the case we will get a tree structure back of more
>>> than one node. Under the proc interface implementation this just worked.
>>> With the move to sysfs it appears we have a regression here. What makes
>>> more sense here, for us to walk the tree calling of_attach_node, or to
>>> move such tree walking logic into of_attach_node? From what I can tell
>>> we are the only consumers of of_attach_node.
>>
>> That is very shortly going to change. The overlay code also uses
>> of_attach_node(). I can make of_attach_node() recurse over
>> descendants, but I'm also considering moving the powerpc code over to
>> the of_changeset series that Panto and I are working on.
>>
>> Either way, the handling of multiple nodes should be common code. I
>> think the easiest is to put the recursion into of_attach_node(), at
>> least for fixing the bug. It can be reworked later.
> 
> On pseries, do notifiers ever fail? ie. Does the reconfig code ever
> object to a DT change and prevent it from being applied?
> 
I cannot think of a time that  I ever saw a notifier fail.

-Nathan

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-07-16 23:12                                                 ` Nathan Fontenot
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Fontenot @ 2014-07-16 23:12 UTC (permalink / raw)
  To: Grant Likely, Tyrel Datwyler
  Cc: devicetree, Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On 07/16/2014 05:26 PM, Grant Likely wrote:
> On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely <grant.likely@linaro.org> wrote:
>> On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
>> <turtle.in.the.kernel@gmail.com> wrote:
>>> On 07/15/2014 10:33 PM, Grant Likely wrote:
>>>> I've got another question about powerpc reconfiguration. I was looking
>>>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>>>> function has the ability to process multiple nodes with additional
>>>> sibling and child nodes. It appears to link them into a detached tree
>>>> structure, and the function returns a pointer to the first node.
>>>>
>>>> All of the callers of that function then call dlpar_attach_node(),
>>>> which calls of_attach_node(). However, of_attach_node() only handles a
>>>> single node. It doesn't handle siblings or children. Is this a bug?
>>>> Does the configure connector ever actually receive more than one node
>>>> at once?
>>>
>>> Yes, it is sometimes the case we will get a tree structure back of more
>>> than one node. Under the proc interface implementation this just worked.
>>> With the move to sysfs it appears we have a regression here. What makes
>>> more sense here, for us to walk the tree calling of_attach_node, or to
>>> move such tree walking logic into of_attach_node? From what I can tell
>>> we are the only consumers of of_attach_node.
>>
>> That is very shortly going to change. The overlay code also uses
>> of_attach_node(). I can make of_attach_node() recurse over
>> descendants, but I'm also considering moving the powerpc code over to
>> the of_changeset series that Panto and I are working on.
>>
>> Either way, the handling of multiple nodes should be common code. I
>> think the easiest is to put the recursion into of_attach_node(), at
>> least for fixing the bug. It can be reworked later.
> 
> On pseries, do notifiers ever fail? ie. Does the reconfig code ever
> object to a DT change and prevent it from being applied?
> 
I cannot think of a time that  I ever saw a notifier fail.

-Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
  2014-07-16 23:12                                                 ` Nathan Fontenot
@ 2014-07-17  0:44                                                     ` Grant Likely
  -1 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-17  0:44 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Tyrel Datwyler, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Pantelis Antoniou, Tyrel Datwyler, linuxppc-dev

On Wed, Jul 16, 2014 at 5:12 PM, Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org> wrote:
> On 07/16/2014 05:26 PM, Grant Likely wrote:
>> On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely <grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
>>> <turtle.in.the.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> On 07/15/2014 10:33 PM, Grant Likely wrote:
>>>>> I've got another question about powerpc reconfiguration. I was looking
>>>>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>>>>> function has the ability to process multiple nodes with additional
>>>>> sibling and child nodes. It appears to link them into a detached tree
>>>>> structure, and the function returns a pointer to the first node.
>>>>>
>>>>> All of the callers of that function then call dlpar_attach_node(),
>>>>> which calls of_attach_node(). However, of_attach_node() only handles a
>>>>> single node. It doesn't handle siblings or children. Is this a bug?
>>>>> Does the configure connector ever actually receive more than one node
>>>>> at once?
>>>>
>>>> Yes, it is sometimes the case we will get a tree structure back of more
>>>> than one node. Under the proc interface implementation this just worked.
>>>> With the move to sysfs it appears we have a regression here. What makes
>>>> more sense here, for us to walk the tree calling of_attach_node, or to
>>>> move such tree walking logic into of_attach_node? From what I can tell
>>>> we are the only consumers of of_attach_node.
>>>
>>> That is very shortly going to change. The overlay code also uses
>>> of_attach_node(). I can make of_attach_node() recurse over
>>> descendants, but I'm also considering moving the powerpc code over to
>>> the of_changeset series that Panto and I are working on.
>>>
>>> Either way, the handling of multiple nodes should be common code. I
>>> think the easiest is to put the recursion into of_attach_node(), at
>>> least for fixing the bug. It can be reworked later.
>>
>> On pseries, do notifiers ever fail? ie. Does the reconfig code ever
>> object to a DT change and prevent it from being applied?
>>
> I cannot think of a time that  I ever saw a notifier fail.

Good to know. I was hoping it wasn't part of the design. I can't think
of any situation where the kernel would want to inhibit a change to
the device tree.

Can you take a look at the following tree and give it a spin on
PowerPC. I've reordered the notifiers and hopefully got the powerpc
fixups right, but I don't have a way to test it...

The following changes since commit cd3de83f147601356395b57a8673e9c5ff1e59d1:

  Linux 3.16-rc4 (2014-07-06 12:37:51 -0700)

are available in the git repository at:

  git://git.secretlab.ca/git/linux devicetree/next-overlay

for you to fetch changes up to 44ac93bb5583c5f1f85912309fe04045b61a6dd0:

  of: Transactional DT support. (2014-07-16 16:44:07 -0600)

----------------------------------------------------------------
Grant Likely (6):
      of/platform: Fix of_platform_device_destroy iteration of devices
      of: Move CONFIG_OF_DYNAMIC code into a separate file
      of: Make sure attached nodes don't carry along extra children
      of: Move dynamic node fixups out of powerpc and into common code
      of: Make OF_DYNAMIC user selectable
      of: Reorder device tree changes and notifiers

Pantelis Antoniou (5):
      of: rename of_aliases_mutex to just of_mutex
      OF: Utility helper functions for dynamic nodes
      of: Create unlocked versions of node and property add/remove functions
      of: Make devicetree sysfs update functions consistent.
      of: Transactional DT support.

 Documentation/devicetree/changesets.txt         |  41 ++
 arch/powerpc/kernel/prom.c                      |  70 ---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   2 +-
 drivers/crypto/nx/nx-842.c                      |  30 +-
 drivers/of/Kconfig                              |   2 +-
 drivers/of/Makefile                             |   1 +
 drivers/of/base.c                               | 423 ++++-----------
 drivers/of/device.c                             |   4 +-
 drivers/of/dynamic.c                            | 671 ++++++++++++++++++++++++
 drivers/of/of_private.h                         |  59 ++-
 drivers/of/platform.c                           |  32 +-
 drivers/of/selftest.c                           |  79 +++
 drivers/of/testcase-data/testcases.dtsi         |  10 +
 include/linux/of.h                              |  80 ++-
 include/linux/of_platform.h                     |   7 +-
 15 files changed, 1067 insertions(+), 444 deletions(-)
 create mode 100644 Documentation/devicetree/changesets.txt
 create mode 100644 drivers/of/dynamic.c
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: OF_DYNAMIC node lifecycle
@ 2014-07-17  0:44                                                     ` Grant Likely
  0 siblings, 0 replies; 48+ messages in thread
From: Grant Likely @ 2014-07-17  0:44 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: devicetree, Tyrel Datwyler, Pantelis Antoniou, Tyrel Datwyler,
	linuxppc-dev

On Wed, Jul 16, 2014 at 5:12 PM, Nathan Fontenot <nfont@austin.ibm.com> wrote:
> On 07/16/2014 05:26 PM, Grant Likely wrote:
>> On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely <grant.likely@linaro.org> wrote:
>>> On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
>>> <turtle.in.the.kernel@gmail.com> wrote:
>>>> On 07/15/2014 10:33 PM, Grant Likely wrote:
>>>>> I've got another question about powerpc reconfiguration. I was looking
>>>>> at the dlpar_configure_connector() function in dlpar.c. I see that the
>>>>> function has the ability to process multiple nodes with additional
>>>>> sibling and child nodes. It appears to link them into a detached tree
>>>>> structure, and the function returns a pointer to the first node.
>>>>>
>>>>> All of the callers of that function then call dlpar_attach_node(),
>>>>> which calls of_attach_node(). However, of_attach_node() only handles a
>>>>> single node. It doesn't handle siblings or children. Is this a bug?
>>>>> Does the configure connector ever actually receive more than one node
>>>>> at once?
>>>>
>>>> Yes, it is sometimes the case we will get a tree structure back of more
>>>> than one node. Under the proc interface implementation this just worked.
>>>> With the move to sysfs it appears we have a regression here. What makes
>>>> more sense here, for us to walk the tree calling of_attach_node, or to
>>>> move such tree walking logic into of_attach_node? From what I can tell
>>>> we are the only consumers of of_attach_node.
>>>
>>> That is very shortly going to change. The overlay code also uses
>>> of_attach_node(). I can make of_attach_node() recurse over
>>> descendants, but I'm also considering moving the powerpc code over to
>>> the of_changeset series that Panto and I are working on.
>>>
>>> Either way, the handling of multiple nodes should be common code. I
>>> think the easiest is to put the recursion into of_attach_node(), at
>>> least for fixing the bug. It can be reworked later.
>>
>> On pseries, do notifiers ever fail? ie. Does the reconfig code ever
>> object to a DT change and prevent it from being applied?
>>
> I cannot think of a time that  I ever saw a notifier fail.

Good to know. I was hoping it wasn't part of the design. I can't think
of any situation where the kernel would want to inhibit a change to
the device tree.

Can you take a look at the following tree and give it a spin on
PowerPC. I've reordered the notifiers and hopefully got the powerpc
fixups right, but I don't have a way to test it...

The following changes since commit cd3de83f147601356395b57a8673e9c5ff1e59d1:

  Linux 3.16-rc4 (2014-07-06 12:37:51 -0700)

are available in the git repository at:

  git://git.secretlab.ca/git/linux devicetree/next-overlay

for you to fetch changes up to 44ac93bb5583c5f1f85912309fe04045b61a6dd0:

  of: Transactional DT support. (2014-07-16 16:44:07 -0600)

----------------------------------------------------------------
Grant Likely (6):
      of/platform: Fix of_platform_device_destroy iteration of devices
      of: Move CONFIG_OF_DYNAMIC code into a separate file
      of: Make sure attached nodes don't carry along extra children
      of: Move dynamic node fixups out of powerpc and into common code
      of: Make OF_DYNAMIC user selectable
      of: Reorder device tree changes and notifiers

Pantelis Antoniou (5):
      of: rename of_aliases_mutex to just of_mutex
      OF: Utility helper functions for dynamic nodes
      of: Create unlocked versions of node and property add/remove functions
      of: Make devicetree sysfs update functions consistent.
      of: Transactional DT support.

 Documentation/devicetree/changesets.txt         |  41 ++
 arch/powerpc/kernel/prom.c                      |  70 ---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   2 +-
 drivers/crypto/nx/nx-842.c                      |  30 +-
 drivers/of/Kconfig                              |   2 +-
 drivers/of/Makefile                             |   1 +
 drivers/of/base.c                               | 423 ++++-----------
 drivers/of/device.c                             |   4 +-
 drivers/of/dynamic.c                            | 671 ++++++++++++++++++++++++
 drivers/of/of_private.h                         |  59 ++-
 drivers/of/platform.c                           |  32 +-
 drivers/of/selftest.c                           |  79 +++
 drivers/of/testcase-data/testcases.dtsi         |  10 +
 include/linux/of.h                              |  80 ++-
 include/linux/of_platform.h                     |   7 +-
 15 files changed, 1067 insertions(+), 444 deletions(-)
 create mode 100644 Documentation/devicetree/changesets.txt
 create mode 100644 drivers/of/dynamic.c

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-07-17  0:44 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-18 20:07 OF_DYNAMIC node lifecycle Grant Likely
2014-06-18 20:07 ` Grant Likely
     [not found] ` <CACxGe6tsXuLZT=h8S0yRPRPy6Hqz1xkX8G+ViY0cxEUuxZ1dsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-19  8:33   ` Pantelis Antoniou
2014-06-19  8:33     ` Pantelis Antoniou
     [not found]     ` <43898B58-2EA7-42B5-A17A-27F16F2618A6-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
2014-06-23 14:58       ` Grant Likely
2014-06-23 14:58         ` Grant Likely
     [not found]         ` <20140623145844.DA6A3C40AE5-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-23 15:26           ` Pantelis Antoniou
2014-06-23 15:26             ` Pantelis Antoniou
     [not found]             ` <5213060A-74FB-4CD6-BF1C-4B7DCA98BE51-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
2014-06-23 20:21               ` Grant Likely
2014-06-23 20:21                 ` Grant Likely
2014-06-24 20:07           ` Nathan Fontenot
2014-06-24 20:07             ` Nathan Fontenot
     [not found]             ` <53A9DA69.1040101-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-25 20:22               ` Grant Likely
2014-06-25 20:22                 ` Grant Likely
     [not found]                 ` <20140625202216.16A8AC40AE6-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-26 19:59                   ` Nathan Fontenot
2014-06-26 19:59                     ` Nathan Fontenot
     [not found]                     ` <53AC7BA3.5030909-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-27 12:32                       ` Grant Likely
2014-06-27 12:32                         ` Grant Likely
     [not found]                         ` <20140627123251.D0857C40859-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-27 12:40                           ` Pantelis Antoniou
2014-06-27 12:40                             ` Pantelis Antoniou
     [not found]                             ` <94595B4D-1A58-427C-B9CE-C139048FEDCD-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
2014-06-27 14:41                               ` Nathan Fontenot
2014-06-27 14:41                                 ` Nathan Fontenot
2014-06-19 15:26   ` Nathan Fontenot
2014-06-19 15:26     ` Nathan Fontenot
     [not found]     ` <53A30117.3010100-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-23 14:48       ` Grant Likely
2014-06-23 14:48         ` Grant Likely
     [not found]         ` <20140623144806.1348EC40A60-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-24 20:10           ` Nathan Fontenot
2014-06-24 20:10             ` Nathan Fontenot
     [not found]             ` <53A9DB4F.9060708-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-25 20:24               ` Grant Likely
2014-06-25 20:24                 ` Grant Likely
     [not found]                 ` <20140625202446.77687C40AE6-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-26 20:01                   ` Nathan Fontenot
2014-06-26 20:01                     ` Nathan Fontenot
     [not found]                     ` <53AC7C2D.3040604-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-27 12:41                       ` Grant Likely
2014-06-27 12:41                         ` Grant Likely
     [not found]                         ` <20140627124101.367F7C40E5E-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-27 14:41                           ` Nathan Fontenot
2014-06-27 14:41                             ` Nathan Fontenot
     [not found]                             ` <53AD8296.6040702-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-07-16  5:33                               ` Grant Likely
2014-07-16  5:33                                 ` Grant Likely
     [not found]                                 ` <CACxGe6u5bVwYZjux9F2xzZSxZOSe37DGemCgcYtmiTP-xenQfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-16 18:30                                   ` Tyrel Datwyler
2014-07-16 18:30                                     ` Tyrel Datwyler
     [not found]                                     ` <53C6C4BE.6010301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-07-16 20:57                                       ` Grant Likely
2014-07-16 20:57                                         ` Grant Likely
     [not found]                                         ` <CACxGe6v8BgEcTRB-ftPVkR6Tqs3GPw_0fVuwFgff_VqawoocGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-16 22:26                                           ` Grant Likely
2014-07-16 22:26                                             ` Grant Likely
     [not found]                                             ` <CACxGe6scb291V9rjz2P48FxCEtXEEtOkgiQms=nZ_wdOf_cqHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-16 23:12                                               ` Nathan Fontenot
2014-07-16 23:12                                                 ` Nathan Fontenot
     [not found]                                                 ` <53C706D2.7080207-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-07-17  0:44                                                   ` Grant Likely
2014-07-17  0:44                                                     ` Grant Likely

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.