All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Fontenot <nfont-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
To: Grant Likely
	<grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	Pantelis Antoniou
	<pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
Cc: Tyrel Datwyler
	<tyreld-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Benjamin Herrenschmidt
	<benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>,
	linuxppc-dev
	<linuxppc-dev-mnsaURCQ41sdnm+yROfE0A@public.gmane.org>,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
Subject: Re: OF_DYNAMIC node lifecycle
Date: Tue, 24 Jun 2014 15:07:05 -0500	[thread overview]
Message-ID: <53A9DA69.1040101@austin.ibm.com> (raw)
In-Reply-To: <20140623145844.DA6A3C40AE5-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>

On 06/23/2014 09:58 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Grant,
>>
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>>
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>
>> In hindsight it appears that drivers just can't get the lifecycle right.
>> So we need to simplify things.
>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>>
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> Since we're going about changing things, how about that devtree_lock?
> 
> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> modifiers would need to grab a mutex to ensure there is only one writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
> 
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
> 
>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>>
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of real-time
>> performance.
>>
>> So the question is, do we really have code that access the live tree
>> during atomic sections?  Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
> 
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan might
> know better. But, if I'm right, the whole problem goes away with RCU.

I went back through the cpu hotplug code. we do update the DT during cpu
hotplug but I don't see it happening during atomic sections.

The code is in arch/powerpc/platforms/pseries/dlpar.c

-Nathan

> 
> The design with RCU is to switch struct device_node and struct property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals to
> be non-atomic (there will be a brief period where the node can be found
> on one list, but not the other) because it is a transient state already
> accounted for in rcu read-side critical region.
> 
> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
> 
> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still require
> the of_node_get/put stuff (basically, so existing code continue to works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
> 
> g.
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Nathan Fontenot <nfont@austin.ibm.com>
To: Grant Likely <grant.likely@linaro.org>,
	Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: devicetree@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	linuxppc-dev <linuxppc-dev@ozlabs.org>,
	Tyrel Datwyler <tyreld@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: OF_DYNAMIC node lifecycle
Date: Tue, 24 Jun 2014 15:07:05 -0500	[thread overview]
Message-ID: <53A9DA69.1040101@austin.ibm.com> (raw)
In-Reply-To: <20140623145844.DA6A3C40AE5@trevor.secretlab.ca>

On 06/23/2014 09:58 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko.com> wrote:
>> Hi Grant,
>>
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>>
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>
>> In hindsight it appears that drivers just can't get the lifecycle right.
>> So we need to simplify things.
>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>>
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> Since we're going about changing things, how about that devtree_lock?
> 
> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> modifiers would need to grab a mutex to ensure there is only one writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
> 
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
> 
>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>>
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of real-time
>> performance.
>>
>> So the question is, do we really have code that access the live tree
>> during atomic sections?  Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
> 
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan might
> know better. But, if I'm right, the whole problem goes away with RCU.

I went back through the cpu hotplug code. we do update the DT during cpu
hotplug but I don't see it happening during atomic sections.

The code is in arch/powerpc/platforms/pseries/dlpar.c

-Nathan

> 
> The design with RCU is to switch struct device_node and struct property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals to
> be non-atomic (there will be a brief period where the node can be found
> on one list, but not the other) because it is a transient state already
> accounted for in rcu read-side critical region.
> 
> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
> 
> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still require
> the of_node_get/put stuff (basically, so existing code continue to works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
> 
> g.
> 

  parent reply	other threads:[~2014-06-24 20:07 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-18 20:07 OF_DYNAMIC node lifecycle Grant Likely
2014-06-18 20:07 ` Grant Likely
     [not found] ` <CACxGe6tsXuLZT=h8S0yRPRPy6Hqz1xkX8G+ViY0cxEUuxZ1dsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-19  8:33   ` Pantelis Antoniou
2014-06-19  8:33     ` Pantelis Antoniou
     [not found]     ` <43898B58-2EA7-42B5-A17A-27F16F2618A6-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
2014-06-23 14:58       ` Grant Likely
2014-06-23 14:58         ` Grant Likely
     [not found]         ` <20140623145844.DA6A3C40AE5-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-23 15:26           ` Pantelis Antoniou
2014-06-23 15:26             ` Pantelis Antoniou
     [not found]             ` <5213060A-74FB-4CD6-BF1C-4B7DCA98BE51-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
2014-06-23 20:21               ` Grant Likely
2014-06-23 20:21                 ` Grant Likely
2014-06-24 20:07           ` Nathan Fontenot [this message]
2014-06-24 20:07             ` Nathan Fontenot
     [not found]             ` <53A9DA69.1040101-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-25 20:22               ` Grant Likely
2014-06-25 20:22                 ` Grant Likely
     [not found]                 ` <20140625202216.16A8AC40AE6-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-26 19:59                   ` Nathan Fontenot
2014-06-26 19:59                     ` Nathan Fontenot
     [not found]                     ` <53AC7BA3.5030909-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-27 12:32                       ` Grant Likely
2014-06-27 12:32                         ` Grant Likely
     [not found]                         ` <20140627123251.D0857C40859-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-27 12:40                           ` Pantelis Antoniou
2014-06-27 12:40                             ` Pantelis Antoniou
     [not found]                             ` <94595B4D-1A58-427C-B9CE-C139048FEDCD-OWPKS81ov/FWk0Htik3J/w@public.gmane.org>
2014-06-27 14:41                               ` Nathan Fontenot
2014-06-27 14:41                                 ` Nathan Fontenot
2014-06-19 15:26   ` Nathan Fontenot
2014-06-19 15:26     ` Nathan Fontenot
     [not found]     ` <53A30117.3010100-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-23 14:48       ` Grant Likely
2014-06-23 14:48         ` Grant Likely
     [not found]         ` <20140623144806.1348EC40A60-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-24 20:10           ` Nathan Fontenot
2014-06-24 20:10             ` Nathan Fontenot
     [not found]             ` <53A9DB4F.9060708-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-25 20:24               ` Grant Likely
2014-06-25 20:24                 ` Grant Likely
     [not found]                 ` <20140625202446.77687C40AE6-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-26 20:01                   ` Nathan Fontenot
2014-06-26 20:01                     ` Nathan Fontenot
     [not found]                     ` <53AC7C2D.3040604-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-06-27 12:41                       ` Grant Likely
2014-06-27 12:41                         ` Grant Likely
     [not found]                         ` <20140627124101.367F7C40E5E-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2014-06-27 14:41                           ` Nathan Fontenot
2014-06-27 14:41                             ` Nathan Fontenot
     [not found]                             ` <53AD8296.6040702-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-07-16  5:33                               ` Grant Likely
2014-07-16  5:33                                 ` Grant Likely
     [not found]                                 ` <CACxGe6u5bVwYZjux9F2xzZSxZOSe37DGemCgcYtmiTP-xenQfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-16 18:30                                   ` Tyrel Datwyler
2014-07-16 18:30                                     ` Tyrel Datwyler
     [not found]                                     ` <53C6C4BE.6010301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-07-16 20:57                                       ` Grant Likely
2014-07-16 20:57                                         ` Grant Likely
     [not found]                                         ` <CACxGe6v8BgEcTRB-ftPVkR6Tqs3GPw_0fVuwFgff_VqawoocGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-16 22:26                                           ` Grant Likely
2014-07-16 22:26                                             ` Grant Likely
     [not found]                                             ` <CACxGe6scb291V9rjz2P48FxCEtXEEtOkgiQms=nZ_wdOf_cqHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-16 23:12                                               ` Nathan Fontenot
2014-07-16 23:12                                                 ` Nathan Fontenot
     [not found]                                                 ` <53C706D2.7080207-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>
2014-07-17  0:44                                                   ` Grant Likely
2014-07-17  0:44                                                     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A9DA69.1040101@austin.ibm.com \
    --to=nfont-v7bbcbafuwjmbyb6qlfgeg@public.gmane.org \
    --cc=benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org \
    --cc=devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=linuxppc-dev-mnsaURCQ41sdnm+yROfE0A@public.gmane.org \
    --cc=pantelis.antoniou-OWPKS81ov/FWk0Htik3J/w@public.gmane.org \
    --cc=rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=tyreld-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.