* [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
@ 2018-09-10 15:42 Sebastian Andrzej Siewior
2018-09-11 19:19 ` Rob Herring
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-09-10 15:42 UTC (permalink / raw)
To: devicetree; +Cc: Rob Herring, Frank Rowand, tglx
The phandle cache code allocates memory while holding devtree_lock which
is a raw_spinlock_t. Memory allocation (and free()) is not possible on
RT while a raw_spinlock_t is held.
Invoke the kfree() and kcalloc() while the lock is dropped.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
The devtree_lock lock is raw_spin_lock_t and as of today there are a few
users which invoke a DT call via a smp_function_call and need it that
way.
If this is not acceptable, is there a reason not to use RCU lookups?
Since every lookup requires to hold devtree_lock it makes parallel
lookups not possible (not sure if it needed / happens, maybe only during
boot).
While looking through the code when the lock is held, I noticed that
of_find_last_cache_level() is using a node after dropping a reference to
it. With RCU, some modifications of the node would require making a copy
of the node and then replacing the node.
drivers/of/base.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -108,43 +108,49 @@ void of_populate_phandle_cache(void)
u32 cache_entries;
struct device_node *np;
u32 phandles = 0;
+ struct device_node **shadow;
raw_spin_lock_irqsave(&devtree_lock, flags);
-
- kfree(phandle_cache);
+ shadow = phandle_cache;
phandle_cache = NULL;
for_each_of_allnodes(np)
if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
phandles++;
+ raw_spin_unlock_irqrestore(&devtree_lock, flags);
+
cache_entries = roundup_pow_of_two(phandles);
phandle_cache_mask = cache_entries - 1;
- phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
- GFP_ATOMIC);
- if (!phandle_cache)
- goto out;
+ kfree(shadow);
+ shadow = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_KERNEL);
+
+ if (!shadow)
+ return;
+ raw_spin_lock_irqsave(&devtree_lock, flags);
+ phandle_cache = shadow;
for_each_of_allnodes(np)
if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
phandle_cache[np->phandle & phandle_cache_mask] = np;
-out:
raw_spin_unlock_irqrestore(&devtree_lock, flags);
}
int of_free_phandle_cache(void)
{
unsigned long flags;
+ struct device_node **shadow;
raw_spin_lock_irqsave(&devtree_lock, flags);
- kfree(phandle_cache);
+ shadow = phandle_cache;
phandle_cache = NULL;
raw_spin_unlock_irqrestore(&devtree_lock, flags);
+ kfree(shadow);
return 0;
}
#if !defined(CONFIG_MODULES)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
@ 2018-09-11 19:19 ` Rob Herring
2018-09-12 4:54 ` Frank Rowand
2018-09-13 1:01 ` Frank Rowand
2 siblings, 0 replies; 5+ messages in thread
From: Rob Herring @ 2018-09-11 19:19 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: devicetree, Frank Rowand, Thomas Gleixner
On Mon, Sep 10, 2018 at 10:42 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The phandle cache code allocates memory while holding devtree_lock which
> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> RT while a raw_spinlock_t is held.
> Invoke the kfree() and kcalloc() while the lock is dropped.
>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>
> The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> users which invoke a DT call via a smp_function_call and need it that
> way.
> If this is not acceptable, is there a reason not to use RCU lookups?
No, in fact, that's in the TODO in Documentation/devicetree/todo.txt.
It isn't really maintained though I don't think anything in the list
has gotten done (it was kind of Grant's list).
> Since every lookup requires to hold devtree_lock it makes parallel
> lookups not possible (not sure if it needed / happens, maybe only during
> boot).
Yeah, it's not really a hot path and mostly at boot. That's part of
the reason why converting from a rwlock wasn't an issue.
> While looking through the code when the lock is held, I noticed that
> of_find_last_cache_level() is using a node after dropping a reference to
> it. With RCU, some modifications of the node would require making a copy
> of the node and then replacing the node.
Needing to understand details like this is why it hasn't been done
yet. :) What modifications need a copy?
> drivers/of/base.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
If we go this route, there's a fix just today[1] that it will need to
be rebased onto.
Rob
[1] http://patchwork.ozlabs.org/patch/968583/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
2018-09-11 19:19 ` Rob Herring
@ 2018-09-12 4:54 ` Frank Rowand
2018-09-13 1:01 ` Frank Rowand
2 siblings, 0 replies; 5+ messages in thread
From: Frank Rowand @ 2018-09-12 4:54 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, devicetree; +Cc: Rob Herring, tglx
Hi Sebastian,
On 09/10/18 08:42, Sebastian Andrzej Siewior wrote:
> The phandle cache code allocates memory while holding devtree_lock which
> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> RT while a raw_spinlock_t is held.
> Invoke the kfree() and kcalloc() while the lock is dropped.
>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
I'm on vacation. It will be a couple of days before I reply.
In any case, I don't want to do a rushed conversion to a different
locking strategy. Locking for devicetree in general needs to be
revisited and potentially has an impact across a wide range of
usage.
-Frank
> ---
>
> The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> users which invoke a DT call via a smp_function_call and need it that
> way.
> If this is not acceptable, is there a reason not to use RCU lookups?
> Since every lookup requires to hold devtree_lock it makes parallel
> lookups not possible (not sure if it needed / happens, maybe only during
> boot).
> While looking through the code when the lock is held, I noticed that
> of_find_last_cache_level() is using a node after dropping a reference to
> it. With RCU, some modifications of the node would require making a copy
> of the node and then replacing the node.
>
> drivers/of/base.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -108,43 +108,49 @@ void of_populate_phandle_cache(void)
> u32 cache_entries;
> struct device_node *np;
> u32 phandles = 0;
> + struct device_node **shadow;
>
> raw_spin_lock_irqsave(&devtree_lock, flags);
> -
> - kfree(phandle_cache);
> + shadow = phandle_cache;
> phandle_cache = NULL;
>
> for_each_of_allnodes(np)
> if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
> phandles++;
>
> + raw_spin_unlock_irqrestore(&devtree_lock, flags);
> +
> cache_entries = roundup_pow_of_two(phandles);
> phandle_cache_mask = cache_entries - 1;
>
> - phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
> - GFP_ATOMIC);
> - if (!phandle_cache)
> - goto out;
> + kfree(shadow);
> + shadow = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_KERNEL);
> +
> + if (!shadow)
> + return;
> + raw_spin_lock_irqsave(&devtree_lock, flags);
> + phandle_cache = shadow;
>
> for_each_of_allnodes(np)
> if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
> phandle_cache[np->phandle & phandle_cache_mask] = np;
>
> -out:
> raw_spin_unlock_irqrestore(&devtree_lock, flags);
> }
>
> int of_free_phandle_cache(void)
> {
> unsigned long flags;
> + struct device_node **shadow;
>
> raw_spin_lock_irqsave(&devtree_lock, flags);
>
> - kfree(phandle_cache);
> + shadow = phandle_cache;
> phandle_cache = NULL;
>
> raw_spin_unlock_irqrestore(&devtree_lock, flags);
>
> + kfree(shadow);
> return 0;
> }
> #if !defined(CONFIG_MODULES)
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
2018-09-11 19:19 ` Rob Herring
2018-09-12 4:54 ` Frank Rowand
@ 2018-09-13 1:01 ` Frank Rowand
2018-09-13 8:51 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 5+ messages in thread
From: Frank Rowand @ 2018-09-13 1:01 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, devicetree; +Cc: Rob Herring, tglx
Hi Sebastion,
On 09/10/18 08:42, Sebastian Andrzej Siewior wrote:
> The phandle cache code allocates memory while holding devtree_lock which
> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> RT while a raw_spinlock_t is held.
I am assuming that by "RT" you mean "PREEMPT_RT". Can you point me to the
current documentation about memory allocation and freeing with PREEMPT_RT
enabled so that I can update my understanding? Especially why GFP_ATOMIC
is not sufficient for allocation.
> Invoke the kfree() and kcalloc() while the lock is dropped.
>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>
> The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> users which invoke a DT call via a smp_function_call and need it that
> way.
Can you please point me to these smp_call_function instances?
> If this is not acceptable, is there a reason not to use RCU lookups?
> Since every lookup requires to hold devtree_lock it makes parallel
> lookups not possible (not sure if it needed / happens, maybe only during
> boot).
Using locks on phandle lookup should not be a performance issue. If someone
finds it to be an issue they should report it.
-Frank
> While looking through the code when the lock is held, I noticed that
> of_find_last_cache_level() is using a node after dropping a reference to
> it. With RCU, some modifications of the node would require making a copy
> of the node and then replacing the node.
>
> drivers/of/base.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -108,43 +108,49 @@ void of_populate_phandle_cache(void)
> u32 cache_entries;
> struct device_node *np;
> u32 phandles = 0;
> + struct device_node **shadow;
>
> raw_spin_lock_irqsave(&devtree_lock, flags);
> -
> - kfree(phandle_cache);
> + shadow = phandle_cache;
> phandle_cache = NULL;
>
> for_each_of_allnodes(np)
> if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
> phandles++;
>
> + raw_spin_unlock_irqrestore(&devtree_lock, flags);
> +
> cache_entries = roundup_pow_of_two(phandles);
> phandle_cache_mask = cache_entries - 1;
>
> - phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
> - GFP_ATOMIC);
> - if (!phandle_cache)
> - goto out;
> + kfree(shadow);
> + shadow = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_KERNEL);
> +
> + if (!shadow)
> + return;
> + raw_spin_lock_irqsave(&devtree_lock, flags);
> + phandle_cache = shadow;
>
> for_each_of_allnodes(np)
> if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
> phandle_cache[np->phandle & phandle_cache_mask] = np;
>
> -out:
> raw_spin_unlock_irqrestore(&devtree_lock, flags);
> }
>
> int of_free_phandle_cache(void)
> {
> unsigned long flags;
> + struct device_node **shadow;
>
> raw_spin_lock_irqsave(&devtree_lock, flags);
>
> - kfree(phandle_cache);
> + shadow = phandle_cache;
> phandle_cache = NULL;
>
> raw_spin_unlock_irqrestore(&devtree_lock, flags);
>
> + kfree(shadow);
> return 0;
> }
> #if !defined(CONFIG_MODULES)
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
2018-09-13 1:01 ` Frank Rowand
@ 2018-09-13 8:51 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-09-13 8:51 UTC (permalink / raw)
To: Frank Rowand; +Cc: devicetree, Rob Herring, tglx
On 2018-09-12 18:01:49 [-0700], Frank Rowand wrote:
> Hi Sebastion,
Hi Frank,
> On 09/10/18 08:42, Sebastian Andrzej Siewior wrote:
> > The phandle cache code allocates memory while holding devtree_lock which
> > is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> > RT while a raw_spinlock_t is held.
>
> I am assuming that by "RT" you mean "PREEMPT_RT".
correct.
> Can you point me to the
> current documentation about memory allocation and freeing with PREEMPT_RT
> enabled so that I can update my understanding? Especially why GFP_ATOMIC
> is not sufficient for allocation.
I don't think that there is any documentation. The raw_* spinlock
actually do disable interrupts and preemption and the memory allocation
"might" require the use of sleeping locks which does not work.
And yes, GFP_ATOMIC makes no difference (there are no memory allocations
with disabled interrupts on RT, spin_lock_irqsave() is a sleeping lock
which does not disable interrupts).
> > Invoke the kfree() and kcalloc() while the lock is dropped.
> >
> > Cc: Rob Herring <robh+dt@kernel.org>
> > Cc: Frank Rowand <frowand.list@gmail.com>
> > Cc: devicetree@vger.kernel.org
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >
> > The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> > users which invoke a DT call via a smp_function_call and need it that
> > way.
>
> Can you please point me to these smp_call_function instances?
sure, that one pops up on a arm64 box:
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974
| in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: cpuhp/0
| Preemption disabled at:
| [<ffff000008163580>] smp_call_function_single+0x58/0x1e0
| CPU: 0 PID: 18 Comm: cpuhp/0 Not tainted 4.18.7-rt5+ #8
| Hardware name: AMD Seattle/Seattle, BIOS 10:18:39 Dec 8 2016
| Call trace:
| dump_backtrace+0x0/0x188
| show_stack+0x14/0x20
| dump_stack+0x8c/0xac
| ___might_sleep+0x128/0x158
| rt_spin_lock+0x38/0x78
That spin_lock() is sleeping and used in irq-off section.
| of_find_property+0x2c/0x60
| of_phandle_iterator_init+0x58/0xb8
| __of_parse_phandle_with_args+0x48/0x108
| of_find_next_cache_node+0x40/0xc8
| of_find_last_cache_level+0x58/0xb0
| _init_cache_level+0x84/0xc8
| generic_exec_single+0xec/0x138
| smp_call_function_single+0xa4/0x1e0
The smp call I was talking about.
| init_cache_level+0x34/0x58
| cacheinfo_cpu_online+0x18/0x6a0
| cpuhp_invoke_callback+0xa0/0x810
| cpuhp_thread_fun+0x108/0x190
| smpboot_thread_fn+0x1b4/0x2e8
| kthread+0x134/0x138
| ret_from_fork+0x10/0x18
| cacheinfo: Unable to detect cache hierarchy for CPU 0
I *think* powerpc (32 and/or 64) has more of devtree_lock usage in
irq-off sections on RT during setup/boot and that is why the lock has
been made raw in the first place. I don't know if this is still the
case.
> > If this is not acceptable, is there a reason not to use RCU lookups?
> > Since every lookup requires to hold devtree_lock it makes parallel
> > lookups not possible (not sure if it needed / happens, maybe only during
> > boot).
>
> Using locks on phandle lookup should not be a performance issue. If someone
> finds it to be an issue they should report it.
Don't get me wrong. Two lookups of an OF node would require to have
devtree_lock held during that time so both lookups can't happen in
parallel. This has nothing to do with phandle hunk here; I am just
pointing it out.
I dropped the lock here during the memory allocation because it doesn't
work on RT. If that is acceptable upstream then I would consider this
closed.
If not then I would suggest to use RCU on the reader side for lookups
and the writer side could use devtree_lock as a simple spin_lock.
Besides working on RT it would allow parallel lookups which might in
turn improve performance. So better performance (if at all) would be a
side effect :)
> -Frank
Sebastian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-13 8:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
2018-09-11 19:19 ` Rob Herring
2018-09-12 4:54 ` Frank Rowand
2018-09-13 1:01 ` Frank Rowand
2018-09-13 8:51 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.