All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
@ 2018-09-10 15:42 Sebastian Andrzej Siewior
  2018-09-11 19:19 ` Rob Herring
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-09-10 15:42 UTC (permalink / raw)
  To: devicetree; +Cc: Rob Herring, Frank Rowand, tglx

The phandle cache code allocates memory while holding devtree_lock which
is a raw_spinlock_t. Memory allocation (and free()) is not possible on
RT while a raw_spinlock_t is held.
Invoke the kfree() and kcalloc() while the lock is dropped.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

The devtree_lock lock is raw_spin_lock_t and as of today there are a few
users which invoke a DT call via a smp_function_call and need it that
way.
If this is not acceptable, is there a reason not to use RCU lookups?
Since every lookup requires to hold devtree_lock it makes parallel
lookups not possible (not sure if it needed / happens, maybe only during
boot).
While looking through the code when the lock is held, I noticed that
of_find_last_cache_level() is using a node after dropping a reference to
it. With RCU, some modifications of the node would require making a copy
of the node and then replacing the node.

 drivers/of/base.c |   22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -108,43 +108,49 @@ void of_populate_phandle_cache(void)
 	u32 cache_entries;
 	struct device_node *np;
 	u32 phandles = 0;
+	struct device_node **shadow;
 
 	raw_spin_lock_irqsave(&devtree_lock, flags);
-
-	kfree(phandle_cache);
+	shadow = phandle_cache;
 	phandle_cache = NULL;
 
 	for_each_of_allnodes(np)
 		if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
 			phandles++;
 
+	raw_spin_unlock_irqrestore(&devtree_lock, flags);
+
 	cache_entries = roundup_pow_of_two(phandles);
 	phandle_cache_mask = cache_entries - 1;
 
-	phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
-				GFP_ATOMIC);
-	if (!phandle_cache)
-		goto out;
+	kfree(shadow);
+	shadow = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_KERNEL);
+
+	if (!shadow)
+		return;
+	raw_spin_lock_irqsave(&devtree_lock, flags);
+	phandle_cache = shadow;
 
 	for_each_of_allnodes(np)
 		if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
 			phandle_cache[np->phandle & phandle_cache_mask] = np;
 
-out:
 	raw_spin_unlock_irqrestore(&devtree_lock, flags);
 }
 
 int of_free_phandle_cache(void)
 {
 	unsigned long flags;
+	struct device_node **shadow;
 
 	raw_spin_lock_irqsave(&devtree_lock, flags);
 
-	kfree(phandle_cache);
+	shadow = phandle_cache;
 	phandle_cache = NULL;
 
 	raw_spin_unlock_irqrestore(&devtree_lock, flags);
 
+	kfree(shadow);
 	return 0;
 }
 #if !defined(CONFIG_MODULES)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
  2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
@ 2018-09-11 19:19 ` Rob Herring
  2018-09-12  4:54 ` Frank Rowand
  2018-09-13  1:01 ` Frank Rowand
  2 siblings, 0 replies; 5+ messages in thread
From: Rob Herring @ 2018-09-11 19:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: devicetree, Frank Rowand, Thomas Gleixner

On Mon, Sep 10, 2018 at 10:42 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The phandle cache code allocates memory while holding devtree_lock which
> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> RT while a raw_spinlock_t is held.
> Invoke the kfree() and kcalloc() while the lock is dropped.
>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>
> The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> users which invoke a DT call via a smp_function_call and need it that
> way.
> If this is not acceptable, is there a reason not to use RCU lookups?

No, in fact, that's in the TODO in Documentation/devicetree/todo.txt.
It isn't really maintained though I don't think anything in the list
has gotten done (it was kind of Grant's list).

> Since every lookup requires to hold devtree_lock it makes parallel
> lookups not possible (not sure if it needed / happens, maybe only during
> boot).

Yeah, it's not really a hot path and mostly at boot. That's part of
the reason why converting from a rwlock wasn't an issue.

> While looking through the code when the lock is held, I noticed that
> of_find_last_cache_level() is using a node after dropping a reference to
> it. With RCU, some modifications of the node would require making a copy
> of the node and then replacing the node.

Needing to understand details like this is why it hasn't been done
yet. :) What modifications need a copy?

>  drivers/of/base.c |   22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)

If we go this route, there's a fix just today[1] that it will need to
be rebased onto.

Rob

[1] http://patchwork.ozlabs.org/patch/968583/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
  2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
  2018-09-11 19:19 ` Rob Herring
@ 2018-09-12  4:54 ` Frank Rowand
  2018-09-13  1:01 ` Frank Rowand
  2 siblings, 0 replies; 5+ messages in thread
From: Frank Rowand @ 2018-09-12  4:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, devicetree; +Cc: Rob Herring, tglx

Hi Sebastian,

On 09/10/18 08:42, Sebastian Andrzej Siewior wrote:
> The phandle cache code allocates memory while holding devtree_lock which
> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> RT while a raw_spinlock_t is held.
> Invoke the kfree() and kcalloc() while the lock is dropped.
> 
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I'm on vacation.  It will be a couple of days before I reply.

In any case, I don't want to do a rushed conversion to a different
locking strategy.  Locking for devicetree in general needs to be
revisited and potentially has an impact across a wide range of
usage.

-Frank

> ---
> 
> The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> users which invoke a DT call via a smp_function_call and need it that
> way.
> If this is not acceptable, is there a reason not to use RCU lookups?
> Since every lookup requires to hold devtree_lock it makes parallel
> lookups not possible (not sure if it needed / happens, maybe only during
> boot).
> While looking through the code when the lock is held, I noticed that
> of_find_last_cache_level() is using a node after dropping a reference to
> it. With RCU, some modifications of the node would require making a copy
> of the node and then replacing the node.
> 
>  drivers/of/base.c |   22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -108,43 +108,49 @@ void of_populate_phandle_cache(void)
>  	u32 cache_entries;
>  	struct device_node *np;
>  	u32 phandles = 0;
> +	struct device_node **shadow;
>  
>  	raw_spin_lock_irqsave(&devtree_lock, flags);
> -
> -	kfree(phandle_cache);
> +	shadow = phandle_cache;
>  	phandle_cache = NULL;
>  
>  	for_each_of_allnodes(np)
>  		if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
>  			phandles++;
>  
> +	raw_spin_unlock_irqrestore(&devtree_lock, flags);
> +
>  	cache_entries = roundup_pow_of_two(phandles);
>  	phandle_cache_mask = cache_entries - 1;
>  
> -	phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
> -				GFP_ATOMIC);
> -	if (!phandle_cache)
> -		goto out;
> +	kfree(shadow);
> +	shadow = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_KERNEL);
> +
> +	if (!shadow)
> +		return;
> +	raw_spin_lock_irqsave(&devtree_lock, flags);
> +	phandle_cache = shadow;
>  
>  	for_each_of_allnodes(np)
>  		if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
>  			phandle_cache[np->phandle & phandle_cache_mask] = np;
>  
> -out:
>  	raw_spin_unlock_irqrestore(&devtree_lock, flags);
>  }
>  
>  int of_free_phandle_cache(void)
>  {
>  	unsigned long flags;
> +	struct device_node **shadow;
>  
>  	raw_spin_lock_irqsave(&devtree_lock, flags);
>  
> -	kfree(phandle_cache);
> +	shadow = phandle_cache;
>  	phandle_cache = NULL;
>  
>  	raw_spin_unlock_irqrestore(&devtree_lock, flags);
>  
> +	kfree(shadow);
>  	return 0;
>  }
>  #if !defined(CONFIG_MODULES)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
  2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
  2018-09-11 19:19 ` Rob Herring
  2018-09-12  4:54 ` Frank Rowand
@ 2018-09-13  1:01 ` Frank Rowand
  2018-09-13  8:51   ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 5+ messages in thread
From: Frank Rowand @ 2018-09-13  1:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, devicetree; +Cc: Rob Herring, tglx

Hi Sebastion,

On 09/10/18 08:42, Sebastian Andrzej Siewior wrote:
> The phandle cache code allocates memory while holding devtree_lock which
> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> RT while a raw_spinlock_t is held.

I am assuming that by "RT" you mean "PREEMPT_RT".  Can you point me to the
current documentation about memory allocation and freeing with PREEMPT_RT
enabled so that I can update my understanding?  Especially why GFP_ATOMIC
is not sufficient for allocation.


> Invoke the kfree() and kcalloc() while the lock is dropped.
> 
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> 
> The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> users which invoke a DT call via a smp_function_call and need it that
> way.

Can you please point me to these smp_call_function instances?


> If this is not acceptable, is there a reason not to use RCU lookups?
> Since every lookup requires to hold devtree_lock it makes parallel
> lookups not possible (not sure if it needed / happens, maybe only during
> boot).

Using locks on phandle lookup should not be a performance issue.  If someone
finds it to be an issue they should report it.

-Frank


> While looking through the code when the lock is held, I noticed that
> of_find_last_cache_level() is using a node after dropping a reference to
> it. With RCU, some modifications of the node would require making a copy
> of the node and then replacing the node.
> 
>  drivers/of/base.c |   22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -108,43 +108,49 @@ void of_populate_phandle_cache(void)
>  	u32 cache_entries;
>  	struct device_node *np;
>  	u32 phandles = 0;
> +	struct device_node **shadow;
>  
>  	raw_spin_lock_irqsave(&devtree_lock, flags);
> -
> -	kfree(phandle_cache);
> +	shadow = phandle_cache;
>  	phandle_cache = NULL;
>  
>  	for_each_of_allnodes(np)
>  		if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
>  			phandles++;
>  
> +	raw_spin_unlock_irqrestore(&devtree_lock, flags);
> +
>  	cache_entries = roundup_pow_of_two(phandles);
>  	phandle_cache_mask = cache_entries - 1;
>  
> -	phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
> -				GFP_ATOMIC);
> -	if (!phandle_cache)
> -		goto out;
> +	kfree(shadow);
> +	shadow = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_KERNEL);
> +
> +	if (!shadow)
> +		return;
> +	raw_spin_lock_irqsave(&devtree_lock, flags);
> +	phandle_cache = shadow;
>  
>  	for_each_of_allnodes(np)
>  		if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
>  			phandle_cache[np->phandle & phandle_cache_mask] = np;
>  
> -out:
>  	raw_spin_unlock_irqrestore(&devtree_lock, flags);
>  }
>  
>  int of_free_phandle_cache(void)
>  {
>  	unsigned long flags;
> +	struct device_node **shadow;
>  
>  	raw_spin_lock_irqsave(&devtree_lock, flags);
>  
> -	kfree(phandle_cache);
> +	shadow = phandle_cache;
>  	phandle_cache = NULL;
>  
>  	raw_spin_unlock_irqrestore(&devtree_lock, flags);
>  
> +	kfree(shadow);
>  	return 0;
>  }
>  #if !defined(CONFIG_MODULES)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock
  2018-09-13  1:01 ` Frank Rowand
@ 2018-09-13  8:51   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-09-13  8:51 UTC (permalink / raw)
  To: Frank Rowand; +Cc: devicetree, Rob Herring, tglx

On 2018-09-12 18:01:49 [-0700], Frank Rowand wrote:
> Hi Sebastion,
Hi Frank,

> On 09/10/18 08:42, Sebastian Andrzej Siewior wrote:
> > The phandle cache code allocates memory while holding devtree_lock which
> > is a raw_spinlock_t. Memory allocation (and free()) is not possible on
> > RT while a raw_spinlock_t is held.
> 
> I am assuming that by "RT" you mean "PREEMPT_RT".
correct.

> Can you point me to the
> current documentation about memory allocation and freeing with PREEMPT_RT
> enabled so that I can update my understanding?  Especially why GFP_ATOMIC
> is not sufficient for allocation.

I don't think that there is any documentation. The raw_* spinlock
actually do disable interrupts and preemption and the memory allocation
"might" require the use of sleeping locks which does not work.
And yes, GFP_ATOMIC makes no difference (there are no memory allocations
with disabled interrupts on RT, spin_lock_irqsave() is a sleeping lock
which does not disable interrupts).

> > Invoke the kfree() and kcalloc() while the lock is dropped.
> > 
> > Cc: Rob Herring <robh+dt@kernel.org>
> > Cc: Frank Rowand <frowand.list@gmail.com>
> > Cc: devicetree@vger.kernel.org
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> > 
> > The devtree_lock lock is raw_spin_lock_t and as of today there are a few
> > users which invoke a DT call via a smp_function_call and need it that
> > way.
> 
> Can you please point me to these smp_call_function instances?

sure, that one pops up on a arm64 box:

| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974
| in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: cpuhp/0
| Preemption disabled at:
| [<ffff000008163580>] smp_call_function_single+0x58/0x1e0
| CPU: 0 PID: 18 Comm: cpuhp/0 Not tainted 4.18.7-rt5+ #8
| Hardware name: AMD Seattle/Seattle, BIOS 10:18:39 Dec  8 2016
| Call trace:
|  dump_backtrace+0x0/0x188
|  show_stack+0x14/0x20
|  dump_stack+0x8c/0xac
|  ___might_sleep+0x128/0x158
|  rt_spin_lock+0x38/0x78

That spin_lock() is sleeping and used in irq-off section.

|  of_find_property+0x2c/0x60
|  of_phandle_iterator_init+0x58/0xb8
|  __of_parse_phandle_with_args+0x48/0x108
|  of_find_next_cache_node+0x40/0xc8
|  of_find_last_cache_level+0x58/0xb0
|  _init_cache_level+0x84/0xc8
|  generic_exec_single+0xec/0x138
|  smp_call_function_single+0xa4/0x1e0

The smp call I was talking about.

|  init_cache_level+0x34/0x58
|  cacheinfo_cpu_online+0x18/0x6a0
|  cpuhp_invoke_callback+0xa0/0x810
|  cpuhp_thread_fun+0x108/0x190
|  smpboot_thread_fn+0x1b4/0x2e8
|  kthread+0x134/0x138
|  ret_from_fork+0x10/0x18
| cacheinfo: Unable to detect cache hierarchy for CPU 0

I *think* powerpc (32 and/or 64) has more of devtree_lock usage in
irq-off sections on RT during setup/boot and that is why the lock has
been made raw in the first place. I don't know if this is still the
case.

> > If this is not acceptable, is there a reason not to use RCU lookups?
> > Since every lookup requires to hold devtree_lock it makes parallel
> > lookups not possible (not sure if it needed / happens, maybe only during
> > boot).
> 
> Using locks on phandle lookup should not be a performance issue.  If someone
> finds it to be an issue they should report it.

Don't get me wrong. Two lookups of an OF node would require to have
devtree_lock held during that time so both lookups can't happen in
parallel. This has nothing to do with phandle hunk here; I am just
pointing it out.
I dropped the lock here during the memory allocation because it doesn't
work on RT. If that is acceptable upstream then I would consider this
closed.
If not then I would suggest to use RCU on the reader side for lookups
and the writer side could use devtree_lock as a simple spin_lock.
Besides working on RT it would allow parallel lookups which might in
turn improve performance. So better performance (if at all) would be a
side effect :)

> -Frank

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-13  8:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 15:42 [RFC PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
2018-09-11 19:19 ` Rob Herring
2018-09-12  4:54 ` Frank Rowand
2018-09-13  1:01 ` Frank Rowand
2018-09-13  8:51   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.