linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/pseries: Failure on removing device node
@ 2014-08-11  9:16 Gavin Shan
  2014-08-11  9:16 ` [PATCH 2/2] powerpc/pseries: Avoid deadlock on removing ddw Gavin Shan
  2014-08-11 14:16 ` [PATCH 1/2] powerpc/pseries: Failure on removing device node Nathan Fontenot
  0 siblings, 2 replies; 4+ messages in thread
From: Gavin Shan @ 2014-08-11  9:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan, stable

While running command "drmgr -c phb -r -s 'PHB 528'", following
backtrace jumped out because the target device node isn't marked
with OF_DETACHED by of_detach_node(), which caused by error
returned from memory hotplug related reconfig notifier when
disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.

ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
CPU: 14 PID: 2252 Comm: drmgr Tainted: G        W     3.16.0+ #427
Call Trace:
[c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
[c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
[c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
[c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
[c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
[c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
[c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
[c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 7995135..24abc5c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -146,7 +146,7 @@ static inline int pseries_remove_memblock(unsigned long base,
 }
 static inline int pseries_remove_mem_node(struct device_node *np)
 {
-	return -EOPNOTSUPP;
+	return 0;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] powerpc/pseries: Avoid deadlock on removing ddw
  2014-08-11  9:16 [PATCH 1/2] powerpc/pseries: Failure on removing device node Gavin Shan
@ 2014-08-11  9:16 ` Gavin Shan
  2014-08-11 14:16 ` [PATCH 1/2] powerpc/pseries: Failure on removing device node Nathan Fontenot
  1 sibling, 0 replies; 4+ messages in thread
From: Gavin Shan @ 2014-08-11  9:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan, stable

Function remove_ddw() could be called in of_reconfig_notifier and
we potentially remove the dynamic DMA window property, which invokes
of_reconfig_notifier again. Eventually, it leads to the deadlock as
following backtrace shows.

The patch fixes the above issue by deferring releasing the dynamic
DMA window property while releasing the device node.

=============================================
[ INFO: possible recursive locking detected ]
3.16.0+ #428 Tainted: G        W
---------------------------------------------
drmgr/2273 is trying to acquire lock:
 ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \
 .__blocking_notifier_call_chain+0x40/0x78

but task is already holding lock:
 ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \
 .__blocking_notifier_call_chain+0x40/0x78

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock((of_reconfig_chain).rwsem);
  lock((of_reconfig_chain).rwsem);
 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by drmgr/2273:
 #0:  (sb_writers#4){.+.+.+}, at: [<c0000000001cbe70>] \
      .vfs_write+0xb0/0x1f8
 #1:  ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \
      .__blocking_notifier_call_chain+0x40/0x78

stack backtrace:
CPU: 17 PID: 2273 Comm: drmgr Tainted: G        W     3.16.0+ #428
Call Trace:
[c0000000137e7000] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c0000000137e70b0] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c0000000137e7130] [c0000000000b8afc] .__lock_acquire+0x128c/0x1c68
[c0000000137e7280] [c0000000000b9a4c] .lock_acquire+0xe8/0x104
[c0000000137e7350] [c00000000083588c] .down_read+0x4c/0x90
[c0000000137e73e0] [c000000000091890] .__blocking_notifier_call_chain+0x40/0x78
[c0000000137e7490] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[c0000000137e7520] [c000000000682a28] .of_reconfig_notify+0x34/0x5c
[c0000000137e75b0] [c000000000682a9c] .of_property_notify+0x4c/0x54
[c0000000137e7650] [c000000000682bf0] .of_remove_property+0x30/0xd4
[c0000000137e76f0] [c000000000052a44] .remove_ddw+0x144/0x168
[c0000000137e7790] [c000000000053204] .iommu_reconfig_notifier+0x30/0xe0
[c0000000137e7820] [c00000000009137c] .notifier_call_chain+0x6c/0xb4
[c0000000137e78c0] [c0000000000918ac] .__blocking_notifier_call_chain+0x5c/0x78
[c0000000137e7970] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[c0000000137e7a00] [c000000000682a28] .of_reconfig_notify+0x34/0x5c
[c0000000137e7a90] [c000000000682e14] .of_detach_node+0x44/0x1fc
[c0000000137e7b40] [c0000000000518e4] .ofdt_write+0x3ac/0x688
[c0000000137e7c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c0000000137e7cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c0000000137e7d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c0000000137e7e30] [c00000000000a064] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 33b552f..4642d6a 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -721,13 +721,13 @@ static int __init disable_ddw_setup(char *str)
 
 early_param("disable_ddw", disable_ddw_setup);
 
-static void remove_ddw(struct device_node *np)
+static void remove_ddw(struct device_node *np, bool remove_prop)
 {
 	struct dynamic_dma_window_prop *dwp;
 	struct property *win64;
 	const u32 *ddw_avail;
 	u64 liobn;
-	int len, ret;
+	int len, ret = 0;
 
 	ddw_avail = of_get_property(np, "ibm,ddw-applicable", &len);
 	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
@@ -761,7 +761,8 @@ static void remove_ddw(struct device_node *np)
 			np->full_name, ret, ddw_avail[2], liobn);
 
 delprop:
-	ret = of_remove_property(np, win64);
+	if (remove_prop)
+		ret = of_remove_property(np, win64);
 	if (ret)
 		pr_warning("%s: failed to remove direct window property: %d\n",
 			np->full_name, ret);
@@ -805,7 +806,7 @@ static int find_existing_ddw_windows(void)
 		window = kzalloc(sizeof(*window), GFP_KERNEL);
 		if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
 			kfree(window);
-			remove_ddw(pdn);
+			remove_ddw(pdn, true);
 			continue;
 		}
 
@@ -1045,7 +1046,7 @@ out_free_window:
 	kfree(window);
 
 out_clear_window:
-	remove_ddw(pdn);
+	remove_ddw(pdn, true);
 
 out_free_prop:
 	kfree(win64->name);
@@ -1255,7 +1256,14 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 
 	switch (action) {
 	case OF_RECONFIG_DETACH_NODE:
-		remove_ddw(np);
+		/*
+		 * Removing the property will invoke the reconfig
+		 * notifier again, which causes dead-lock on the
+		 * read-write semaphore of the notifier chain. So
+		 * we have to remove the property when releasing
+		 * the device node.
+		 */
+		remove_ddw(np, false);
 		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] powerpc/pseries: Failure on removing device node
  2014-08-11  9:16 [PATCH 1/2] powerpc/pseries: Failure on removing device node Gavin Shan
  2014-08-11  9:16 ` [PATCH 2/2] powerpc/pseries: Avoid deadlock on removing ddw Gavin Shan
@ 2014-08-11 14:16 ` Nathan Fontenot
  2014-08-12  1:47   ` Gavin Shan
  1 sibling, 1 reply; 4+ messages in thread
From: Nathan Fontenot @ 2014-08-11 14:16 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev; +Cc: stable

On 08/11/2014 04:16 AM, Gavin Shan wrote:
> While running command "drmgr -c phb -r -s 'PHB 528'", following
> backtrace jumped out because the target device node isn't marked
> with OF_DETACHED by of_detach_node(), which caused by error
> returned from memory hotplug related reconfig notifier when
> disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.
> 

Could you provide some more context here.

Your comment claims that you hit an error while trying to remove a PHB,
but the fix you provided is for memory hotplug. This changes the
rturn code to zero which usually inidcates success except that
your comment states you disabled memory hotplug remove.

I think the fix we need to have here is to update the version of
pseries_remove_mem_node() when CONFIG_MEMORY_HOTREMOVE is disabled
to validate that the node is a memory node and return the proper value
instead of just returning -EOPNOTSUPP in all cases. 

The pseries_remove_mem_node() routine when memory removed is enabled
already does this.
 
-Nathan

> ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
> CPU: 14 PID: 2252 Comm: drmgr Tainted: G        W     3.16.0+ #427
> Call Trace:
> [c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
> [c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
> [c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
> [c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
> [c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
> [c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
> [c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
> [c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
> [c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
> [c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
> [c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
> [c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
> [c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 7995135..24abc5c 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -146,7 +146,7 @@ static inline int pseries_remove_memblock(unsigned long base,
>  }
>  static inline int pseries_remove_mem_node(struct device_node *np)
>  {
> -	return -EOPNOTSUPP;
> +	return 0;
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] powerpc/pseries: Failure on removing device node
  2014-08-11 14:16 ` [PATCH 1/2] powerpc/pseries: Failure on removing device node Nathan Fontenot
@ 2014-08-12  1:47   ` Gavin Shan
  0 siblings, 0 replies; 4+ messages in thread
From: Gavin Shan @ 2014-08-12  1:47 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev, Gavin Shan

On Mon, Aug 11, 2014 at 09:16:01AM -0500, Nathan Fontenot wrote:
>On 08/11/2014 04:16 AM, Gavin Shan wrote:

[Removing stable from cc list to avoid mail flooding]

>> While running command "drmgr -c phb -r -s 'PHB 528'", following
>> backtrace jumped out because the target device node isn't marked
>> with OF_DETACHED by of_detach_node(), which caused by error
>> returned from memory hotplug related reconfig notifier when
>> disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.
>> 
>
>Could you provide some more context here.
>
>Your comment claims that you hit an error while trying to remove a PHB,
>but the fix you provided is for memory hotplug. This changes the
>rturn code to zero which usually inidcates success except that
>your comment states you disabled memory hotplug remove.
>

Yep, here's more information about it: The notification callbacks
are called in sequence as they are regsitered. For of_reconfig_notifier,
following callbacks would be called in sequence:

arch/powerpc/platforms/pseries/setup.c::pci_dn_reconfig_notifier()
                               iommu.c::iommu_reconfig_notifier()
                               hotplug-cpu.c::pseries_smp_notifier()
                               hotplug-memory.c::pseries_memory_notifier()

Writing "remove_node xxxx" to /proc/powerpc/ofdt will invoke of_detach_node()
which bails early without marking OF_DETACHED for the device node if any
error returned from of_reconfig_notifier callbacks. The error was contributed
by hotplug-memory.c::pseries_memory_notifier() with disabled CONFIG_MEMORY_HOTREMOVE.

int of_detach_node(struct device_node *np)
{
        struct device_node *parent;
        unsigned long flags;
        int rc = 0;

        rc = of_reconfig_notify(OF_RECONFIG_DETACH_NODE, np);
        if (rc)
                return rc;
        :
        :
        of_node_set_flag(np, OF_DETACHED);          <<< It's missed.
        raw_spin_unlock_irqrestore(&devtree_lock, flags);
        :
}

When releasing the device node, we run into warning as the device
node wasn't marked with flag OF_DETACHED.

static void of_node_release(struct kobject *kobj)
{
        struct device_node *node = kobj_to_device_node(kobj);
        struct property *prop = node->properties;

        /* We should never be releasing nodes that haven't been detached. */
        if (!of_node_check_flag(node, OF_DETACHED)) {
                pr_err("ERROR: Bad of_node_put() on %s\n", node->full_name);
                dump_stack();
                return;
        }
        :
        :
}

>I think the fix we need to have here is to update the version of
>pseries_remove_mem_node() when CONFIG_MEMORY_HOTREMOVE is disabled
>to validate that the node is a memory node and return the proper value
>instead of just returning -EOPNOTSUPP in all cases. 
>

I guess you suggested to add following piece of code in pseries_remove_mem_node()
when CONFIG_MEMORY_HOTREMOVE is disabled? If so, we can't avoid the issue and it's
not helping anything. I think returning 0 might be enough here.

static inline int pseries_remove_mem_node(struct device_node *np)
{
        type = of_get_property(np, "device_type", NULL);
        if (type == NULL || strcmp(type, "memory") != 0)
                return 0;

        return -EOPNOTSUPP
}

>The pseries_remove_mem_node() routine when memory removed is enabled
>already does this.
>

Yes, we don't have problem for this case because PHB or PCI adapter
device nodes occasionally have "reg" property. Otherwise, it also
fails.

Thanks,
Gavin

>-Nathan
>
>> ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
>> CPU: 14 PID: 2252 Comm: drmgr Tainted: G        W     3.16.0+ #427
>> Call Trace:
>> [c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
>> [c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
>> [c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
>> [c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
>> [c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
>> [c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
>> [c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
>> [c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
>> [c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
>> [c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
>> [c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
>> [c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
>> [c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98
>> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 7995135..24abc5c 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -146,7 +146,7 @@ static inline int pseries_remove_memblock(unsigned long base,
>>  }
>>  static inline int pseries_remove_mem_node(struct device_node *np)
>>  {
>> -	return -EOPNOTSUPP;
>> +	return 0;
>>  }
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>  
>> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-12  1:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-11  9:16 [PATCH 1/2] powerpc/pseries: Failure on removing device node Gavin Shan
2014-08-11  9:16 ` [PATCH 2/2] powerpc/pseries: Avoid deadlock on removing ddw Gavin Shan
2014-08-11 14:16 ` [PATCH 1/2] powerpc/pseries: Failure on removing device node Nathan Fontenot
2014-08-12  1:47   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).