linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/mobility: Fix node detach/rename problem
@ 2018-07-29 13:11 Michael Bringmann
  2018-07-29 16:31 ` kbuild test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Michael Bringmann @ 2018-07-29 13:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon

During LPAR migration, the content of the device tree/sysfs may
be updated including deletion and replacement of nodes in the
tree.  When nodes are added to the internal node structures, they
are appended in FIFO order to a list of nodes maintained by the
OF code APIs.  When nodes are removed from the device tree, they
are marked OF_DETACHED, but not actually deleted from the system
to allow for pointers cached elsewhere in the kernel.  The order
and content of the entries in the list of nodes is not altered,
though.

During LPAR migration some common nodes are deleted and re-added
e.g. "ibm,platform-facilities".  If a node is re-added to the OF
node lists, the of_attach_node function checks to make sure that
the name + ibm,phandle of the to-be-added data is unique.  As the
previous copy of a re-added node is not modified beyond the addition
of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
notice to the console, (3) renames the to-be-added node to avoid
filename collisions within a directory, and (3) adds entries to
the sysfs/kernfs.

This patch fixes the 'migration add' problem by changing the
stored 'phandle' of the OF_DETACHed node to 0 (reserved value for
of_find_node_by_phandle), so that subsequent re-add operations,
such as those during migration, do not find the detached node,
do not observe duplicate names, do not rename them,  and the
extra WARNING notices are removed from the console output.

In addition, it erases the 'name' field of the OF_DETACHED node,
to prevent any future calls to of_find_node_by_name() or
of_find_node_by_path() from matching this node.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 2de0f0d..9d82c28 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn)
 	if (rc)
 		return rc;
 
+	dn->phandle = 0;
+	memset(dn->name, 0, strlen(dn->name));
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann
@ 2018-07-29 16:31 ` kbuild test robot
  2018-07-30  6:31 ` Michael Ellerman
  2018-07-31  0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler
  2 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2018-07-29 16:31 UTC (permalink / raw)
  To: Michael Bringmann
  Cc: kbuild-all, linuxppc-dev, Nathan Fontenot, Michael Bringmann,
	Thomas Falcon, Tyrel Datwyler, John Allen

[-- Attachment #1: Type: text/plain, Size: 2459 bytes --]

Hi Michael,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18-rc6 next-20180727]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-mobility-Fix-node-detach-rename-problem/20180729-213517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_detach_node':
>> arch/powerpc/platforms/pseries/dlpar.c:276:9: error: passing argument 1 of 'memset' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
     memset(dn->name, 0, strlen(dn->name));
            ^~
   In file included from include/linux/string.h:20:0,
                    from arch/powerpc/include/asm/paca.h:19,
                    from arch/powerpc/include/asm/current.h:16,
                    from include/linux/mutex.h:14,
                    from include/linux/notifier.h:14,
                    from arch/powerpc/platforms/pseries/dlpar.c:16:
   arch/powerpc/include/asm/string.h:23:15: note: expected 'void *' but argument is of type 'const char *'
    extern void * memset(void *,int,__kernel_size_t);
                  ^~~~~~
   cc1: all warnings being treated as errors

vim +276 arch/powerpc/platforms/pseries/dlpar.c

   259	
   260	int dlpar_detach_node(struct device_node *dn)
   261	{
   262		struct device_node *child;
   263		int rc;
   264	
   265		child = of_get_next_child(dn, NULL);
   266		while (child) {
   267			dlpar_detach_node(child);
   268			child = of_get_next_child(dn, child);
   269		}
   270	
   271		rc = of_detach_node(dn);
   272		if (rc)
   273			return rc;
   274	
   275		dn->phandle = 0;
 > 276		memset(dn->name, 0, strlen(dn->name));
   277	
   278		return 0;
   279	}
   280	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23683 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann
  2018-07-29 16:31 ` kbuild test robot
@ 2018-07-30  6:31 ` Michael Ellerman
  2018-07-30 21:02   ` Michael Bringmann
  2018-07-31  0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler
  2 siblings, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2018-07-30  6:31 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev
  Cc: Nathan Fontenot, Michael Bringmann, Thomas Falcon,
	Tyrel Datwyler, John Allen

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:

> During LPAR migration, the content of the device tree/sysfs may
> be updated including deletion and replacement of nodes in the
> tree.  When nodes are added to the internal node structures, they
> are appended in FIFO order to a list of nodes maintained by the
> OF code APIs.

That hasn't been true for several years. The data structure is an n-ary
tree. What kernel version are you working on?

> When nodes are removed from the device tree, they
> are marked OF_DETACHED, but not actually deleted from the system
> to allow for pointers cached elsewhere in the kernel.  The order
> and content of the entries in the list of nodes is not altered,
> though.

Something is going wrong if this is actually happening.

When the node is detached it should be *detached* from the tree of all
nodes, so it should not be discoverable other than by having an existing
pointer to it.

That's what __of_detach_node() does:

	parent = np->parent;
	if (WARN_ON(!parent))
		return;

	if (parent->child == np)
		parent->child = np->sibling;
	else {
		struct device_node *prevsib;
		for (prevsib = np->parent->child;
		     prevsib->sibling != np;
		     prevsib = prevsib->sibling)
			;
		prevsib->sibling = np->sibling;
	}

ie. the node must already have a NULL parent, and then it is spliced out
of its parent's child list.


Please give us more info so we can work out what's actually happening.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-30  6:31 ` Michael Ellerman
@ 2018-07-30 21:02   ` Michael Bringmann
  2018-07-31  6:34     ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Bringmann @ 2018-07-30 21:02 UTC (permalink / raw)
  To: linuxppc-dev

See below.

On 07/30/2018 01:31 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> 
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.
> 
> That hasn't been true for several years. The data structure is an n-ary
> tree. What kernel version are you working on?

Sorry for an error in my description.  I oversimplified based on the
name of a search iterator.  Let me try to provide a better explanation
of the problem, here.

This is the problem.  The PPC mobility code receives RTAS requests to
delete nodes with platform-/hardware-specific attributes when restarting
the kernel after a migration.  My example is for migration between a
P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
or others.

The mobility.c code calls 'of_detach_node' for the nodes and their children.
This makes calls to detach the properties and to try to remove the associated
sysfs/kernfs files.

Then new copies of the same nodes are next provided by the PHYP, local
copies are built, and a pointer to the 'struct device_node' is passed to
of_attach_node.  Before the call to of_attach_node, the phandle is initialized
to 0 when the data structure is alloced.  During the call to of_attach_node,
it calls __of_attach_node which pulls the actual name and phandle from just
created sub-properties named something like 'name' and 'ibm,phandle'.

This is all fine for the first migration.  The problem occurs with the
second and subsequent migrations when the PHYP on the new system wants to
replace the same set of nodes again, referenced with the same names and
phandle values.

> 
>> When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
> 
> Something is going wrong if this is actually happening.
> 
> When the node is detached it should be *detached* from the tree of all
> nodes, so it should not be discoverable other than by having an existing
> pointer to it.
On the second and subsequent migrations, the PHYP tells the system
to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
nodes by its known set of phandle values -- the same handles used
by the PHYP on the source system are known on the target system.
The mobility.c code calls of_find_node_by_phandle() with these values
and ends up locating the first instance of each node that was added
during the original boot, instead of the second instance of each node
created after the first migration.  The detach during the second
migration fails with errors like,

[ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
[ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
[ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
[ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
[ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
[ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
[ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
[ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
[ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
[ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
[ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
[ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
[ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
[ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
[ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
[ 4565.030811] Call Trace:
[ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
[ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
[ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
[ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
[ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
[ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
[ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
[ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
[ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
[ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
[ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
[ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
[ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
[ 4565.030884] Instruction dump:
[ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
[ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
[ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---

The mobility.c code continues on during the second migration, accepts the
definitions of the new nodes from the PHYP and ends up renaming the new
properties e.g.

[ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"

I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
of_find_node_by_phandle to skip nodes that are detached, but still present
due to caching or use count considerations.  Another possibility to consider
is that of_find_node_by_phandle also uses something called 'phandle_cache'
which may have outdated data as of_detach_node() does not have access to
that cache for the 'OF_DETACHED' nodes.

>
> That's what __of_detach_node() does:
> 
> 	parent = np->parent;
> 	if (WARN_ON(!parent))
> 		return;
> 
> 	if (parent->child == np)
> 		parent->child = np->sibling;
> 	else {
> 		struct device_node *prevsib;
> 		for (prevsib = np->parent->child;
> 		     prevsib->sibling != np;
> 		     prevsib = prevsib->sibling)
> 			;
> 		prevsib->sibling = np->sibling;
> 	}
> 
> ie. the node must already have a NULL parent, and then it is spliced out
> of its parent's child list.
> 
> 
> Please give us more info so we can work out what's actually happening.

I duplicated the problem in the 4.18 raw kernel on the second and subsequent
legs of migration of an LPAR back and forth between an Alpine to a Brazos.
Again, entries like  'ibm,random-v1', 'ibm,compression-v1',
'ibm,platform-facilities', 'ibm,sym-encryption-v1', yield errors as described
above.

My patch was attempting to resolve the issue wholly within the powerpc code,
at this time.

> 
> cheers
> 

Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann
  2018-07-29 16:31 ` kbuild test robot
  2018-07-30  6:31 ` Michael Ellerman
@ 2018-07-31  0:59 ` Tyrel Datwyler
  2018-07-31  1:46   ` Tyrel Datwyler
  2018-07-31  6:42   ` Michael Ellerman
  2 siblings, 2 replies; 17+ messages in thread
From: Tyrel Datwyler @ 2018-07-31  0:59 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev
  Cc: Nathan Fontenot, Thomas Falcon, John Allen, mpe

On 07/29/2018 06:11 AM, Michael Bringmann wrote:
> During LPAR migration, the content of the device tree/sysfs may
> be updated including deletion and replacement of nodes in the
> tree.  When nodes are added to the internal node structures, they
> are appended in FIFO order to a list of nodes maintained by the
> OF code APIs.  When nodes are removed from the device tree, they
> are marked OF_DETACHED, but not actually deleted from the system
> to allow for pointers cached elsewhere in the kernel.  The order
> and content of the entries in the list of nodes is not altered,
> though.
> 
> During LPAR migration some common nodes are deleted and re-added
> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
> node lists, the of_attach_node function checks to make sure that
> the name + ibm,phandle of the to-be-added data is unique.  As the
> previous copy of a re-added node is not modified beyond the addition
> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
> notice to the console, (3) renames the to-be-added node to avoid
> filename collisions within a directory, and (3) adds entries to
> the sysfs/kernfs.

So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing.

As and example a quick look at nx-842-pseries.c reveals several issues.

static int __init nx842_pseries_init(void)
{
        struct nx842_devdata *new_devdata;
        int ret;

        if (!of_find_compatible_node(NULL, NULL, "ibm,compression"))
                return -ENODEV;

This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked.

Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added.

/**
 * nx842_OF_notifier - Process updates to OF properties for the device
 *
 * @np: notifier block
 * @action: notifier action
 * @update: struct pSeries_reconfig_prop_update pointer if action is
 *      PSERIES_UPDATE_PROPERTY
 *
 * Returns:
 *      NOTIFY_OK on success
 *      NOTIFY_BAD encoded with error number on failure, use
 *              notifier_to_errno() to decode this value
 */
static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
                             void *data)
{
        struct of_reconfig_data *upd = data;
        struct nx842_devdata *local_devdata;
        struct device_node *node = NULL;

        rcu_read_lock();
        local_devdata = rcu_dereference(devdata);
        if (local_devdata)
                node = local_devdata->dev->of_node;

        if (local_devdata &&
                        action == OF_RECONFIG_UPDATE_PROPERTY &&
                        !strcmp(upd->dn->name, node->name)) {
                rcu_read_unlock();
                nx842_OF_upd(upd->prop);
        } else
                rcu_read_unlock();

        return NOTIFY_OK;
}

I expect to find the same problems in pseries-rng.c and nx.c.

-Tyrel

> 
> This patch fixes the 'migration add' problem by changing the
> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for
> of_find_node_by_phandle), so that subsequent re-add operations,
> such as those during migration, do not find the detached node,
> do not observe duplicate names, do not rename them,  and the
> extra WARNING notices are removed from the console output.
> 
> In addition, it erases the 'name' field of the OF_DETACHED node,
> to prevent any future calls to of_find_node_by_name() or
> of_find_node_by_path() from matching this node.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/dlpar.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index 2de0f0d..9d82c28 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn)
>  	if (rc)
>  		return rc;
> 
> +	dn->phandle = 0;
> +	memset(dn->name, 0, strlen(dn->name));
> +
>  	return 0;
>  }
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-31  0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler
@ 2018-07-31  1:46   ` Tyrel Datwyler
  2018-07-31  6:42   ` Michael Ellerman
  1 sibling, 0 replies; 17+ messages in thread
From: Tyrel Datwyler @ 2018-07-31  1:46 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev
  Cc: Nathan Fontenot, Thomas Falcon, John Allen

On 07/30/2018 05:59 PM, Tyrel Datwyler wrote:
> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.  When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>>
>> During LPAR migration some common nodes are deleted and re-added
>> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
>> node lists, the of_attach_node function checks to make sure that
>> the name + ibm,phandle of the to-be-added data is unique.  As the
>> previous copy of a re-added node is not modified beyond the addition
>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>> notice to the console, (3) renames the to-be-added node to avoid
>> filename collisions within a directory, and (3) adds entries to
>> the sysfs/kernfs.
> 
> So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing.
> 
> As and example a quick look at nx-842-pseries.c reveals several issues.
> 
> static int __init nx842_pseries_init(void)
> {
>         struct nx842_devdata *new_devdata;
>         int ret;
> 
>         if (!of_find_compatible_node(NULL, NULL, "ibm,compression"))
>                 return -ENODEV;
> 
> This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked.
> 
> Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added.
> 
> /**
>  * nx842_OF_notifier - Process updates to OF properties for the device
>  *
>  * @np: notifier block
>  * @action: notifier action
>  * @update: struct pSeries_reconfig_prop_update pointer if action is
>  *      PSERIES_UPDATE_PROPERTY
>  *
>  * Returns:
>  *      NOTIFY_OK on success
>  *      NOTIFY_BAD encoded with error number on failure, use
>  *              notifier_to_errno() to decode this value
>  */
> static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
>                              void *data)
> {
>         struct of_reconfig_data *upd = data;
>         struct nx842_devdata *local_devdata;
>         struct device_node *node = NULL;
> 
>         rcu_read_lock();
>         local_devdata = rcu_dereference(devdata);
>         if (local_devdata)
>                 node = local_devdata->dev->of_node;
> 
>         if (local_devdata &&
>                         action == OF_RECONFIG_UPDATE_PROPERTY &&
>                         !strcmp(upd->dn->name, node->name)) {
>                 rcu_read_unlock();
>                 nx842_OF_upd(upd->prop);
>         } else
>                 rcu_read_unlock();
> 
>         return NOTIFY_OK;
> }
> 
> I expect to find the same problems in pseries-rng.c and nx.c.

So, in actuality the main root of the problem is really in the vio core. A node reference for each PFO device under "ibm,platform-facilities" is taken during vio_register_device_node(). We need a reconfig notifier to call vio_unregister_device() for each PFO device on detach, and vio_bus_scan_register_devices("ibm,platform-facilities") on attach. This will make sure the PFO vio devices are released such that vio_dev_release() gets called to put the node reference that was taken at original registration time.

/* vio_dev refcount hit 0 */
static void vio_dev_release(struct device *dev)
{
        struct iommu_table *tbl = get_iommu_table_base(dev);

        if (tbl)
                iommu_tce_table_put(tbl);
        of_node_put(dev->of_node);
        kfree(to_vio_dev(dev));
}

-Tyrel

> 
> -Tyrel
> 
>>
>> This patch fixes the 'migration add' problem by changing the
>> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for
>> of_find_node_by_phandle), so that subsequent re-add operations,
>> such as those during migration, do not find the detached node,
>> do not observe duplicate names, do not rename them,  and the
>> extra WARNING notices are removed from the console output.
>>
>> In addition, it erases the 'name' field of the OF_DETACHED node,
>> to prevent any future calls to of_find_node_by_name() or
>> of_find_node_by_path() from matching this node.
>>
>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/dlpar.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
>> index 2de0f0d..9d82c28 100644
>> --- a/arch/powerpc/platforms/pseries/dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/dlpar.c
>> @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn)
>>  	if (rc)
>>  		return rc;
>>
>> +	dn->phandle = 0;
>> +	memset(dn->name, 0, strlen(dn->name));
>> +
>>  	return 0;
>>  }
>>
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-30 21:02   ` Michael Bringmann
@ 2018-07-31  6:34     ` Michael Ellerman
  2018-07-31 14:17       ` Rob Herring
  2018-07-31 19:11       ` Michael Bringmann
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Ellerman @ 2018-07-31  6:34 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev, Rob Herring, Frank Rowand, devicetree

Hi Rob/Frank,

I think we might have a problem with the phandle_cache not interacting
well with of_detach_node():

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> See below.
>
> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>> 
>>> During LPAR migration, the content of the device tree/sysfs may
>>> be updated including deletion and replacement of nodes in the
>>> tree.  When nodes are added to the internal node structures, they
>>> are appended in FIFO order to a list of nodes maintained by the
>>> OF code APIs.
>> 
>> That hasn't been true for several years. The data structure is an n-ary
>> tree. What kernel version are you working on?
>
> Sorry for an error in my description.  I oversimplified based on the
> name of a search iterator.  Let me try to provide a better explanation
> of the problem, here.
>
> This is the problem.  The PPC mobility code receives RTAS requests to
> delete nodes with platform-/hardware-specific attributes when restarting
> the kernel after a migration.  My example is for migration between a
> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
> or others.
>
> The mobility.c code calls 'of_detach_node' for the nodes and their children.
> This makes calls to detach the properties and to try to remove the associated
> sysfs/kernfs files.
>
> Then new copies of the same nodes are next provided by the PHYP, local
> copies are built, and a pointer to the 'struct device_node' is passed to
> of_attach_node.  Before the call to of_attach_node, the phandle is initialized
> to 0 when the data structure is alloced.  During the call to of_attach_node,
> it calls __of_attach_node which pulls the actual name and phandle from just
> created sub-properties named something like 'name' and 'ibm,phandle'.
>
> This is all fine for the first migration.  The problem occurs with the
> second and subsequent migrations when the PHYP on the new system wants to
> replace the same set of nodes again, referenced with the same names and
> phandle values.
>
>> 
>>> When nodes are removed from the device tree, they
>>> are marked OF_DETACHED, but not actually deleted from the system
>>> to allow for pointers cached elsewhere in the kernel.  The order
>>> and content of the entries in the list of nodes is not altered,
>>> though.
>> 
>> Something is going wrong if this is actually happening.
>> 
>> When the node is detached it should be *detached* from the tree of all
>> nodes, so it should not be discoverable other than by having an existing
>> pointer to it.
> On the second and subsequent migrations, the PHYP tells the system
> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
> nodes by its known set of phandle values -- the same handles used
> by the PHYP on the source system are known on the target system.
> The mobility.c code calls of_find_node_by_phandle() with these values
> and ends up locating the first instance of each node that was added
> during the original boot, instead of the second instance of each node
> created after the first migration.  The detach during the second
> migration fails with errors like,
>
> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
> [ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
> [ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
> [ 4565.030811] Call Trace:
> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
> [ 4565.030884] Instruction dump:
> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---
>
> The mobility.c code continues on during the second migration, accepts the
> definitions of the new nodes from the PHYP and ends up renaming the new
> properties e.g.
>
> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"
>
> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
> of_find_node_by_phandle to skip nodes that are detached, but still present
> due to caching or use count considerations.  Another possibility to consider
> is that of_find_node_by_phandle also uses something called 'phandle_cache'
> which may have outdated data as of_detach_node() does not have access to
> that cache for the 'OF_DETACHED' nodes.

Yes the phandle_cache looks like it might be the problem.

I saw of_free_phandle_cache() being called as late_initcall, but didn't
realise that's only if MODULES is disabled.

So I don't see anything that invalidates the phandle_cache when a node
is removed.

The right solution would be for __of_detach_node() to invalidate the
phandle_cache for the node being detached. That's slightly complicated
by the phandle_cache being static inside base.c

To test the theory that it's the phandle_cache causing the problems can
you try this patch:

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 848f549164cd..60e219132e24 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle)
 		if (phandle_cache[masked_handle] &&
 		    handle == phandle_cache[masked_handle]->phandle)
 			np = phandle_cache[masked_handle];
+
+		if (of_node_check_flag(np, OF_DETACHED))
+			np = NULL;
 	}
 
 	if (!np) {

cheers

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-31  0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler
  2018-07-31  1:46   ` Tyrel Datwyler
@ 2018-07-31  6:42   ` Michael Ellerman
  2018-07-31 19:50     ` Tyrel Datwyler
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2018-07-31  6:42 UTC (permalink / raw)
  To: Tyrel Datwyler, Michael Bringmann, linuxppc-dev
  Cc: Nathan Fontenot, Thomas Falcon, John Allen

Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.  When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>> 
>> During LPAR migration some common nodes are deleted and re-added
>> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
>> node lists, the of_attach_node function checks to make sure that
>> the name + ibm,phandle of the to-be-added data is unique.  As the
>> previous copy of a re-added node is not modified beyond the addition
>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>> notice to the console, (3) renames the to-be-added node to avoid
>> filename collisions within a directory, and (3) adds entries to
>> the sysfs/kernfs.
>
> So, this patch actually just band aids over the real problem. This is
> a long standing problem with several PFO drivers leaking references.
> The issue here is that, during the device tree update that follows a
> migration. the update of the ibm,platform-facilities node and friends
> below are always deleted and re-added on the destination lpar and
> subsequently the leaked references prevent the devices nodes from
> every actually being properly cleaned up after detach. Thus, leading
> to the issue you are observing.

Leaking references shouldn't affect the node being detached from the
tree though.

See of_detach_node() calling __of_detach_node(), none of that depends on
the refcount.

It's only the actual freeing of the node, in of_node_release() that is
prevented by leaked reference counts.

So I agree we need to do a better job with the reference counting, but I
don't see how it is causing the problem here.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-31  6:34     ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman
@ 2018-07-31 14:17       ` Rob Herring
  2018-07-31 19:18         ` Frank Rowand
  2018-07-31 19:11       ` Michael Bringmann
  1 sibling, 1 reply; 17+ messages in thread
From: Rob Herring @ 2018-07-31 14:17 UTC (permalink / raw)
  To: Michael Ellerman, Frank Rowand; +Cc: mwb, linuxppc-dev, devicetree

On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrot=
e:
>
> Hi Rob/Frank,
>
> I think we might have a problem with the phandle_cache not interacting
> well with of_detach_node():

Probably needs a similar fix as this commit did for overlays:

commit b9952b5218added5577e4a3443969bc20884cea9
Author: Frank Rowand <frank.rowand@sony.com>
Date:   Thu Jul 12 14:00:07 2018 -0700

    of: overlay: update phandle cache on overlay apply and remove

    A comment in the review of the patch adding the phandle cache said that
    the cache would have to be updated when modules are applied and removed=
.
    This patch implements the cache updates.

    Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
of_find_node_by_phandle()")
    Reported-by: Alan Tull <atull@kernel.org>
    Suggested-by: Alan Tull <atull@kernel.org>
    Signed-off-by: Frank Rowand <frank.rowand@sony.com>
    Signed-off-by: Rob Herring <robh@kernel.org>


Really what we need here is an "invalidate phandle" function rather
than free and re-allocate the whole damn cache.

Rob

>
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> > See below.
> >
> > On 07/30/2018 01:31 AM, Michael Ellerman wrote:
> >> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> >>
> >>> During LPAR migration, the content of the device tree/sysfs may
> >>> be updated including deletion and replacement of nodes in the
> >>> tree.  When nodes are added to the internal node structures, they
> >>> are appended in FIFO order to a list of nodes maintained by the
> >>> OF code APIs.
> >>
> >> That hasn't been true for several years. The data structure is an n-ar=
y
> >> tree. What kernel version are you working on?
> >
> > Sorry for an error in my description.  I oversimplified based on the
> > name of a search iterator.  Let me try to provide a better explanation
> > of the problem, here.
> >
> > This is the problem.  The PPC mobility code receives RTAS requests to
> > delete nodes with platform-/hardware-specific attributes when restartin=
g
> > the kernel after a migration.  My example is for migration between a
> > P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,rando=
m-v1',
> > 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1=
',
> > or others.
> >
> > The mobility.c code calls 'of_detach_node' for the nodes and their chil=
dren.
> > This makes calls to detach the properties and to try to remove the asso=
ciated
> > sysfs/kernfs files.
> >
> > Then new copies of the same nodes are next provided by the PHYP, local
> > copies are built, and a pointer to the 'struct device_node' is passed t=
o
> > of_attach_node.  Before the call to of_attach_node, the phandle is init=
ialized
> > to 0 when the data structure is alloced.  During the call to of_attach_=
node,
> > it calls __of_attach_node which pulls the actual name and phandle from =
just
> > created sub-properties named something like 'name' and 'ibm,phandle'.
> >
> > This is all fine for the first migration.  The problem occurs with the
> > second and subsequent migrations when the PHYP on the new system wants =
to
> > replace the same set of nodes again, referenced with the same names and
> > phandle values.
> >
> >>
> >>> When nodes are removed from the device tree, they
> >>> are marked OF_DETACHED, but not actually deleted from the system
> >>> to allow for pointers cached elsewhere in the kernel.  The order
> >>> and content of the entries in the list of nodes is not altered,
> >>> though.
> >>
> >> Something is going wrong if this is actually happening.
> >>
> >> When the node is detached it should be *detached* from the tree of all
> >> nodes, so it should not be discoverable other than by having an existi=
ng
> >> pointer to it.
> > On the second and subsequent migrations, the PHYP tells the system
> > to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
> > 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
> > nodes by its known set of phandle values -- the same handles used
> > by the PHYP on the source system are known on the target system.
> > The mobility.c code calls of_find_node_by_phandle() with these values
> > and ends up locating the first instance of each node that was added
> > during the original boot, instead of the second instance of each node
> > created after the first migration.  The detach during the second
> > migration fails with errors like,
> >
> > [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __=
of_detach_node+0x8/0xa0
> > [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag i=
net_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc x=
ts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibm=
veth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> > [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W        =
 4.18.0-rc1-wi107836-v05-120+ #201
> > [ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000=
000655170
> > [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W =
         (4.18.0-rc1-wi107836-v05-120+)
> > [ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,=
TM[E]>  CR: 22288822  XER: 0000000a
> > [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
> > [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf0=
0 c0000003ffff8e68
> > [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b=
8 ffffffffffffffff
> > [ 4565.030757] GPR08: 0000000000000000 0000000000000001 000000008000000=
3 0000000000002843
> > [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 000000004000000=
0 0000000000000000
> > [ 4565.030757] GPR16: 0000000000000000 0000000000000008 000000000000000=
0 00000000f6ffffff
> > [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f03=
4 0000000000000001
> > [ 4565.030757] GPR24: 0000000000000000 0000000000000000 000000000000000=
0 0000000000000000
> > [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e6=
8 c0000003f302b930
> > [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
> > [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
> > [ 4565.030811] Call Trace:
> > [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x6=
4/0xd0 (unreliable)
> > [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+=
0xb4/0x150
> > [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3=
c/0x80
> > [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree=
_update+0x150/0x4f0
> > [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixu=
p+0x7c/0xf0
> > [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0x=
f8/0x130
> > [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x=
30/0x60
> > [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x6=
4/0xa0
> > [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0=
x16c/0x240
> > [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0=
x220
> > [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x2=
40
> > [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x=
100
> > [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0=
x70
> > [ 4565.030884] Instruction dump:
> > [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 eb=
c1fff0 ebe1fff8
> > [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 =
4cde0020 e9030040
> > [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---
> >
> > The mobility.c code continues on during the second migration, accepts t=
he
> > definitions of the new nodes from the PHYP and ends up renaming the new
> > properties e.g.
> >
> > [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilit=
ies#1"
> >
> > I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
> > of_find_node_by_phandle to skip nodes that are detached, but still pres=
ent
> > due to caching or use count considerations.  Another possibility to con=
sider
> > is that of_find_node_by_phandle also uses something called 'phandle_cac=
he'
> > which may have outdated data as of_detach_node() does not have access t=
o
> > that cache for the 'OF_DETACHED' nodes.
>
> Yes the phandle_cache looks like it might be the problem.
>
> I saw of_free_phandle_cache() being called as late_initcall, but didn't
> realise that's only if MODULES is disabled.
>
> So I don't see anything that invalidates the phandle_cache when a node
> is removed.
>
> The right solution would be for __of_detach_node() to invalidate the
> phandle_cache for the node being detached. That's slightly complicated
> by the phandle_cache being static inside base.c
>
> To test the theory that it's the phandle_cache causing the problems can
> you try this patch:
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 848f549164cd..60e219132e24 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle=
 handle)
>                 if (phandle_cache[masked_handle] &&
>                     handle =3D=3D phandle_cache[masked_handle]->phandle)
>                         np =3D phandle_cache[masked_handle];
> +
> +               if (of_node_check_flag(np, OF_DETACHED))
> +                       np =3D NULL;
>         }
>
>         if (!np) {
>
> cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-31  6:34     ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman
  2018-07-31 14:17       ` Rob Herring
@ 2018-07-31 19:11       ` Michael Bringmann
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Bringmann @ 2018-07-31 19:11 UTC (permalink / raw)
  To: linuxppc-dev

I applied your suggestion with a couple of modifications,
and it looks to have worked for the first 2 migration events.
I am not seeing the errors from repeated migrations.  The
revised patch tested is,

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 466e3c8..8bf64e5 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1096,8 +1096,14 @@ struct device_node *of_find_node_by_phandle(phandle handle)

        if (phandle_cache) {
                if (phandle_cache[masked_handle] &&
-                   handle == phandle_cache[masked_handle]->phandle)
+                   handle == phandle_cache[masked_handle]->phandle) {
                        np = phandle_cache[masked_handle];
+
+                       if (of_node_check_flag(np, OF_DETACHED)) {
+                               np = NULL;
+                               phandle_cache[masked_handle] = NULL;
+                       }
+               }
        }

        if (!np) {

During a conference call this morning, Tyrel expressed concerns
about the use of the phandle_cache on powerpc, at all.  I will
try another build with that feature disabled, but without this
patch.

Michael


On 07/31/2018 01:34 AM, Michael Ellerman wrote:
> Hi Rob/Frank,
> 
> I think we might have a problem with the phandle_cache not interacting
> well with of_detach_node():
> 
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>> See below.
>>
>> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>
>>>> During LPAR migration, the content of the device tree/sysfs may
>>>> be updated including deletion and replacement of nodes in the
>>>> tree.  When nodes are added to the internal node structures, they
>>>> are appended in FIFO order to a list of nodes maintained by the
>>>> OF code APIs.
>>>
>>> That hasn't been true for several years. The data structure is an n-ary
>>> tree. What kernel version are you working on?
>>
>> Sorry for an error in my description.  I oversimplified based on the
>> name of a search iterator.  Let me try to provide a better explanation
>> of the problem, here.
>>
>> This is the problem.  The PPC mobility code receives RTAS requests to
>> delete nodes with platform-/hardware-specific attributes when restarting
>> the kernel after a migration.  My example is for migration between a
>> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
>> or others.
>>
>> The mobility.c code calls 'of_detach_node' for the nodes and their children.
>> This makes calls to detach the properties and to try to remove the associated
>> sysfs/kernfs files.
>>
>> Then new copies of the same nodes are next provided by the PHYP, local
>> copies are built, and a pointer to the 'struct device_node' is passed to
>> of_attach_node.  Before the call to of_attach_node, the phandle is initialized
>> to 0 when the data structure is alloced.  During the call to of_attach_node,
>> it calls __of_attach_node which pulls the actual name and phandle from just
>> created sub-properties named something like 'name' and 'ibm,phandle'.
>>
>> This is all fine for the first migration.  The problem occurs with the
>> second and subsequent migrations when the PHYP on the new system wants to
>> replace the same set of nodes again, referenced with the same names and
>> phandle values.
>>
>>>
>>>> When nodes are removed from the device tree, they
>>>> are marked OF_DETACHED, but not actually deleted from the system
>>>> to allow for pointers cached elsewhere in the kernel.  The order
>>>> and content of the entries in the list of nodes is not altered,
>>>> though.
>>>
>>> Something is going wrong if this is actually happening.
>>>
>>> When the node is detached it should be *detached* from the tree of all
>>> nodes, so it should not be discoverable other than by having an existing
>>> pointer to it.
>> On the second and subsequent migrations, the PHYP tells the system
>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>> nodes by its known set of phandle values -- the same handles used
>> by the PHYP on the source system are known on the target system.
>> The mobility.c code calls of_find_node_by_phandle() with these values
>> and ends up locating the first instance of each node that was added
>> during the original boot, instead of the second instance of each node
>> created after the first migration.  The detach during the second
>> migration fails with errors like,
>>
>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
>> [ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
>> [ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
>> [ 4565.030811] Call Trace:
>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
>> [ 4565.030884] Instruction dump:
>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---
>>
>> The mobility.c code continues on during the second migration, accepts the
>> definitions of the new nodes from the PHYP and ends up renaming the new
>> properties e.g.
>>
>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"
>>
>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
>> of_find_node_by_phandle to skip nodes that are detached, but still present
>> due to caching or use count considerations.  Another possibility to consider
>> is that of_find_node_by_phandle also uses something called 'phandle_cache'
>> which may have outdated data as of_detach_node() does not have access to
>> that cache for the 'OF_DETACHED' nodes.
> 
> Yes the phandle_cache looks like it might be the problem.
> 
> I saw of_free_phandle_cache() being called as late_initcall, but didn't
> realise that's only if MODULES is disabled.
> 
> So I don't see anything that invalidates the phandle_cache when a node
> is removed.
> 
> The right solution would be for __of_detach_node() to invalidate the
> phandle_cache for the node being detached. That's slightly complicated
> by the phandle_cache being static inside base.c
> 
> To test the theory that it's the phandle_cache causing the problems can
> you try this patch:
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 848f549164cd..60e219132e24 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle)
>  		if (phandle_cache[masked_handle] &&
>  		    handle == phandle_cache[masked_handle]->phandle)
>  			np = phandle_cache[masked_handle];
> +
> +		if (of_node_check_flag(np, OF_DETACHED))
> +			np = NULL;
>  	}
> 
>  	if (!np) {
> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-31 14:17       ` Rob Herring
@ 2018-07-31 19:18         ` Frank Rowand
  2018-07-31 19:22           ` Frank Rowand
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Frank Rowand @ 2018-07-31 19:18 UTC (permalink / raw)
  To: Rob Herring, Michael Ellerman; +Cc: mwb, linuxppc-dev, devicetree

On 07/31/18 07:17, Rob Herring wrote:
> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Hi Rob/Frank,
>>
>> I think we might have a problem with the phandle_cache not interacting
>> well with of_detach_node():
> 
> Probably needs a similar fix as this commit did for overlays:
> 
> commit b9952b5218added5577e4a3443969bc20884cea9
> Author: Frank Rowand <frank.rowand@sony.com>
> Date:   Thu Jul 12 14:00:07 2018 -0700
> 
>     of: overlay: update phandle cache on overlay apply and remove
> 
>     A comment in the review of the patch adding the phandle cache said that
>     the cache would have to be updated when modules are applied and removed.
>     This patch implements the cache updates.
> 
>     Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
> of_find_node_by_phandle()")
>     Reported-by: Alan Tull <atull@kernel.org>
>     Suggested-by: Alan Tull <atull@kernel.org>
>     Signed-off-by: Frank Rowand <frank.rowand@sony.com>
>     Signed-off-by: Rob Herring <robh@kernel.org>

Agreed.  Sorry about missing the of_detach_node() case.


> Really what we need here is an "invalidate phandle" function rather
> than free and re-allocate the whole damn cache.

The big hammer approach was chosen to avoid the race conditions that
would otherwise occur.  OF does not have a locking strategy that
would be able to protect against the races.

We could maybe implement a slightly smaller hammer by (1) disabling
the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
the cache.  That is an off the cuff thought - I would have to look
a little bit more carefully to make sure it would work.

But I don't see a need to add the complexity of the smaller hammer
or the bigger hammer of proper locking _unless_ we start seeing that
the cache is being freed and re-allocated frequently.  For overlays
I don't expect the high frequency because it happens on a per overlay
removal basis (not per node removal basis).  For of_detach_node() the
event _is_ on a per node removal basis.  Michael, do you expect node
removals to be a frequent event with low latency being important?  If
so, a rough guess on what the frequency would be?

-Frank


> Rob
> 
>>
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>> See below.
>>>
>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>>
>>>>> During LPAR migration, the content of the device tree/sysfs may
>>>>> be updated including deletion and replacement of nodes in the
>>>>> tree.  When nodes are added to the internal node structures, they
>>>>> are appended in FIFO order to a list of nodes maintained by the
>>>>> OF code APIs.
>>>>
>>>> That hasn't been true for several years. The data structure is an n-ary
>>>> tree. What kernel version are you working on?
>>>
>>> Sorry for an error in my description.  I oversimplified based on the
>>> name of a search iterator.  Let me try to provide a better explanation
>>> of the problem, here.
>>>
>>> This is the problem.  The PPC mobility code receives RTAS requests to
>>> delete nodes with platform-/hardware-specific attributes when restarting
>>> the kernel after a migration.  My example is for migration between a
>>> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
>>> or others.
>>>
>>> The mobility.c code calls 'of_detach_node' for the nodes and their children.
>>> This makes calls to detach the properties and to try to remove the associated
>>> sysfs/kernfs files.
>>>
>>> Then new copies of the same nodes are next provided by the PHYP, local
>>> copies are built, and a pointer to the 'struct device_node' is passed to
>>> of_attach_node.  Before the call to of_attach_node, the phandle is initialized
>>> to 0 when the data structure is alloced.  During the call to of_attach_node,
>>> it calls __of_attach_node which pulls the actual name and phandle from just
>>> created sub-properties named something like 'name' and 'ibm,phandle'.
>>>
>>> This is all fine for the first migration.  The problem occurs with the
>>> second and subsequent migrations when the PHYP on the new system wants to
>>> replace the same set of nodes again, referenced with the same names and
>>> phandle values.
>>>
>>>>
>>>>> When nodes are removed from the device tree, they
>>>>> are marked OF_DETACHED, but not actually deleted from the system
>>>>> to allow for pointers cached elsewhere in the kernel.  The order
>>>>> and content of the entries in the list of nodes is not altered,
>>>>> though.
>>>>
>>>> Something is going wrong if this is actually happening.
>>>>
>>>> When the node is detached it should be *detached* from the tree of all
>>>> nodes, so it should not be discoverable other than by having an existing
>>>> pointer to it.
>>> On the second and subsequent migrations, the PHYP tells the system
>>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>>> nodes by its known set of phandle values -- the same handles used
>>> by the PHYP on the source system are known on the target system.
>>> The mobility.c code calls of_find_node_by_phandle() with these values
>>> and ends up locating the first instance of each node that was added
>>> during the original boot, instead of the second instance of each node
>>> created after the first migration.  The detach during the second
>>> migration fails with errors like,
>>>
>>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
>>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
>>> [ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
>>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
>>> [ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
>>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
>>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
>>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
>>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
>>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
>>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
>>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
>>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
>>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
>>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
>>> [ 4565.030811] Call Trace:
>>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
>>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
>>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
>>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
>>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
>>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
>>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
>>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
>>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
>>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
>>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
>>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
>>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
>>> [ 4565.030884] Instruction dump:
>>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
>>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
>>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---
>>>
>>> The mobility.c code continues on during the second migration, accepts the
>>> definitions of the new nodes from the PHYP and ends up renaming the new
>>> properties e.g.
>>>
>>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"
>>>
>>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
>>> of_find_node_by_phandle to skip nodes that are detached, but still present
>>> due to caching or use count considerations.  Another possibility to consider
>>> is that of_find_node_by_phandle also uses something called 'phandle_cache'
>>> which may have outdated data as of_detach_node() does not have access to
>>> that cache for the 'OF_DETACHED' nodes.
>>
>> Yes the phandle_cache looks like it might be the problem.
>>
>> I saw of_free_phandle_cache() being called as late_initcall, but didn't
>> realise that's only if MODULES is disabled.
>>
>> So I don't see anything that invalidates the phandle_cache when a node
>> is removed.
>>
>> The right solution would be for __of_detach_node() to invalidate the
>> phandle_cache for the node being detached. That's slightly complicated
>> by the phandle_cache being static inside base.c
>>
>> To test the theory that it's the phandle_cache causing the problems can
>> you try this patch:
>>
>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index 848f549164cd..60e219132e24 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c
>> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle)
>>                 if (phandle_cache[masked_handle] &&
>>                     handle == phandle_cache[masked_handle]->phandle)
>>                         np = phandle_cache[masked_handle];
>> +
>> +               if (of_node_check_flag(np, OF_DETACHED))
>> +                       np = NULL;
>>         }
>>
>>         if (!np) {
>>
>> cheers
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-31 19:18         ` Frank Rowand
@ 2018-07-31 19:22           ` Frank Rowand
  2018-07-31 19:26           ` Michael Bringmann
  2018-08-01 14:26           ` Michael Ellerman
  2 siblings, 0 replies; 17+ messages in thread
From: Frank Rowand @ 2018-07-31 19:22 UTC (permalink / raw)
  To: Rob Herring, Michael Ellerman; +Cc: mwb, linuxppc-dev, devicetree

On 07/31/18 12:18, Frank Rowand wrote:
> On 07/31/18 07:17, Rob Herring wrote:
>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>
>>> Hi Rob/Frank,
>>>
>>> I think we might have a problem with the phandle_cache not interacting
>>> well with of_detach_node():
>>
>> Probably needs a similar fix as this commit did for overlays:
>>
>> commit b9952b5218added5577e4a3443969bc20884cea9
>> Author: Frank Rowand <frank.rowand@sony.com>
>> Date:   Thu Jul 12 14:00:07 2018 -0700
>>
>>     of: overlay: update phandle cache on overlay apply and remove
>>
>>     A comment in the review of the patch adding the phandle cache said that
>>     the cache would have to be updated when modules are applied and removed.
>>     This patch implements the cache updates.
>>
>>     Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>> of_find_node_by_phandle()")
>>     Reported-by: Alan Tull <atull@kernel.org>
>>     Suggested-by: Alan Tull <atull@kernel.org>
>>     Signed-off-by: Frank Rowand <frank.rowand@sony.com>
>>     Signed-off-by: Rob Herring <robh@kernel.org>
> 
> Agreed.  Sorry about missing the of_detach_node() case.
> 
> 
>> Really what we need here is an "invalidate phandle" function rather
>> than free and re-allocate the whole damn cache.
> 
> The big hammer approach was chosen to avoid the race conditions that
> would otherwise occur.  OF does not have a locking strategy that
> would be able to protect against the races.
> 
> We could maybe implement a slightly smaller hammer by (1) disabling
> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
> the cache.  That is an off the cuff thought - I would have to look
> a little bit more carefully to make sure it would work.
> 
> But I don't see a need to add the complexity of the smaller hammer
> or the bigger hammer of proper locking _unless_ we start seeing that
> the cache is being freed and re-allocated frequently.  For overlays
> I don't expect the high frequency because it happens on a per overlay
> removal basis (not per node removal basis).


>                                              For of_detach_node() the
> event _is_ on a per node removal basis.  Michael, do you expect node
> removals to be a frequent event with low latency being important?  If
> so, a rough guess on what the frequency would be?

I have not looked at how of_detach_node() is used, so it might not be
very different that overlays.  If a group of of_detach_node() calls
are made from a common code location, the the sequence could possibly
be:

   of_free_phandle_cache()

   multiple calls of of_detach_node()

   of_populate_phandle_cache()

-Frank
> 
> -Frank
> 
> 
>> Rob
>>
>>>
>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>> See below.
>>>>
>>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>>>
>>>>>> During LPAR migration, the content of the device tree/sysfs may
>>>>>> be updated including deletion and replacement of nodes in the
>>>>>> tree.  When nodes are added to the internal node structures, they
>>>>>> are appended in FIFO order to a list of nodes maintained by the
>>>>>> OF code APIs.
>>>>>
>>>>> That hasn't been true for several years. The data structure is an n-ary
>>>>> tree. What kernel version are you working on?
>>>>
>>>> Sorry for an error in my description.  I oversimplified based on the
>>>> name of a search iterator.  Let me try to provide a better explanation
>>>> of the problem, here.
>>>>
>>>> This is the problem.  The PPC mobility code receives RTAS requests to
>>>> delete nodes with platform-/hardware-specific attributes when restarting
>>>> the kernel after a migration.  My example is for migration between a
>>>> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
>>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
>>>> or others.
>>>>
>>>> The mobility.c code calls 'of_detach_node' for the nodes and their children.
>>>> This makes calls to detach the properties and to try to remove the associated
>>>> sysfs/kernfs files.
>>>>
>>>> Then new copies of the same nodes are next provided by the PHYP, local
>>>> copies are built, and a pointer to the 'struct device_node' is passed to
>>>> of_attach_node.  Before the call to of_attach_node, the phandle is initialized
>>>> to 0 when the data structure is alloced.  During the call to of_attach_node,
>>>> it calls __of_attach_node which pulls the actual name and phandle from just
>>>> created sub-properties named something like 'name' and 'ibm,phandle'.
>>>>
>>>> This is all fine for the first migration.  The problem occurs with the
>>>> second and subsequent migrations when the PHYP on the new system wants to
>>>> replace the same set of nodes again, referenced with the same names and
>>>> phandle values.
>>>>
>>>>>
>>>>>> When nodes are removed from the device tree, they
>>>>>> are marked OF_DETACHED, but not actually deleted from the system
>>>>>> to allow for pointers cached elsewhere in the kernel.  The order
>>>>>> and content of the entries in the list of nodes is not altered,
>>>>>> though.
>>>>>
>>>>> Something is going wrong if this is actually happening.
>>>>>
>>>>> When the node is detached it should be *detached* from the tree of all
>>>>> nodes, so it should not be discoverable other than by having an existing
>>>>> pointer to it.
>>>> On the second and subsequent migrations, the PHYP tells the system
>>>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>>>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>>>> nodes by its known set of phandle values -- the same handles used
>>>> by the PHYP on the source system are known on the target system.
>>>> The mobility.c code calls of_find_node_by_phandle() with these values
>>>> and ends up locating the first instance of each node that was added
>>>> during the original boot, instead of the second instance of each node
>>>> created after the first migration.  The detach during the second
>>>> migration fails with errors like,
>>>>
>>>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
>>>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>>>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
>>>> [ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
>>>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
>>>> [ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
>>>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
>>>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
>>>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
>>>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
>>>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
>>>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
>>>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
>>>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
>>>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
>>>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
>>>> [ 4565.030811] Call Trace:
>>>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
>>>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
>>>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
>>>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
>>>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
>>>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
>>>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
>>>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
>>>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
>>>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
>>>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
>>>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
>>>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
>>>> [ 4565.030884] Instruction dump:
>>>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
>>>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
>>>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---
>>>>
>>>> The mobility.c code continues on during the second migration, accepts the
>>>> definitions of the new nodes from the PHYP and ends up renaming the new
>>>> properties e.g.
>>>>
>>>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"
>>>>
>>>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
>>>> of_find_node_by_phandle to skip nodes that are detached, but still present
>>>> due to caching or use count considerations.  Another possibility to consider
>>>> is that of_find_node_by_phandle also uses something called 'phandle_cache'
>>>> which may have outdated data as of_detach_node() does not have access to
>>>> that cache for the 'OF_DETACHED' nodes.
>>>
>>> Yes the phandle_cache looks like it might be the problem.
>>>
>>> I saw of_free_phandle_cache() being called as late_initcall, but didn't
>>> realise that's only if MODULES is disabled.
>>>
>>> So I don't see anything that invalidates the phandle_cache when a node
>>> is removed.
>>>
>>> The right solution would be for __of_detach_node() to invalidate the
>>> phandle_cache for the node being detached. That's slightly complicated
>>> by the phandle_cache being static inside base.c
>>>
>>> To test the theory that it's the phandle_cache causing the problems can
>>> you try this patch:
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index 848f549164cd..60e219132e24 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle)
>>>                 if (phandle_cache[masked_handle] &&
>>>                     handle == phandle_cache[masked_handle]->phandle)
>>>                         np = phandle_cache[masked_handle];
>>> +
>>> +               if (of_node_check_flag(np, OF_DETACHED))
>>> +                       np = NULL;
>>>         }
>>>
>>>         if (!np) {
>>>
>>> cheers
>>
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-31 19:18         ` Frank Rowand
  2018-07-31 19:22           ` Frank Rowand
@ 2018-07-31 19:26           ` Michael Bringmann
  2018-08-01 14:26           ` Michael Ellerman
  2 siblings, 0 replies; 17+ messages in thread
From: Michael Bringmann @ 2018-07-31 19:26 UTC (permalink / raw)
  To: linuxppc-dev

On 07/31/2018 02:18 PM, Frank Rowand wrote:
> On 07/31/18 07:17, Rob Herring wrote:
>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>
>>> Hi Rob/Frank,
>>>
>>> I think we might have a problem with the phandle_cache not interacting
>>> well with of_detach_node():
>>
>> Probably needs a similar fix as this commit did for overlays:
>>
>> commit b9952b5218added5577e4a3443969bc20884cea9
>> Author: Frank Rowand <frank.rowand@sony.com>
>> Date:   Thu Jul 12 14:00:07 2018 -0700
>>
>>     of: overlay: update phandle cache on overlay apply and remove
>>
>>     A comment in the review of the patch adding the phandle cache said that
>>     the cache would have to be updated when modules are applied and removed.
>>     This patch implements the cache updates.
>>
>>     Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>> of_find_node_by_phandle()")
>>     Reported-by: Alan Tull <atull@kernel.org>
>>     Suggested-by: Alan Tull <atull@kernel.org>
>>     Signed-off-by: Frank Rowand <frank.rowand@sony.com>
>>     Signed-off-by: Rob Herring <robh@kernel.org>
> 
> Agreed.  Sorry about missing the of_detach_node() case.
> 
> 
>> Really what we need here is an "invalidate phandle" function rather
>> than free and re-allocate the whole damn cache.
> 
> The big hammer approach was chosen to avoid the race conditions that
> would otherwise occur.  OF does not have a locking strategy that
> would be able to protect against the races.
> 
> We could maybe implement a slightly smaller hammer by (1) disabling
> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
> the cache.  That is an off the cuff thought - I would have to look
> a little bit more carefully to make sure it would work.
> 
> But I don't see a need to add the complexity of the smaller hammer
> or the bigger hammer of proper locking _unless_ we start seeing that
> the cache is being freed and re-allocated frequently.  For overlays
> I don't expect the high frequency because it happens on a per overlay
> removal basis (not per node removal basis).  For of_detach_node() the
> event _is_ on a per node removal basis.  Michael, do you expect node
> removals to be a frequent event with low latency being important?  If
> so, a rough guess on what the frequency would be?

I am only seeing node removals during startup of the kernel after an
LPAR migration event.  For the tests that I have run between a couple
of P8 platforms, I see about 15 node removals split between, 8 l2-caches,
4 hardware-specific property clusters (e.g. ibm,platform-facilities),
and 3 CPUs.

> 
> -Frank

Michael
> 
> 
>> Rob
>>
>>>
>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>> See below.
>>>>
>>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>>>
>>>>>> During LPAR migration, the content of the device tree/sysfs may
>>>>>> be updated including deletion and replacement of nodes in the
>>>>>> tree.  When nodes are added to the internal node structures, they
>>>>>> are appended in FIFO order to a list of nodes maintained by the
>>>>>> OF code APIs.
>>>>>
>>>>> That hasn't been true for several years. The data structure is an n-ary
>>>>> tree. What kernel version are you working on?
>>>>
>>>> Sorry for an error in my description.  I oversimplified based on the
>>>> name of a search iterator.  Let me try to provide a better explanation
>>>> of the problem, here.
>>>>
>>>> This is the problem.  The PPC mobility code receives RTAS requests to
>>>> delete nodes with platform-/hardware-specific attributes when restarting
>>>> the kernel after a migration.  My example is for migration between a
>>>> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
>>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
>>>> or others.
>>>>
>>>> The mobility.c code calls 'of_detach_node' for the nodes and their children.
>>>> This makes calls to detach the properties and to try to remove the associated
>>>> sysfs/kernfs files.
>>>>
>>>> Then new copies of the same nodes are next provided by the PHYP, local
>>>> copies are built, and a pointer to the 'struct device_node' is passed to
>>>> of_attach_node.  Before the call to of_attach_node, the phandle is initialized
>>>> to 0 when the data structure is alloced.  During the call to of_attach_node,
>>>> it calls __of_attach_node which pulls the actual name and phandle from just
>>>> created sub-properties named something like 'name' and 'ibm,phandle'.
>>>>
>>>> This is all fine for the first migration.  The problem occurs with the
>>>> second and subsequent migrations when the PHYP on the new system wants to
>>>> replace the same set of nodes again, referenced with the same names and
>>>> phandle values.
>>>>
>>>>>
>>>>>> When nodes are removed from the device tree, they
>>>>>> are marked OF_DETACHED, but not actually deleted from the system
>>>>>> to allow for pointers cached elsewhere in the kernel.  The order
>>>>>> and content of the entries in the list of nodes is not altered,
>>>>>> though.
>>>>>
>>>>> Something is going wrong if this is actually happening.
>>>>>
>>>>> When the node is detached it should be *detached* from the tree of all
>>>>> nodes, so it should not be discoverable other than by having an existing
>>>>> pointer to it.
>>>> On the second and subsequent migrations, the PHYP tells the system
>>>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>>>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>>>> nodes by its known set of phandle values -- the same handles used
>>>> by the PHYP on the source system are known on the target system.
>>>> The mobility.c code calls of_find_node_by_phandle() with these values
>>>> and ends up locating the first instance of each node that was added
>>>> during the original boot, instead of the second instance of each node
>>>> created after the first migration.  The detach during the second
>>>> migration fails with errors like,
>>>>
>>>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
>>>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>>>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
>>>> [ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
>>>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
>>>> [ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
>>>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
>>>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
>>>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
>>>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
>>>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
>>>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
>>>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
>>>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
>>>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
>>>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
>>>> [ 4565.030811] Call Trace:
>>>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
>>>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
>>>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
>>>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
>>>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
>>>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
>>>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
>>>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
>>>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
>>>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
>>>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
>>>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
>>>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
>>>> [ 4565.030884] Instruction dump:
>>>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
>>>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
>>>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---
>>>>
>>>> The mobility.c code continues on during the second migration, accepts the
>>>> definitions of the new nodes from the PHYP and ends up renaming the new
>>>> properties e.g.
>>>>
>>>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"
>>>>
>>>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within
>>>> of_find_node_by_phandle to skip nodes that are detached, but still present
>>>> due to caching or use count considerations.  Another possibility to consider
>>>> is that of_find_node_by_phandle also uses something called 'phandle_cache'
>>>> which may have outdated data as of_detach_node() does not have access to
>>>> that cache for the 'OF_DETACHED' nodes.
>>>
>>> Yes the phandle_cache looks like it might be the problem.
>>>
>>> I saw of_free_phandle_cache() being called as late_initcall, but didn't
>>> realise that's only if MODULES is disabled.
>>>
>>> So I don't see anything that invalidates the phandle_cache when a node
>>> is removed.
>>>
>>> The right solution would be for __of_detach_node() to invalidate the
>>> phandle_cache for the node being detached. That's slightly complicated
>>> by the phandle_cache being static inside base.c
>>>
>>> To test the theory that it's the phandle_cache causing the problems can
>>> you try this patch:
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index 848f549164cd..60e219132e24 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle)
>>>                 if (phandle_cache[masked_handle] &&
>>>                     handle == phandle_cache[masked_handle]->phandle)
>>>                         np = phandle_cache[masked_handle];
>>> +
>>> +               if (of_node_check_flag(np, OF_DETACHED))
>>> +                       np = NULL;
>>>         }
>>>
>>>         if (!np) {
>>>
>>> cheers
>>
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-31  6:42   ` Michael Ellerman
@ 2018-07-31 19:50     ` Tyrel Datwyler
  2018-08-01 14:35       ` Michael Ellerman
  0 siblings, 1 reply; 17+ messages in thread
From: Tyrel Datwyler @ 2018-07-31 19:50 UTC (permalink / raw)
  To: Michael Ellerman, Michael Bringmann, linuxppc-dev
  Cc: Nathan Fontenot, Thomas Falcon, John Allen

T24gMDcvMzAvMjAxOCAxMTo0MiBQTSwgTWljaGFlbCBFbGxlcm1hbiB3cm90ZToNCj4gVHly
ZWwgRGF0d3lsZXIgPHR5cmVsZEBsaW51eC52bmV0LmlibS5jb20+IHdyaXRlczoNCj4+IE9u
IDA3LzI5LzIwMTggMDY6MTEgQU0sIE1pY2hhZWwgQnJpbmdtYW5uIHdyb3RlOg0KPj4+IER1
cmluZyBMUEFSIG1pZ3JhdGlvbiwgdGhlIGNvbnRlbnQgb2YgdGhlIGRldmljZSB0cmVlL3N5
c2ZzIG1heQ0KPj4+IGJlIHVwZGF0ZWQgaW5jbHVkaW5nIGRlbGV0aW9uIGFuZCByZXBsYWNl
bWVudCBvZiBub2RlcyBpbiB0aGUNCj4+PiB0cmVlLiAgV2hlbiBub2RlcyBhcmUgYWRkZWQg
dG8gdGhlIGludGVybmFsIG5vZGUgc3RydWN0dXJlcywgdGhleQ0KPj4+IGFyZSBhcHBlbmRl
ZCBpbiBGSUZPIG9yZGVyIHRvIGEgbGlzdCBvZiBub2RlcyBtYWludGFpbmVkIGJ5IHRoZQ0K
Pj4+IE9GIGNvZGUgQVBJcy4gIFdoZW4gbm9kZXMgYXJlIHJlbW92ZWQgZnJvbSB0aGUgZGV2
aWNlIHRyZWUsIHRoZXkNCj4+PiBhcmUgbWFya2VkIE9GX0RFVEFDSEVELCBidXQgbm90IGFj
dHVhbGx5IGRlbGV0ZWQgZnJvbSB0aGUgc3lzdGVtDQo+Pj4gdG8gYWxsb3cgZm9yIHBvaW50
ZXJzIGNhY2hlZCBlbHNld2hlcmUgaW4gdGhlIGtlcm5lbC4gIFRoZSBvcmRlcg0KPj4+IGFu
ZCBjb250ZW50IG9mIHRoZSBlbnRyaWVzIGluIHRoZSBsaXN0IG9mIG5vZGVzIGlzIG5vdCBh
bHRlcmVkLA0KPj4+IHRob3VnaC4NCj4+Pg0KPj4+IER1cmluZyBMUEFSIG1pZ3JhdGlvbiBz
b21lIGNvbW1vbiBub2RlcyBhcmUgZGVsZXRlZCBhbmQgcmUtYWRkZWQNCj4+PiBlLmcuICJp
Ym0scGxhdGZvcm0tZmFjaWxpdGllcyIuICBJZiBhIG5vZGUgaXMgcmUtYWRkZWQgdG8gdGhl
IE9GDQo+Pj4gbm9kZSBsaXN0cywgdGhlIG9mX2F0dGFjaF9ub2RlIGZ1bmN0aW9uIGNoZWNr
cyB0byBtYWtlIHN1cmUgdGhhdA0KPj4+IHRoZSBuYW1lICsgaWJtLHBoYW5kbGUgb2YgdGhl
IHRvLWJlLWFkZGVkIGRhdGEgaXMgdW5pcXVlLiAgQXMgdGhlDQo+Pj4gcHJldmlvdXMgY29w
eSBvZiBhIHJlLWFkZGVkIG5vZGUgaXMgbm90IG1vZGlmaWVkIGJleW9uZCB0aGUgYWRkaXRp
b24NCj4+PiBvZiBhIGJpdCBmbGFnLCB0aGUgY29kZSAoMSkgZmluZHMgdGhlIG9sZCBjb3B5
LCAoMikgcHJpbnRzIGEgV0FSTklORw0KPj4+IG5vdGljZSB0byB0aGUgY29uc29sZSwgKDMp
IHJlbmFtZXMgdGhlIHRvLWJlLWFkZGVkIG5vZGUgdG8gYXZvaWQNCj4+PiBmaWxlbmFtZSBj
b2xsaXNpb25zIHdpdGhpbiBhIGRpcmVjdG9yeSwgYW5kICgzKSBhZGRzIGVudHJpZXMgdG8N
Cj4+PiB0aGUgc3lzZnMva2VybmZzLg0KPj4NCj4+IFNvLCB0aGlzIHBhdGNoIGFjdHVhbGx5
IGp1c3QgYmFuZCBhaWRzIG92ZXIgdGhlIHJlYWwgcHJvYmxlbS4gVGhpcyBpcw0KPj4gYSBs
b25nIHN0YW5kaW5nIHByb2JsZW0gd2l0aCBzZXZlcmFsIFBGTyBkcml2ZXJzIGxlYWtpbmcg
cmVmZXJlbmNlcy4NCj4+IFRoZSBpc3N1ZSBoZXJlIGlzIHRoYXQsIGR1cmluZyB0aGUgZGV2
aWNlIHRyZWUgdXBkYXRlIHRoYXQgZm9sbG93cyBhDQo+PiBtaWdyYXRpb24uIHRoZSB1cGRh
dGUgb2YgdGhlIGlibSxwbGF0Zm9ybS1mYWNpbGl0aWVzIG5vZGUgYW5kIGZyaWVuZHMNCj4+
IGJlbG93IGFyZSBhbHdheXMgZGVsZXRlZCBhbmQgcmUtYWRkZWQgb24gdGhlIGRlc3RpbmF0
aW9uIGxwYXIgYW5kDQo+PiBzdWJzZXF1ZW50bHkgdGhlIGxlYWtlZCByZWZlcmVuY2VzIHBy
ZXZlbnQgdGhlIGRldmljZXMgbm9kZXMgZnJvbQ0KPj4gZXZlcnkgYWN0dWFsbHkgYmVpbmcg
cHJvcGVybHkgY2xlYW5lZCB1cCBhZnRlciBkZXRhY2guIFRodXMsIGxlYWRpbmcNCj4+IHRv
IHRoZSBpc3N1ZSB5b3UgYXJlIG9ic2VydmluZy4NCg0KU28sIGl0IHdhcyB0aGUgZW5kIG9m
IHRoZSBkYXksIGFuZCBJIGtpbmQgb2YgZ2xvc3NlZCBvdmVyIHRoZSBpc3N1ZSBNaWNoYWVs
IHdhcyB0cnlpbmcgdG8gYWRkcmVzcyB3aXRoIGFuIGlzc3VlIHRoYXQgSSByZW1lbWJlcmVk
IHRoYXQgaGFkIGJlZW4gYXJvdW5kIGZvciBhd2hpbGUuDQoNCj4gDQo+IExlYWtpbmcgcmVm
ZXJlbmNlcyBzaG91bGRuJ3QgYWZmZWN0IHRoZSBub2RlIGJlaW5nIGRldGFjaGVkIGZyb20g
dGhlDQo+IHRyZWUgdGhvdWdoLg0KDQpObywgYnV0IGl0IGRvZXMgcHJldmVudCBpdCBmcm9t
IGJlaW5nIGZyZWVkIGZyb20gc3lzZnMgd2hpY2ggbGVhZHMgdG8gdGhlIHN5c2ZzIGVudHJ5
IHJlbmFtaW5nIHRoYXQgaGFwcGVucyB3aGVuIGFub3RoZXIgbm9kZSB3aXRoIHRoZSBzYW1l
IG5hbWUgaXMgYXR0YWNoZWQuDQoNCj4gDQo+IFNlZSBvZl9kZXRhY2hfbm9kZSgpIGNhbGxp
bmcgX19vZl9kZXRhY2hfbm9kZSgpLCBub25lIG9mIHRoYXQgZGVwZW5kcyBvbg0KPiB0aGUg
cmVmY291bnQuDQo+IA0KPiBJdCdzIG9ubHkgdGhlIGFjdHVhbCBmcmVlaW5nIG9mIHRoZSBu
b2RlLCBpbiBvZl9ub2RlX3JlbGVhc2UoKSB0aGF0IGlzDQo+IHByZXZlbnRlZCBieSBsZWFr
ZWQgcmVmZXJlbmNlIGNvdW50cy4NCg0KUmlnaHQsIGJ1dCBpZiB3ZSBkaWQgcmVmY291bnQg
Y29ycmVjdGx5IHRoZXJlIGlzIHRoZSBuZXcgcHJvYmxlbSB0aGF0IHRoZSBub2RlIGlzIGZy
ZWVkIGFuZCBub3cgdGhlIHBoYW5kbGUgY2FjaGUgcG9pbnRzIGF0IGZyZWVkIG1lbW9yeSBp
biB0aGUgY2FzZSB3ZXJlIG5vIGludmFsaWRhdGlvbiBpcyBkb25lLg0KDQo+IA0KPiBTbyBJ
IGFncmVlIHdlIG5lZWQgdG8gZG8gYSBiZXR0ZXIgam9iIHdpdGggdGhlIHJlZmVyZW5jZSBj
b3VudGluZywgYnV0IEkNCj4gZG9uJ3Qgc2VlIGhvdyBpdCBpcyBjYXVzaW5nIHRoZSBwcm9i
bGVtIGhlcmUNCg0KTm93IGluIHJlZ2FyZHMgdG8gdGhlIHBoYW5kbGUgY2FjaGluZyBzb21l
aG93IEkgbWlzc2VkIHRoYXQgY29kZSBnb2luZyBpbnRvIE9GIGVhcmxpZXIgdGhpcyB5ZWFy
LiBUaGF0IHNob3VsZCBoYXZlIGhhZCBhdCBsZWFzdCBzb21lIGRpc2N1c3Npb24gZnJvbSBv
dXIgc2lkZSBiYXNlZCBvbiB0aGUgZmFjdCB0aGF0IGl0IGlzIGJ1aWx0IG9uIGR0YyBjb21w
aWxlciBhc3N1bXB0aW9uIHRoYXQgdGhlcmUgYXJlIGEgc2V0IG51bWJlciBvZiBwaGFuZGxl
cyB0aGF0IGFyZSBhbGxvY2F0ZWQgc3RhcnRpbmcgYXQgMS4ubiBzdWNoIHRoYXQgdGhleSBh
cmUgbW9ub3RvbmljYWxseSBpbmNyZWFzaW5nLiBUaGF0IGhhcyBhIG5pY2UgZml4ZWQgc2l6
ZSB3aXRoIE8oMSkgbG9va3VwIHRpbWUuIFBoeXAgZG9lc24ndCBndWFyYW50ZWUgYW55IHNv
cnQgb2YgbmljZW5lc3Mgd2l0aCBuaWNlbHkgb3JkZXJlZCBwaGFuZGxlcy4gRnJvbSB3aGF0
IEkndmUgc2VlbiB0aGVyZSBhcmUgYSBzdWJzZXQgb2YgcGhhbmRsZXMgdGhhdCBkZWNyZWFz
ZSBmcm9tICgtMSkgbW9ub3RvbmljYWxseSwgYW5kIHRoZW4gdGhlcmUgYXJlIGEgYnVuY2gg
b2YgcmFuZG9tIHZhbHVlcyBmb3IgY3B1cyBhbmQgSU9Bcy4gVGhpbmtpbmcgb24gaXQgbWln
aHQgbm90IGJlIHRoYXQgYmlnIG9mIGEgZGVhbCBhcyB3ZSBqdXN0IGVuZCB1cCBpbiB0aGUg
Y2FzZSB3aGVyZSB3ZSBoYXZlIGEgcGhhbmRsZSBjb2xsaXNpb24gb25lIG1ha2VzIGl0IGlu
dG8gdGhlIGNhY2hlIGFuZCBpcyBvcHRpbWl6ZWQgd2hpbGUgdGhlIG90aGVyIGRvZXNuJ3Qu
IE9uIGFub3RoZXIgbm90ZSwgdGhleSBjbGVhcmx5IGhpdCBhIHNpbWlsYXIgaXNzdWUgZHVy
aW5nIG92ZXJsYXkgYXR0YWNoIGFuZCByZW1vdmUsIGFuZCBhcyBSb2IgcG9pbnRlZCBvdXQg
dGhlaXIgc29sdXRpb24gdG8gaGFuZGxlIGl0IGlzIGEgZnVsbCBjYWNoZSBpbnZhbGlkYXRp
b24gZm9sbG93ZWQgYnkgcmVzY2FubmluZyB0aGUgd2hvbGUgdHJlZSB0byByZWJ1aWxkIGl0
LiBTZWVpbmcgYXMgb3VyIGR5bmFtaWMgbGlmZWN5Y2xlIGlzIG5vZGUgYnkgbm9kZSB0aGlz
IGRlZmluaXRlbHkgYWRkcyBzb21lIG92ZXJoZWFkLg0KDQotVHlyZWwNCg0KPiANCj4gY2hl
ZXJzDQo+IA0KDQo=

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-07-31 19:18         ` Frank Rowand
  2018-07-31 19:22           ` Frank Rowand
  2018-07-31 19:26           ` Michael Bringmann
@ 2018-08-01 14:26           ` Michael Ellerman
  2018-08-01 17:26             ` Michael Bringmann
  2 siblings, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2018-08-01 14:26 UTC (permalink / raw)
  To: Frank Rowand, Rob Herring; +Cc: mwb, linuxppc-dev, devicetree

Frank Rowand <frowand.list@gmail.com> writes:
> On 07/31/18 07:17, Rob Herring wrote:
>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>
>>> Hi Rob/Frank,
>>>
>>> I think we might have a problem with the phandle_cache not interacting
>>> well with of_detach_node():
>> 
>> Probably needs a similar fix as this commit did for overlays:
>> 
>> commit b9952b5218added5577e4a3443969bc20884cea9
>> Author: Frank Rowand <frank.rowand@sony.com>
>> Date:   Thu Jul 12 14:00:07 2018 -0700
>> 
>>     of: overlay: update phandle cache on overlay apply and remove
>> 
>>     A comment in the review of the patch adding the phandle cache said that
>>     the cache would have to be updated when modules are applied and removed.
>>     This patch implements the cache updates.
>> 
>>     Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>> of_find_node_by_phandle()")
>>     Reported-by: Alan Tull <atull@kernel.org>
>>     Suggested-by: Alan Tull <atull@kernel.org>
>>     Signed-off-by: Frank Rowand <frank.rowand@sony.com>
>>     Signed-off-by: Rob Herring <robh@kernel.org>
>
> Agreed.  Sorry about missing the of_detach_node() case.
>
>
>> Really what we need here is an "invalidate phandle" function rather
>> than free and re-allocate the whole damn cache.
>
> The big hammer approach was chosen to avoid the race conditions that
> would otherwise occur.  OF does not have a locking strategy that
> would be able to protect against the races.
>
> We could maybe implement a slightly smaller hammer by (1) disabling
> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
> the cache.  That is an off the cuff thought - I would have to look
> a little bit more carefully to make sure it would work.
>
> But I don't see a need to add the complexity of the smaller hammer
> or the bigger hammer of proper locking _unless_ we start seeing that
> the cache is being freed and re-allocated frequently.  For overlays
> I don't expect the high frequency because it happens on a per overlay
> removal basis (not per node removal basis).  For of_detach_node() the
> event _is_ on a per node removal basis.  Michael, do you expect node
> removals to be a frequent event with low latency being important?  If
> so, a rough guess on what the frequency would be?

No in practice we expect them to be fairly rare and in the thousands at
most I think.

TBH you could just disable the phandle_cache on the first detach and
that would be fine by me. I don't think we've ever noticed phandle
lookups being much of a problem for us on our hardware.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
  2018-07-31 19:50     ` Tyrel Datwyler
@ 2018-08-01 14:35       ` Michael Ellerman
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2018-08-01 14:35 UTC (permalink / raw)
  To: Tyrel Datwyler, Michael Bringmann, linuxppc-dev
  Cc: Nathan Fontenot, Thomas Falcon, John Allen

Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> On 07/30/2018 11:42 PM, Michael Ellerman wrote:
>> Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
>>> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>>>> During LPAR migration, the content of the device tree/sysfs may
>>>> be updated including deletion and replacement of nodes in the
>>>> tree.  When nodes are added to the internal node structures, they
>>>> are appended in FIFO order to a list of nodes maintained by the
>>>> OF code APIs.  When nodes are removed from the device tree, they
>>>> are marked OF_DETACHED, but not actually deleted from the system
>>>> to allow for pointers cached elsewhere in the kernel.  The order
>>>> and content of the entries in the list of nodes is not altered,
>>>> though.
>>>>
>>>> During LPAR migration some common nodes are deleted and re-added
>>>> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
>>>> node lists, the of_attach_node function checks to make sure that
>>>> the name + ibm,phandle of the to-be-added data is unique.  As the
>>>> previous copy of a re-added node is not modified beyond the addition
>>>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>>>> notice to the console, (3) renames the to-be-added node to avoid
>>>> filename collisions within a directory, and (3) adds entries to
>>>> the sysfs/kernfs.
>>>
>>> So, this patch actually just band aids over the real problem. This is
>>> a long standing problem with several PFO drivers leaking references.
>>> The issue here is that, during the device tree update that follows a
>>> migration. the update of the ibm,platform-facilities node and friends
>>> below are always deleted and re-added on the destination lpar and
>>> subsequently the leaked references prevent the devices nodes from
>>> every actually being properly cleaned up after detach. Thus, leading
>>> to the issue you are observing.
>
> So, it was the end of the day, and I kind of glossed over the issue
> Michael was trying to address with an issue that I remembered that had
> been around for awhile.

Sure, I know that feeling :)

>> Leaking references shouldn't affect the node being detached from the
>> tree though.
>
> No, but it does prevent it from being freed from sysfs which leads to
> the sysfs entry renaming that happens when another node with the same
> name is attached.

OK.

>> See of_detach_node() calling __of_detach_node(), none of that depends on
>> the refcount.
>> 
>> It's only the actual freeing of the node, in of_node_release() that is
>> prevented by leaked reference counts.
>
> Right, but if we did refcount correctly there is the new problem that
> the node is freed and now the phandle cache points at freed memory in
> the case were no invalidation is done.

Yeah that's bad.

I guess no one is supposed to lookup that phandle anymore, but it's
super fragile.

>> So I agree we need to do a better job with the reference counting, but I
>> don't see how it is causing the problem here
>
> Now in regards to the phandle caching somehow I missed that code going
> into OF earlier this year. That should have had at least some
> discussion from our side based on the fact that it is built on dtc
> compiler assumption that there are a set number of phandles that are
> allocated starting at 1..n such that they are monotonically
> increasing. That has a nice fixed size with O(1) lookup time. Phyp
> doesn't guarantee any sort of niceness with nicely ordered phandles.
> From what I've seen there are a subset of phandles that decrease from
> (-1) monotonically, and then there are a bunch of random values for
> cpus and IOAs. Thinking on it might not be that big of a deal as we
> just end up in the case where we have a phandle collision one makes it
> into the cache and is optimized while the other doesn't. On another
> note, they clearly hit a similar issue during overlay attach and
> remove, and as Rob pointed out their solution to handle it is a full
> cache invalidation followed by rescanning the whole tree to rebuild
> it. Seeing as our dynamic lifecycle is node by node this definitely
> adds some overhead.

Yeah I also should have noticed it going in, but despite being
subscribed to the devicetree list I didn't spot it in the flood.

AIUI the 1..n assumption is not about correctness but if our phandles
violate that we may not be getting any benefit.

Anyway yeah lots of things to look at tomorrow :)

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
  2018-08-01 14:26           ` Michael Ellerman
@ 2018-08-01 17:26             ` Michael Bringmann
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Bringmann @ 2018-08-01 17:26 UTC (permalink / raw)
  To: linuxppc-dev

On 08/01/2018 09:26 AM, Michael Ellerman wrote:
> Frank Rowand <frowand.list@gmail.com> writes:
>> On 07/31/18 07:17, Rob Herring wrote:
>>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>>
>>>> Hi Rob/Frank,
>>>>
>>>> I think we might have a problem with the phandle_cache not interacting
>>>> well with of_detach_node():
>>>
>>> Probably needs a similar fix as this commit did for overlays:
>>>
>>> commit b9952b5218added5577e4a3443969bc20884cea9
>>> Author: Frank Rowand <frank.rowand@sony.com>
>>> Date:   Thu Jul 12 14:00:07 2018 -0700
>>>
>>>     of: overlay: update phandle cache on overlay apply and remove
>>>
>>>     A comment in the review of the patch adding the phandle cache said that
>>>     the cache would have to be updated when modules are applied and removed.
>>>     This patch implements the cache updates.
>>>
>>>     Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>>> of_find_node_by_phandle()")
>>>     Reported-by: Alan Tull <atull@kernel.org>
>>>     Suggested-by: Alan Tull <atull@kernel.org>
>>>     Signed-off-by: Frank Rowand <frank.rowand@sony.com>
>>>     Signed-off-by: Rob Herring <robh@kernel.org>
>>
>> Agreed.  Sorry about missing the of_detach_node() case.
>>
>>
>>> Really what we need here is an "invalidate phandle" function rather
>>> than free and re-allocate the whole damn cache.
>>
>> The big hammer approach was chosen to avoid the race conditions that
>> would otherwise occur.  OF does not have a locking strategy that
>> would be able to protect against the races.
>>
>> We could maybe implement a slightly smaller hammer by (1) disabling
>> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
>> the cache.  That is an off the cuff thought - I would have to look
>> a little bit more carefully to make sure it would work.
>>
>> But I don't see a need to add the complexity of the smaller hammer
>> or the bigger hammer of proper locking _unless_ we start seeing that
>> the cache is being freed and re-allocated frequently.  For overlays
>> I don't expect the high frequency because it happens on a per overlay
>> removal basis (not per node removal basis).  For of_detach_node() the
>> event _is_ on a per node removal basis.  Michael, do you expect node
>> removals to be a frequent event with low latency being important?  If
>> so, a rough guess on what the frequency would be?
> 
> No in practice we expect them to be fairly rare and in the thousands at
> most I think.
> 
> TBH you could just disable the phandle_cache on the first detach and
> that would be fine by me. I don't think we've ever noticed phandle
> lookups being much of a problem for us on our hardware.

I ran a couple of migrations with the calls to

of_free_phandle_cache() ... of_populate_phandle_cache()

bracketing the body of

arch/powerpc/platforms/pseries/mobility.c:post_mobility_fixup()

with a patch like,

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e245a88..efc9442 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -22,6 +22,9 @@
 #include <asm/rtas.h>
 #include "pseries.h"

+extern int of_free_phandle_cache(void);
+extern void of_populate_phandle_cache(void);
+
 static struct kobject *mobility_kobj;

 struct update_props_workarea {
@@ -343,6 +346,8 @@ void post_mobility_fixup(void)
                rc = rtas_call(activate_fw_token, 0, 1, NULL);
        } while (rtas_busy_delay(rc));

+       of_free_phandle_cache();
+
        if (rc)
                printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);

@@ -354,6 +359,8 @@ void post_mobility_fixup(void)
        /* Possibly switch to a new RFI flush type */
        pseries_setup_rfi_flush();

+       of_populate_phandle_cache();
+
        return;
 }

and did not observe the warnings/crashes that have been
plaguing me.  We will need to move their prototypes out
of drivers/of/of_private.h to include/linux/of.h, though.

> 
> cheers
> 

Michael
 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-08-01 17:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann
2018-07-29 16:31 ` kbuild test robot
2018-07-30  6:31 ` Michael Ellerman
2018-07-30 21:02   ` Michael Bringmann
2018-07-31  6:34     ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman
2018-07-31 14:17       ` Rob Herring
2018-07-31 19:18         ` Frank Rowand
2018-07-31 19:22           ` Frank Rowand
2018-07-31 19:26           ` Michael Bringmann
2018-08-01 14:26           ` Michael Ellerman
2018-08-01 17:26             ` Michael Bringmann
2018-07-31 19:11       ` Michael Bringmann
2018-07-31  0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler
2018-07-31  1:46   ` Tyrel Datwyler
2018-07-31  6:42   ` Michael Ellerman
2018-07-31 19:50     ` Tyrel Datwyler
2018-08-01 14:35       ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).