From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9471C07E85 for ; Tue, 11 Dec 2018 15:27:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7179920849 for ; Tue, 11 Dec 2018 15:27:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7179920849 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726927AbeLKP1L (ORCPT ); Tue, 11 Dec 2018 10:27:11 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:44548 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726366AbeLKP1K (ORCPT ); Tue, 11 Dec 2018 10:27:10 -0500 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBBFPjqA014121 for ; Tue, 11 Dec 2018 10:27:07 -0500 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by mx0b-001b2d01.pphosted.com with ESMTP id 2pafv8g2cv-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 11 Dec 2018 10:27:06 -0500 Received: from localhost by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 11 Dec 2018 15:27:04 -0000 Received: from b03cxnp07029.gho.boulder.ibm.com (9.17.130.16) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 11 Dec 2018 15:27:02 -0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBBFR1j315269940 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 11 Dec 2018 15:27:01 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1D1896E04E; Tue, 11 Dec 2018 15:27:01 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2B8BD6E050; Tue, 11 Dec 2018 15:27:00 +0000 (GMT) Received: from oc5000245537.ibm.com (unknown [9.53.179.223]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Tue, 11 Dec 2018 15:26:59 +0000 (GMT) Subject: Re: [PATCH v03] powerpc/mobility: Fix node detach/rename problem To: Michael Ellerman , linuxppc-dev@lists.ozlabs.org Cc: devicetree@vger.kernel.org, Thomas Falcon , linux-kernel@vger.kernel.org, robh+dt@kernel.org, Juliet Kim , Tyrel Datwyler , frowand.list@gmail.com References: <871s6oxkdf.fsf@concordia.ellerman.id.au> From: Michael Bringmann Openpgp: preference=signencrypt Autocrypt: addr=mwb@linux.vnet.ibm.com; prefer-encrypt=mutual; keydata= xsBNBFcY7GcBCADzw3en+yzo9ASFGCfldVkIg95SAMPK0myXp2XJYET3zT45uBsX/uj9/2nA lBmXXeOSXnPfJ9V3vtiwcfATnWIsVt3tL6n1kqikzH9nXNxZT7MU/7gqzWZngMAWh/GJ9qyg DTOZdjsvdUNUWxtiLvBo7y+reA4HjlQhwhYxxvCpXBeRoF0qDWfQ8DkneemqINzDZPwSQ7zY t4F5iyN1I9GC5RNK8Y6jiKmm6bDkrrbtXPOtzXKs0J0FqWEIab/u3BDrRP3STDVPdXqViHua AjEzthQbGZm0VCxI4a7XjMi99g614/qDcXZCs00GLZ/VYIE8hB9C5Q+l66S60PLjRrxnABEB AAHNLU1pY2hhZWwgVy4gQnJpbmdtYW5uIDxtd2JAbGludXgudm5ldC5pYm0uY29tPsLAeAQT AQIAIgUCVxjsZwIbAwYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQSEdag3dpuTI0NAf8 CKYTDKQLgOSjVrU2L5rM4lXaJRmQV6oidD3vIhKSnWRvPq9C29ifRG6ri20prTHAlc0vycgm 41HHg0y2vsGgNXGTWC2ObemoZBI7mySXe/7Tq5mD/semGzOp0YWZ7teqrkiSR8Bw0p+LdE7K QmT7tpjjvuhrtQ3RRojUYcuy1nWUsc4D+2cxsnZslsx84FUKxPbLagDgZmgBhUw/sUi40s6S AkdViVCVS0WANddLIpG0cfdsV0kCae/XdjK3mRK6drFKv1z+QFjvOhc8QIkkxFD0da9w3tJj oqnqHFV5gLcHO6/wizPx/NV90y6RngeBORkQiRFWxTXS4Oj9GVI/Us7ATQRXGOxnAQgAmJ5Y ikTWrMWPfiveUacETyEhWVl7u8UhZcx3yy2te8O0ay7t9fYcZgIEfQPPVVus89acIXlG3wYL DDPvb21OprLxi+ZJ2a0S5we+LcSWN1jByxJlbWBq+/LcMtGAOhNLpysY1gD0Y4UW/eKS+TFZ 562qKC3k1dBvnV9JXCgeS1taYFxRdVAn+2DwK3nuyG/DDq/XgJ5BtmyC3MMx8CiW3POj+O+l 6SedIeAfZlZ7/xhijx82g93h07VavUQRwMZgZFsqmuxBxVGiav2HB+dNvs3PFB087Pvc9OHe qhajPWOP/gNLMmvBvknn1NToM9a8/E8rzcIZXoYs4RggRRYh6wARAQABwsBfBBgBAgAJBQJX GOxnAhsMAAoJEEhHWoN3abky+RUH/jE08/r5QzaNKYeVhu0uVbgXu5fsxqr2cAxhf+KuwT3T efhEP2alarxzUZdEh4MsG6c+X2NYLbD3cryiXxVx/7kSAJEFQJfA5P06g8NLR25Qpq9BLsN7 ++dxQ+CLKzSEb1X24hYAJZpOhS8ev3ii+M/XIo+olDBKuTaTgB6elrg3CaxUsVgLBJ+jbRkW yQe2S5f/Ja1ThDpSSLLWLiLK/z7+gaqwhnwjQ8Z8Y9D2itJQcj4itHilwImsqwLG7SxzC0NX IQ5KaAFYdRcOgwR8VhhkOIVd70ObSZU+E4pTET1WDz4o65xZ89yfose1No0+r5ht/xWOOrh8 53/hcWvxHVs= Organization: IBM Linux Technology Center Date: Tue, 11 Dec 2018 09:26:59 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <871s6oxkdf.fsf@concordia.ellerman.id.au> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18121115-0036-0000-0000-00000A68EA62 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010213; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01130256; UDB=6.00587299; IPR=6.00910389; MB=3.00024654; MTD=3.00000008; XFM=3.00000015; UTC=2018-12-11 15:27:03 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18121115-0037-0000-0000-000049F01902 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-12-11_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812110138 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/11/2018 07:29 AM, Michael Ellerman wrote: > Hi Michael, > > Please Cc the device tree folks on device tree patches, and also the > original author of the patch that added the code you're modifying. > > So I've added: > robh+dt@kernel.org > frowand.list@gmail.com > devicetree@vger.kernel.org > linux-kernel@vger.kernel.org Thanks. > > Michael Bringmann writes: >> The PPC mobility code receives RTAS requests to delete nodes with >> platform-/hardware-specific attributes when restarting the kernel >> after a migration. My example is for migration between a P8 Alpine >> and a P8 Brazos. Nodes to be deleted include 'ibm,random-v1', >> 'ibm,platform-facilities', 'ibm,sym-encryption-v1', and, >> 'ibm,compression-v1'. >> >> The mobility.c code calls 'of_detach_node' for the nodes and their >> children. This makes calls to detach the properties and to remove >> the associated sysfs/kernfs files. >> >> Then new copies of the same nodes are next provided by the PHYP, >> local copies are built, and a pointer to the 'struct device_node' >> is passed to of_attach_node. Before the call to of_attach_node, >> the phandle is initialized to 0 when the data structure is alloced. >> During the call to of_attach_node, it calls __of_attach_node which >> pulls the actual name and phandle from just created sub-properties >> named something like 'name' and 'ibm,phandle'. >> >> This is all fine for the first migration. The problem occurs with >> the second and subsequent migrations when the PHYP on the new system >> wants to replace the same set of nodes again, referenced with the >> same names and phandle values. >> >> On the second and subsequent migrations, the PHYP tells the system >> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >> nodes by its known set of phandle values -- the same handles used >> by the PHYP on the source system are known on the target system. >> The mobility.c code calls of_find_node_by_phandle() with these values >> and ends up locating the first instance of each node that was added >> during the original boot, instead of the second instance of each node >> created after the first migration. The detach during the second >> migration fails with errors like, >> >> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 >> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 >> [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 >> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) >> [ 4565.030745] MSR: 800000010282b033 CR: 22288822 XER: 0000000a >> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 >> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 >> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff >> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 >> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 >> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff >> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 >> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 >> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 >> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 >> [ 4565.030811] Call Trace: >> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) >> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 >> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 >> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 >> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 >> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 >> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 >> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 >> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 >> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 >> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 >> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 >> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 >> [ 4565.030884] Instruction dump: >> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 >> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 >> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- >> >> The mobility.c code continues on during the second migration, accepts >> the definitions of the new nodes from the PHYP and ends up renaming >> the new properties e.g. >> >> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" >> >> There is no check like 'of_node_check_flag(np, OF_DETACHED)' within >> of_find_node_by_phandle to skip nodes that are detached, but still >> present due to caching or use count considerations. Also, note that >> of_find_node_by_phandle also uses a 'phandle_cache' which does not >> appear to be updated when of_detach_node() is invoked. > > This seems like the real bug. Since the phandle cache was added we can > now find detached nodes when we shouldn't be able to. > > Does the patch below work? > > cheers > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 09692c9b32a7..d8e4534c0686 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -1190,6 +1190,10 @@ struct device_node *of_find_node_by_phandle(phandle handle) > if (phandle_cache[masked_handle] && > handle == phandle_cache[masked_handle]->phandle) > np = phandle_cache[masked_handle]; > + > + /* If we find a detached node, remove it */ > + if (of_node_check_flag(np, OF_DETACHED)) > + np = phandle_cache[masked_handle] = NULL; > } > > if (!np) { > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com