From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751427AbdJEGdQ (ORCPT ); Thu, 5 Oct 2017 02:33:16 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:40397 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751324AbdJEGdO (ORCPT ); Thu, 5 Oct 2017 02:33:14 -0400 Subject: [linux-next][DLPAR CPU][Oops] Kernel crash with CPU hotunplug From: Abdul Haleem To: linuxppc-dev Cc: linux-next , linux-kernel , Michael Ellerman , Rob Herring , Tyrel Datwyler , sachinp Date: Thu, 05 Oct 2017 12:03:05 +0530 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17100506-8235-0000-0000-00000C5D9614 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007846; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000233; SDB=6.00926714; UDB=6.00466209; IPR=6.00706922; BA=6.00005620; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017400; XFM=3.00000015; UTC=2017-10-05 06:33:12 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17100506-8236-0000-0000-00003DE9EAF9 Message-Id: <1507185185.3792.12.camel@abdul.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-05_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710050091 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, linux-next kernel panic while DLPAR CPU add/remove operation in a loop. Test: CPU hot-unplug Machine Type: Power8 PowerVM LPAR kernel: 4.14.0-rc2-next-20170928 gcc : 5.2.1 trace logs ---------- cpu 10 (hwid 10) Ready to die... cpu 11 (hwid 11) Ready to die... cpu 12 (hwid 12) Ready to die... cpu 13 (hwid 13) Ready to die... cpu 14 (hwid 14) Ready to die... cpu 15 (hwid 15) Ready to die... Unable to handle kernel paging request for data at address 0xdead4ead00000030 Faulting instruction address: 0xc000000001af38e4 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp bridge stp llc xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter vmx_crypto pseries_rng rng_core binfmt_misc nfsd ip_tables x_tables autofs4 CPU: 7 PID: 10657 Comm: systemd-udevd Not tainted 4.14.0-rc2-next-20170928-autotest #1 task: c000000271b7cc00 task.stack: c00000026d504000 NIP: c000000001af38e4 LR: c000000001af3b48 CTR: c000000001af4270 REGS: c00000026d5079e0 TRAP: 0380 Not tainted (4.14.0-rc2-next-20170928-autotest) MSR: 8000000000009033 CR: 22008882 XER: 20000000 CFAR: c000000001af3b44 SOFTE: 1 GPR00: c000000001af3b48 c00000026d507c60 c000000003572500 c00000026c0d4a80 GPR04: c00000026c0d4a80 c00000026b56b310 c0000000037d2500 dead4ead00000030 GPR08: 00000000000016f0 fffffffffffffff0 dead4ead00000000 c000000270b24420 GPR12: c000000001af4270 c00000000fdc1f80 00000000000029a3 000000000aba9500 GPR16: 000001000e4134f0 000000000aba9500 000000000000000f 0000000000000001 GPR20: 0000000120ff68d8 0000000120ff68d0 0000000120ff6a48 0000000120ff33f0 GPR24: 0000000120ff6550 c00000026b56b310 c00000027286d9b8 c0000000037d4d88 GPR28: c0000002727b17a0 c00000026c0d4a80 c00000027286da38 c00000026c0d4a80 NIP [c000000001af38e4] free_pipe_info+0x64/0x200 LR [c000000001af3b48] put_pipe_info+0xc8/0x140 Call Trace: [c00000026d507c60] [c00000027286da38] 0xc00000027286da38 (unreliable) [c00000026d507ca0] [c000000001af3b48] put_pipe_info+0xc8/0x140 [c00000026d507ce0] [c000000001af43fc] pipe_release+0x18c/0x1e0 [c00000026d507d20] [c000000001ae0efc] __fput+0x12c/0x4f0 [c00000026d507d80] [c000000001ae12ec] ____fput+0x2c/0x50 [c00000026d507da0] [c00000000178eb3c] task_work_run+0x17c/0x200 [c00000026d507e00] [c00000000160adb8] do_notify_resume+0x1f8/0x220 [c00000026d507e30] [c0000000015ebec4] ret_from_except_lite+0x70/0x74 Instruction dump: 81230070 e94300b0 39080001 7d2900d0 38ea0030 f9066d98 7c0004ac 3d020026 e9086da0 3cc20026 39080001 f9066da0 <7d0038a8> 7d094214 7d0039ad 40c2fff4 ---[ end trace 4dcb6f2341ddb370 ]--- Kernel panic - not syncing: Fatal exception Rebooting in 10 seconds.. Test logs: ---------- DLPAR remove cpu operation Running 'drmgr -c cpu -d 5 -w 30 -r' ########## Oct 04 03:09:22 2017 ########## drmgr: -c cpu -d 5 -w 30 -r Validating CPU DLPAR capability...yes. Expecting 20 threads...found 16. Found cpu PowerPC,POWER8@8 Found cpu PowerPC,POWER8@0 Start CPU List. 10000008 : CPU 9 thread: 8: /sys/devices/system/cpu/cpu8 thread: 9: /sys/devices/system/cpu/cpu9 thread: 10: /sys/devices/system/cpu/cpu10 thread: 11: /sys/devices/system/cpu/cpu11 thread: 12: /sys/devices/system/cpu/cpu12 thread: 13: /sys/devices/system/cpu/cpu13 thread: 14: /sys/devices/system/cpu/cpu14 thread: 15: /sys/devices/system/cpu/cpu15 10000000 : CPU 1 thread: 0: /sys/devices/system/cpu/cpu0 thread: 1: /sys/devices/system/cpu/cpu1 thread: 2: /sys/devices/system/cpu/cpu2 thread: 3: /sys/devices/system/cpu/cpu3 thread: 4: /sys/devices/system/cpu/cpu4 thread: 5: /sys/devices/system/cpu/cpu5 thread: 6: /sys/devices/system/cpu/cpu6 thread: 7: /sys/devices/system/cpu/cpu7 Done. Number of CPUs = 2 Releasing cpu "/cpus/PowerPC,POWER8@8" Removed 1 of 1 requested cpu(s) ########## Oct 04 03:09:24 2017 ########## Command 'drmgr -c cpu -d 5 -w 30 -r' finished with 0 after 2.20577907562s [stdout] CPU 9 DLPAR add cpu operation Running 'drmgr -c cpu -d 5 -w 30 -a' ########## Oct 04 03:09:24 2017 ########## drmgr: -c cpu -d 5 -w 30 -a Validating CPU DLPAR capability...yes. Expecting 20 threads...found 16. Found cpu PowerPC,POWER8@0 Start CPU List. 10000008 : CPU 9 10000000 : CPU 1 thread: 0: /sys/devices/system/cpu/cpu0 thread: 1: /sys/devices/system/cpu/cpu1 thread: 2: /sys/devices/system/cpu/cpu2 thread: 3: /sys/devices/system/cpu/cpu3 thread: 4: /sys/devices/system/cpu/cpu4 thread: 5: /sys/devices/system/cpu/cpu5 thread: 6: /sys/devices/system/cpu/cpu6 thread: 7: /sys/devices/system/cpu/cpu7 Done. Probing cpu 0x10000008 Kernel panics after above operation. -- Regard's Abdul Haleem IBM Linux Technology Centre From mboxrd@z Thu Jan 1 00:00:00 1970 From: Abdul Haleem Subject: [DLPAR CPU][Oops] Kernel crash with CPU hotunplug Date: Thu, 05 Oct 2017 12:03:05 +0530 Message-ID: <1507185185.3792.12.camel@abdul.in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:40396 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751291AbdJEGdO (ORCPT ); Thu, 5 Oct 2017 02:33:14 -0400 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v956TWBM151311 for ; Thu, 5 Oct 2017 02:33:13 -0400 Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by mx0b-001b2d01.pphosted.com with ESMTP id 2ddf1phqy6-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 05 Oct 2017 02:33:13 -0400 Received: from localhost by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Oct 2017 00:33:12 -0600 Sender: linux-next-owner@vger.kernel.org List-ID: To: linuxppc-dev Cc: linux-next , linux-kernel , Michael Ellerman , Rob Herring , Tyrel Datwyler , sachinp Hi, linux-next kernel panic while DLPAR CPU add/remove operation in a loop. Test: CPU hot-unplug Machine Type: Power8 PowerVM LPAR kernel: 4.14.0-rc2-next-20170928 gcc : 5.2.1 trace logs ---------- cpu 10 (hwid 10) Ready to die... cpu 11 (hwid 11) Ready to die... cpu 12 (hwid 12) Ready to die... cpu 13 (hwid 13) Ready to die... cpu 14 (hwid 14) Ready to die... cpu 15 (hwid 15) Ready to die... Unable to handle kernel paging request for data at address 0xdead4ead00000030 Faulting instruction address: 0xc000000001af38e4 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp bridge stp llc xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter vmx_crypto pseries_rng rng_core binfmt_misc nfsd ip_tables x_tables autofs4 CPU: 7 PID: 10657 Comm: systemd-udevd Not tainted 4.14.0-rc2-next-20170928-autotest #1 task: c000000271b7cc00 task.stack: c00000026d504000 NIP: c000000001af38e4 LR: c000000001af3b48 CTR: c000000001af4270 REGS: c00000026d5079e0 TRAP: 0380 Not tainted (4.14.0-rc2-next-20170928-autotest) MSR: 8000000000009033 CR: 22008882 XER: 20000000 CFAR: c000000001af3b44 SOFTE: 1 GPR00: c000000001af3b48 c00000026d507c60 c000000003572500 c00000026c0d4a80 GPR04: c00000026c0d4a80 c00000026b56b310 c0000000037d2500 dead4ead00000030 GPR08: 00000000000016f0 fffffffffffffff0 dead4ead00000000 c000000270b24420 GPR12: c000000001af4270 c00000000fdc1f80 00000000000029a3 000000000aba9500 GPR16: 000001000e4134f0 000000000aba9500 000000000000000f 0000000000000001 GPR20: 0000000120ff68d8 0000000120ff68d0 0000000120ff6a48 0000000120ff33f0 GPR24: 0000000120ff6550 c00000026b56b310 c00000027286d9b8 c0000000037d4d88 GPR28: c0000002727b17a0 c00000026c0d4a80 c00000027286da38 c00000026c0d4a80 NIP [c000000001af38e4] free_pipe_info+0x64/0x200 LR [c000000001af3b48] put_pipe_info+0xc8/0x140 Call Trace: [c00000026d507c60] [c00000027286da38] 0xc00000027286da38 (unreliable) [c00000026d507ca0] [c000000001af3b48] put_pipe_info+0xc8/0x140 [c00000026d507ce0] [c000000001af43fc] pipe_release+0x18c/0x1e0 [c00000026d507d20] [c000000001ae0efc] __fput+0x12c/0x4f0 [c00000026d507d80] [c000000001ae12ec] ____fput+0x2c/0x50 [c00000026d507da0] [c00000000178eb3c] task_work_run+0x17c/0x200 [c00000026d507e00] [c00000000160adb8] do_notify_resume+0x1f8/0x220 [c00000026d507e30] [c0000000015ebec4] ret_from_except_lite+0x70/0x74 Instruction dump: 81230070 e94300b0 39080001 7d2900d0 38ea0030 f9066d98 7c0004ac 3d020026 e9086da0 3cc20026 39080001 f9066da0 <7d0038a8> 7d094214 7d0039ad 40c2fff4 ---[ end trace 4dcb6f2341ddb370 ]--- Kernel panic - not syncing: Fatal exception Rebooting in 10 seconds.. Test logs: ---------- DLPAR remove cpu operation Running 'drmgr -c cpu -d 5 -w 30 -r' ########## Oct 04 03:09:22 2017 ########## drmgr: -c cpu -d 5 -w 30 -r Validating CPU DLPAR capability...yes. Expecting 20 threads...found 16. Found cpu PowerPC,POWER8@8 Found cpu PowerPC,POWER8@0 Start CPU List. 10000008 : CPU 9 thread: 8: /sys/devices/system/cpu/cpu8 thread: 9: /sys/devices/system/cpu/cpu9 thread: 10: /sys/devices/system/cpu/cpu10 thread: 11: /sys/devices/system/cpu/cpu11 thread: 12: /sys/devices/system/cpu/cpu12 thread: 13: /sys/devices/system/cpu/cpu13 thread: 14: /sys/devices/system/cpu/cpu14 thread: 15: /sys/devices/system/cpu/cpu15 10000000 : CPU 1 thread: 0: /sys/devices/system/cpu/cpu0 thread: 1: /sys/devices/system/cpu/cpu1 thread: 2: /sys/devices/system/cpu/cpu2 thread: 3: /sys/devices/system/cpu/cpu3 thread: 4: /sys/devices/system/cpu/cpu4 thread: 5: /sys/devices/system/cpu/cpu5 thread: 6: /sys/devices/system/cpu/cpu6 thread: 7: /sys/devices/system/cpu/cpu7 Done. Number of CPUs = 2 Releasing cpu "/cpus/PowerPC,POWER8@8" Removed 1 of 1 requested cpu(s) ########## Oct 04 03:09:24 2017 ########## Command 'drmgr -c cpu -d 5 -w 30 -r' finished with 0 after 2.20577907562s [stdout] CPU 9 DLPAR add cpu operation Running 'drmgr -c cpu -d 5 -w 30 -a' ########## Oct 04 03:09:24 2017 ########## drmgr: -c cpu -d 5 -w 30 -a Validating CPU DLPAR capability...yes. Expecting 20 threads...found 16. Found cpu PowerPC,POWER8@0 Start CPU List. 10000008 : CPU 9 10000000 : CPU 1 thread: 0: /sys/devices/system/cpu/cpu0 thread: 1: /sys/devices/system/cpu/cpu1 thread: 2: /sys/devices/system/cpu/cpu2 thread: 3: /sys/devices/system/cpu/cpu3 thread: 4: /sys/devices/system/cpu/cpu4 thread: 5: /sys/devices/system/cpu/cpu5 thread: 6: /sys/devices/system/cpu/cpu6 thread: 7: /sys/devices/system/cpu/cpu7 Done. Probing cpu 0x10000008 Kernel panics after above operation. -- Regard's Abdul Haleem IBM Linux Technology Centre