From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D92E3C43461 for ; Tue, 6 Apr 2021 14:36:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A7F71613AE for ; Tue, 6 Apr 2021 14:36:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345227AbhDFOgv (ORCPT ); Tue, 6 Apr 2021 10:36:51 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:4208 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345216AbhDFOgu (ORCPT ); Tue, 6 Apr 2021 10:36:50 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 136EY13U141801; Tue, 6 Apr 2021 10:36:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=vcwyT1vP5oGIhr0ABs++F+zbHHYveP26Kmi/Bg16X6w=; b=aU7WTp0HGl1WunD5vHqJiO32IlMAKW0Nrh2iwvX4TB3L3GWoUjGhY8aehC0AV9mWm4CY KYOxvinWqR4fLHySdhdLUviXN3jNFrsZjB2/xcoMcB9PwQ577OJ4XTTMbOtb/gXivvl9 enosmJU6sc/fGYAd0n0h6Ehh4ahilaVsWSypjQCfPhJh/IEoz5j5KCuRjfoakrBMXE+D 0b5X+dbGkJWBQgYXkpUJu8lOSLjJg3fPUpFyQKTBm1+EcgT5VldJFYHS0HbVY22JuuDc U/kretBiVZKnXamFCNIBTbmqcggViZFqcGN4na4+bmqe7aBtLEkVaXNAkiFnuK2VuFof rQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 37q5c04uvc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Apr 2021 10:36:40 -0400 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 136EY123141796; Tue, 6 Apr 2021 10:36:40 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 37q5c04uup-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Apr 2021 10:36:40 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 136ERQXY010099; Tue, 6 Apr 2021 14:36:38 GMT Received: from b03cxnp07027.gho.boulder.ibm.com (b03cxnp07027.gho.boulder.ibm.com [9.17.130.14]) by ppma03wdc.us.ibm.com with ESMTP id 37qbgydx54-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Apr 2021 14:36:38 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp07027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 136EaZOZ29032776 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 6 Apr 2021 14:36:35 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 804C16E053; Tue, 6 Apr 2021 14:36:35 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A70596E04E; Tue, 6 Apr 2021 14:36:33 +0000 (GMT) Received: from cpe-172-100-182-241.stny.res.rr.com (unknown [9.85.175.110]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Tue, 6 Apr 2021 14:36:33 +0000 (GMT) Subject: Re: [PATCH v14 00/13] s390/vfio-ap: dynamic configuration support To: Halil Pasic Cc: linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jjherne@linux.ibm.com, freude@linux.ibm.com, borntraeger@de.ibm.com, cohuck@redhat.com, mjrosato@linux.ibm.com, alex.williamson@redhat.com, kwankhede@nvidia.com, fiuczy@linux.ibm.com, frankja@linux.ibm.com, david@redhat.com, hca@linux.ibm.com, gor@linux.ibm.com References: <20210331152256.28129-1-akrowiak@linux.ibm.com> <20210401211742.6afd6b14.pasic@linux.ibm.com> From: Tony Krowiak Message-ID: Date: Tue, 6 Apr 2021 10:36:32 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <20210401211742.6afd6b14.pasic@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: trN1l5pQywNV81xhSAB6Gnp3KonUyYMr X-Proofpoint-ORIG-GUID: B1xsN-30PaYTXSYWV1JLTeqG2JX-TxZ9 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-04-06_03:2021-04-01,2021-04-06 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 mlxlogscore=999 priorityscore=1501 mlxscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 phishscore=0 bulkscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104030000 definitions=main-2104060102 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Given what I finally was able to figure out, it is interesting to note that this failure only occurred when building the kernel with the debug_defconfig configuration. The problem occurs when the vfio_ap_mdev_remove_queue() callback is called subsequent to the mdev being removed via the vfio_ap_mdev_remove() callback. The failure results because the vfio_ap_queue object representing the queue device being removed still has a link to the mdev to which the queue is assigned. The fix is to remove the link to the mdev from all vfio_ap_queue objects when the mdev is removed. I will provide a new set of patches with the fix included. On 4/1/21 3:17 PM, Halil Pasic wrote: > On Wed, 31 Mar 2021 11:22:43 -0400 > Tony Krowiak wrote: > >> Change log v13-v14: >> ------------------ > When testing I've experienced this kernel panic. > > > [ 4422.479706] vfio_ap matrix: MDEV: Registered > [ 4422.516999] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Adding to iommu group 1 > [ 4422.517037] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: group_id = 1 > [ 4577.906708] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Removing from iommu group 1 > [ 4577.906917] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: detaching iommu > [ 4577.908093] Unable to handle kernel pointer dereference in virtual kernel address space > [ 4577.908097] Failing address: 00000006ec02f000 TEID: 00000006ec02f403 > [ 4577.908100] Fault in home space mode while using kernel ASCE. > [ 4577.908106] AS:000000035eb4c007 R3:0000000000000024 > [ 4577.908126] Oops: 003b ilc:3 [#1] PREEMPT SMP > [ 4577.908132] Modules linked in: vfio_ap vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_R > EJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf > _reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc s390_trng eadm_s > ch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel configfs ip_tables x_tables dm_service_time ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 > sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror d > m_region_hash dm_log dm_mod rng_core autofs4 > [ 4577.908181] CPU: 0 PID: 14315 Comm: nose2 Not tainted 5.12.0-rc5-00030-g4cd110385fa2 #55 > [ 4577.908183] Hardware name: IBM 8561 T01 701 (LPAR) > [ 4577.908185] Krnl PSW : 0404e00180000000 000000035d2a50f4 (__lock_acquire+0xdc/0x7c8) > [ 4577.908194] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 > [ 4577.908232] Krnl GPRS: 000000039d168d46 00000006ec02f538 000000035e7de940 0000000000000000 > [ 4577.908235] 0000000000000000 0000000000000000 0000000000000001 00000000f9e04150 > [ 4577.908237] 000000035fa8b100 006b6b6b680c417f 00000000f9e04150 000000035e61e8d0 > [ 4577.908239] 000000035fa8b100 0000000000000000 0000038010c4b7d8 0000038010c4b738 > [ 4577.908247] Krnl Code: 000000035d2a50e4: eb110003000d sllg %r1,%r1,3 > [ 4577.908247] 000000035d2a50ea: b9080012 agr %r1,%r2 > [ 4577.908247] #000000035d2a50ee: e31003b80008 ag %r1,952 > [ 4577.908247] >000000035d2a50f4: eb011000007a agsi 0(%r1),1 > [ 4577.908247] 000000035d2a50fa: a718ffff lhi %r1,-1 > [ 4577.908247] 000000035d2a50fe: eb1103a800f8 laa %r1,%r1,936 > [ 4577.908247] 000000035d2a5104: ec18026b017e cij %r1,1,8,000000035d2a55da > [ 4577.908247] 000000035d2a510a: c4180086d01f lgrl %r1,000000035e37f148 > [ 4577.908262] Call Trace: > [ 4577.908264] [<000000035d2a50f4>] __lock_acquire+0xdc/0x7c8 > [ 4577.908267] [<000000035d2a41ac>] lock_acquire.part.0+0xec/0x1e8 > [ 4577.908270] [<000000035d2a4360>] lock_acquire+0xb8/0x208 > [ 4577.908272] [<000000035de6fa2a>] _raw_spin_lock_irqsave+0x6a/0xd8 > [ 4577.908279] [<000000035d2874fe>] prepare_to_wait_event+0x2e/0x1e0 > [ 4577.908281] [<000003ff805d539a>] vfio_ap_mdev_remove_queue+0x122/0x148 [vfio_ap] > [ 4577.908287] [<000000035de20e94>] ap_device_remove+0x4c/0xf0 > [ 4577.908292] [<000000035db268a2>] __device_release_driver+0x18a/0x230 > [ 4577.908298] [<000000035db27cf0>] device_driver_detach+0x58/0xd0 > [ 4577.908301] [<000000035db25000>] device_reprobe+0x30/0xc0 > [ 4577.908304] [<000000035de22570>] __ap_revise_reserved+0x110/0x148 > [ 4577.908307] [<000000035db2408c>] bus_for_each_dev+0x7c/0xb8 > [ 4577.908310] [<000000035de2290c>] apmask_store+0xd4/0x118 > [ 4577.908313] [<000000035d639316>] kernfs_fop_write_iter+0x13e/0x1e0 > [ 4577.908317] [<000000035d542d22>] new_sync_write+0x10a/0x198 > [ 4577.908321] [<000000035d5433ee>] vfs_write.part.0+0x196/0x290 > [ 4577.908323] [<000000035d545f44>] ksys_write+0x6c/0xf8 > [ 4577.908326] [<000000035d1ce7ae>] do_syscall+0x7e/0xd0 > [ 4577.908330] [<000000035de5fc00>] __do_syscall+0xc0/0xd8 > [ 4577.908334] [<000000035de70c22>] system_call+0x72/0x98 > [ 4577.908337] INFO: lockdep is turned off. > [ 4577.908338] Last Breaking-Event-Address: > [ 4577.908340] [<0000038010c4b648>] 0x38010c4b648 > [ 4577.908345] Kernel panic - not syncing: Fatal exception: panic_on_oops