From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH] kvm: deassign irqs in reset path Date: Fri, 30 Mar 2012 22:18:31 +0200 Message-ID: <4F761517.6010105@web.de> References: <201203301918.q2UJI63c005908@int-mx02.intmail.prod.int.phx2.redhat.com> <4F760993.304@web.de> <20120330201313.GB2376@redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigFAB02F4C18E041D6E21DF9C3" Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, mst@redhat.com, alex.williamson@redhat.com To: Jason Baron Return-path: Received: from fmmailgate07.web.de ([217.72.192.248]:35406 "EHLO fmmailgate07.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934311Ab2C3USe (ORCPT ); Fri, 30 Mar 2012 16:18:34 -0400 Received: from moweb002.kundenserver.de (moweb002.kundenserver.de [172.19.20.108]) by fmmailgate07.web.de (Postfix) with ESMTP id 8745CFE112E for ; Fri, 30 Mar 2012 22:18:32 +0200 (CEST) In-Reply-To: <20120330201313.GB2376@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigFAB02F4C18E041D6E21DF9C3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 2012-03-30 22:13, Jason Baron wrote: > On Fri, Mar 30, 2012 at 09:29:23PM +0200, Jan Kiszka wrote: >> On 2012-03-30 21:18, Jason Baron wrote: >>> We've hit a kernel host panic, when issuing a 'system_reset' with a 8= 2576 nic >>> assigned and a Windows guest. Host system is a PowerEdge R815. >>> >>> [Hardware Error]: Hardware error from APEI Generic Hardware Error Sou= rce: 32993 >>> [Hardware Error]: APEI generic hardware error status >>> [Hardware Error]: severity: 1, fatal >>> [Hardware Error]: section: 0, severity: 1, fatal >>> [Hardware Error]: flags: 0x01 >>> [Hardware Error]: primary >>> [Hardware Error]: section_type: PCIe error >>> [Hardware Error]: port_type: 0, PCIe end point >>> [Hardware Error]: version: 1.0 >>> [Hardware Error]: command: 0x0000, status: 0x0010 >>> [Hardware Error]: device_id: 0000:08:00.0 >>> [Hardware Error]: slot: 1 >>> [Hardware Error]: secondary_bus: 0x00 >>> [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 >>> [Hardware Error]: class_code: 000002 >>> [Hardware Error]: aer_status: 0x00100000, aer_mask: 0x00018000 >>> [Hardware Error]: Unsupported Request >>> [Hardware Error]: aer_layer=3DTransaction Layer, aer_agent=3DRequeste= r ID >>> [Hardware Error]: aer_uncor_severity: 0x00067011 >>> [Hardware Error]: aer_tlp_header: 40001001 0020000f edbf800c 01000000= >>> [Hardware Error]: section: 1, severity: 1, fatal >>> [Hardware Error]: flags: 0x01 >>> [Hardware Error]: primary >>> [Hardware Error]: section_type: PCIe error >>> [Hardware Error]: port_type: 0, PCIe end point >>> [Hardware Error]: version: 1.0 >>> [Hardware Error]: command: 0x0000, status: 0x0010 >>> [Hardware Error]: device_id: 0000:08:00.0 >>> [Hardware Error]: slot: 1 >>> [Hardware Error]: secondary_bus: 0x00 >>> [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 >>> [Hardware Error]: class_code: 000002 >>> [Hardware Error]: aer_status: 0x00100000, aer_mask: 0x00018000 >>> [Hardware Error]: Unsupported Request >>> [Hardware Error]: aer_layer=3DTransaction Layer, aer_agent=3DRequeste= r ID >>> [Hardware Error]: aer_uncor_severity: 0x00067011 >>> [Hardware Error]: aer_tlp_header: 40001001 0020000f edbf800c 01000000= >>> Kernel panic - not syncing: Fatal hardware error! >>> Pid: 0, comm: swapper Not tainted 2.6.32-242.el6.x86_64 #1 >>> Call Trace: >>> [] ? panic+0xa0/0x168 >>> [] ? ghes_notify_nmi+0x17c/0x180 >>> [] ? notifier_call_chain+0x55/0x80 >>> [] ? atomic_notifier_call_chain+0x1a/0x20 >>> [] ? notify_die+0x2e/0x30 >>> [] ? do_nmi+0x1a1/0x2b0 >>> [] ? nmi+0x20/0x30 >>> [] ? native_safe_halt+0xb/0x10 >>> <> [] ? default_idle+0x4d/0xb0 >>> [] ? cpu_idle+0xb6/0x110 >>> [] ? rest_init+0x7a/0x80 >>> [] ? start_kernel+0x424/0x430 >>> [] ? x86_64_start_reservations+0x125/0x129 >>> [] ? x86_64_start_kernel+0xfa/0x109 >>> >>> The root cause of the problem is that the 'reset_assigned_device()' c= ode >>> first writes a 0 to the command register. Then, when qemu subsequentl= y does >>> a kvm_deassign_irq() (called by assign_irq(), in the system_reset pat= h), >>> the kernel ends up calling '__msix_mask_irq()', which performs a writ= e to >>> the memory mapped msi vector space. Since, we've explicitly told the = device >>> to disallow mmio access (via the 0 write to the command register), we= end >>> up with the above 'Unsupported Request'. >>> >>> The fix here is to first call kvm_deassign_irq(), before doing the re= set, >> >> s/fix/workaround/. This is a kernel bug if userspace can crash the >> system like this, no? Let's fix the kernel first and then look at what= >> needs to be changed here. >> >> Jan >> >=20 > But don't I need special privalege to run the device assignment bits? Yes, but even that might be moderated by a management component like libvirt. > For example, this crash is precipitated by a write of '0' to the pci > device config register from userspace. Surely, not every is allowed to > do that write. So it seems to me, that this patch is in keeping with th= e > current model of how things work. No user should needlessly be able to crash the host by issuing valid commands in a special order. Jan --------------enigFAB02F4C18E041D6E21DF9C3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk92FRcACgkQitSsb3rl5xT1MQCeISyfWEB57A9o3OBIT3Wendu/ iNQAnArR6fHUhmNhYpcY545K/2EUT0Bq =8Pvh -----END PGP SIGNATURE----- --------------enigFAB02F4C18E041D6E21DF9C3-- From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41924) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SDiHM-00030p-Ns for qemu-devel@nongnu.org; Fri, 30 Mar 2012 16:18:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SDiHK-0003v6-Dm for qemu-devel@nongnu.org; Fri, 30 Mar 2012 16:18:36 -0400 Received: from fmmailgate06.web.de ([217.72.192.247]:62248) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SDiHK-0003v0-4d for qemu-devel@nongnu.org; Fri, 30 Mar 2012 16:18:34 -0400 Received: from moweb002.kundenserver.de (moweb002.kundenserver.de [172.19.20.108]) by fmmailgate06.web.de (Postfix) with ESMTP id 8B9D4106D9B1 for ; Fri, 30 Mar 2012 22:18:32 +0200 (CEST) Message-ID: <4F761517.6010105@web.de> Date: Fri, 30 Mar 2012 22:18:31 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <201203301918.q2UJI63c005908@int-mx02.intmail.prod.int.phx2.redhat.com> <4F760993.304@web.de> <20120330201313.GB2376@redhat.com> In-Reply-To: <20120330201313.GB2376@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigFAB02F4C18E041D6E21DF9C3" Subject: Re: [Qemu-devel] [PATCH] kvm: deassign irqs in reset path List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jason Baron Cc: alex.williamson@redhat.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, mst@redhat.com This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigFAB02F4C18E041D6E21DF9C3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 2012-03-30 22:13, Jason Baron wrote: > On Fri, Mar 30, 2012 at 09:29:23PM +0200, Jan Kiszka wrote: >> On 2012-03-30 21:18, Jason Baron wrote: >>> We've hit a kernel host panic, when issuing a 'system_reset' with a 8= 2576 nic >>> assigned and a Windows guest. Host system is a PowerEdge R815. >>> >>> [Hardware Error]: Hardware error from APEI Generic Hardware Error Sou= rce: 32993 >>> [Hardware Error]: APEI generic hardware error status >>> [Hardware Error]: severity: 1, fatal >>> [Hardware Error]: section: 0, severity: 1, fatal >>> [Hardware Error]: flags: 0x01 >>> [Hardware Error]: primary >>> [Hardware Error]: section_type: PCIe error >>> [Hardware Error]: port_type: 0, PCIe end point >>> [Hardware Error]: version: 1.0 >>> [Hardware Error]: command: 0x0000, status: 0x0010 >>> [Hardware Error]: device_id: 0000:08:00.0 >>> [Hardware Error]: slot: 1 >>> [Hardware Error]: secondary_bus: 0x00 >>> [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 >>> [Hardware Error]: class_code: 000002 >>> [Hardware Error]: aer_status: 0x00100000, aer_mask: 0x00018000 >>> [Hardware Error]: Unsupported Request >>> [Hardware Error]: aer_layer=3DTransaction Layer, aer_agent=3DRequeste= r ID >>> [Hardware Error]: aer_uncor_severity: 0x00067011 >>> [Hardware Error]: aer_tlp_header: 40001001 0020000f edbf800c 01000000= >>> [Hardware Error]: section: 1, severity: 1, fatal >>> [Hardware Error]: flags: 0x01 >>> [Hardware Error]: primary >>> [Hardware Error]: section_type: PCIe error >>> [Hardware Error]: port_type: 0, PCIe end point >>> [Hardware Error]: version: 1.0 >>> [Hardware Error]: command: 0x0000, status: 0x0010 >>> [Hardware Error]: device_id: 0000:08:00.0 >>> [Hardware Error]: slot: 1 >>> [Hardware Error]: secondary_bus: 0x00 >>> [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 >>> [Hardware Error]: class_code: 000002 >>> [Hardware Error]: aer_status: 0x00100000, aer_mask: 0x00018000 >>> [Hardware Error]: Unsupported Request >>> [Hardware Error]: aer_layer=3DTransaction Layer, aer_agent=3DRequeste= r ID >>> [Hardware Error]: aer_uncor_severity: 0x00067011 >>> [Hardware Error]: aer_tlp_header: 40001001 0020000f edbf800c 01000000= >>> Kernel panic - not syncing: Fatal hardware error! >>> Pid: 0, comm: swapper Not tainted 2.6.32-242.el6.x86_64 #1 >>> Call Trace: >>> [] ? panic+0xa0/0x168 >>> [] ? ghes_notify_nmi+0x17c/0x180 >>> [] ? notifier_call_chain+0x55/0x80 >>> [] ? atomic_notifier_call_chain+0x1a/0x20 >>> [] ? notify_die+0x2e/0x30 >>> [] ? do_nmi+0x1a1/0x2b0 >>> [] ? nmi+0x20/0x30 >>> [] ? native_safe_halt+0xb/0x10 >>> <> [] ? default_idle+0x4d/0xb0 >>> [] ? cpu_idle+0xb6/0x110 >>> [] ? rest_init+0x7a/0x80 >>> [] ? start_kernel+0x424/0x430 >>> [] ? x86_64_start_reservations+0x125/0x129 >>> [] ? x86_64_start_kernel+0xfa/0x109 >>> >>> The root cause of the problem is that the 'reset_assigned_device()' c= ode >>> first writes a 0 to the command register. Then, when qemu subsequentl= y does >>> a kvm_deassign_irq() (called by assign_irq(), in the system_reset pat= h), >>> the kernel ends up calling '__msix_mask_irq()', which performs a writ= e to >>> the memory mapped msi vector space. Since, we've explicitly told the = device >>> to disallow mmio access (via the 0 write to the command register), we= end >>> up with the above 'Unsupported Request'. >>> >>> The fix here is to first call kvm_deassign_irq(), before doing the re= set, >> >> s/fix/workaround/. This is a kernel bug if userspace can crash the >> system like this, no? Let's fix the kernel first and then look at what= >> needs to be changed here. >> >> Jan >> >=20 > But don't I need special privalege to run the device assignment bits? Yes, but even that might be moderated by a management component like libvirt. > For example, this crash is precipitated by a write of '0' to the pci > device config register from userspace. Surely, not every is allowed to > do that write. So it seems to me, that this patch is in keeping with th= e > current model of how things work. No user should needlessly be able to crash the host by issuing valid commands in a special order. Jan --------------enigFAB02F4C18E041D6E21DF9C3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk92FRcACgkQitSsb3rl5xT1MQCeISyfWEB57A9o3OBIT3Wendu/ iNQAnArR6fHUhmNhYpcY545K/2EUT0Bq =8Pvh -----END PGP SIGNATURE----- --------------enigFAB02F4C18E041D6E21DF9C3--