From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Wei, Gang" Subject: RE: Xen 4.1 rc1 test report Date: Tue, 25 Jan 2011 22:05:21 +0800 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Zheng, Shaohui" , "xen-devel@lists.xensource.com" Cc: "Wei, Gang" List-Id: xen-devel@lists.xenproject.org Zheng, Shaohui wrote on=A02011-01-23: >2. [VT-d]xen panic on function do_IRQ after many times NIC pass-throu (Int= el) > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=3D1706 I may need some help on this bug. Below are my findings. According the call trace, just got the fault code point is at the last line= of below code segment. -------------------- __do_IRQ_guest(...) for ( i =3D 0; i < action->nr_guests; i++ ) d =3D action->guest[i]; pirq =3D domain_irq_to_pirq(d, irq); =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Fatal page fault while access ((d)->arch.irq_pirq[irq]), because (d)->arch.= irq_pirq is already NULL. More experiments shows that while doing the one before last 'xl create', pc= iback could not locate the device to be assigned: --------------------- [ 4802.773665] pciback pci-26-0: 22 Couldn't locate PCI device (0000:05:00.= 0)!perhaps already in-use? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D And while doing the following 'xl destroy', device model didn't response: --------------------- libxl: error: libxl_device.c:477:libxl__wait_for_device_model Device Model = not ready libxl: error: libxl_pci.c:866:do_pci_remove Device Model didn't respond in = time =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D In the immediate 'xl debug i' output, we can see the guest pirqs of the ass= igned device were not unbound from the host irq desc. --------------------- (XEN) IRQ: 16 affinity:00000000,00000000,00000000,00000001 vec:a8 type= =3DIO-APIC-level status=3D00000050 in-flight=3D0 domain-list=3D0: 16(-S--),= 1: 16(----), (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00000004 vec:ba type= =3DPCI-MSI status=3D00000010 in-flight=3D0 domain-list=3D1: 55(----), =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The unbound guest domain info(which is already destroy while 'xl destroy') = then induces the null address access while there comes a spurious interrupt= for that device. There are three points we may need to do:=20 1. Figure out the root cause why the pciback could not locate the device. I suspect the previous 'xl destroy' didn't return the device to pcistub suc= cessfully. 2. Figure out the root cause why the guest pirq was not force unbound. Just found: Some time because if ( !IS_PRIV_FOR(current->domain, d) ) hit, so returned = with -EINVAL; Sometime if ( !(desc->status & IRQ_GUEST) ) hit, so do not unbind. 3. Think about how we could prevent such cases from panic Xen. Any ideas, hints, comments, suggestions or even fixes on it? Jimmy