All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen 4.1 rc1 test report
@ 2011-01-22 16:37 Zheng, Shaohui
  2011-01-22 16:54 ` Mike Viau
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-22 16:37 UTC (permalink / raw)
  To: xen-devel

Hi, All
	Intel QA conducted a full validation for xen 4.1 rc1, it includes VT-x, VT-d,  SRIOV, RAS, TXT and xl tools testing. 24 issues were exposed. Refer the bug list, please.

	We already assigned 14 bugs to Intel developers (which has an 'Intel' tag in the bug title), most of the rest 10 bugs are related xl command.  For the these bugs, need community's help to fix them. 

Version information:
Change Set: 22764:75b6287626ee
pvops dom0: 862ef97190f6b54d35c76c93fb2b8fadd7ab7d68
ioemu : 1c304816043c0ffe14d20d6006d6165cb7fddb9b

Bug list:
Vt-d ( 7 bugs)
1. ubuntu PAE SMP guest has network problem with NIC assigned (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1709
2. [VT-d] xen panic on function do_IRQ after many times NIC pass-throu (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706
3. [VT-D]run guest with NIC assigned will cause system hang sometimes under PAE on Sandy bridge (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1725
4.[vt-d] dom0 igb driver is too old to support 4-port	(Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1708
5.[VT-d] xen panic when run guest with NIC assigned sometimes (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=24
6.[vt-d] xl command does not response after passthrou IGD card (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1723
7.[vt-d] fail to get IP address after hotplug VF for 300 times (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1722

RAS (1 bug)
1. System hang when running cpu offline (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1654

ACPI (1 bug)
1. System cann't resume after do suspend (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1707

Save/Restore(1 bug)
1. RHEL6 guest fail to do save/restore (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1716

xl command(7 bugs)
1. xl does not check the duplicated configure file and image file (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1711
2. [vt-d] Can not detach the device which was assigned statically (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1717
3. guest shows white screen when boot guest with NIC assigned (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
4. memory corruption was reported by "xl" with device pass-throu (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713
5. [vt-d] fail to passthrou two or more devices to guest (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
6. Guest network broken after do SAVE/RESTOR with xl (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1703
7. Too many error information showed when destroy an inexistent guest (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1714

Hypervisor(4 bugs)
1. Only two 1GB-pages be allocated to a 10GBs memory guest (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1721
2. guest with vnif assigned fail to bootup when disable apic (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1692
3. Dom0 crashes on Core2 when dom0_mem is no more than 1972MB (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726
4. Guest does not disappear after poweroff it (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1720

Performance(1 bug)
1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719

X2APIC (1 bug)
1. Fail to bootup sandy bridge under PAE with x2apic enabled (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1718

Windows (1 bug)
1. All windows UP guest boot fail
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1704	(Intel)

Thanks & Regards,
Shaohui

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-22 16:37 Xen 4.1 rc1 test report Zheng, Shaohui
@ 2011-01-22 16:54 ` Mike Viau
  2011-01-23  2:34   ` Zheng, Shaohui
  2011-01-24 19:00 ` Stefano Stabellini
  2011-01-25 14:05 ` Wei, Gang
  2 siblings, 1 reply; 28+ messages in thread
From: Mike Viau @ 2011-01-22 16:54 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4351 bytes --]


> On Sun, 23 Jan 2011 00:37:21 +0800 <shaohui.zheng@intel.com> wrote:
> 
> Hi, All
> 	Intel QA conducted a full validation for xen 4.1 rc1, it includes VT-x, VT-d,  SRIOV, RAS, TXT and xl tools testing. 24 issues were exposed. Refer the bug list, please.
> 
> 	We already assigned 14 bugs to Intel developers (which has an 'Intel' tag in the bug title), most of the rest 10 bugs are related xl command.  For the these bugs, need community's help to fix them. 
> 
> Version information:
> Change Set: 22764:75b6287626ee
> pvops dom0: 862ef97190f6b54d35c76c93fb2b8fadd7ab7d68
> ioemu : 1c304816043c0ffe14d20d6006d6165cb7fddb9b
> 
> Bug list:
> Vt-d ( 7 bugs)
> 1. ubuntu PAE SMP guest has network problem with NIC assigned (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1709
> 2. [VT-d] xen panic on function do_IRQ after many times NIC pass-throu (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706
> 3. [VT-D]run guest with NIC assigned will cause system hang sometimes under PAE on Sandy bridge (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1725
> 4.[vt-d] dom0 igb driver is too old to support 4-port	(Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1708
> 5.[VT-d] xen panic when run guest with NIC assigned sometimes (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=24
> 6.[vt-d] xl command does not response after passthrou IGD card (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1723
> 7.[vt-d] fail to get IP address after hotplug VF for 300 times (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1722
> 
> RAS (1 bug)
> 1. System hang when running cpu offline (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1654
> 
> ACPI (1 bug)
> 1. System cann't resume after do suspend (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1707
> 
> Save/Restore(1 bug)
> 1. RHEL6 guest fail to do save/restore (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1716
> 
> xl command(7 bugs)
> 1. xl does not check the duplicated configure file and image file (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1711
> 2. [vt-d] Can not detach the device which was assigned statically (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1717
> 3. guest shows white screen when boot guest with NIC assigned (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
> 4. memory corruption was reported by "xl" with device pass-throu (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713
> 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> 6. Guest network broken after do SAVE/RESTOR with xl (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1703
> 7. Too many error information showed when destroy an inexistent guest (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1714
> 
> Hypervisor(4 bugs)
> 1. Only two 1GB-pages be allocated to a 10GBs memory guest (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1721
> 2. guest with vnif assigned fail to bootup when disable apic (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1692
> 3. Dom0 crashes on Core2 when dom0_mem is no more than 1972MB (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726
> 4. Guest does not disappear after poweroff it (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1720
> 
> Performance(1 bug)
> 1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719
> 
> X2APIC (1 bug)
> 1. Fail to bootup sandy bridge under PAE with x2apic enabled (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1718
> 
> Windows (1 bug)
> 1. All windows UP guest boot fail
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1704	(Intel)
> 

What your BSOD like the one in the screen shot attached?

This is what I got when I attempted to boot from a Window 7 x86 ISO.

hal_initialization_failed + 0x5C  : http://msdn.microsoft.com/en-us/library/ff559069%28v=vs.85%29.aspx 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 5049 bytes --]

[-- Attachment #2: Screenshot-Untitled Window.png --]
[-- Type: image/png, Size: 19117 bytes --]

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-22 16:54 ` Mike Viau
@ 2011-01-23  2:34   ` Zheng, Shaohui
  0 siblings, 0 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-23  2:34 UTC (permalink / raw)
  To: Mike Viau, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 212 bytes --]


What your BSOD like the one in the screen shot attached?

This is what I got when I attempted to boot from a Window 7 x86 ISO.

Yes, the error code is 0x5c,  Intel developer already has a patch to fix it.

[-- Attachment #1.2: Type: text/html, Size: 2649 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-22 16:37 Xen 4.1 rc1 test report Zheng, Shaohui
  2011-01-22 16:54 ` Mike Viau
@ 2011-01-24 19:00 ` Stefano Stabellini
  2011-01-25  3:42   ` Zhang, Yang Z
                     ` (2 more replies)
  2011-01-25 14:05 ` Wei, Gang
  2 siblings, 3 replies; 28+ messages in thread
From: Stefano Stabellini @ 2011-01-24 19:00 UTC (permalink / raw)
  To: Zheng, Shaohui; +Cc: xen-devel

looking more closely at some of the bugs...

On Sat, 22 Jan 2011, Zheng, Shaohui wrote:
> Save/Restore(1 bug)
> 1. RHEL6 guest fail to do save/restore (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1716

Upstream PV on HVM save/restore should be fixed, I am afraid this could
be a bug left in the RHEL6 PV on HVM kernel.


> xl command(7 bugs)
> 2. [vt-d] Can not detach the device which was assigned statically (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1717

This should be easy to fix, probably a parsing error somewhere in xl.


> 3. guest shows white screen when boot guest with NIC assigned (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
> 4. memory corruption was reported by "xl" with device pass-throu (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713

I haven't seen any of these bugs recently. Are you still able to
reproduce them or are they just old bugs left open in bugzilla?


> 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710

Assigning multiple devices to an HVM guest using qemu should work,
however assigning multiple devices to a PV guest or an HVM
guest using stubdoms is known NOT to work. In particular this is what
IanC is working on.

Is this bug being reproduced using stubdoms? If so, this is a known
issue, otherwise it might be a new bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-24 19:00 ` Stefano Stabellini
@ 2011-01-25  3:42   ` Zhang, Yang Z
  2011-01-25 11:23     ` Stefano Stabellini
  2011-01-25 16:01     ` Stefano Stabellini
  2011-01-25 10:10   ` Pasi Kärkkäinen
  2011-01-25 15:49   ` Stefano Stabellini
  2 siblings, 2 replies; 28+ messages in thread
From: Zhang, Yang Z @ 2011-01-25  3:42 UTC (permalink / raw)
  To: Stefano Stabellini, Zheng, Shaohui; +Cc: xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Stefano
> Stabellini
> Sent: Tuesday, January 25, 2011 3:00 AM
> To: Zheng, Shaohui
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Xen 4.1 rc1 test report
> 
> looking more closely at some of the bugs...
> 
> > 3. guest shows white screen when boot guest with NIC assigned
> (Community)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
> > 4. memory corruption was reported by "xl" with device pass-throu
> (Community)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713
> 
> I haven't seen any of these bugs recently. Are you still able to
> reproduce them or are they just old bugs left open in bugzilla?

This should be a regression. I didn't see it in changeset 22653. And it still exist latest xen-unstable tree.

> 
> > 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> 
> Assigning multiple devices to an HVM guest using qemu should work,
> however assigning multiple devices to a PV guest or an HVM
> guest using stubdoms is known NOT to work. In particular this is what
> IanC is working on.
> 
> Is this bug being reproduced using stubdoms? If so, this is a known
> issue, otherwise it might be a new bug.

No, we don't use stubdoms. This also is a regression. Changeset 22653 didn't have this problem.

Here are two questions:
1.Is there new format to set cdrom to disk? It will show the following error when I use the default config:
[root@vt-snb3 var]# xl create xmexample.hvm
Parsing config file xmexample.hvm
Unknown disk type: ,hdc

2. If I removing CONFIG_XEN_BLKDEV_BACKEND and CONFIG_XEN_BLKDEV_TAP from dom0, when using xm to create qcow guest, it will show VBD error. Also xenu guest cannot be boot up successfully. I had reported this issue before, but no response.


Best regards
yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-24 19:00 ` Stefano Stabellini
  2011-01-25  3:42   ` Zhang, Yang Z
@ 2011-01-25 10:10   ` Pasi Kärkkäinen
  2011-01-25 15:49   ` Stefano Stabellini
  2 siblings, 0 replies; 28+ messages in thread
From: Pasi Kärkkäinen @ 2011-01-25 10:10 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Zheng, Shaohui, xen-devel

On Mon, Jan 24, 2011 at 07:00:17PM +0000, Stefano Stabellini wrote:
> looking more closely at some of the bugs...
> 
> On Sat, 22 Jan 2011, Zheng, Shaohui wrote:
> > Save/Restore(1 bug)
> > 1. RHEL6 guest fail to do save/restore (Community)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1716
> 
> Upstream PV on HVM save/restore should be fixed, I am afraid this could
> be a bug left in the RHEL6 PV on HVM kernel.
> 
> 

Redhat bugzilla entry about this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=669252

-- Pasi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25  3:42   ` Zhang, Yang Z
@ 2011-01-25 11:23     ` Stefano Stabellini
  2011-01-25 16:01     ` Stefano Stabellini
  1 sibling, 0 replies; 28+ messages in thread
From: Stefano Stabellini @ 2011-01-25 11:23 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Zheng, Shaohui, xen-devel, Stefano Stabellini

On Tue, 25 Jan 2011, Zhang, Yang Z wrote:
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xensource.com
> > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Stefano
> > Stabellini
> > Sent: Tuesday, January 25, 2011 3:00 AM
> > To: Zheng, Shaohui
> > Cc: xen-devel@lists.xensource.com
> > Subject: Re: [Xen-devel] Xen 4.1 rc1 test report
> > 
> > looking more closely at some of the bugs...
> > 
> > > 3. guest shows white screen when boot guest with NIC assigned
> > (Community)
> > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
> > > 4. memory corruption was reported by "xl" with device pass-throu
> > (Community)
> > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713
> > 
> > I haven't seen any of these bugs recently. Are you still able to
> > reproduce them or are they just old bugs left open in bugzilla?
> 
> This should be a regression. I didn't see it in changeset 22653. And it still exist latest xen-unstable tree.

Just to be clear: you can still reproduce both
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
and
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713
on xen-unstable?
How often can you see these issues?


> > 
> > > 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> > 
> > Assigning multiple devices to an HVM guest using qemu should work,
> > however assigning multiple devices to a PV guest or an HVM
> > guest using stubdoms is known NOT to work. In particular this is what
> > IanC is working on.
> > 
> > Is this bug being reproduced using stubdoms? If so, this is a known
> > issue, otherwise it might be a new bug.
> 
> No, we don't use stubdoms. This also is a regression. Changeset 22653 didn't have this problem.
> 

I see.


> Here are two questions:
> 1.Is there new format to set cdrom to disk? It will show the following error when I use the default config:
> [root@vt-snb3 var]# xl create xmexample.hvm
> Parsing config file xmexample.hvm
> Unknown disk type: ,hdc
> 

Thanks for spotting this! Xl is currently expecting a correct value in
the "path" field so ',hdc:cdrom,r' is not supported, but
'file:/path/to/cdrom.iso,hdc:cdrom,r' is.
We need to fix this.


> 2. If I removing CONFIG_XEN_BLKDEV_BACKEND and CONFIG_XEN_BLKDEV_TAP from dom0, when using xm to create qcow guest, it will show VBD error. Also xenu guest cannot be boot up successfully. I had reported this issue before, but no response.
> 
 
Unfortunately xend doesn't support qemu as block backend, only blktap2,
and blktap2 is still broken with qcow.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-22 16:37 Xen 4.1 rc1 test report Zheng, Shaohui
  2011-01-22 16:54 ` Mike Viau
  2011-01-24 19:00 ` Stefano Stabellini
@ 2011-01-25 14:05 ` Wei, Gang
  2011-01-25 14:13   ` Keir Fraser
  2011-01-26  5:52   ` Wei, Gang
  2 siblings, 2 replies; 28+ messages in thread
From: Wei, Gang @ 2011-01-25 14:05 UTC (permalink / raw)
  To: Zheng, Shaohui, xen-devel; +Cc: Wei, Gang

Zheng, Shaohui wrote on 2011-01-23:
>2. [VT-d]xen panic on function do_IRQ after many times NIC pass-throu (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706

I may need some help on this bug. Below are my findings.

According the call trace, just got the fault code point is at the last line of below code segment.
--------------------
__do_IRQ_guest(...)
	for ( i = 0; i < action->nr_guests; i++ )
        d = action->guest[i];
        pirq = domain_irq_to_pirq(d, irq);
===========
Fatal page fault while access ((d)->arch.irq_pirq[irq]), because (d)->arch.irq_pirq is already NULL.

More experiments shows that while doing the one before last 'xl create', pciback could not locate the device to be assigned:
---------------------
[ 4802.773665] pciback pci-26-0: 22 Couldn't locate PCI device (0000:05:00.0)!perhaps already in-use?
============

And while doing the following 'xl destroy', device model didn't response:
---------------------
libxl: error: libxl_device.c:477:libxl__wait_for_device_model Device Model not ready
libxl: error: libxl_pci.c:866:do_pci_remove Device Model didn't respond in time
============

In the immediate 'xl debug i' output, we can see the guest pirqs of the assigned device were not unbound from the host irq desc.
---------------------
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:a8 type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 16(-S--),1: 16(----),
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:ba type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 55(----),
============

The unbound guest domain info(which is already destroy while 'xl destroy') then induces the null address access while there comes a spurious interrupt for that device.

There are three points we may need to do: 
1. Figure out the root cause why the pciback could not locate the device.
I suspect the previous 'xl destroy' didn't return the device to pcistub successfully.

2. Figure out the root cause why the guest pirq was not force unbound.
Just found:
Some time because if ( !IS_PRIV_FOR(current->domain, d) ) hit, so returned with -EINVAL;
Sometime if ( !(desc->status & IRQ_GUEST) ) hit, so do not unbind.

3. Think about how we could prevent such cases from panic Xen.

Any ideas, hints, comments, suggestions or even fixes on it?

Jimmy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RE: Xen 4.1 rc1 test report
  2011-01-25 14:05 ` Wei, Gang
@ 2011-01-25 14:13   ` Keir Fraser
  2011-01-25 15:49     ` Wei, Gang
  2011-01-26  5:52   ` Wei, Gang
  1 sibling, 1 reply; 28+ messages in thread
From: Keir Fraser @ 2011-01-25 14:13 UTC (permalink / raw)
  To: Wei, Gang, Zheng, Shaohui, xen-devel

On 25/01/2011 14:05, "Wei, Gang" <gang.wei@intel.com> wrote:

> 3. Think about how we could prevent such cases from panic Xen.
> 
> Any ideas, hints, comments, suggestions or even fixes on it?

Either the domain destroy path should forcibly unbind pirqs, or a non-empty
set of pirq bindings should hold at least one reference to a domain, to
prevent it being destroyed+freed.

Is forcible unbinding ever dangerous to system stability? If not perhaps
that is best.

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-24 19:00 ` Stefano Stabellini
  2011-01-25  3:42   ` Zhang, Yang Z
  2011-01-25 10:10   ` Pasi Kärkkäinen
@ 2011-01-25 15:49   ` Stefano Stabellini
  2011-01-26  3:56     ` Zheng, Shaohui
  2 siblings, 1 reply; 28+ messages in thread
From: Stefano Stabellini @ 2011-01-25 15:49 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Zheng, Shaohui, xen-devel

On Mon, 24 Jan 2011, Stefano Stabellini wrote:
> > xl command(7 bugs)
> > 2. [vt-d] Can not detach the device which was assigned statically (Community)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1717
> 
> This should be easy to fix, probably a parsing error somewhere in xl.
> 

Actually I cannot repro this bug.
Could you please confirm you still have this issue?
I have just statically assigned a NIC to my domain, then I pci-detach it
using xl and everything worked properly.
Of course you need HOTPLUG_PCI_ACPI compiled in the guest's kernel for
this to work.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Xen 4.1 rc1 test report
  2011-01-25 14:13   ` Keir Fraser
@ 2011-01-25 15:49     ` Wei, Gang
  0 siblings, 0 replies; 28+ messages in thread
From: Wei, Gang @ 2011-01-25 15:49 UTC (permalink / raw)
  To: Keir Fraser, Zheng, Shaohui, xen-devel; +Cc: Wei, Gang

Keir Fraser wrote on 2011-01-25:
> On 25/01/2011 14:05, "Wei, Gang" <gang.wei@intel.com> wrote:
> 
>> 3. Think about how we could prevent such cases from panic Xen.
>> 
>> Any ideas, hints, comments, suggestions or even fixes on it?
> 
> Either the domain destroy path should forcibly unbind pirqs, or a
> non-empty set of pirq bindings should hold at least one reference to a
> domain, to prevent it being destroyed+freed.
> 
> Is forcible unbinding ever dangerous to system stability? If not
> perhaps that is best.

I agree, and will have a try in this way.

Jimmy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25  3:42   ` Zhang, Yang Z
  2011-01-25 11:23     ` Stefano Stabellini
@ 2011-01-25 16:01     ` Stefano Stabellini
  2011-01-26  1:01       ` Zheng, Shaohui
                         ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Stefano Stabellini @ 2011-01-25 16:01 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Zheng, Shaohui, xen-devel, Stefano Stabellini

On Tue, 25 Jan 2011, Zhang, Yang Z wrote:
> > > 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> > 
> > Assigning multiple devices to an HVM guest using qemu should work,
> > however assigning multiple devices to a PV guest or an HVM
> > guest using stubdoms is known NOT to work. In particular this is what
> > IanC is working on.
> > 
> > Is this bug being reproduced using stubdoms? If so, this is a known
> > issue, otherwise it might be a new bug.
> 
> No, we don't use stubdoms. This also is a regression. Changeset 22653 didn't have this problem.

Unfortunately I cannot reproduce this bug either.
I have just successfully assigned two NICs to a VM, run dhclient and
received proper IP addresses on both interfaces.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25 16:01     ` Stefano Stabellini
@ 2011-01-26  1:01       ` Zheng, Shaohui
  2011-01-26  3:54       ` Zheng, Shaohui
  2011-01-26  4:53       ` Zheng, Shaohui
  2 siblings, 0 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-26  1:01 UTC (permalink / raw)
  To: Stefano Stabellini, Zhang, Yang Z; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

Stefano,
	It is static passthrou via the hvm configure file, for dynamic pci-attach, there is not such bug. QA already identify that it is a regression between CS 22653 and 22764.
	

Thanks & Regards,
Shaohui


> -----Original Message-----
> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> Sent: Wednesday, January 26, 2011 12:01 AM
> To: Zhang, Yang Z
> Cc: Stefano Stabellini; Zheng, Shaohui; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] Xen 4.1 rc1 test report
> 
> On Tue, 25 Jan 2011, Zhang, Yang Z wrote:
> > > > 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> > > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> > >
> > > Assigning multiple devices to an HVM guest using qemu should work,
> > > however assigning multiple devices to a PV guest or an HVM
> > > guest using stubdoms is known NOT to work. In particular this is what
> > > IanC is working on.
> > >
> > > Is this bug being reproduced using stubdoms? If so, this is a known
> > > issue, otherwise it might be a new bug.
> >
> > No, we don't use stubdoms. This also is a regression. Changeset 22653 didn't have this problem.
> 
> Unfortunately I cannot reproduce this bug either.
> I have just successfully assigned two NICs to a VM, run dhclient and
> received proper IP addresses on both interfaces.


[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25 16:01     ` Stefano Stabellini
  2011-01-26  1:01       ` Zheng, Shaohui
@ 2011-01-26  3:54       ` Zheng, Shaohui
  2011-01-26  4:53       ` Zheng, Shaohui
  2 siblings, 0 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-26  3:54 UTC (permalink / raw)
  To: Zheng, Shaohui, Stefano Stabellini, Zhang, Yang Z; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

Stefano,
	We tried changeset 22812: 73b3debb90cf, this bug was already fixed. We can success to assign two and two more devices to guest now. Thanks.

Thanks & Regards,
Shaohui


> -----Original Message-----
> From: Zheng, Shaohui
> Sent: Wednesday, January 26, 2011 9:02 AM
> To: 'Stefano Stabellini'; Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] Xen 4.1 rc1 test report
> 
> Stefano,
> 	It is static passthrou via the hvm configure file, for dynamic pci-attach, there is not such bug. QA
> already identify that it is a regression between CS 22653 and 22764.
> 
> 
> Thanks & Regards,
> Shaohui
> 
> 
> > -----Original Message-----
> > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > Sent: Wednesday, January 26, 2011 12:01 AM
> > To: Zhang, Yang Z
> > Cc: Stefano Stabellini; Zheng, Shaohui; xen-devel@lists.xensource.com
> > Subject: RE: [Xen-devel] Xen 4.1 rc1 test report
> >
> > On Tue, 25 Jan 2011, Zhang, Yang Z wrote:
> > > > > 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> > > > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> > > >
> > > > Assigning multiple devices to an HVM guest using qemu should work,
> > > > however assigning multiple devices to a PV guest or an HVM
> > > > guest using stubdoms is known NOT to work. In particular this is what
> > > > IanC is working on.
> > > >
> > > > Is this bug being reproduced using stubdoms? If so, this is a known
> > > > issue, otherwise it might be a new bug.
> > >
> > > No, we don't use stubdoms. This also is a regression. Changeset 22653 didn't have this problem.
> >
> > Unfortunately I cannot reproduce this bug either.
> > I have just successfully assigned two NICs to a VM, run dhclient and
> > received proper IP addresses on both interfaces.


[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25 15:49   ` Stefano Stabellini
@ 2011-01-26  3:56     ` Zheng, Shaohui
  0 siblings, 0 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-26  3:56 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

it was already fixed. We can detach the device which was assigned statically against chagneset 22812: 73b3debb90cf.

Thanks & Regards,
Shaohui


> -----Original Message-----
> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> Sent: Tuesday, January 25, 2011 11:49 PM
> To: Stefano Stabellini
> Cc: Zheng, Shaohui; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Xen 4.1 rc1 test report
> 
> On Mon, 24 Jan 2011, Stefano Stabellini wrote:
> > > xl command(7 bugs)
> > > 2. [vt-d] Can not detach the device which was assigned statically (Community)
> > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1717
> >
> > This should be easy to fix, probably a parsing error somewhere in xl.
> >
> 
> Actually I cannot repro this bug.
> Could you please confirm you still have this issue?
> I have just statically assigned a NIC to my domain, then I pci-detach it
> using xl and everything worked properly.
> Of course you need HOTPLUG_PCI_ACPI compiled in the guest's kernel for
> this to work.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25 16:01     ` Stefano Stabellini
  2011-01-26  1:01       ` Zheng, Shaohui
  2011-01-26  3:54       ` Zheng, Shaohui
@ 2011-01-26  4:53       ` Zheng, Shaohui
  2 siblings, 0 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-26  4:53 UTC (permalink / raw)
  To: Stefano Stabellini, Zhang, Yang Z; +Cc: xen-devel

Stefano,
	I tried changeset 22812: 73b3debb90cf, and this bug already was fixed 

Thanks & Regards,
Shaohui


> -----Original Message-----
> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> Sent: Wednesday, January 26, 2011 12:01 AM
> To: Zhang, Yang Z
> Cc: Stefano Stabellini; Zheng, Shaohui; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] Xen 4.1 rc1 test report
> 
> On Tue, 25 Jan 2011, Zhang, Yang Z wrote:
> > > > 5. [vt-d] fail to passthrou two or more devices to guest (Community)
> > > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
> > >
> > > Assigning multiple devices to an HVM guest using qemu should work,
> > > however assigning multiple devices to a PV guest or an HVM
> > > guest using stubdoms is known NOT to work. In particular this is what
> > > IanC is working on.
> > >
> > > Is this bug being reproduced using stubdoms? If so, this is a known
> > > issue, otherwise it might be a new bug.
> >
> > No, we don't use stubdoms. This also is a regression. Changeset 22653 didn't have this problem.
> 
> Unfortunately I cannot reproduce this bug either.
> I have just successfully assigned two NICs to a VM, run dhclient and
> received proper IP addresses on both interfaces.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25 14:05 ` Wei, Gang
  2011-01-25 14:13   ` Keir Fraser
@ 2011-01-26  5:52   ` Wei, Gang
  2011-01-26  6:31     ` Wei, Gang
  1 sibling, 1 reply; 28+ messages in thread
From: Wei, Gang @ 2011-01-26  5:52 UTC (permalink / raw)
  To: Zheng, Shaohui, xen-devel; +Cc: Keir Fraser, Wei, Gang

Wei, Gang wrote on 2011-01-25:
>> 2. [VT-d]xen panic on function do_IRQ after many times NIC pass-throu
>> (Intel)
>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706
> There are three points we may need to do:
> 1. Figure out the root cause why the pciback could not locate the device.
> I suspect the previous 'xl destroy' didn't return the device to
> pcistub successfully.
> 
> 2. Figure out the root cause why the guest pirq was not force unbound.
> Just found:
> Some time because if ( !IS_PRIV_FOR(current->domain, d) ) hit, so
> returned with -EINVAL; Sometime if ( !(desc->status & IRQ_GUEST) )
> hit, so do not unbind.
> 
> 3. Think about how we could prevent such cases from panic Xen.

Just found sometime while doing domain_destroy the current->domain is IDLE domain, so the if ( !IS_PRIV_FOR(current->domain, d) ) hit and skip the pirq forcible unbinding. How could it happen?

Jimmy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-26  5:52   ` Wei, Gang
@ 2011-01-26  6:31     ` Wei, Gang
  2011-01-26  7:24       ` Keir Fraser
  2011-01-26  7:28       ` [PATCH] Fix bug1706 Wei, Gang
  0 siblings, 2 replies; 28+ messages in thread
From: Wei, Gang @ 2011-01-26  6:31 UTC (permalink / raw)
  To: Zheng, Shaohui, xen-devel; +Cc: Keir Fraser, Wei, Gang

Wei, Gang wrote on 2011-01-26:
> Just found sometime while doing domain_destroy the current->domain is
> IDLE domain, so the if ( !IS_PRIV_FOR(current->domain, d) ) hit and
> skip the pirq forcible unbinding. How could it happen?

Look like it is caused by call_rcu(&d->rcu, complete_domain_destroy) in the end of domain_destroy fn. We may need to move the check for IS_PRIV_FOR(current->domain, d) out and check it earlier in the call path if necessary.

Jimmy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-26  6:31     ` Wei, Gang
@ 2011-01-26  7:24       ` Keir Fraser
  2011-01-26  7:34         ` Wei, Gang
  2011-01-26  7:28       ` [PATCH] Fix bug1706 Wei, Gang
  1 sibling, 1 reply; 28+ messages in thread
From: Keir Fraser @ 2011-01-26  7:24 UTC (permalink / raw)
  To: Wei, Gang, Zheng, Shaohui, xen-devel

On 26/01/2011 06:31, "Wei, Gang" <gang.wei@intel.com> wrote:

> Wei, Gang wrote on 2011-01-26:
>> Just found sometime while doing domain_destroy the current->domain is
>> IDLE domain, so the if ( !IS_PRIV_FOR(current->domain, d) ) hit and
>> skip the pirq forcible unbinding. How could it happen?
> 
> Look like it is caused by call_rcu(&d->rcu, complete_domain_destroy) in the
> end of domain_destroy fn. We may need to move the check for
> IS_PRIV_FOR(current->domain, d) out and check it earlier in the call path if
> necessary.

Those core map/unmap functions shouldn't be doing the priv checks
themselves. I'll sort out a patch for you to try.

 -- Keir

> Jimmy
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] Fix bug1706
  2011-01-26  6:31     ` Wei, Gang
  2011-01-26  7:24       ` Keir Fraser
@ 2011-01-26  7:28       ` Wei, Gang
  1 sibling, 0 replies; 28+ messages in thread
From: Wei, Gang @ 2011-01-26  7:28 UTC (permalink / raw)
  To: Zheng, Shaohui, xen-devel; +Cc: Keir Fraser, Wei, Gang

[-- Attachment #1: Type: text/plain, Size: 947 bytes --]

Here is fix for bug 1706.

ROOT-CAUSE:
In the end of domain_destroy fn, call_rcu(&d->rcu, complete_domain_destroy) make it possible that complete_domain_destroy fn be executed in different vcpu context. So the IS_PRIV_FOR check in unmap_domain_pirq fn is not suitable. In fact, all necessary privilege checks have already been done in the start of hypercalls, we need only simply remove this check from unmap_domain_pirq.

Signed-off-by: Wei Gang <gang.wei@inte.com>

diff -r d1631540bcc4 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Tue Jan 18 17:23:24 2011 +0000
+++ b/xen/arch/x86/irq.c	Thu Jan 27 20:53:28 2011 +0800
@@ -1567,9 +1567,6 @@ int unmap_domain_pirq(struct domain *d, 
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
 
-    if ( !IS_PRIV_FOR(current->domain, d) )
-        return -EINVAL;
-
     ASSERT(spin_is_locked(&pcidevs_lock));
     ASSERT(spin_is_locked(&d->event_lock));

Jimmy

[-- Attachment #2: bug1706-fix.patch --]
[-- Type: application/octet-stream, Size: 2637 bytes --]

Fix bug1706

According to the call trace, just got the fault code point is at the last line of below code segment.
--------------------
__do_IRQ_guest(...)
	for ( i = 0; i < action->nr_guests; i++ )
        d = action->guest[i];
        pirq = domain_irq_to_pirq(d, irq);
===========
Fatal page fault while access ((d)->arch.irq_pirq[irq]), because (d)->arch.irq_pirq is already NULL.

More experiments shows that while doing the one before last 'xl create', pciback could not locate the device to be assigned:
---------------------
[ 4802.773665] pciback pci-26-0: 22 Couldn't locate PCI device (0000:05:00.0)!perhaps already in-use?
============

And while doing the following 'xl destroy', device model didn't response:
---------------------
libxl: error: libxl_device.c:477:libxl__wait_for_device_model Device Model not ready
libxl: error: libxl_pci.c:866:do_pci_remove Device Model didn't respond in time
============

In the immediate 'xl debug i' output, we can see the guest pirqs of the assigned device were not unbound from the host irq desc.
---------------------
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:a8 type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 16(-S--),1: 16(----),
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:ba type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 55(----),
============

The unbound guest domain info(which is already destroy while 'xl destroy') then induces the null address access while there comes a spurious interrupt for that device.

Further investigation shows that sometime while doing domain_destroy the current->domain is IDLE domain, so the if ( !IS_PRIV_FOR(current->domain, d) ) hit and skip the pirq forcible unbinding.

ROOT-CAUSE:
In the end of domain_destroy fn, call_rcu(&d->rcu, complete_domain_destroy) make it possible that complete_domain_destroy fn be executed in different vcpu context. So the IS_PRIV_FOR check in unmap_domain_pirq fn is not suitable. In fact, all necessary privilege checks have already been done in the start of hypercalls, we need only simply remove this check from unmap_domain_pirq.

Signed-off-by: Wei Gang <gang.wei@inte.com>

diff -r d1631540bcc4 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Tue Jan 18 17:23:24 2011 +0000
+++ b/xen/arch/x86/irq.c	Thu Jan 27 20:53:28 2011 +0800
@@ -1567,9 +1567,6 @@ int unmap_domain_pirq(struct domain *d, 
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
 
-    if ( !IS_PRIV_FOR(current->domain, d) )
-        return -EINVAL;
-
     ASSERT(spin_is_locked(&pcidevs_lock));
     ASSERT(spin_is_locked(&d->event_lock));
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-26  7:24       ` Keir Fraser
@ 2011-01-26  7:34         ` Wei, Gang
  2011-01-26 14:38           ` Wei, Gang
  0 siblings, 1 reply; 28+ messages in thread
From: Wei, Gang @ 2011-01-26  7:34 UTC (permalink / raw)
  To: Keir Fraser, Zheng, Shaohui, xen-devel; +Cc: Wei, Gang

Keir Fraser wrote on 2011-01-26:
> Those core map/unmap functions shouldn't be doing the priv checks
> themselves. I'll sort out a patch for you to try.

Haha, I have already sent out the fixing patch. Please have a look on it. I am testing it, already did 30 times of create/destroy without panic. Let's see whether we can reach 100 times...

Jimmy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-26  7:34         ` Wei, Gang
@ 2011-01-26 14:38           ` Wei, Gang
  0 siblings, 0 replies; 28+ messages in thread
From: Wei, Gang @ 2011-01-26 14:38 UTC (permalink / raw)
  To: Keir Fraser, Zheng, Shaohui, xen-devel; +Cc: Wei, Gang

Wei, Gang wrote on 2011-01-26:
> Keir Fraser wrote on 2011-01-26:
>> Those core map/unmap functions shouldn't be doing the priv checks
>> themselves. I'll sort out a patch for you to try.
> 
> Haha, I have already sent out the fixing patch. Please have a look on
> it. I am testing it, already did 30 times of create/destroy without
> panic. Let's see whether we can reach 100 times...

We made it. 100 times create/destroy without failures.

Jimmy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-25 11:43     ` Ian Campbell
@ 2011-01-26  0:47       ` Haitao Shan
  0 siblings, 0 replies; 28+ messages in thread
From: Haitao Shan @ 2011-01-26  0:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Zheng, Shaohui, Keir Fraser, xen-devel

I think it is basically the same idea as Keir introduced in 20841. I
guess this bug would happen on platforms which has large number of
physical CPUs, not only on EX system of Intel.
If you can cook the patch, that would be great! Thanks!!

Shan Haitao

2011/1/25 Ian Campbell <Ian.Campbell@citrix.com>:
> On Tue, 2011-01-25 at 06:24 +0000, Haitao Shan wrote:
>> > Performance(1 bug)
>> > 1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
>> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719
>> >
>>
>> This bug happened 1 year before. Keir has made a fix with c/s 20841,
>> which essentially holds the locked (and hence allocated) hypercall
>> page so that next hypercall can reuse it without doing alloc and mlock
>> again. By doing this, overhead of rschedule IPI as a result of
>> frequent mlock is greatly reduced.
>>
>> Late in year 2010, libxc introduced a new mechanism called hypercall
>> buffers, as you can refer c/s 22288~22312. Keir's fix is dropped in
>> this new framework. As a result, the bug appears again.
>> Probably the new framework auther can pick up Keir's fix again?
>
> I think it would make sense to include a low water mark of a small
> number of pages (perhaps 4 or 8) which instead of being freed are kept
> and reused in preference to future new allocations. These pages would
> only finally be released by the xc_interface_close() call.
>
> Is this something which you feel able to make a patch for?
>
> Ian.
>
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-25  6:24   ` Haitao Shan
  2011-01-25  8:00     ` Zheng, Shaohui
  2011-01-25  8:43     ` Keir Fraser
@ 2011-01-25 11:43     ` Ian Campbell
  2011-01-26  0:47       ` Haitao Shan
  2 siblings, 1 reply; 28+ messages in thread
From: Ian Campbell @ 2011-01-25 11:43 UTC (permalink / raw)
  To: Haitao Shan; +Cc: Zheng, Shaohui, Keir Fraser, xen-devel

On Tue, 2011-01-25 at 06:24 +0000, Haitao Shan wrote:
> > Performance(1 bug)
> > 1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719
> >
> 
> This bug happened 1 year before. Keir has made a fix with c/s 20841,
> which essentially holds the locked (and hence allocated) hypercall
> page so that next hypercall can reuse it without doing alloc and mlock
> again. By doing this, overhead of rschedule IPI as a result of
> frequent mlock is greatly reduced.
> 
> Late in year 2010, libxc introduced a new mechanism called hypercall
> buffers, as you can refer c/s 22288~22312. Keir's fix is dropped in
> this new framework. As a result, the bug appears again.
> Probably the new framework auther can pick up Keir's fix again?

I think it would make sense to include a low water mark of a small
number of pages (perhaps 4 or 8) which instead of being freed are kept
and reused in preference to future new allocations. These pages would
only finally be released by the xc_interface_close() call.

Is this something which you feel able to make a patch for?

Ian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-25  6:24   ` Haitao Shan
  2011-01-25  8:00     ` Zheng, Shaohui
@ 2011-01-25  8:43     ` Keir Fraser
  2011-01-25 11:43     ` Ian Campbell
  2 siblings, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2011-01-25  8:43 UTC (permalink / raw)
  To: Haitao Shan, Zheng, Shaohui; +Cc: Ian Campbell, xen-devel

On 25/01/2011 06:24, "Haitao Shan" <maillists.shan@gmail.com> wrote:

>> Performance(1 bug)
>> 1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719
>> 
> 
> This bug happened 1 year before. Keir has made a fix with c/s 20841,
> which essentially holds the locked (and hence allocated) hypercall
> page so that next hypercall can reuse it without doing alloc and mlock
> again. By doing this, overhead of rschedule IPI as a result of
> frequent mlock is greatly reduced.
> 
> Late in year 2010, libxc introduced a new mechanism called hypercall
> buffers, as you can refer c/s 22288~22312. Keir's fix is dropped in
> this new framework. As a result, the bug appears again.
> Probably the new framework auther can pick up Keir's fix again?

Ian Campbell, I think (cc'ed).

 -- Keir

> Shan Haitao

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Xen 4.1 rc1 test report
  2011-01-25  6:24   ` Haitao Shan
@ 2011-01-25  8:00     ` Zheng, Shaohui
  2011-01-25  8:43     ` Keir Fraser
  2011-01-25 11:43     ` Ian Campbell
  2 siblings, 0 replies; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-25  8:00 UTC (permalink / raw)
  To: ian.campbell, Haitao Shan, Keir Fraser; +Cc: xen-devel

Thanks haitao's clear explanation. I see that Ian is the author of series c/s 22288~22312. Add Ian into the loop. 

Thanks & Regards,
Shaohui

> -----Original Message-----
> From: Haitao Shan [mailto:maillists.shan@gmail.com]
> Sent: Tuesday, January 25, 2011 2:24 PM
> To: Zheng, Shaohui; Keir Fraser
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Xen 4.1 rc1 test report
> 
> > Performance(1 bug)
> > 1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719
> >
> 
> This bug happened 1 year before. Keir has made a fix with c/s 20841,
> which essentially holds the locked (and hence allocated) hypercall
> page so that next hypercall can reuse it without doing alloc and mlock
> again. By doing this, overhead of rschedule IPI as a result of
> frequent mlock is greatly reduced.
> 
> Late in year 2010, libxc introduced a new mechanism called hypercall
> buffers, as you can refer c/s 22288~22312. Keir's fix is dropped in
> this new framework. As a result, the bug appears again.
> Probably the new framework auther can pick up Keir's fix again?
> 
> Shan Haitao

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Xen 4.1 rc1 test report
  2011-01-22  9:39 ` Xen 4.1 rc1 test report Zheng, Shaohui
@ 2011-01-25  6:24   ` Haitao Shan
  2011-01-25  8:00     ` Zheng, Shaohui
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Haitao Shan @ 2011-01-25  6:24 UTC (permalink / raw)
  To: Zheng, Shaohui, Keir Fraser; +Cc: xen-devel

> Performance(1 bug)
> 1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719
>

This bug happened 1 year before. Keir has made a fix with c/s 20841,
which essentially holds the locked (and hence allocated) hypercall
page so that next hypercall can reuse it without doing alloc and mlock
again. By doing this, overhead of rschedule IPI as a result of
frequent mlock is greatly reduced.

Late in year 2010, libxc introduced a new mechanism called hypercall
buffers, as you can refer c/s 22288~22312. Keir's fix is dropped in
this new framework. As a result, the bug appears again.
Probably the new framework auther can pick up Keir's fix again?

Shan Haitao

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Xen 4.1 rc1 test report
@ 2011-01-22  9:39 ` Zheng, Shaohui
  2011-01-25  6:24   ` Haitao Shan
  0 siblings, 1 reply; 28+ messages in thread
From: Zheng, Shaohui @ 2011-01-22  9:39 UTC (permalink / raw)
  To: xen-devel

Hi, All
	Intel QA conducted a full validation for xen 4.1 rc1, it includes VT-x, VT-d,  SRIOV, RAS, TXT and xl tools testing. 24 issues were exposed. Refer the bug list, please.

	We already assigned 14 bugs to Intel developers (which has an 'Intel' tag in the bug title), most of the rest 10 bugs are related xl command.  For the these bugs, need community's help to fix them. 

Version information:
Change Set: 22764:75b6287626ee
pvops dom0: 862ef97190f6b54d35c76c93fb2b8fadd7ab7d68
ioemu : 1c304816043c0ffe14d20d6006d6165cb7fddb9b

Bug list:
Vt-d ( 7 bugs)
1. ubuntu PAE SMP guest has network problem with NIC assigned (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1709
2. [VT-d] xen panic on function do_IRQ after many times NIC pass-throu (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706
3. [VT-D]run guest with NIC assigned will cause system hang sometimes under PAE on Sandy bridge (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1725
4.[vt-d] dom0 igb driver is too old to support 4-port	(Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1708
5.[VT-d] xen panic when run guest with NIC assigned sometimes (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=24
6.[vt-d] xl command does not response after passthrou IGD card (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1723
7.[vt-d] fail to get IP address after hotplug VF for 300 times (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1722

RAS (1 bug)
1. System hang when running cpu offline (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1654

ACPI (1 bug)
1. System cann't resume after do suspend (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1707

Save/Restore(1 bug)
1. RHEL6 guest fail to do save/restore (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1716

xl command(7 bugs)
1. xl does not check the duplicated configure file and image file (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1711
2. [vt-d] Can not detach the device which was assigned statically (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1717
3. guest shows white screen when boot guest with NIC assigned (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1712
4. memory corruption was reported by "xl" with device pass-throu (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1713
5. [vt-d] fail to passthrou two or more devices to guest (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1710
6. Guest network broken after do SAVE/RESTOR with xl (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1703
7. Too many error information showed when destroy an inexistent guest (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1714

Hypervisor(4 bugs)
1. Only two 1GB-pages be allocated to a 10GBs memory guest (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1721
2. guest with vnif assigned fail to bootup when disable apic (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1692
3. Dom0 crashes on Core2 when dom0_mem is no more than 1972MB (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726
4. Guest does not disappear after poweroff it (Community)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1720

Performance(1 bug)
1. guest boot very slowly without limit dom0 cpu number on EX (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719

X2APIC (1 bug)
1. Fail to bootup sandy bridge under PAE with x2apic enabled (Intel)
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1718

Windows (1 bug)
1. All windows UP guest boot fail
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1704	(Intel)

Thanks & Regards,
Shaohui

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-01-26 14:38 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-22 16:37 Xen 4.1 rc1 test report Zheng, Shaohui
2011-01-22 16:54 ` Mike Viau
2011-01-23  2:34   ` Zheng, Shaohui
2011-01-24 19:00 ` Stefano Stabellini
2011-01-25  3:42   ` Zhang, Yang Z
2011-01-25 11:23     ` Stefano Stabellini
2011-01-25 16:01     ` Stefano Stabellini
2011-01-26  1:01       ` Zheng, Shaohui
2011-01-26  3:54       ` Zheng, Shaohui
2011-01-26  4:53       ` Zheng, Shaohui
2011-01-25 10:10   ` Pasi Kärkkäinen
2011-01-25 15:49   ` Stefano Stabellini
2011-01-26  3:56     ` Zheng, Shaohui
2011-01-25 14:05 ` Wei, Gang
2011-01-25 14:13   ` Keir Fraser
2011-01-25 15:49     ` Wei, Gang
2011-01-26  5:52   ` Wei, Gang
2011-01-26  6:31     ` Wei, Gang
2011-01-26  7:24       ` Keir Fraser
2011-01-26  7:34         ` Wei, Gang
2011-01-26 14:38           ` Wei, Gang
2011-01-26  7:28       ` [PATCH] Fix bug1706 Wei, Gang
     [not found] <Acu6GEBstpnTfIH/TdeQZvf0FjUZ0Q==>
2011-01-22  9:39 ` Xen 4.1 rc1 test report Zheng, Shaohui
2011-01-25  6:24   ` Haitao Shan
2011-01-25  8:00     ` Zheng, Shaohui
2011-01-25  8:43     ` Keir Fraser
2011-01-25 11:43     ` Ian Campbell
2011-01-26  0:47       ` Haitao Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.