All of lore.kernel.org
 help / color / mirror / Atom feed
* IOMMU: improve the FLR logic and move it from hypervisor to Control Panel?
@ 2008-06-19  5:13 Cui, Dexuan
  2008-06-20  3:19 ` IOMMU: improve the FLR logic and move it fromhypervisor " Cui, Dexuan
  0 siblings, 1 reply; 6+ messages in thread
From: Cui, Dexuan @ 2008-06-19  5:13 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

Currently, when creating/destroying hvm guest with assigned devices, we
perform FLR for the devices in hypervisor:
xen/drivers/passthrough/vtd/utils.c: pdev_flr(). 
The logic is:
a) if the device is PCI-e endpoint and it supports FLR, use that;
b) for other cases, we use D3hot/D0 transition for FLR.

There are some issues:

1) looks there are few PCIe devices supporting FLR now. So currently,
almost all the PCIe devices and all PCI devices use the D3hot/D0 method.
However, actually, Dstate transition is not guaranteed to  properly
clear the device state;

2) in case a), the current implementation is actually buggy:
Transaction_Pending_bit==0 doesn't mean the completion of FLR, just
means a way to ensure there is no pending transaction when we're going
to issue FLR (so we can be sure there is no data corruption). 
And according to PCIe spec, after issuing FLR, we should wait at least
100ms, but "mdelay(100)" is not acceptable in Xen...

To resolve the issues, I propose to change the FLR logic to:

1) If the device is PCIe endpoint and supports PCIe FLR, use that;
2) Else, if the device is PCIe endpoint, and all functions on the device
are assigned to the same guest, we use the immediate parent bus's
"Secondary Bus Reset" to reset all functions of the device (here,
actually we require all the functions of the device be assigned to the
same guest);
3) Else, if the device is PCI endpoint and is on a host bus (e.g.
integrated devices), and if the device supports PCI "Advanced
Capabilities", we use that for FLR;
4) Else, if the device is a vendor integrated PCI device with "known"
set of vendor/device id, we use the vendor-defined method of issuing
FLR. For instance, for the VendorID=0x8086, we can use the method
defined in Intel ICH9 Datasheet to perform FLR;
5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices
behind a bridge must be assigned to the same guest).

And I propose to move the FLR logic to Control Panel. 
The benefits are: 
1) It's natural, and makes the hypervisor thin;
2) The 100ms-delay can be implemented easily in Control Panel, but not
easily in hypervisor;
3) Some logic, like the lookup of a device's BDF to its parent's BDF can
be done  more easily in Control Panel.

Comments are appreciated.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
  2008-06-19  5:13 IOMMU: improve the FLR logic and move it from hypervisor to Control Panel? Cui, Dexuan
@ 2008-06-20  3:19 ` Cui, Dexuan
  2008-06-20  4:17   ` Yosuke Iwamatsu
  0 siblings, 1 reply; 6+ messages in thread
From: Cui, Dexuan @ 2008-06-20  3:19 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

Hi, Keir and all
Do you think the improvement to the FLR logic is OK? And moving it to Control Panel?
I'm going to make a patch based on this.

Thanks,
-- Dexuan


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cui, Dexuan
Sent: 2008年6月19日 13:14
To: Keir Fraser; xen-devel@lists.xensource.com
Subject: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?

Currently, when creating/destroying hvm guest with assigned devices, we
perform FLR for the devices in hypervisor:
xen/drivers/passthrough/vtd/utils.c: pdev_flr(). 
The logic is:
a) if the device is PCI-e endpoint and it supports FLR, use that;
b) for other cases, we use D3hot/D0 transition for FLR.

There are some issues:

1) looks there are few PCIe devices supporting FLR now. So currently,
almost all the PCIe devices and all PCI devices use the D3hot/D0 method.
However, actually, Dstate transition is not guaranteed to  properly
clear the device state;

2) in case a), the current implementation is actually buggy:
Transaction_Pending_bit==0 doesn't mean the completion of FLR, just
means a way to ensure there is no pending transaction when we're going
to issue FLR (so we can be sure there is no data corruption). 
And according to PCIe spec, after issuing FLR, we should wait at least
100ms, but "mdelay(100)" is not acceptable in Xen...

To resolve the issues, I propose to change the FLR logic to:

1) If the device is PCIe endpoint and supports PCIe FLR, use that;
2) Else, if the device is PCIe endpoint, and all functions on the device
are assigned to the same guest, we use the immediate parent bus's
"Secondary Bus Reset" to reset all functions of the device (here,
actually we require all the functions of the device be assigned to the
same guest);
3) Else, if the device is PCI endpoint and is on a host bus (e.g.
integrated devices), and if the device supports PCI "Advanced
Capabilities", we use that for FLR;
4) Else, if the device is a vendor integrated PCI device with "known"
set of vendor/device id, we use the vendor-defined method of issuing
FLR. For instance, for the VendorID=0x8086, we can use the method
defined in Intel ICH9 Datasheet to perform FLR;
5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices
behind a bridge must be assigned to the same guest).

And I propose to move the FLR logic to Control Panel. 
The benefits are: 
1) It's natural, and makes the hypervisor thin;
2) The 100ms-delay can be implemented easily in Control Panel, but not
easily in hypervisor;
3) Some logic, like the lookup of a device's BDF to its parent's BDF can
be done  more easily in Control Panel.

Comments are appreciated.

Thanks,
-- Dexuan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
  2008-06-20  3:19 ` IOMMU: improve the FLR logic and move it fromhypervisor " Cui, Dexuan
@ 2008-06-20  4:17   ` Yosuke Iwamatsu
  2008-06-20  4:41     ` Cui, Dexuan
  0 siblings, 1 reply; 6+ messages in thread
From: Yosuke Iwamatsu @ 2008-06-20  4:17 UTC (permalink / raw)
  To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser

Hi,

The term 'Control Panel' is rather unfamiliar to me. Does it mean
qemu-dm for HVM guests?
I think pciback in dom0 kernel would be the right place to do FLR,
because it commonly used as the holder of pass-through pci device
for both PV and HVM guests. The drawback of this is that communication
between pciback and dom0 userspace tools may become complicated. But
in general, it seems good to let dom0 kernel control pci devices.

Regards,
  -- Yosuke

Cui, Dexuan wrote:
> Hi, Keir and all
> Do you think the improvement to the FLR logic is OK? And moving it to Control Panel?
> I'm going to make a patch based on this.
> 
> Thanks,
> -- Dexuan
> 
> 
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cui, Dexuan
> Sent: 2008年6月19日 13:14
> To: Keir Fraser; xen-devel@lists.xensource.com
> Subject: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
> 
> Currently, when creating/destroying hvm guest with assigned devices, we
> perform FLR for the devices in hypervisor:
> xen/drivers/passthrough/vtd/utils.c: pdev_flr(). 
> The logic is:
> a) if the device is PCI-e endpoint and it supports FLR, use that;
> b) for other cases, we use D3hot/D0 transition for FLR.
> 
> There are some issues:
> 
> 1) looks there are few PCIe devices supporting FLR now. So currently,
> almost all the PCIe devices and all PCI devices use the D3hot/D0 method.
> However, actually, Dstate transition is not guaranteed to  properly
> clear the device state;
> 
> 2) in case a), the current implementation is actually buggy:
> Transaction_Pending_bit==0 doesn't mean the completion of FLR, just
> means a way to ensure there is no pending transaction when we're going
> to issue FLR (so we can be sure there is no data corruption). 
> And according to PCIe spec, after issuing FLR, we should wait at least
> 100ms, but "mdelay(100)" is not acceptable in Xen...
> 
> To resolve the issues, I propose to change the FLR logic to:
> 
> 1) If the device is PCIe endpoint and supports PCIe FLR, use that;
> 2) Else, if the device is PCIe endpoint, and all functions on the device
> are assigned to the same guest, we use the immediate parent bus's
> "Secondary Bus Reset" to reset all functions of the device (here,
> actually we require all the functions of the device be assigned to the
> same guest);
> 3) Else, if the device is PCI endpoint and is on a host bus (e.g.
> integrated devices), and if the device supports PCI "Advanced
> Capabilities", we use that for FLR;
> 4) Else, if the device is a vendor integrated PCI device with "known"
> set of vendor/device id, we use the vendor-defined method of issuing
> FLR. For instance, for the VendorID=0x8086, we can use the method
> defined in Intel ICH9 Datasheet to perform FLR;
> 5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices
> behind a bridge must be assigned to the same guest).
> 
> And I propose to move the FLR logic to Control Panel. 
> The benefits are: 
> 1) It's natural, and makes the hypervisor thin;
> 2) The 100ms-delay can be implemented easily in Control Panel, but not
> easily in hypervisor;
> 3) Some logic, like the lookup of a device's BDF to its parent's BDF can
> be done  more easily in Control Panel.
> 
> Comments are appreciated.
> 
> Thanks,
> -- Dexuan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
  2008-06-20  4:17   ` Yosuke Iwamatsu
@ 2008-06-20  4:41     ` Cui, Dexuan
  2008-06-20  5:45       ` Yosuke Iwamatsu
  0 siblings, 1 reply; 6+ messages in thread
From: Cui, Dexuan @ 2008-06-20  4:41 UTC (permalink / raw)
  To: Yosuke Iwamatsu; +Cc: xen-devel, Keir Fraser

Thanks for the comments.
qemu-dm is Device Model. I think Control Panel means Xend/libxc (and other necessary scripts). 
I think pciback of Dom0 may be not the best place. Beside the drawback you mentioned, for the "Secondary Bus Reset", pciback doesn't own the bridge. 
In Control Panel, Python script can access PCI config space easily via the sys filesystem.

Thanks,
-- Dexuan


-----Original Message-----
From: Yosuke Iwamatsu [mailto:y-iwamatsu@ab.jp.nec.com] 
Sent: 2008年6月20日 12:18
To: Cui, Dexuan
Cc: Keir Fraser; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?

Hi,

The term 'Control Panel' is rather unfamiliar to me. Does it mean
qemu-dm for HVM guests?
I think pciback in dom0 kernel would be the right place to do FLR,
because it commonly used as the holder of pass-through pci device
for both PV and HVM guests. The drawback of this is that communication
between pciback and dom0 userspace tools may become complicated. But
in general, it seems good to let dom0 kernel control pci devices.

Regards,
  -- Yosuke

Cui, Dexuan wrote:
> Hi, Keir and all
> Do you think the improvement to the FLR logic is OK? And moving it to Control Panel?
> I'm going to make a patch based on this.
> 
> Thanks,
> -- Dexuan
> 
> 
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cui, Dexuan
> Sent: 2008年6月19日 13:14
> To: Keir Fraser; xen-devel@lists.xensource.com
> Subject: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
> 
> Currently, when creating/destroying hvm guest with assigned devices, we
> perform FLR for the devices in hypervisor:
> xen/drivers/passthrough/vtd/utils.c: pdev_flr(). 
> The logic is:
> a) if the device is PCI-e endpoint and it supports FLR, use that;
> b) for other cases, we use D3hot/D0 transition for FLR.
> 
> There are some issues:
> 
> 1) looks there are few PCIe devices supporting FLR now. So currently,
> almost all the PCIe devices and all PCI devices use the D3hot/D0 method.
> However, actually, Dstate transition is not guaranteed to  properly
> clear the device state;
> 
> 2) in case a), the current implementation is actually buggy:
> Transaction_Pending_bit==0 doesn't mean the completion of FLR, just
> means a way to ensure there is no pending transaction when we're going
> to issue FLR (so we can be sure there is no data corruption). 
> And according to PCIe spec, after issuing FLR, we should wait at least
> 100ms, but "mdelay(100)" is not acceptable in Xen...
> 
> To resolve the issues, I propose to change the FLR logic to:
> 
> 1) If the device is PCIe endpoint and supports PCIe FLR, use that;
> 2) Else, if the device is PCIe endpoint, and all functions on the device
> are assigned to the same guest, we use the immediate parent bus's
> "Secondary Bus Reset" to reset all functions of the device (here,
> actually we require all the functions of the device be assigned to the
> same guest);
> 3) Else, if the device is PCI endpoint and is on a host bus (e.g.
> integrated devices), and if the device supports PCI "Advanced
> Capabilities", we use that for FLR;
> 4) Else, if the device is a vendor integrated PCI device with "known"
> set of vendor/device id, we use the vendor-defined method of issuing
> FLR. For instance, for the VendorID=0x8086, we can use the method
> defined in Intel ICH9 Datasheet to perform FLR;
> 5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices
> behind a bridge must be assigned to the same guest).
> 
> And I propose to move the FLR logic to Control Panel. 
> The benefits are: 
> 1) It's natural, and makes the hypervisor thin;
> 2) The 100ms-delay can be implemented easily in Control Panel, but not
> easily in hypervisor;
> 3) Some logic, like the lookup of a device's BDF to its parent's BDF can
> be done  more easily in Control Panel.
> 
> Comments are appreciated.
> 
> Thanks,
> -- Dexuan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
  2008-06-20  4:41     ` Cui, Dexuan
@ 2008-06-20  5:45       ` Yosuke Iwamatsu
  2008-06-20  9:10         ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: Yosuke Iwamatsu @ 2008-06-20  5:45 UTC (permalink / raw)
  To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser

Cui, Dexuan wrote:
> Thanks for the comments.
> qemu-dm is Device Model. I think Control Panel means Xend/libxc (and other necessary scripts). 
> I think pciback of Dom0 may be not the best place. Beside the drawback you mentioned, for the "Secondary Bus Reset", pciback doesn't own the bridge. 
> In Control Panel, Python script can access PCI config space easily via the sys filesystem.

I vaguely feel uneasy about allowing python scripts to write in PCI
config space directly. Is it a common way?

-- Yosuke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
  2008-06-20  5:45       ` Yosuke Iwamatsu
@ 2008-06-20  9:10         ` Keir Fraser
  0 siblings, 0 replies; 6+ messages in thread
From: Keir Fraser @ 2008-06-20  9:10 UTC (permalink / raw)
  To: Yosuke Iwamatsu, Cui, Dexuan; +Cc: xen-devel

On 20/6/08 06:45, "Yosuke Iwamatsu" <y-iwamatsu@ab.jp.nec.com> wrote:

> Cui, Dexuan wrote:
>> Thanks for the comments.
>> qemu-dm is Device Model. I think Control Panel means Xend/libxc (and other
>> necessary scripts).
>> I think pciback of Dom0 may be not the best place. Beside the drawback you
>> mentioned, for the "Secondary Bus Reset", pciback doesn't own the bridge.
>> In Control Panel, Python script can access PCI config space easily via the
>> sys filesystem.
> 
> I vaguely feel uneasy about allowing python scripts to write in PCI
> config space directly. Is it a common way?

It has the benefit of being easy and flexible and not needing much
re-architecting of the code. pciback would be indeed be a sensible place for
this functionality, and in general a good place for all guest PCI config
space accesses to pass through (whether HVM or PV). It gives us more
consistency in implementation between PV and HVM, and it is a more stable,
stateful and controlled environment than any user-space daemon, let alone a
Python script!

But personally I'm happy doing it in userspace for now at least.

 -- Keir

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-20  9:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-19  5:13 IOMMU: improve the FLR logic and move it from hypervisor to Control Panel? Cui, Dexuan
2008-06-20  3:19 ` IOMMU: improve the FLR logic and move it fromhypervisor " Cui, Dexuan
2008-06-20  4:17   ` Yosuke Iwamatsu
2008-06-20  4:41     ` Cui, Dexuan
2008-06-20  5:45       ` Yosuke Iwamatsu
2008-06-20  9:10         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.