All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Resend: RE: enable_ats_device() call site
       [not found] <4E4E48360200007800051F65@nat28.tlf.novell.com>
@ 2011-08-22 22:01 ` Kay, Allen M
  2011-08-23  8:27   ` Jan Beulich
       [not found] ` <987664A83D2D224EAE907B061CE93D5301EE709B5E@orsmsx505.amr.corp.intel.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Kay, Allen M @ 2011-08-22 22:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen Devel

>And why is it VT-d specific then? The problem to solve is that enabling
>may not happen when it is first attempted, in the case where Xen on its
>own can't be certain that using MMCFG is safe. Hence when the device
>gets reported by Dom0 (or when MMCFG gets enabled for the respective
>bus), another attempt needs to be made at enabling it. De-assigning and
>then re-assigning the device to Dom0 seems to be overkill to me.

The part that is VT-d specific is ATS capability reporting in ACPI DMAR
table for reporting ATS capability in root ports.  I not so clear on what
do you mean by making another attempt when initial ATS enabling attempt fails.
Do you have a patch to address this issue?

Allen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-08-22 22:01 ` Resend: RE: enable_ats_device() call site Kay, Allen M
@ 2011-08-23  8:27   ` Jan Beulich
  2011-09-20 13:56     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-08-23  8:27 UTC (permalink / raw)
  To: Allen M Kay; +Cc: Xen Devel

>>> On 23.08.11 at 00:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>> And why is it VT-d specific then? The problem to solve is that enabling
>>may not happen when it is first attempted, in the case where Xen on its
>>own can't be certain that using MMCFG is safe. Hence when the device
>>gets reported by Dom0 (or when MMCFG gets enabled for the respective
>>bus), another attempt needs to be made at enabling it. De-assigning and
>>then re-assigning the device to Dom0 seems to be overkill to me.
> 
> The part that is VT-d specific is ATS capability reporting in ACPI DMAR
> table for reporting ATS capability in root ports.  I not so clear on what
> do you mean by making another attempt when initial ATS enabling attempt 
> fails.

The initial enabling may fail because of the unavailability of MMCFG
accesses (due to the memory range(s) used not being reserved in E820,
which is not required to be the case by the spec). Once MMCFG gets
enabled, this needs to be re-attempted at some point, and the logical
point appears to be when the device gets re-registered by Dom0.

> Do you have a patch to address this issue?

Only for the (trivial) ACS part so far (see below).

There's also a second aspect here: the bus-to-bridge mappings
currently get established just once, during boot. This is already
hot(un)plug incompatible, but will become even more of a problem
when multi-segment support gets in (I should be sending out patches
soon), for the same MMCFG-initially-unavailable reason as outlined
above (as in that case non-zero segments can't be scanned at boot).

Jan

--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -350,9 +350,10 @@ int pci_add_device(u16 seg, u8 bus, u8 d
         }
 
         list_add(&pdev->domain_list, &dom0->arch.pdev_list);
-        pci_enable_acs(pdev);
     }
 
+    pci_enable_acs(pdev);
+
 out:
     spin_unlock(&pcidevs_lock);
     printk(XENLOG_DEBUG "PCI add %s %04x:%02x:%02x.%u\n", pdev_type,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-08-23  8:27   ` Jan Beulich
@ 2011-09-20 13:56     ` Jan Beulich
  2011-09-20 18:02       ` Kay, Allen M
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-09-20 13:56 UTC (permalink / raw)
  To: Allen M Kay; +Cc: Xen Devel

Ping?

>>> On 23.08.11 at 10:27, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>> On 23.08.11 at 00:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>> And why is it VT-d specific then? The problem to solve is that enabling
>>>may not happen when it is first attempted, in the case where Xen on its
>>>own can't be certain that using MMCFG is safe. Hence when the device
>>>gets reported by Dom0 (or when MMCFG gets enabled for the respective
>>>bus), another attempt needs to be made at enabling it. De-assigning and
>>>then re-assigning the device to Dom0 seems to be overkill to me.
>> 
>> The part that is VT-d specific is ATS capability reporting in ACPI DMAR
>> table for reporting ATS capability in root ports.  I not so clear on what
>> do you mean by making another attempt when initial ATS enabling attempt 
>> fails.
> 
> The initial enabling may fail because of the unavailability of MMCFG
> accesses (due to the memory range(s) used not being reserved in E820,
> which is not required to be the case by the spec). Once MMCFG gets
> enabled, this needs to be re-attempted at some point, and the logical
> point appears to be when the device gets re-registered by Dom0.
> 
>> Do you have a patch to address this issue?
> 
> Only for the (trivial) ACS part so far (see below).
> 
> There's also a second aspect here: the bus-to-bridge mappings
> currently get established just once, during boot. This is already
> hot(un)plug incompatible, but will become even more of a problem
> when multi-segment support gets in (I should be sending out patches
> soon), for the same MMCFG-initially-unavailable reason as outlined
> above (as in that case non-zero segments can't be scanned at boot).
> 
> Jan
> 
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -350,9 +350,10 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>          }
>  
>          list_add(&pdev->domain_list, &dom0->arch.pdev_list);
> -        pci_enable_acs(pdev);
>      }
>  
> +    pci_enable_acs(pdev);
> +
>  out:
>      spin_unlock(&pcidevs_lock);
>      printk(XENLOG_DEBUG "PCI add %s %04x:%02x:%02x.%u\n", pdev_type,
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RE: Resend: RE: enable_ats_device() call site
  2011-09-20 13:56     ` Jan Beulich
@ 2011-09-20 18:02       ` Kay, Allen M
  2011-09-21  6:45         ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Kay, Allen M @ 2011-09-20 18:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen Devel

Jan,

Thanks for the reminder.  This has slip through the cracks.  The patch look good.  With this patch, we shouldn't need the existing call to pci_enable_acs() in setup_dom0_devices(), correct?

I agree that the same problem exists for bus-to-bridge mapping function pci_scan_device().  By the way, we probably should take the opportunity to give it a proper name that reflects to the true purpose of this function.

Allen

-----Original Message-----
From: Jan Beulich [mailto:JBeulich@suse.com] 
Sent: Tuesday, September 20, 2011 6:57 AM
To: Kay, Allen M
Cc: Xen Devel
Subject: [Xen-devel] RE: Resend: RE: enable_ats_device() call site

Ping?

>>> On 23.08.11 at 10:27, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>> On 23.08.11 at 00:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>> And why is it VT-d specific then? The problem to solve is that enabling
>>>may not happen when it is first attempted, in the case where Xen on its
>>>own can't be certain that using MMCFG is safe. Hence when the device
>>>gets reported by Dom0 (or when MMCFG gets enabled for the respective
>>>bus), another attempt needs to be made at enabling it. De-assigning and
>>>then re-assigning the device to Dom0 seems to be overkill to me.
>> 
>> The part that is VT-d specific is ATS capability reporting in ACPI DMAR
>> table for reporting ATS capability in root ports.  I not so clear on what
>> do you mean by making another attempt when initial ATS enabling attempt 
>> fails.
> 
> The initial enabling may fail because of the unavailability of MMCFG
> accesses (due to the memory range(s) used not being reserved in E820,
> which is not required to be the case by the spec). Once MMCFG gets
> enabled, this needs to be re-attempted at some point, and the logical
> point appears to be when the device gets re-registered by Dom0.
> 
>> Do you have a patch to address this issue?
> 
> Only for the (trivial) ACS part so far (see below).
> 
> There's also a second aspect here: the bus-to-bridge mappings
> currently get established just once, during boot. This is already
> hot(un)plug incompatible, but will become even more of a problem
> when multi-segment support gets in (I should be sending out patches
> soon), for the same MMCFG-initially-unavailable reason as outlined
> above (as in that case non-zero segments can't be scanned at boot).
> 
> Jan
> 
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -350,9 +350,10 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>          }
>  
>          list_add(&pdev->domain_list, &dom0->arch.pdev_list);
> -        pci_enable_acs(pdev);
>      }
>  
> +    pci_enable_acs(pdev);
> +
>  out:
>      spin_unlock(&pcidevs_lock);
>      printk(XENLOG_DEBUG "PCI add %s %04x:%02x:%02x.%u\n", pdev_type,
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RE: Resend: RE: enable_ats_device() call site
  2011-09-20 18:02       ` Kay, Allen M
@ 2011-09-21  6:45         ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2011-09-21  6:45 UTC (permalink / raw)
  To: Allen M Kay; +Cc: Xen Devel

>>> On 20.09.11 at 20:02, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
> Thanks for the reminder.  This has slip through the cracks.  The patch look 
> good.  With this patch, we shouldn't need the existing call to 
> pci_enable_acs() in setup_dom0_devices(), correct?

Ah, indeed, I didn't even pay attention to this.

But you didn't say anything about the subject of the mail, which is
what keeps me from pushing the (incomplete) patch.

> I agree that the same problem exists for bus-to-bridge mapping function 
> pci_scan_device().  By the way, we probably should take the opportunity to 
> give it a proper name that reflects to the true purpose of this function.

It's not that far off, if you mean scan_pci_devices() (and
_scan_pci_devices() once the multi-seg patches get committed).
What alternative name were you thinking of?

Also, if the bus2bridge data got maintained dynamically (i.e. out of
pci_{add,remove}_device()), I would think these functions could go
away again (and hence renaming them would become mute).

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
       [not found] ` <987664A83D2D224EAE907B061CE93D5301EE709B5E@orsmsx505.amr.corp.intel.com>
@ 2011-10-07  7:38   ` Jan Beulich
  2011-10-08  2:09     ` Kay, Allen M
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-10-07  7:38 UTC (permalink / raw)
  To: Allen M Kay; +Cc: xen-devel

>>> On 07.10.11 at 04:19, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
> Currently I three places where enable_ats_device() can be eventually called 
> because of it is in domain_context_mapping():
> 
> 1) setup_dom0_devices()
> 2) intel_iommu_add_device()
> 3) reassign_device_ownership()
> 
> Calling it in the first two is probably all we need to cover all the 
> devices.   I don't think we need to call it again in 
> reassign_device_ownership().

That would mean that a device that was found during Xen's initial scan
(i.e. pdev->domain set due to it having gone through
setup_dom0_pci_devices()), but for which enable_ats_device() was
unsuccessful due to mmcfg access still being impossible at that point,
would never get ATS enabled. But that's the whole point of the thread
here. My question isn't whether to *re*move call sites, but whether it
would be possible to move them elsewhere. For which I'd like to
understand why this is being done in the places it is now (not the
least why this is done in VT-d specific code in the first place).

Just like in the suggested change to how pci_enable_acs() gets called,
this should really happen from pci_add_device() *without* regard to
whether pdev->domain was already set.

Also, earlier you suggested to remove the call to pci_enable_acs() from
setup_dom0_device() - I'm not convinced anymore that this is correct,
since old Dom0 kernels (forward ports from the 2.6.18 tree up to pretty
recently) can't be relied upon to report all PCI devices to Xen. Which
also suggests that we shouldn't really remove scan_pci_devices()
(although it may be possible to adjust it back to scan only segment 0).
Otoh, when mmcfg isn't available early, on such Dom0 kernels ATS
wouldn't get enabled today either, even with the adjusted call site of
pci_enable_acs().

Consequently, another alternative would be to retry ATS and ACS
enabling when mmcfg becomes available for a certain bus range, i.e.
out of pci_mmcfg_reserved().

Jan

> By the way, due to the lack of production ATS devices, we have not tried ATS 
> for quite a while.  I'm not sure we should make the change now or should we 
> just make a note of it in reassign_device_ownership() for now.
> 
> Allen
> 
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@novell.com] 
> Sent: Friday, August 19, 2011 2:26 AM
> To: Kay, Allen M
> Subject: Resend: RE: enable_ats_device() call site
> 
> (for some reason the first send to you bounced - please re-add xen-devel as cc 
> if you reply to this one)
> 
>>>> On 18.08.11 at 01:27, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>>  what is the reason for calling this from VT-d's domain_context_mapping()?
>>> I neither undertsand why this is VT-d specific, nor why it needs to be
>>> re-done with each device re-assignment.
>> 
>> The reason is FLR clears the ATS enabled bit so we need to re-enable it for 
>> every re-assignment.  The reason we don't need to do this for ACS might be 
> ACS 
>> reside on the bridge, not in the PCI endpoint.  ATS on the other hand, 
>> resides in PCI endpoints.
> 
> And why is it VT-d specific then? The problem to solve is that enabling
> may not happen when it is first attempted, in the case where Xen on its
> own can't be certain that using MMCFG is safe. Hence when the device
> gets reported by Dom0 (or when MMCFG gets enabled for the respective
> bus), another attempt needs to be made at enabling it. De-assigning and
> then re-assigning the device to Dom0 seems to be overkill to me.
> 
>>> Alternatively - why do we need scan_pci_devices() at all? We're
>>> supposed to be getting the devices reported from Dom0 anyway
>> 
>> Looks like it is use for building bus2bridge[] which is used for figuring 
>> out upstream bridges which are needed when assigning non-PCIe devices.
> 
> Oh, right, I keep forgetting that, especially as that puts under question
> why we have Dom0 report non-extfn, non-virtfn devices at all. And
> perhaps we should issue a warning if Dom0 reports such a device that
> we didn't know about already?
> 
> Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-07  7:38   ` Jan Beulich
@ 2011-10-08  2:09     ` Kay, Allen M
  2011-10-11 12:54       ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Kay, Allen M @ 2011-10-08  2:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

>For which I'd like to understand why this is being done in the places it is now
>(not the least why this is done in VT-d specific code in the first place).

The reason it is call by reassign_device_ownership() is because FLR clears ATS enabling bit on the device - I forgot about it when I wrote the last email so we still need to re-enable ATS on the device for each device assignment.  To summarize:

1) Reason for difference in ATS and ACS handling
    a. ATS capability is in the PCIe endpoint - enabling bit is cleared by device FLR on the passthrough device.
    b. ACS capability is in the PCIe switch - not affected by FLR on the passthrough device.

2) ATS enabling requirement
    a. VT-d engine serving the device has to be ATS capable.
    b. device has to be ATS capable

Allen

-----Original Message-----
From: Jan Beulich [mailto:JBeulich@suse.com] 
Sent: Friday, October 07, 2011 12:39 AM
To: Kay, Allen M
Cc: xen-devel@lists.xensource.com
Subject: RE: Resend: RE: enable_ats_device() call site

>>> On 07.10.11 at 04:19, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
> Currently I three places where enable_ats_device() can be eventually called 
> because of it is in domain_context_mapping():
> 
> 1) setup_dom0_devices()
> 2) intel_iommu_add_device()
> 3) reassign_device_ownership()
> 
> Calling it in the first two is probably all we need to cover all the 
> devices.   I don't think we need to call it again in 
> reassign_device_ownership().

That would mean that a device that was found during Xen's initial scan
(i.e. pdev->domain set due to it having gone through
setup_dom0_pci_devices()), but for which enable_ats_device() was
unsuccessful due to mmcfg access still being impossible at that point,
would never get ATS enabled. But that's the whole point of the thread
here. My question isn't whether to *re*move call sites, but whether it
would be possible to move them elsewhere. For which I'd like to
understand why this is being done in the places it is now (not the
least why this is done in VT-d specific code in the first place).

Just like in the suggested change to how pci_enable_acs() gets called,
this should really happen from pci_add_device() *without* regard to
whether pdev->domain was already set.

Also, earlier you suggested to remove the call to pci_enable_acs() from
setup_dom0_device() - I'm not convinced anymore that this is correct,
since old Dom0 kernels (forward ports from the 2.6.18 tree up to pretty
recently) can't be relied upon to report all PCI devices to Xen. Which
also suggests that we shouldn't really remove scan_pci_devices()
(although it may be possible to adjust it back to scan only segment 0).
Otoh, when mmcfg isn't available early, on such Dom0 kernels ATS
wouldn't get enabled today either, even with the adjusted call site of
pci_enable_acs().

Consequently, another alternative would be to retry ATS and ACS
enabling when mmcfg becomes available for a certain bus range, i.e.
out of pci_mmcfg_reserved().

Jan

> By the way, due to the lack of production ATS devices, we have not tried ATS 
> for quite a while.  I'm not sure we should make the change now or should we 
> just make a note of it in reassign_device_ownership() for now.
> 
> Allen
> 
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@novell.com] 
> Sent: Friday, August 19, 2011 2:26 AM
> To: Kay, Allen M
> Subject: Resend: RE: enable_ats_device() call site
> 
> (for some reason the first send to you bounced - please re-add xen-devel as cc 
> if you reply to this one)
> 
>>>> On 18.08.11 at 01:27, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>>  what is the reason for calling this from VT-d's domain_context_mapping()?
>>> I neither undertsand why this is VT-d specific, nor why it needs to be
>>> re-done with each device re-assignment.
>> 
>> The reason is FLR clears the ATS enabled bit so we need to re-enable it for 
>> every re-assignment.  The reason we don't need to do this for ACS might be 
> ACS 
>> reside on the bridge, not in the PCI endpoint.  ATS on the other hand, 
>> resides in PCI endpoints.
> 
> And why is it VT-d specific then? The problem to solve is that enabling
> may not happen when it is first attempted, in the case where Xen on its
> own can't be certain that using MMCFG is safe. Hence when the device
> gets reported by Dom0 (or when MMCFG gets enabled for the respective
> bus), another attempt needs to be made at enabling it. De-assigning and
> then re-assigning the device to Dom0 seems to be overkill to me.
> 
>>> Alternatively - why do we need scan_pci_devices() at all? We're
>>> supposed to be getting the devices reported from Dom0 anyway
>> 
>> Looks like it is use for building bus2bridge[] which is used for figuring 
>> out upstream bridges which are needed when assigning non-PCIe devices.
> 
> Oh, right, I keep forgetting that, especially as that puts under question
> why we have Dom0 report non-extfn, non-virtfn devices at all. And
> perhaps we should issue a warning if Dom0 reports such a device that
> we didn't know about already?
> 
> Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-08  2:09     ` Kay, Allen M
@ 2011-10-11 12:54       ` Jan Beulich
  2011-10-18 22:46         ` Kay, Allen M
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-10-11 12:54 UTC (permalink / raw)
  To: Allen M Kay; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4890 bytes --]

>>> On 08.10.11 at 04:09, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>> For which I'd like to understand why this is being done in the places it is 
> now
>>(not the least why this is done in VT-d specific code in the first place).
> 
> The reason it is call by reassign_device_ownership() is because FLR clears 
> ATS enabling bit on the device - I forgot about it when I wrote the last email 
> so we still need to re-enable ATS on the device for each device assignment.  
> To summarize:
> 
> 1) Reason for difference in ATS and ACS handling
>     a. ATS capability is in the PCIe endpoint - enabling bit is cleared by 
> device FLR on the passthrough device.
>     b. ACS capability is in the PCIe switch - not affected by FLR on the 
> passthrough device.
> 
> 2) ATS enabling requirement
>     a. VT-d engine serving the device has to be ATS capable.
>     b. device has to be ATS capable

Okay, so how about the below then (with an attached prerequisite
cleanup patch)?

Jan

--- 2011-09-20.orig/xen/drivers/passthrough/iommu.c
+++ 2011-09-20/xen/drivers/passthrough/iommu.c
@@ -150,6 +150,23 @@ int iommu_add_device(struct pci_dev *pde
     return hd->platform_ops->add_device(pdev);
 }
 
+int iommu_enable_device(struct pci_dev *pdev)
+{
+    struct hvm_iommu *hd;
+
+    if ( !pdev->domain )
+        return -EINVAL;
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    hd = domain_hvm_iommu(pdev->domain);
+    if ( !iommu_enabled || !hd->platform_ops ||
+         !hd->platform_ops->enable_device )
+        return 0;
+
+    return hd->platform_ops->enable_device(pdev);
+}
+
 int iommu_remove_device(struct pci_dev *pdev)
 {
     struct hvm_iommu *hd;
--- 2011-09-20.orig/xen/drivers/passthrough/pci.c
+++ 2011-09-20/xen/drivers/passthrough/pci.c
@@ -258,7 +258,7 @@ struct pci_dev *pci_get_pdev_by_domain(
  * pci_enable_acs - enable ACS if hardware support it
  * @dev: the PCI device
  */
-void pci_enable_acs(struct pci_dev *pdev)
+static void pci_enable_acs(struct pci_dev *pdev)
 {
     int pos;
     u16 cap, ctrl, seg = pdev->seg;
@@ -409,8 +409,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
         }
 
         list_add(&pdev->domain_list, &dom0->arch.pdev_list);
-        pci_enable_acs(pdev);
     }
+    else
+        iommu_enable_device(pdev);
+
+    pci_enable_acs(pdev);
 
 out:
     spin_unlock(&pcidevs_lock);
--- 2011-09-20.orig/xen/drivers/passthrough/vtd/iommu.c
+++ 2011-09-20/xen/drivers/passthrough/vtd/iommu.c
@@ -1901,6 +1901,19 @@ static int intel_iommu_add_device(struct
     return ret;
 }
 
+static int intel_iommu_enable_device(struct pci_dev *pdev)
+{
+    struct acpi_drhd_unit *drhd = acpi_find_matched_drhd_unit(pdev);
+    int ret = drhd ? ats_device(pdev, drhd) : -ENODEV;
+
+    if ( ret <= 0 )
+        return ret;
+
+    ret = enable_ats_device(pdev->seg, pdev->bus, pdev->devfn);
+
+    return ret >= 0 ? 0 : ret;
+}
+
 static int intel_iommu_remove_device(struct pci_dev *pdev)
 {
     struct acpi_rmrr_unit *rmrr;
@@ -1931,7 +1944,6 @@ static int intel_iommu_remove_device(str
 static void __init setup_dom0_device(struct pci_dev *pdev)
 {
     domain_context_mapping(pdev->domain, pdev->seg, pdev->bus, pdev->devfn);
-    pci_enable_acs(pdev);
     pci_vtd_quirk(pdev);
 }
 
@@ -2302,6 +2314,7 @@ const struct iommu_ops intel_iommu_ops =
     .init = intel_iommu_domain_init,
     .dom0_init = intel_iommu_dom0_init,
     .add_device = intel_iommu_add_device,
+    .enable_device = intel_iommu_enable_device,
     .remove_device = intel_iommu_remove_device,
     .assign_device  = intel_iommu_assign_device,
     .teardown = iommu_domain_teardown,
--- 2011-09-20.orig/xen/include/xen/iommu.h
+++ 2011-09-20/xen/include/xen/iommu.h
@@ -70,6 +70,7 @@ int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
 int iommu_add_device(struct pci_dev *pdev);
+int iommu_enable_device(struct pci_dev *pdev);
 int iommu_remove_device(struct pci_dev *pdev);
 int iommu_domain_init(struct domain *d);
 void iommu_dom0_init(struct domain *d);
@@ -120,6 +121,7 @@ struct iommu_ops {
     int (*init)(struct domain *d);
     void (*dom0_init)(struct domain *d);
     int (*add_device)(struct pci_dev *pdev);
+    int (*enable_device)(struct pci_dev *pdev);
     int (*remove_device)(struct pci_dev *pdev);
     int (*assign_device)(struct domain *d, u16 seg, u8 bus, u8 devfn);
     void (*teardown)(struct domain *d);
--- 2011-09-20.orig/xen/include/xen/pci.h
+++ 2011-09-20/xen/include/xen/pci.h
@@ -134,6 +134,5 @@ struct pirq;
 int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);
 void msixtbl_pt_unregister(struct domain *, struct pirq *);
 void msixtbl_pt_cleanup(struct domain *d);
-void pci_enable_acs(struct pci_dev *pdev);
 
 #endif /* __XEN_PCI_H__ */


[-- Attachment #2: vtd-ats-cleanup.patch --]
[-- Type: text/plain, Size: 5981 bytes --]

--- 2011-09-20.orig/xen/drivers/passthrough/vtd/dmar.c	2011-09-21 17:55:19.000000000 +0200
+++ 2011-09-20/xen/drivers/passthrough/vtd/dmar.c	2011-10-11 13:35:05.000000000 +0200
@@ -160,7 +160,7 @@ static int __init acpi_register_atsr_uni
     return 0;
 }
 
-struct acpi_drhd_unit * acpi_find_matched_drhd_unit(struct pci_dev *pdev)
+struct acpi_drhd_unit *acpi_find_matched_drhd_unit(const struct pci_dev *pdev)
 {
     u8 bus, devfn;
     struct acpi_drhd_unit *drhd;
@@ -204,19 +204,26 @@ struct acpi_drhd_unit * acpi_find_matche
     return include_all;
 }
 
-struct acpi_atsr_unit * acpi_find_matched_atsr_unit(u16 seg, u8 bus, u8 devfn)
+struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const struct pci_dev *pdev)
 {
     struct acpi_atsr_unit *atsr;
     struct acpi_atsr_unit *all_ports = NULL;
+    u16 bdf = PCI_BDF2(pdev->bus, pdev->devfn);
 
     list_for_each_entry ( atsr, &acpi_atsr_units, list )
     {
-        if ( atsr->segment != seg )
+        unsigned int i;
+
+        if ( atsr->segment != pdev->seg )
             continue;
 
-        if ( test_bit(bus, atsr->scope.buses) )
+        if ( test_bit(pdev->bus, atsr->scope.buses) )
             return atsr;
 
+        for ( i = 0; i < atsr->scope.devices_cnt; ++i )
+            if ( atsr->scope.devices[i] == bdf )
+                return atsr;
+
         if ( atsr->all_ports )
             all_ports = atsr;
     }
--- 2011-09-20.orig/xen/drivers/passthrough/vtd/dmar.h	2011-08-25 15:06:43.000000000 +0200
+++ 2011-09-20/xen/drivers/passthrough/vtd/dmar.h	2011-10-11 13:30:30.000000000 +0200
@@ -86,8 +86,8 @@ struct acpi_rhsa_unit {
         for (idx = 0; (bdf = rmrr->scope.devices[idx]) && \
                  idx < rmrr->scope.devices_cnt; idx++)
 
-struct acpi_drhd_unit * acpi_find_matched_drhd_unit(struct pci_dev *pdev);
-struct acpi_atsr_unit * acpi_find_matched_atsr_unit(u16 seg, u8 bus, u8 devfn);
+struct acpi_drhd_unit *acpi_find_matched_drhd_unit(const struct pci_dev *);
+struct acpi_atsr_unit *acpi_find_matched_atsr_unit(const struct pci_dev *);
 
 #define DMAR_TYPE 1
 #define RMRR_TYPE 2
--- 2011-09-20.orig/xen/drivers/passthrough/vtd/extern.h	2011-04-11 08:33:59.000000000 +0200
+++ 2011-09-20/xen/drivers/passthrough/vtd/extern.h	2011-10-11 13:56:00.000000000 +0200
@@ -61,10 +61,9 @@ extern bool_t ats_enabled;
 
 struct acpi_drhd_unit * find_ats_dev_drhd(struct iommu *iommu);
 
-int ats_device(int seg, int bus, int devfn);
+int ats_device(const struct pci_dev *, const struct acpi_drhd_unit *);
 int enable_ats_device(int seg, int bus, int devfn);
 void disable_ats_device(int seg, int bus, int devfn);
-int invalidate_ats_tcs(struct iommu *iommu);
 
 int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
                          u64 addr, unsigned int size_order, u64 type);
@@ -76,7 +75,8 @@ static inline struct acpi_drhd_unit *fin
     return NULL;
 }
 
-static inline int ats_device(int seg, int bus, int devfn)
+static inline int ats_device(const struct pci_dev *pdev,
+                             const struct acpi_drhd_unit *drhd)
 {
     return 0;
 }
--- 2011-09-20.orig/xen/drivers/passthrough/vtd/iommu.c	2011-09-21 17:56:24.000000000 +0200
+++ 2011-09-20/xen/drivers/passthrough/vtd/iommu.c	2011-10-11 14:03:23.000000000 +0200
@@ -1411,7 +1411,7 @@ static int domain_context_mapping(
                     domain->domain_id, seg, bus,
                     PCI_SLOT(devfn), PCI_FUNC(devfn));
         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn);
-        if ( !ret && ats_device(seg, bus, devfn) )
+        if ( !ret && ats_device(pdev, drhd) > 0 )
             enable_ats_device(seg, bus, devfn);
 
         break;
@@ -1542,7 +1542,7 @@ static int domain_context_unmap(
                     domain->domain_id, seg, bus,
                     PCI_SLOT(devfn), PCI_FUNC(devfn));
         ret = domain_context_unmap_one(domain, iommu, bus, devfn);
-        if ( !ret && ats_device(seg, bus, devfn) )
+        if ( !ret && ats_device(pdev, drhd) > 0 )
             disable_ats_device(seg, bus, devfn);
 
         break;
--- 2011-09-20.orig/xen/drivers/passthrough/vtd/x86/ats.c	2011-08-25 15:13:05.000000000 +0200
+++ 2011-09-20/xen/drivers/passthrough/vtd/x86/ats.c	2011-10-11 14:03:00.000000000 +0200
@@ -83,40 +83,32 @@ struct acpi_drhd_unit * find_ats_dev_drh
     return NULL;
 }
 
-int ats_device(int seg, int bus, int devfn)
+int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
 {
-    struct acpi_drhd_unit *drhd, *ats_drhd, *new_drhd;
-    struct pci_dev *pdev;
-    int pos = 0;
+    struct acpi_drhd_unit *ats_drhd;
+    int pos;
 
     if ( !ats_enabled || !iommu_qinval )
         return 0;
 
-    pdev = pci_get_pdev(seg, bus, devfn);
-    if ( !pdev )
-        return 0;
-
-    drhd = acpi_find_matched_drhd_unit(pdev);
-    if ( !drhd )
-        return 0;
-
     if ( !ecap_queued_inval(drhd->iommu->ecap) ||
          !ecap_dev_iotlb(drhd->iommu->ecap) )
         return 0;
 
-    if ( !acpi_find_matched_atsr_unit(seg, bus, devfn) )
+    if ( !acpi_find_matched_atsr_unit(pdev) )
         return 0;
 
     ats_drhd = find_ats_dev_drhd(drhd->iommu);
-    pos = pci_find_ext_capability(seg, bus, devfn, PCI_EXT_CAP_ID_ATS);
+    pos = pci_find_ext_capability(pdev->seg, pdev->bus, pdev->devfn,
+                                  PCI_EXT_CAP_ID_ATS);
 
     if ( pos && (ats_drhd == NULL) )
     {
-        new_drhd = xmalloc(struct acpi_drhd_unit);
-        if ( !new_drhd )
-            return 0;
-        memcpy(new_drhd, drhd, sizeof(struct acpi_drhd_unit));
-        list_add_tail(&new_drhd->list, &ats_dev_drhd_units);
+        ats_drhd = xmalloc(struct acpi_drhd_unit);
+        if ( !ats_drhd )
+            return -ENOMEM;
+        *ats_drhd = *drhd;
+        list_add_tail(&ats_drhd->list, &ats_dev_drhd_units);
     }
     return pos;
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-11 12:54       ` Jan Beulich
@ 2011-10-18 22:46         ` Kay, Allen M
  2011-10-19  6:47           ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Kay, Allen M @ 2011-10-18 22:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tian, Kevin, xen-devel, Shan, Haitao, Dugger, Donald D

Hi Jan,

Sorry for the late reply, I was trying to close something on another project.  I have the following questions on the patches after reviewing the paches:

1) In acpi_find_matched_atsr_unit(), you added following code.  The original code only tries to match the bus number.  What is the purpose of this new additional code? Does it fix a problem on one of your systems?

+        for ( i = 0; i < atsr->scope.devices_cnt; ++i )
+            if ( atsr->scope.devices[i] == bdf )
+                return atsr;

2)  In pci_add_device() function, the original code calls pci_enable_acs() only if pdev->domain is not set.  The new code calls pci_enable_acs() unconditionally, potentially more than once?  What is the reason for the change?

3) In the same pci_add_device() function, the new code now also calls iommu_enable_device() which currently calls enable_ats_device().  This means the new code will enable ATS as it is being discovered by the platform.  However, I did not see any code that removing enable_ats_device() call in domain_context_mapping().  Is this the intention?  If so, what is the reason?  I see the reason the original code is still needed but I don't see why we need to call enable_ats_device() during platform device discovery since the enabling bit will get cleared by FLR.

Allen


-----Original Message-----
From: Jan Beulich [mailto:JBeulich@suse.com] 
Sent: Tuesday, October 11, 2011 5:54 AM
To: Kay, Allen M
Cc: xen-devel@lists.xensource.com
Subject: RE: Resend: RE: enable_ats_device() call site

>>> On 08.10.11 at 04:09, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>> For which I'd like to understand why this is being done in the places 
>> it is
> now
>>(not the least why this is done in VT-d specific code in the first place).
> 
> The reason it is call by reassign_device_ownership() is because FLR 
> clears ATS enabling bit on the device - I forgot about it when I wrote 
> the last email so we still need to re-enable ATS on the device for each device assignment.
> To summarize:
> 
> 1) Reason for difference in ATS and ACS handling
>     a. ATS capability is in the PCIe endpoint - enabling bit is 
> cleared by device FLR on the passthrough device.
>     b. ACS capability is in the PCIe switch - not affected by FLR on 
> the passthrough device.
> 
> 2) ATS enabling requirement
>     a. VT-d engine serving the device has to be ATS capable.
>     b. device has to be ATS capable

Okay, so how about the below then (with an attached prerequisite cleanup patch)?

Jan

--- 2011-09-20.orig/xen/drivers/passthrough/iommu.c
+++ 2011-09-20/xen/drivers/passthrough/iommu.c
@@ -150,6 +150,23 @@ int iommu_add_device(struct pci_dev *pde
     return hd->platform_ops->add_device(pdev);
 }
 
+int iommu_enable_device(struct pci_dev *pdev) {
+    struct hvm_iommu *hd;
+
+    if ( !pdev->domain )
+        return -EINVAL;
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    hd = domain_hvm_iommu(pdev->domain);
+    if ( !iommu_enabled || !hd->platform_ops ||
+         !hd->platform_ops->enable_device )
+        return 0;
+
+    return hd->platform_ops->enable_device(pdev);
+}
+
 int iommu_remove_device(struct pci_dev *pdev)  {
     struct hvm_iommu *hd;
--- 2011-09-20.orig/xen/drivers/passthrough/pci.c
+++ 2011-09-20/xen/drivers/passthrough/pci.c
@@ -258,7 +258,7 @@ struct pci_dev *pci_get_pdev_by_domain(
  * pci_enable_acs - enable ACS if hardware support it
  * @dev: the PCI device
  */
-void pci_enable_acs(struct pci_dev *pdev)
+static void pci_enable_acs(struct pci_dev *pdev)
 {
     int pos;
     u16 cap, ctrl, seg = pdev->seg;
@@ -409,8 +409,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
         }
 
         list_add(&pdev->domain_list, &dom0->arch.pdev_list);
-        pci_enable_acs(pdev);
     }
+    else
+        iommu_enable_device(pdev);
+
+    pci_enable_acs(pdev);
 
 out:
     spin_unlock(&pcidevs_lock);
--- 2011-09-20.orig/xen/drivers/passthrough/vtd/iommu.c
+++ 2011-09-20/xen/drivers/passthrough/vtd/iommu.c
@@ -1901,6 +1901,19 @@ static int intel_iommu_add_device(struct
     return ret;
 }
 
+static int intel_iommu_enable_device(struct pci_dev *pdev) {
+    struct acpi_drhd_unit *drhd = acpi_find_matched_drhd_unit(pdev);
+    int ret = drhd ? ats_device(pdev, drhd) : -ENODEV;
+
+    if ( ret <= 0 )
+        return ret;
+
+    ret = enable_ats_device(pdev->seg, pdev->bus, pdev->devfn);
+
+    return ret >= 0 ? 0 : ret;
+}
+
 static int intel_iommu_remove_device(struct pci_dev *pdev)  {
     struct acpi_rmrr_unit *rmrr;
@@ -1931,7 +1944,6 @@ static int intel_iommu_remove_device(str  static void __init setup_dom0_device(struct pci_dev *pdev)  {
     domain_context_mapping(pdev->domain, pdev->seg, pdev->bus, pdev->devfn);
-    pci_enable_acs(pdev);
     pci_vtd_quirk(pdev);
 }
 
@@ -2302,6 +2314,7 @@ const struct iommu_ops intel_iommu_ops =
     .init = intel_iommu_domain_init,
     .dom0_init = intel_iommu_dom0_init,
     .add_device = intel_iommu_add_device,
+    .enable_device = intel_iommu_enable_device,
     .remove_device = intel_iommu_remove_device,
     .assign_device  = intel_iommu_assign_device,
     .teardown = iommu_domain_teardown,
--- 2011-09-20.orig/xen/include/xen/iommu.h
+++ 2011-09-20/xen/include/xen/iommu.h
@@ -70,6 +70,7 @@ int iommu_enable_x2apic_IR(void);  void iommu_disable_x2apic_IR(void);
 
 int iommu_add_device(struct pci_dev *pdev);
+int iommu_enable_device(struct pci_dev *pdev);
 int iommu_remove_device(struct pci_dev *pdev);  int iommu_domain_init(struct domain *d);  void iommu_dom0_init(struct domain *d); @@ -120,6 +121,7 @@ struct iommu_ops {
     int (*init)(struct domain *d);
     void (*dom0_init)(struct domain *d);
     int (*add_device)(struct pci_dev *pdev);
+    int (*enable_device)(struct pci_dev *pdev);
     int (*remove_device)(struct pci_dev *pdev);
     int (*assign_device)(struct domain *d, u16 seg, u8 bus, u8 devfn);
     void (*teardown)(struct domain *d);
--- 2011-09-20.orig/xen/include/xen/pci.h
+++ 2011-09-20/xen/include/xen/pci.h
@@ -134,6 +134,5 @@ struct pirq;
 int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);  void msixtbl_pt_unregister(struct domain *, struct pirq *);  void msixtbl_pt_cleanup(struct domain *d); -void pci_enable_acs(struct pci_dev *pdev);
 
 #endif /* __XEN_PCI_H__ */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-18 22:46         ` Kay, Allen M
@ 2011-10-19  6:47           ` Jan Beulich
  2011-10-19 22:20             ` Kay, Allen M
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-10-19  6:47 UTC (permalink / raw)
  To: Allen M Kay; +Cc: Kevin Tian, xen-devel, Haitao Shan, Donald D Dugger

>>> On 19.10.11 at 00:46, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
> Sorry for the late reply, I was trying to close something on another 
> project.  I have the following questions on the patches after reviewing the 
> paches:
> 
> 1) In acpi_find_matched_atsr_unit(), you added following code.  The original 
> code only tries to match the bus number.  What is the purpose of this new 
> additional code? Does it fix a problem on one of your systems?
> 
> +        for ( i = 0; i < atsr->scope.devices_cnt; ++i )
> +            if ( atsr->scope.devices[i] == bdf )
> +                return atsr;

I reckon that the availability of device specifications in the ATSR
data structure must be there for a purpose. If that's not correct,
then I'll certainly remove that code again, but I'd like to
understand what that data is meant to be for in that case.

> 2)  In pci_add_device() function, the original code calls pci_enable_acs() 
> only if pdev->domain is not set.  The new code calls pci_enable_acs() 
> unconditionally, potentially more than once?  What is the reason for the 
> change?

That's the whole purpose of the change, so just to repeat: MMCFG
accesses may not be possible at scan_pci_devices() time for some or
all segments/busses. Hence enabling ATS may simply be impossible
at that point, and must be attempted a second time after Dom0
reported whether using MMCFG is safe.

Since enabling ATS on an already enabled device doesn't do any
harm according to how enable_ats_device() is implemented I
can't see any bad in doing so. If there is, then we're back to
square one where I was asking you how to properly do ATS
enabling given the described MMCFG restriction.

> 3) In the same pci_add_device() function, the new code now also calls 
> iommu_enable_device() which currently calls enable_ats_device().  This means 
> the new code will enable ATS as it is being discovered by the platform.  
> However, I did not see any code that removing enable_ats_device() call in 
> domain_context_mapping().  Is this the intention?  If so, what is the reason? 

You were telling me that this needs to be re-done after FLR, and hence
has to remain there.

>  I see the reason the original code is still needed but I don't see why we 
> need to call enable_ats_device() during platform device discovery since the 
> enabling bit will get cleared by FLR.

Either we don't need to call it at all during discovery (which I doubt,
since when the device is in use by Dom0, I suppose having ATS
enabled is still desirable or even required), or we have to potentially
do it twice (remember that older Dom0 kernels may fail to report all
PCI devices to the hypervisor).

Jan

> Allen
> 
> 
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com] 
> Sent: Tuesday, October 11, 2011 5:54 AM
> To: Kay, Allen M
> Cc: xen-devel@lists.xensource.com 
> Subject: RE: Resend: RE: enable_ats_device() call site
> 
>>>> On 08.10.11 at 04:09, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>> For which I'd like to understand why this is being done in the places 
>>> it is
>> now
>>>(not the least why this is done in VT-d specific code in the first place).
>> 
>> The reason it is call by reassign_device_ownership() is because FLR 
>> clears ATS enabling bit on the device - I forgot about it when I wrote 
>> the last email so we still need to re-enable ATS on the device for each 
> device assignment.
>> To summarize:
>> 
>> 1) Reason for difference in ATS and ACS handling
>>     a. ATS capability is in the PCIe endpoint - enabling bit is 
>> cleared by device FLR on the passthrough device.
>>     b. ACS capability is in the PCIe switch - not affected by FLR on 
>> the passthrough device.
>> 
>> 2) ATS enabling requirement
>>     a. VT-d engine serving the device has to be ATS capable.
>>     b. device has to be ATS capable
> 
> Okay, so how about the below then (with an attached prerequisite cleanup 
> patch)?
> 
> Jan
> 
> --- 2011-09-20.orig/xen/drivers/passthrough/iommu.c
> +++ 2011-09-20/xen/drivers/passthrough/iommu.c
> @@ -150,6 +150,23 @@ int iommu_add_device(struct pci_dev *pde
>      return hd->platform_ops->add_device(pdev);
>  }
>  
> +int iommu_enable_device(struct pci_dev *pdev) {
> +    struct hvm_iommu *hd;
> +
> +    if ( !pdev->domain )
> +        return -EINVAL;
> +
> +    ASSERT(spin_is_locked(&pcidevs_lock));
> +
> +    hd = domain_hvm_iommu(pdev->domain);
> +    if ( !iommu_enabled || !hd->platform_ops ||
> +         !hd->platform_ops->enable_device )
> +        return 0;
> +
> +    return hd->platform_ops->enable_device(pdev);
> +}
> +
>  int iommu_remove_device(struct pci_dev *pdev)  {
>      struct hvm_iommu *hd;
> --- 2011-09-20.orig/xen/drivers/passthrough/pci.c
> +++ 2011-09-20/xen/drivers/passthrough/pci.c
> @@ -258,7 +258,7 @@ struct pci_dev *pci_get_pdev_by_domain(
>   * pci_enable_acs - enable ACS if hardware support it
>   * @dev: the PCI device
>   */
> -void pci_enable_acs(struct pci_dev *pdev)
> +static void pci_enable_acs(struct pci_dev *pdev)
>  {
>      int pos;
>      u16 cap, ctrl, seg = pdev->seg;
> @@ -409,8 +409,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>          }
>  
>          list_add(&pdev->domain_list, &dom0->arch.pdev_list);
> -        pci_enable_acs(pdev);
>      }
> +    else
> +        iommu_enable_device(pdev);
> +
> +    pci_enable_acs(pdev);
>  
>  out:
>      spin_unlock(&pcidevs_lock);
> --- 2011-09-20.orig/xen/drivers/passthrough/vtd/iommu.c
> +++ 2011-09-20/xen/drivers/passthrough/vtd/iommu.c
> @@ -1901,6 +1901,19 @@ static int intel_iommu_add_device(struct
>      return ret;
>  }
>  
> +static int intel_iommu_enable_device(struct pci_dev *pdev) {
> +    struct acpi_drhd_unit *drhd = acpi_find_matched_drhd_unit(pdev);
> +    int ret = drhd ? ats_device(pdev, drhd) : -ENODEV;
> +
> +    if ( ret <= 0 )
> +        return ret;
> +
> +    ret = enable_ats_device(pdev->seg, pdev->bus, pdev->devfn);
> +
> +    return ret >= 0 ? 0 : ret;
> +}
> +
>  static int intel_iommu_remove_device(struct pci_dev *pdev)  {
>      struct acpi_rmrr_unit *rmrr;
> @@ -1931,7 +1944,6 @@ static int intel_iommu_remove_device(str  static void 
> __init setup_dom0_device(struct pci_dev *pdev)  {
>      domain_context_mapping(pdev->domain, pdev->seg, pdev->bus, pdev->devfn);
> -    pci_enable_acs(pdev);
>      pci_vtd_quirk(pdev);
>  }
>  
> @@ -2302,6 +2314,7 @@ const struct iommu_ops intel_iommu_ops =
>      .init = intel_iommu_domain_init,
>      .dom0_init = intel_iommu_dom0_init,
>      .add_device = intel_iommu_add_device,
> +    .enable_device = intel_iommu_enable_device,
>      .remove_device = intel_iommu_remove_device,
>      .assign_device  = intel_iommu_assign_device,
>      .teardown = iommu_domain_teardown,
> --- 2011-09-20.orig/xen/include/xen/iommu.h
> +++ 2011-09-20/xen/include/xen/iommu.h
> @@ -70,6 +70,7 @@ int iommu_enable_x2apic_IR(void);  void 
> iommu_disable_x2apic_IR(void);
>  
>  int iommu_add_device(struct pci_dev *pdev);
> +int iommu_enable_device(struct pci_dev *pdev);
>  int iommu_remove_device(struct pci_dev *pdev);  int 
> iommu_domain_init(struct domain *d);  void iommu_dom0_init(struct domain *d); 
> @@ -120,6 +121,7 @@ struct iommu_ops {
>      int (*init)(struct domain *d);
>      void (*dom0_init)(struct domain *d);
>      int (*add_device)(struct pci_dev *pdev);
> +    int (*enable_device)(struct pci_dev *pdev);
>      int (*remove_device)(struct pci_dev *pdev);
>      int (*assign_device)(struct domain *d, u16 seg, u8 bus, u8 devfn);
>      void (*teardown)(struct domain *d);
> --- 2011-09-20.orig/xen/include/xen/pci.h
> +++ 2011-09-20/xen/include/xen/pci.h
> @@ -134,6 +134,5 @@ struct pirq;
>  int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);  
> void msixtbl_pt_unregister(struct domain *, struct pirq *);  void 
> msixtbl_pt_cleanup(struct domain *d); -void pci_enable_acs(struct pci_dev 
> *pdev);
>  
>  #endif /* __XEN_PCI_H__ */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-19  6:47           ` Jan Beulich
@ 2011-10-19 22:20             ` Kay, Allen M
  2011-10-20  7:24               ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Kay, Allen M @ 2011-10-19 22:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tian, Kevin, xen-devel, Shan, Haitao, Dugger, Donald D

> I reckon that the availability of device specifications in the ATSR data structure must be there for a purpose.
> If that's not correct, then I'll certainly remove that code again, but I'd like to understand what that data is meant
> to be for in that case.

The atsr leverages the same PCI device scope is used for drhd and rmrr so device and function comes along with bus number.  As far as I can tell, we only  need to check the bus number for atsr.

> Since enabling ATS on an already enabled device doesn't do any harm according to how enable_ats_device() is
> implemented I can't see any bad in doing so. If there is, then we're back to square one where I was asking you how
> to properly do ATS enabling given the described MMCFG restriction.

Yes, there should be no harm enabling ACS or ATS multiple times.  It would be good tested out to make sure though.

> Either we don't need to call it at all during discovery (which I doubt, since when the device is in use by Dom0, I
> suppose having ATS enabled is still desirable or even required), or we have to potentially do it twice (remember
> that older Dom0 kernels may fail to report all PCI devices to the hypervisor).

I see, calling enable_ats_device() in pci_add_device() will also solve the case where MMCFG might not work until after dom0 is initialized.

As I mentioned before, our QA team doesn't test ATS and ACS regularly.  It would good if you can coordinate with our QA team to test out these changes to make sure they don't break any ACS and ATS functionality.

Allen


-----Original Message-----
From: Jan Beulich [mailto:JBeulich@suse.com] 
Sent: Tuesday, October 18, 2011 11:47 PM
To: Kay, Allen M
Cc: Dugger, Donald D; Shan, Haitao; Tian, Kevin; xen-devel@lists.xensource.com
Subject: RE: Resend: RE: enable_ats_device() call site

>>> On 19.10.11 at 00:46, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
> Sorry for the late reply, I was trying to close something on another 
> project.  I have the following questions on the patches after 
> reviewing the
> paches:
> 
> 1) In acpi_find_matched_atsr_unit(), you added following code.  The 
> original code only tries to match the bus number.  What is the purpose 
> of this new additional code? Does it fix a problem on one of your systems?
> 
> +        for ( i = 0; i < atsr->scope.devices_cnt; ++i )
> +            if ( atsr->scope.devices[i] == bdf )
> +                return atsr;

I reckon that the availability of device specifications in the ATSR data structure must be there for a purpose. If that's not correct, then I'll certainly remove that code again, but I'd like to understand what that data is meant to be for in that case.

> 2)  In pci_add_device() function, the original code calls 
> pci_enable_acs() only if pdev->domain is not set.  The new code calls 
> pci_enable_acs() unconditionally, potentially more than once?  What is 
> the reason for the change?

That's the whole purpose of the change, so just to repeat: MMCFG accesses may not be possible at scan_pci_devices() time for some or all segments/busses. Hence enabling ATS may simply be impossible at that point, and must be attempted a second time after Dom0 reported whether using MMCFG is safe.

Since enabling ATS on an already enabled device doesn't do any harm according to how enable_ats_device() is implemented I can't see any bad in doing so. If there is, then we're back to square one where I was asking you how to properly do ATS enabling given the described MMCFG restriction.

> 3) In the same pci_add_device() function, the new code now also calls
> iommu_enable_device() which currently calls enable_ats_device().  This 
> means the new code will enable ATS as it is being discovered by the platform.
> However, I did not see any code that removing enable_ats_device() call 
> in domain_context_mapping().  Is this the intention?  If so, what is the reason?

You were telling me that this needs to be re-done after FLR, and hence has to remain there.

>  I see the reason the original code is still needed but I don't see 
> why we need to call enable_ats_device() during platform device 
> discovery since the enabling bit will get cleared by FLR.

Either we don't need to call it at all during discovery (which I doubt, since when the device is in use by Dom0, I suppose having ATS enabled is still desirable or even required), or we have to potentially do it twice (remember that older Dom0 kernels may fail to report all PCI devices to the hypervisor).

Jan

> Allen
> 
> 
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, October 11, 2011 5:54 AM
> To: Kay, Allen M
> Cc: xen-devel@lists.xensource.com
> Subject: RE: Resend: RE: enable_ats_device() call site
> 
>>>> On 08.10.11 at 04:09, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>> For which I'd like to understand why this is being done in the 
>>> places it is
>> now
>>>(not the least why this is done in VT-d specific code in the first place).
>> 
>> The reason it is call by reassign_device_ownership() is because FLR 
>> clears ATS enabling bit on the device - I forgot about it when I 
>> wrote the last email so we still need to re-enable ATS on the device 
>> for each
> device assignment.
>> To summarize:
>> 
>> 1) Reason for difference in ATS and ACS handling
>>     a. ATS capability is in the PCIe endpoint - enabling bit is 
>> cleared by device FLR on the passthrough device.
>>     b. ACS capability is in the PCIe switch - not affected by FLR on 
>> the passthrough device.
>> 
>> 2) ATS enabling requirement
>>     a. VT-d engine serving the device has to be ATS capable.
>>     b. device has to be ATS capable
> 
> Okay, so how about the below then (with an attached prerequisite 
> cleanup patch)?
> 
> Jan
> 
> --- 2011-09-20.orig/xen/drivers/passthrough/iommu.c
> +++ 2011-09-20/xen/drivers/passthrough/iommu.c
> @@ -150,6 +150,23 @@ int iommu_add_device(struct pci_dev *pde
>      return hd->platform_ops->add_device(pdev);
>  }
>  
> +int iommu_enable_device(struct pci_dev *pdev) {
> +    struct hvm_iommu *hd;
> +
> +    if ( !pdev->domain )
> +        return -EINVAL;
> +
> +    ASSERT(spin_is_locked(&pcidevs_lock));
> +
> +    hd = domain_hvm_iommu(pdev->domain);
> +    if ( !iommu_enabled || !hd->platform_ops ||
> +         !hd->platform_ops->enable_device )
> +        return 0;
> +
> +    return hd->platform_ops->enable_device(pdev);
> +}
> +
>  int iommu_remove_device(struct pci_dev *pdev)  {
>      struct hvm_iommu *hd;
> --- 2011-09-20.orig/xen/drivers/passthrough/pci.c
> +++ 2011-09-20/xen/drivers/passthrough/pci.c
> @@ -258,7 +258,7 @@ struct pci_dev *pci_get_pdev_by_domain(
>   * pci_enable_acs - enable ACS if hardware support it
>   * @dev: the PCI device
>   */
> -void pci_enable_acs(struct pci_dev *pdev)
> +static void pci_enable_acs(struct pci_dev *pdev)
>  {
>      int pos;
>      u16 cap, ctrl, seg = pdev->seg;
> @@ -409,8 +409,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>          }
>  
>          list_add(&pdev->domain_list, &dom0->arch.pdev_list);
> -        pci_enable_acs(pdev);
>      }
> +    else
> +        iommu_enable_device(pdev);
> +
> +    pci_enable_acs(pdev);
>  
>  out:
>      spin_unlock(&pcidevs_lock);
> --- 2011-09-20.orig/xen/drivers/passthrough/vtd/iommu.c
> +++ 2011-09-20/xen/drivers/passthrough/vtd/iommu.c
> @@ -1901,6 +1901,19 @@ static int intel_iommu_add_device(struct
>      return ret;
>  }
>  
> +static int intel_iommu_enable_device(struct pci_dev *pdev) {
> +    struct acpi_drhd_unit *drhd = acpi_find_matched_drhd_unit(pdev);
> +    int ret = drhd ? ats_device(pdev, drhd) : -ENODEV;
> +
> +    if ( ret <= 0 )
> +        return ret;
> +
> +    ret = enable_ats_device(pdev->seg, pdev->bus, pdev->devfn);
> +
> +    return ret >= 0 ? 0 : ret;
> +}
> +
>  static int intel_iommu_remove_device(struct pci_dev *pdev)  {
>      struct acpi_rmrr_unit *rmrr;
> @@ -1931,7 +1944,6 @@ static int intel_iommu_remove_device(str  static 
> void __init setup_dom0_device(struct pci_dev *pdev)  {
>      domain_context_mapping(pdev->domain, pdev->seg, pdev->bus, pdev->devfn);
> -    pci_enable_acs(pdev);
>      pci_vtd_quirk(pdev);
>  }
>  
> @@ -2302,6 +2314,7 @@ const struct iommu_ops intel_iommu_ops =
>      .init = intel_iommu_domain_init,
>      .dom0_init = intel_iommu_dom0_init,
>      .add_device = intel_iommu_add_device,
> +    .enable_device = intel_iommu_enable_device,
>      .remove_device = intel_iommu_remove_device,
>      .assign_device  = intel_iommu_assign_device,
>      .teardown = iommu_domain_teardown,
> --- 2011-09-20.orig/xen/include/xen/iommu.h
> +++ 2011-09-20/xen/include/xen/iommu.h
> @@ -70,6 +70,7 @@ int iommu_enable_x2apic_IR(void);  void 
> iommu_disable_x2apic_IR(void);
>  
>  int iommu_add_device(struct pci_dev *pdev);
> +int iommu_enable_device(struct pci_dev *pdev);
>  int iommu_remove_device(struct pci_dev *pdev);  int 
> iommu_domain_init(struct domain *d);  void iommu_dom0_init(struct 
> domain *d); @@ -120,6 +121,7 @@ struct iommu_ops {
>      int (*init)(struct domain *d);
>      void (*dom0_init)(struct domain *d);
>      int (*add_device)(struct pci_dev *pdev);
> +    int (*enable_device)(struct pci_dev *pdev);
>      int (*remove_device)(struct pci_dev *pdev);
>      int (*assign_device)(struct domain *d, u16 seg, u8 bus, u8 devfn);
>      void (*teardown)(struct domain *d);
> --- 2011-09-20.orig/xen/include/xen/pci.h
> +++ 2011-09-20/xen/include/xen/pci.h
> @@ -134,6 +134,5 @@ struct pirq;
>  int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t 
> gtable); void msixtbl_pt_unregister(struct domain *, struct pirq *);  
> void msixtbl_pt_cleanup(struct domain *d); -void pci_enable_acs(struct 
> pci_dev *pdev);
>  
>  #endif /* __XEN_PCI_H__ */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-19 22:20             ` Kay, Allen M
@ 2011-10-20  7:24               ` Jan Beulich
  2011-10-21  1:59                 ` Kay, Allen M
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-10-20  7:24 UTC (permalink / raw)
  To: Allen M Kay; +Cc: Kevin Tian, xen-devel, Haitao Shan, Donald D Dugger

>>> On 20.10.11 at 00:20, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>  I reckon that the availability of device specifications in the ATSR data 
> structure must be there for a purpose.
>> If that's not correct, then I'll certainly remove that code again, but I'd 
> like to understand what that data is meant
>> to be for in that case.
> 
> The atsr leverages the same PCI device scope is used for drhd and rmrr so 
> device and function comes along with bus number.  As far as I can tell, we 
> only  need to check the bus number for atsr.

So why does the capability to list individual devices then exist? And
why does it matter for DRHDs, but not for ATSRs?

>> Either we don't need to call it at all during discovery (which I doubt, 
> since when the device is in use by Dom0, I
>> suppose having ATS enabled is still desirable or even required), or we have 
> to potentially do it twice (remember
>> that older Dom0 kernels may fail to report all PCI devices to the 
> hypervisor).
> 
> I see, calling enable_ats_device() in pci_add_device() will also solve the 
> case where MMCFG might not work until after dom0 is initialized.
> 
> As I mentioned before, our QA team doesn't test ATS and ACS regularly.  It 
> would good if you can coordinate with our QA team to test out these changes 
> to make sure they don't break any ACS and ATS functionality.

How would I do that other than by getting the stuff committed and
wait for their bi-weekly(?) testing round?

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-20  7:24               ` Jan Beulich
@ 2011-10-21  1:59                 ` Kay, Allen M
  2011-10-21  7:52                   ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Kay, Allen M @ 2011-10-21  1:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, xen-devel, Shan, Haitao, Dugger, Donald D, Li, Susie

> So why does the capability to list individual devices then exist? And why does it matter for DRHDs, but not for ATSRs?

The difference is ATSR only is meant to communicate PCIe root ports' ATS capability.  If the root port is capable, then downstream endpoints can enable ATS device translation cache.

acpi_find_matched_drhd_unit() is used to find out which VT-d hardware is servicing the endpoint device.  Given drhd lists either the actually PCI endpoints or include_all, we have to match the actual BDF of the device passed in with devices we recorded for that VT-d HW.

acpi_find_matched_astr_unit() is used to find out if the endpoint device is under a ATS capable PCIe root port or not.  ASTR information is remembered as secondary and subsidiary bus ranges.  All we have to do is the match the device's bus number with a root ports bus range.  Matching the actual device in this case, will only match the root port device itself, this is what we recorded in acpi_parse_dev_scope(), which should not happen since we don't  assign a root port to a guest.  Even if we do, checking for ATS capability is meaningless since root port will not have device translation cache.

Allen

-----Original Message-----
From: Jan Beulich [mailto:JBeulich@suse.com] 
Sent: Thursday, October 20, 2011 12:24 AM
To: Kay, Allen M
Cc: Dugger, Donald D; Shan, Haitao; Tian, Kevin; xen-devel@lists.xensource.com
Subject: RE: Resend: RE: enable_ats_device() call site

>>> On 20.10.11 at 00:20, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>  I reckon that the availability of device specifications in the ATSR 
>> data
> structure must be there for a purpose.
>> If that's not correct, then I'll certainly remove that code again, 
>> but I'd
> like to understand what that data is meant
>> to be for in that case.
> 
> The atsr leverages the same PCI device scope is used for drhd and rmrr 
> so device and function comes along with bus number.  As far as I can 
> tell, we only  need to check the bus number for atsr.

So why does the capability to list individual devices then exist? And why does it matter for DRHDs, but not for ATSRs?

>> Either we don't need to call it at all during discovery (which I 
>> doubt,
> since when the device is in use by Dom0, I
>> suppose having ATS enabled is still desirable or even required), or 
>> we have
> to potentially do it twice (remember
>> that older Dom0 kernels may fail to report all PCI devices to the
> hypervisor).
> 
> I see, calling enable_ats_device() in pci_add_device() will also solve 
> the case where MMCFG might not work until after dom0 is initialized.
> 
> As I mentioned before, our QA team doesn't test ATS and ACS regularly.  
> It would good if you can coordinate with our QA team to test out these 
> changes to make sure they don't break any ACS and ATS functionality.

How would I do that other than by getting the stuff committed and wait for their bi-weekly(?) testing round?

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Resend: RE: enable_ats_device() call site
  2011-10-21  1:59                 ` Kay, Allen M
@ 2011-10-21  7:52                   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2011-10-21  7:52 UTC (permalink / raw)
  To: Allen M Kay; +Cc: Kevin Tian, Susie Li, Haitao Shan, xen-devel, Donald D Dugger

>>> On 21.10.11 at 03:59, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>>  So why does the capability to list individual devices then exist? And why 
> does it matter for DRHDs, but not for ATSRs?
> 
> The difference is ATSR only is meant to communicate PCIe root ports' ATS 
> capability.  If the root port is capable, then downstream endpoints can 
> enable ATS device translation cache.
> 
> acpi_find_matched_drhd_unit() is used to find out which VT-d hardware is 
> servicing the endpoint device.  Given drhd lists either the actually PCI 
> endpoints or include_all, we have to match the actual BDF of the device 
> passed in with devices we recorded for that VT-d HW.
> 
> acpi_find_matched_astr_unit() is used to find out if the endpoint device is 
> under a ATS capable PCIe root port or not.  ASTR information is remembered as 
> secondary and subsidiary bus ranges.  All we have to do is the match the 
> device's bus number with a root ports bus range.  Matching the actual device 
> in this case, will only match the root port device itself, this is what we 
> recorded in acpi_parse_dev_scope(), which should not happen since we don't  
> assign a root port to a guest.  Even if we do, checking for ATS capability is 
> meaningless since root port will not have device translation cache.

Okay, so I'll remove that part then and re-submit both patches.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-10-21  7:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4E4E48360200007800051F65@nat28.tlf.novell.com>
2011-08-22 22:01 ` Resend: RE: enable_ats_device() call site Kay, Allen M
2011-08-23  8:27   ` Jan Beulich
2011-09-20 13:56     ` Jan Beulich
2011-09-20 18:02       ` Kay, Allen M
2011-09-21  6:45         ` Jan Beulich
     [not found] ` <987664A83D2D224EAE907B061CE93D5301EE709B5E@orsmsx505.amr.corp.intel.com>
2011-10-07  7:38   ` Jan Beulich
2011-10-08  2:09     ` Kay, Allen M
2011-10-11 12:54       ` Jan Beulich
2011-10-18 22:46         ` Kay, Allen M
2011-10-19  6:47           ` Jan Beulich
2011-10-19 22:20             ` Kay, Allen M
2011-10-20  7:24               ` Jan Beulich
2011-10-21  1:59                 ` Kay, Allen M
2011-10-21  7:52                   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.