All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/2] pcie: Add simple ACS "support" to the generic PCIe root port
@ 2019-02-10  6:52 Knut Omang
  2019-02-10  6:52 ` [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function Knut Omang
  2019-02-10  6:53 ` [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability Knut Omang
  0 siblings, 2 replies; 13+ messages in thread
From: Knut Omang @ 2019-02-10  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly, Knut Omang

These two patches together implements a PCIe capability
config space header for Access Control Services (ACS) for the
new Qemu specific generic root port. ACS support in the
associated root port is a prerequisite to be able to pass
individual functions of a device populating the port through to
an L2 guest from an unmodified kernel.

Without this, the IOMMU group the device belongs to will also
include the root port itself, and all functions the device provides.

It is necessary to support SR/IOV where the primary
purpose is to be able to share out individual VFs to different
guests, which will not be permitted by VFIO or the Windows Hyper-V equivalent
unless ACS is supported by the root port.

These patches can also be found as part of an updated version of
my SR/IOV emulation patch set at

  https://github.com/knuto/qemu/tree/sriov_patches_v11

The patches' basic operation with VFIO and iommu groups have
been tested with the above patch set and a rebased version of an
in progress igb ethernet device, which needs some more care
before I can let it go out.

Changes from v2:
----------------
- rebased to the latest qemu master

Incorporated further feedback from Alex:
- Make sure slot/downstream capability bits are only set for slots.
- Make acs reset callback do nothing if no acs capability exists
- Set correct acs version
- div simplification

Changes from v1:
----------------
Incorporated feedback from Alex Williamson:
- Make commit messages reflect a more correct understanding of how this
  affects VFIO operation.
- Implemented the CTRL register properly (reset callback + making non-implemented
  capabilities RO, default value 0)
- removed the egress ctrl vector parameter to the init function
- Fixed some whitespace issues

Knut Omang (2):
  pcie: Add a simple PCIe ACS (Access Control Services) helper function
  gen_pcie_root_port: Add ACS (Access Control Services) capability

 hw/pci-bridge/gen_pcie_root_port.c |  4 ++++
 hw/pci-bridge/pcie_root_port.c     |  4 ++++
 hw/pci/pcie.c                      | 29 +++++++++++++++++++++++++++++
 include/hw/pci/pcie.h              |  6 ++++++
 include/hw/pci/pcie_port.h         |  1 +
 include/hw/pci/pcie_regs.h         |  4 ++++
 6 files changed, 48 insertions(+)

base-commit: e47f81b617684c4546af286d307b69014a83538a
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-10  6:52 [Qemu-devel] [PATCH v3 0/2] pcie: Add simple ACS "support" to the generic PCIe root port Knut Omang
@ 2019-02-10  6:52 ` Knut Omang
  2019-02-11  8:05   ` Marcel Apfelbaum
  2019-02-11 23:09   ` Alex Williamson
  2019-02-10  6:53 ` [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability Knut Omang
  1 sibling, 2 replies; 13+ messages in thread
From: Knut Omang @ 2019-02-10  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly, Knut Omang

Add a helper function to add PCIe capability for Access Control Services (ACS)
ACS support in the associated root port is a prerequisite to be able to do
passthrough of individual functions of a device with VFIO
without Alex Williamson's pcie_acs_override kernel patch or similar
in the guest.

Endpoints may also implement an ACS capability, but with
limited features.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
 include/hw/pci/pcie.h      |  6 ++++++
 include/hw/pci/pcie_regs.h |  4 ++++
 3 files changed, 39 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 230478f..509632f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
         PCI_EXP_DEVCTL2_ARI;
 }
 
+/* Access Control Services (ACS) */
+void pcie_cap_acs_reset(PCIDevice *dev)
+{
+    if (dev->exp.acs_cap) {
+        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
+    }
+}
+
 /**************************************************************************
  * pci express extended capability list management functions
  * uint16_t ext_cap_id (16 bit)
@@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
 
     pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
 }
+
+/* ACS (Access Control Services) */
+void pcie_acs_init(PCIDevice *dev, uint16_t offset)
+{
+    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev), TYPE_PCIE_SLOT);
+    uint16_t cap_bits = 0;
+
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
+                        PCI_ACS_SIZEOF);
+    dev->exp.acs_cap = offset;
+
+    if (is_pcie_slot) {
+        /* Endpoints may also implement ACS, but these capabilities are */
+        /* only valid for slots: */
+        cap_bits =
+            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF;
+    }
+
+    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
+    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 5b82a0d..4c40711 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -79,6 +79,9 @@ struct PCIExpressDevice {
 
     /* Offset of ATS capability in config space */
     uint16_t ats_cap;
+
+    /* ACS */
+    uint16_t acs_cap;
 };
 
 #define COMPAT_PROP_PCP "power_controller_present"
@@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
 void pcie_cap_flr_write_config(PCIDevice *dev,
                            uint32_t addr, uint32_t val, int len);
 
+void pcie_cap_acs_reset(PCIDevice *dev);
+
 /* ARI forwarding capability and control */
 void pcie_cap_arifwd_init(PCIDevice *dev);
 void pcie_cap_arifwd_reset(PCIDevice *dev);
@@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
 void pcie_sync_bridge_lnk(PCIDevice *dev);
 
 void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+void pcie_acs_init(PCIDevice *dev, uint16_t offset);
 void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
 void pcie_ats_init(PCIDevice *dev, uint16_t offset);
 
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index ad4e780..1db86b0 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
                                          PCI_ERR_COR_INTERNAL |         \
                                          PCI_ERR_COR_HL_OVERFLOW)
 
+/* ACS */
+#define PCI_ACS_VER                     0x1
+#define PCI_ACS_SIZEOF                  8
+
 #endif /* QEMU_PCIE_REGS_H */
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability
  2019-02-10  6:52 [Qemu-devel] [PATCH v3 0/2] pcie: Add simple ACS "support" to the generic PCIe root port Knut Omang
  2019-02-10  6:52 ` [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function Knut Omang
@ 2019-02-10  6:53 ` Knut Omang
  2019-02-11  8:07   ` Marcel Apfelbaum
  1 sibling, 1 reply; 13+ messages in thread
From: Knut Omang @ 2019-02-10  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly, Knut Omang

Claim ACS support in the generic PCIe root port to allow
passthrough of individual functions of a device to different
guests (in a nested virt.setting) with VFIO.
Without this patch, all functions of a device, such as all VFs of
an SR/IOV device, will end up in the same IOMMU group.
A similar situation occurs on Windows with Hyper-V.

In the single function device case, it also has a small cosmetic
benefit in that the root port itself is not grouped with
the device. VFIO handles that situation in that binding rules
only apply to endpoints, so it does not limit passthrough in
those cases.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 hw/pci-bridge/gen_pcie_root_port.c | 4 ++++
 hw/pci-bridge/pcie_root_port.c     | 4 ++++
 include/hw/pci/pcie_port.h         | 1 +
 3 files changed, 9 insertions(+)

diff --git a/hw/pci-bridge/gen_pcie_root_port.c b/hw/pci-bridge/gen_pcie_root_port.c
index 9766edb..26bda73 100644
--- a/hw/pci-bridge/gen_pcie_root_port.c
+++ b/hw/pci-bridge/gen_pcie_root_port.c
@@ -20,6 +20,9 @@
         OBJECT_CHECK(GenPCIERootPort, (obj), TYPE_GEN_PCIE_ROOT_PORT)
 
 #define GEN_PCIE_ROOT_PORT_AER_OFFSET           0x100
+#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
+        (GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+
 #define GEN_PCIE_ROOT_PORT_MSIX_NR_VECTOR       1
 
 typedef struct GenPCIERootPort {
@@ -149,6 +152,7 @@ static void gen_rp_dev_class_init(ObjectClass *klass, void *data)
     rpc->interrupts_init = gen_rp_interrupts_init;
     rpc->interrupts_uninit = gen_rp_interrupts_uninit;
     rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
+    rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
 }
 
 static const TypeInfo gen_rp_dev_info = {
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 34ad767..a0b4cf7 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -47,6 +47,7 @@ static void rp_reset(DeviceState *qdev)
     pcie_cap_deverr_reset(d);
     pcie_cap_slot_reset(d);
     pcie_cap_arifwd_reset(d);
+    pcie_cap_acs_reset(d);
     pcie_aer_root_reset(d);
     pci_bridge_reset(qdev);
     pci_bridge_disable_base_limit(d);
@@ -106,6 +107,9 @@ static void rp_realize(PCIDevice *d, Error **errp)
     pcie_aer_root_init(d);
     rp_aer_vector_update(d);
 
+    if (rpc->acs_offset) {
+        pcie_acs_init(d, rpc->acs_offset);
+    }
     return;
 
 err:
diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
index df242a0..09586f4 100644
--- a/include/hw/pci/pcie_port.h
+++ b/include/hw/pci/pcie_port.h
@@ -78,6 +78,7 @@ typedef struct PCIERootPortClass {
     int exp_offset;
     int aer_offset;
     int ssvid_offset;
+    int acs_offset;    /* If nonzero, optional ACS capability offset */
     int ssid;
 } PCIERootPortClass;
 
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-10  6:52 ` [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function Knut Omang
@ 2019-02-11  8:05   ` Marcel Apfelbaum
  2019-02-11 23:09   ` Alex Williamson
  1 sibling, 0 replies; 13+ messages in thread
From: Marcel Apfelbaum @ 2019-02-11  8:05 UTC (permalink / raw)
  To: Knut Omang, qemu-devel
  Cc: Michael S . Tsirkin, Alex Williamson, Stefan Hajnoczi,
	Elijah Shakkour, Tal Attaly

Hi Knut,

On 2/10/19 8:52 AM, Knut Omang wrote:
> Add a helper function to add PCIe capability for Access Control Services (ACS)
> ACS support in the associated root port is a prerequisite to be able to do
> passthrough of individual functions of a device with VFIO
> without Alex Williamson's pcie_acs_override kernel patch or similar
> in the guest.
>
> Endpoints may also implement an ACS capability, but with
> limited features.
>
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
>   hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
>   include/hw/pci/pcie.h      |  6 ++++++
>   include/hw/pci/pcie_regs.h |  4 ++++
>   3 files changed, 39 insertions(+)
>
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 230478f..509632f 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
>           PCI_EXP_DEVCTL2_ARI;
>   }
>   
> +/* Access Control Services (ACS) */
> +void pcie_cap_acs_reset(PCIDevice *dev)
> +{
> +    if (dev->exp.acs_cap) {
> +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> +    }
> +}
> +
>   /**************************************************************************
>    * pci express extended capability list management functions
>    * uint16_t ext_cap_id (16 bit)
> @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
>   
>       pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
>   }
> +
> +/* ACS (Access Control Services) */
> +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> +{
> +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev), TYPE_PCIE_SLOT);
> +    uint16_t cap_bits = 0;
> +
> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> +                        PCI_ACS_SIZEOF);
> +    dev->exp.acs_cap = offset;
> +
> +    if (is_pcie_slot) {
> +        /* Endpoints may also implement ACS, but these capabilities are */
> +        /* only valid for slots: */
> +        cap_bits =
> +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF;
> +    }
> +
> +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> +}
> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> index 5b82a0d..4c40711 100644
> --- a/include/hw/pci/pcie.h
> +++ b/include/hw/pci/pcie.h
> @@ -79,6 +79,9 @@ struct PCIExpressDevice {
>   
>       /* Offset of ATS capability in config space */
>       uint16_t ats_cap;
> +
> +    /* ACS */
> +    uint16_t acs_cap;
>   };
>   
>   #define COMPAT_PROP_PCP "power_controller_present"
> @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
>   void pcie_cap_flr_write_config(PCIDevice *dev,
>                              uint32_t addr, uint32_t val, int len);
>   
> +void pcie_cap_acs_reset(PCIDevice *dev);
> +
>   /* ARI forwarding capability and control */
>   void pcie_cap_arifwd_init(PCIDevice *dev);
>   void pcie_cap_arifwd_reset(PCIDevice *dev);
> @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
>   void pcie_sync_bridge_lnk(PCIDevice *dev);
>   
>   void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> +void pcie_acs_init(PCIDevice *dev, uint16_t offset);
>   void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
>   void pcie_ats_init(PCIDevice *dev, uint16_t offset);
>   
> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> index ad4e780..1db86b0 100644
> --- a/include/hw/pci/pcie_regs.h
> +++ b/include/hw/pci/pcie_regs.h
> @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
>                                            PCI_ERR_COR_INTERNAL |         \
>                                            PCI_ERR_COR_HL_OVERFLOW)
>   
> +/* ACS */
> +#define PCI_ACS_VER                     0x1
> +#define PCI_ACS_SIZEOF                  8
> +
>   #endif /* QEMU_PCIE_REGS_H */

Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>

Thanks,
Marcel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability
  2019-02-10  6:53 ` [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability Knut Omang
@ 2019-02-11  8:07   ` Marcel Apfelbaum
  2019-02-13  9:34     ` Knut Omang
  0 siblings, 1 reply; 13+ messages in thread
From: Marcel Apfelbaum @ 2019-02-11  8:07 UTC (permalink / raw)
  To: Knut Omang, qemu-devel
  Cc: Michael S . Tsirkin, Alex Williamson, Stefan Hajnoczi,
	Elijah Shakkour, Tal Attaly



On 2/10/19 8:53 AM, Knut Omang wrote:
> Claim ACS support in the generic PCIe root port to allow
> passthrough of individual functions of a device to different
> guests (in a nested virt.setting) with VFIO.
> Without this patch, all functions of a device, such as all VFs of
> an SR/IOV device, will end up in the same IOMMU group.
> A similar situation occurs on Windows with Hyper-V.
>
> In the single function device case, it also has a small cosmetic
> benefit in that the root port itself is not grouped with
> the device. VFIO handles that situation in that binding rules
> only apply to endpoints, so it does not limit passthrough in
> those cases.
>
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
>   hw/pci-bridge/gen_pcie_root_port.c | 4 ++++
>   hw/pci-bridge/pcie_root_port.c     | 4 ++++
>   include/hw/pci/pcie_port.h         | 1 +
>   3 files changed, 9 insertions(+)
>
> diff --git a/hw/pci-bridge/gen_pcie_root_port.c b/hw/pci-bridge/gen_pcie_root_port.c
> index 9766edb..26bda73 100644
> --- a/hw/pci-bridge/gen_pcie_root_port.c
> +++ b/hw/pci-bridge/gen_pcie_root_port.c
> @@ -20,6 +20,9 @@
>           OBJECT_CHECK(GenPCIERootPort, (obj), TYPE_GEN_PCIE_ROOT_PORT)
>   
>   #define GEN_PCIE_ROOT_PORT_AER_OFFSET           0x100
> +#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
> +        (GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
> +
>   #define GEN_PCIE_ROOT_PORT_MSIX_NR_VECTOR       1
>   
>   typedef struct GenPCIERootPort {
> @@ -149,6 +152,7 @@ static void gen_rp_dev_class_init(ObjectClass *klass, void *data)
>       rpc->interrupts_init = gen_rp_interrupts_init;
>       rpc->interrupts_uninit = gen_rp_interrupts_uninit;
>       rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
> +    rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
>   }
>   
>   static const TypeInfo gen_rp_dev_info = {
> diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
> index 34ad767..a0b4cf7 100644
> --- a/hw/pci-bridge/pcie_root_port.c
> +++ b/hw/pci-bridge/pcie_root_port.c
> @@ -47,6 +47,7 @@ static void rp_reset(DeviceState *qdev)
>       pcie_cap_deverr_reset(d);
>       pcie_cap_slot_reset(d);
>       pcie_cap_arifwd_reset(d);
> +    pcie_cap_acs_reset(d);
>       pcie_aer_root_reset(d);
>       pci_bridge_reset(qdev);
>       pci_bridge_disable_base_limit(d);
> @@ -106,6 +107,9 @@ static void rp_realize(PCIDevice *d, Error **errp)
>       pcie_aer_root_init(d);
>       rp_aer_vector_update(d);
>   
> +    if (rpc->acs_offset) {
> +        pcie_acs_init(d, rpc->acs_offset);
> +    }
>       return;
>   
>   err:
> diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
> index df242a0..09586f4 100644
> --- a/include/hw/pci/pcie_port.h
> +++ b/include/hw/pci/pcie_port.h
> @@ -78,6 +78,7 @@ typedef struct PCIERootPortClass {
>       int exp_offset;
>       int aer_offset;
>       int ssvid_offset;
> +    int acs_offset;    /* If nonzero, optional ACS capability offset */
>       int ssid;
>   } PCIERootPortClass;
>   

Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>

Thanks,
Marcel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-10  6:52 ` [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function Knut Omang
  2019-02-11  8:05   ` Marcel Apfelbaum
@ 2019-02-11 23:09   ` Alex Williamson
  2019-02-12  8:07     ` Knut Omang
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2019-02-11 23:09 UTC (permalink / raw)
  To: Knut Omang
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Sun, 10 Feb 2019 07:52:59 +0100
Knut Omang <knut.omang@oracle.com> wrote:

> Add a helper function to add PCIe capability for Access Control Services (ACS)
> ACS support in the associated root port is a prerequisite to be able to do
> passthrough of individual functions of a device with VFIO
> without Alex Williamson's pcie_acs_override kernel patch or similar
> in the guest.

This is still incorrect, the ACS override patch is only required for
separating multifunction endpoints or multifunction root ports.  Single
function endpoints are assignable without ACS simply by placing them
downstream of a single function root port or directly on the root
complex.

> Endpoints may also implement an ACS capability, but with
> limited features.
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
>  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
>  include/hw/pci/pcie.h      |  6 ++++++
>  include/hw/pci/pcie_regs.h |  4 ++++
>  3 files changed, 39 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 230478f..509632f 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
>          PCI_EXP_DEVCTL2_ARI;
>  }
>  
> +/* Access Control Services (ACS) */
> +void pcie_cap_acs_reset(PCIDevice *dev)
> +{
> +    if (dev->exp.acs_cap) {
> +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> +    }
> +}
> +
>  /**************************************************************************
>   * pci express extended capability list management functions
>   * uint16_t ext_cap_id (16 bit)
> @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
>  
>      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
>  }
> +
> +/* ACS (Access Control Services) */
> +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> +{
> +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev), TYPE_PCIE_SLOT);
> +    uint16_t cap_bits = 0;
> +
> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> +                        PCI_ACS_SIZEOF);
> +    dev->exp.acs_cap = offset;
> +
> +    if (is_pcie_slot) {
> +        /* Endpoints may also implement ACS, but these capabilities are */
> +        /* only valid for slots: */

Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
and CR must be implemented by multifunction endpoints that support p2p
if they provide an ACS capability.  Linux therefore infers that if ACS
is supported by an endpoint and RR and CR are not implemented, the
device does not support p2p.  We could just say that nothing supports
p2p yet, but single function endpoints (except those implementing
SR-IOV) must not implement an ACS capability per the spec, which could
be difficult to exclude since the multifunction bit is handled
separately from the device model.

Also comment style:
https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127

> +        cap_bits =
> +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF;
> +    }
> +
> +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> +}
> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> index 5b82a0d..4c40711 100644
> --- a/include/hw/pci/pcie.h
> +++ b/include/hw/pci/pcie.h
> @@ -79,6 +79,9 @@ struct PCIExpressDevice {
>  
>      /* Offset of ATS capability in config space */
>      uint16_t ats_cap;
> +
> +    /* ACS */
> +    uint16_t acs_cap;
>  };
>  
>  #define COMPAT_PROP_PCP "power_controller_present"
> @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
>  void pcie_cap_flr_write_config(PCIDevice *dev,
>                             uint32_t addr, uint32_t val, int len);
>  
> +void pcie_cap_acs_reset(PCIDevice *dev);
> +
>  /* ARI forwarding capability and control */
>  void pcie_cap_arifwd_init(PCIDevice *dev);
>  void pcie_cap_arifwd_reset(PCIDevice *dev);
> @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
>  void pcie_sync_bridge_lnk(PCIDevice *dev);
>  
>  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> +void pcie_acs_init(PCIDevice *dev, uint16_t offset);

nit, why aren't the reset and init declarations together?

>  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
>  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
>  
> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> index ad4e780..1db86b0 100644
> --- a/include/hw/pci/pcie_regs.h
> +++ b/include/hw/pci/pcie_regs.h
> @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
>                                           PCI_ERR_COR_INTERNAL |         \
>                                           PCI_ERR_COR_HL_OVERFLOW)
>  
> +/* ACS */
> +#define PCI_ACS_VER                     0x1
> +#define PCI_ACS_SIZEOF                  8
> +
>  #endif /* QEMU_PCIE_REGS_H */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-11 23:09   ` Alex Williamson
@ 2019-02-12  8:07     ` Knut Omang
  2019-02-12  9:48       ` Knut Omang
  2019-02-12 15:59       ` Alex Williamson
  0 siblings, 2 replies; 13+ messages in thread
From: Knut Omang @ 2019-02-12  8:07 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:
> On Sun, 10 Feb 2019 07:52:59 +0100
> Knut Omang <knut.omang@oracle.com> wrote:
> 
> > Add a helper function to add PCIe capability for Access Control Services
> > (ACS)
> > ACS support in the associated root port is a prerequisite to be able to do
> > passthrough of individual functions of a device with VFIO
> > without Alex Williamson's pcie_acs_override kernel patch or similar
> > in the guest.
> 
> This is still incorrect, the ACS override patch is only required for
> separating multifunction endpoints or multifunction root ports.  Single
> function endpoints are assignable without ACS simply by placing them
> downstream of a single function root port or directly on the root
> complex.

Hmm - that was the intended meaning of the comment, but I'll see if I can make
it more clear by saying it explicitly.

> > Endpoints may also implement an ACS capability, but with
> > limited features.
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > ---
> >  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
> >  include/hw/pci/pcie.h      |  6 ++++++
> >  include/hw/pci/pcie_regs.h |  4 ++++
> >  3 files changed, 39 insertions(+)
> > 
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index 230478f..509632f 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
> >          PCI_EXP_DEVCTL2_ARI;
> >  }
> >  
> > +/* Access Control Services (ACS) */
> > +void pcie_cap_acs_reset(PCIDevice *dev)
> > +{
> > +    if (dev->exp.acs_cap) {
> > +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> > +    }
> > +}
> > +
> >  /**************************************************************************
> >   * pci express extended capability list management functions
> >   * uint16_t ext_cap_id (16 bit)
> > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> >  
> >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> >  }
> > +
> > +/* ACS (Access Control Services) */
> > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > +{
> > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev), TYPE_PCIE_SLOT);
> > +    uint16_t cap_bits = 0;
> > +
> > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > +                        PCI_ACS_SIZEOF);
> > +    dev->exp.acs_cap = offset;
> > +
> > +    if (is_pcie_slot) {
> > +        /* Endpoints may also implement ACS, but these capabilities are */
> > +        /* only valid for slots: */
> 
> Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> and CR must be implemented by multifunction endpoints that support p2p
> if they provide an ACS capability.  

Hmm - are you ok with setting 0 here as I have done, just amending your 
description to the comment? Then any future emulation that do support p2p 
would have to set the needed bits after calling the init function.

After your previous comments on this, I had a look at Mellanox CX4 and CX5 which
are the only devices I could find in the lab that are endpoints and implement an
ACS capability, and neither seems to implement any extra capabilities:

        Capabilities: [230 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

that was the reason for my choice of value here - after skimming through the
spec (with my still very limited understanding of the details)

> Linux therefore infers that if ACS
> is supported by an endpoint and RR and CR are not implemented, the
> device does not support p2p.  

Interesting - I thought the CX5 supported p2p, but I have not kept up with what
happens on the RDMA list on that front.

> We could just say that nothing supports
> p2p yet, but single function endpoints (except those implementing
> SR-IOV) must not implement an ACS capability per the spec, which could
> be difficult to exclude since the multifunction bit is handled
> separately from the device model.

Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices and any of
the older Mellanox devices, and our own cancelled Infiniband device, all seem
not to implement ACS?

> Also comment style:
> https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127

I see - will fix,

> > +        cap_bits =
> > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF;
> > +    }
> > +
> > +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > +}
> > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > index 5b82a0d..4c40711 100644
> > --- a/include/hw/pci/pcie.h
> > +++ b/include/hw/pci/pcie.h
> > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> >  
> >      /* Offset of ATS capability in config space */
> >      uint16_t ats_cap;
> > +
> > +    /* ACS */
> > +    uint16_t acs_cap;
> >  };
> >  
> >  #define COMPAT_PROP_PCP "power_controller_present"
> > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> >  void pcie_cap_flr_write_config(PCIDevice *dev,
> >                             uint32_t addr, uint32_t val, int len);
> >  
> > +void pcie_cap_acs_reset(PCIDevice *dev);
> > +
> >  /* ARI forwarding capability and control */
> >  void pcie_cap_arifwd_init(PCIDevice *dev);
> >  void pcie_cap_arifwd_reset(PCIDevice *dev);
> > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> >  void pcie_sync_bridge_lnk(PCIDevice *dev);
> >  
> >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);
> 
> nit, why aren't the reset and init declarations together?

Good point, will fix.

Thanks!
Knut

> >  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t
> > ser_num);
> >  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> >  
> > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > index ad4e780..1db86b0 100644
> > --- a/include/hw/pci/pcie_regs.h
> > +++ b/include/hw/pci/pcie_regs.h
> > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> >                                           PCI_ERR_COR_INTERNAL |         \
> >                                           PCI_ERR_COR_HL_OVERFLOW)
> >  
> > +/* ACS */
> > +#define PCI_ACS_VER                     0x1
> > +#define PCI_ACS_SIZEOF                  8
> > +
> >  #endif /* QEMU_PCIE_REGS_H */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-12  8:07     ` Knut Omang
@ 2019-02-12  9:48       ` Knut Omang
  2019-02-12 15:59       ` Alex Williamson
  1 sibling, 0 replies; 13+ messages in thread
From: Knut Omang @ 2019-02-12  9:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Tue, 2019-02-12 at 09:07 +0100, Knut Omang wrote:
> On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:
> > On Sun, 10 Feb 2019 07:52:59 +0100
> > Knut Omang <knut.omang@oracle.com> wrote:
> > 
> > > Add a helper function to add PCIe capability for Access Control Services
> > > (ACS)
> > > ACS support in the associated root port is a prerequisite to be able to do
> > > passthrough of individual functions of a device with VFIO
> > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > in the guest.
> > 
> > This is still incorrect, the ACS override patch is only required for
> > separating multifunction endpoints or multifunction root ports.  Single
> > function endpoints are assignable without ACS simply by placing them
> > downstream of a single function root port or directly on the root
> > complex.
> 
> Hmm - that was the intended meaning of the comment, but I'll see if I can make
> it more clear by saying it explicitly.
> 
> > > Endpoints may also implement an ACS capability, but with
> > > limited features.
> > > 
> > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > ---
> > >  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
> > >  include/hw/pci/pcie.h      |  6 ++++++
> > >  include/hw/pci/pcie_regs.h |  4 ++++
> > >  3 files changed, 39 insertions(+)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index 230478f..509632f 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
> > >          PCI_EXP_DEVCTL2_ARI;
> > >  }
> > >  
> > > +/* Access Control Services (ACS) */
> > > +void pcie_cap_acs_reset(PCIDevice *dev)
> > > +{
> > > +    if (dev->exp.acs_cap) {
> > > +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> > > +    }
> > > +}
> > > +
> > >  /************************************************************************
> > > **
> > >   * pci express extended capability list management functions
> > >   * uint16_t ext_cap_id (16 bit)
> > > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > >  
> > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > >  }
> > > +
> > > +/* ACS (Access Control Services) */
> > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > +{
> > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > TYPE_PCIE_SLOT);
> > > +    uint16_t cap_bits = 0;
> > > +
> > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > +                        PCI_ACS_SIZEOF);
> > > +    dev->exp.acs_cap = offset;
> > > +
> > > +    if (is_pcie_slot) {
> > > +        /* Endpoints may also implement ACS, but these capabilities are
> > > */
> > > +        /* only valid for slots: */
> > 
> > Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> > and CR must be implemented by multifunction endpoints that support p2p
> > if they provide an ACS capability.  
> 
> Hmm - are you ok with setting 0 here as I have done, just amending your 
> description to the comment? Then any future emulation that do support p2p 
> would have to set the needed bits after calling the init function.
> 
> After your previous comments on this, I had a look at Mellanox CX4 and CX5
> which
> are the only devices I could find in the lab that are endpoints and implement
> an
> ACS capability, and neither seems to implement any extra capabilities:
> 
>         Capabilities: [230 v1] Access Control Services
>                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
> EgressCtrl- DirectTrans-
>                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
> EgressCtrl- DirectTrans-
> 
> that was the reason for my choice of value here - after skimming through the
> spec (with my still very limited understanding of the details)
> 
> > Linux therefore infers that if ACS
> > is supported by an endpoint and RR and CR are not implemented, the
> > device does not support p2p.  
> 
> Interesting - I thought the CX5 supported p2p, but I have not kept up with
> what
> happens on the RDMA list on that front.
> 
> > We could just say that nothing supports
> > p2p yet, but single function endpoints (except those implementing
> > SR-IOV) must not implement an ACS capability per the spec, which could
> > be difficult to exclude since the multifunction bit is handled
> > separately from the device model.
> 
> Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices and any
> of
> the older Mellanox devices, and our own cancelled Infiniband device, all seem
> not to implement ACS?
> 
> > Also comment style:
> > https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127
> 
> I see - will fix,
> 
> > > +        cap_bits =
> > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > PCI_ACS_UF;
> > > +    }
> > > +
> > > +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > > +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > > +}
> > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > index 5b82a0d..4c40711 100644
> > > --- a/include/hw/pci/pcie.h
> > > +++ b/include/hw/pci/pcie.h
> > > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> > >  
> > >      /* Offset of ATS capability in config space */
> > >      uint16_t ats_cap;
> > > +
> > > +    /* ACS */
> > > +    uint16_t acs_cap;
> > >  };
> > >  
> > >  #define COMPAT_PROP_PCP "power_controller_present"
> > > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> > >  void pcie_cap_flr_write_config(PCIDevice *dev,
> > >                             uint32_t addr, uint32_t val, int len);
> > >  
> > > +void pcie_cap_acs_reset(PCIDevice *dev);
> > > +
> > >  /* ARI forwarding capability and control */
> > >  void pcie_cap_arifwd_init(PCIDevice *dev);
> > >  void pcie_cap_arifwd_reset(PCIDevice *dev);
> > > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> > >  void pcie_sync_bridge_lnk(PCIDevice *dev);
> > >  
> > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);
> > 
> > nit, why aren't the reset and init declarations together?
> 
> Good point, will fix.

Actually I realize I have broken the naming conventions with the reset function
too and that the same grouping applies in pcie.c. I inadvertently looked at 
ARI for comparison, but the reset function there applies to the ARIfwd flag
inside the PCIE capability, not the ARI capability itself - 
will fix accordingly.

Knut

> Thanks!
> Knut
> 
> > >  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t
> > > ser_num);
> > >  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> > >  
> > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > index ad4e780..1db86b0 100644
> > > --- a/include/hw/pci/pcie_regs.h
> > > +++ b/include/hw/pci/pcie_regs.h
> > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> > >                                           PCI_ERR_COR_INTERNAL |         \
> > >                                           PCI_ERR_COR_HL_OVERFLOW)
> > >  
> > > +/* ACS */
> > > +#define PCI_ACS_VER                     0x1
> > > +#define PCI_ACS_SIZEOF                  8
> > > +
> > >  #endif /* QEMU_PCIE_REGS_H */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-12  8:07     ` Knut Omang
  2019-02-12  9:48       ` Knut Omang
@ 2019-02-12 15:59       ` Alex Williamson
  2019-02-12 16:25         ` Knut Omang
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2019-02-12 15:59 UTC (permalink / raw)
  To: Knut Omang
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Tue, 12 Feb 2019 09:07:43 +0100
Knut Omang <knut.omang@oracle.com> wrote:

> On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:
> > On Sun, 10 Feb 2019 07:52:59 +0100
> > Knut Omang <knut.omang@oracle.com> wrote:
> >   
> > > Add a helper function to add PCIe capability for Access Control Services
> > > (ACS)
> > > ACS support in the associated root port is a prerequisite to be able to do
> > > passthrough of individual functions of a device with VFIO
> > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > in the guest.  
> > 
> > This is still incorrect, the ACS override patch is only required for
> > separating multifunction endpoints or multifunction root ports.  Single
> > function endpoints are assignable without ACS simply by placing them
> > downstream of a single function root port or directly on the root
> > complex.  
> 
> Hmm - that was the intended meaning of the comment, but I'll see if I can make
> it more clear by saying it explicitly.

"ACS support... is a prerequisite".  Prerequisite: a thing that is
required as a prior condition for something else to happen or exist.

Assignment of individual functions exists today, as is, by using QEMU
to define a PCIe topology that allows the desired grouping.  The code
here enables specific topologies, it is clearly not a prerequisite.

> > > Endpoints may also implement an ACS capability, but with
> > > limited features.
> > > 
> > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > ---
> > >  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
> > >  include/hw/pci/pcie.h      |  6 ++++++
> > >  include/hw/pci/pcie_regs.h |  4 ++++
> > >  3 files changed, 39 insertions(+)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index 230478f..509632f 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
> > >          PCI_EXP_DEVCTL2_ARI;
> > >  }
> > >  
> > > +/* Access Control Services (ACS) */
> > > +void pcie_cap_acs_reset(PCIDevice *dev)
> > > +{
> > > +    if (dev->exp.acs_cap) {
> > > +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> > > +    }
> > > +}
> > > +
> > >  /**************************************************************************
> > >   * pci express extended capability list management functions
> > >   * uint16_t ext_cap_id (16 bit)
> > > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > >  
> > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > >  }
> > > +
> > > +/* ACS (Access Control Services) */
> > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > +{
> > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev), TYPE_PCIE_SLOT);
> > > +    uint16_t cap_bits = 0;
> > > +
> > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > +                        PCI_ACS_SIZEOF);
> > > +    dev->exp.acs_cap = offset;
> > > +
> > > +    if (is_pcie_slot) {
> > > +        /* Endpoints may also implement ACS, but these capabilities are */
> > > +        /* only valid for slots: */  
> > 
> > Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> > and CR must be implemented by multifunction endpoints that support p2p
> > if they provide an ACS capability.    
> 
> Hmm - are you ok with setting 0 here as I have done, just amending your 
> description to the comment? Then any future emulation that do support p2p 
> would have to set the needed bits after calling the init function.

The comment definitely needs work, but I don't know what to do about
single function, non-SR-IOV capable devices calling into this.  Per the
spec, these devices cannot implement an ACS capability at all, but will
anyone care if they do?  Maybe endpoints need a different init
function.  Maybe the capability ID needs to be obscured until
additional functions or an SR-IOV capability are added.  I'm not really
concerned that this helper can only indicate lack of p2p between
functions for endpoints, other than the comments being wrong.

> After your previous comments on this, I had a look at Mellanox CX4 and CX5 which
> are the only devices I could find in the lab that are endpoints and implement an
> ACS capability, and neither seems to implement any extra capabilities:
> 
>         Capabilities: [230 v1] Access Control Services
>                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
>                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 
> that was the reason for my choice of value here - after skimming through the
> spec (with my still very limited understanding of the details)

If the above device is a multifunction device or SR-IOV capable device,
this is the correct way to indicate there is no p2p between functions.
This takes advantage of the wording in the spec that certain
capabilities must be implemented by functions that support p2p
traffic.  Therefore if the capability is not implemented then p2p is
not supported and requires no control.

> > Linux therefore infers that if ACS
> > is supported by an endpoint and RR and CR are not implemented, the
> > device does not support p2p.    
> 
> Interesting - I thought the CX5 supported p2p, but I have not kept up with what
> happens on the RDMA list on that front.

ACS can only indicate p2p capabilities within the device reporting it.
The above example indicates that the device does not do p2p internally
between functions.  It does not preclude p2p between devices or even
functions, but it does require that the p2p traffic occurs on the link
and therefore allows routing to be managed through the ACS capabilities
of the interconnect components.

For instance if we have a multifunction endpoint with the above
capability, we know that a DMA from function 1 targeting an address
range on function 0 will not complete the DMA internally, it will go
through the egress port to the downstream port above the endpoint.  The
ACS capability on that downstream port then controls whether that DMA
request is redirected back to the ingress port of the endpoint or
forwarded upstream.  If a redirect occurs here then the IOMMU is not
fully in control of the IOVA space of the device and the IOMMU group is
extended to indicate this.

> > We could just say that nothing supports
> > p2p yet, but single function endpoints (except those implementing
> > SR-IOV) must not implement an ACS capability per the spec, which could
> > be difficult to exclude since the multifunction bit is handled
> > separately from the device model.  
> 
> Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices and any of
> the older Mellanox devices, and our own cancelled Infiniband device, all seem
> not to implement ACS?

Linux effectively assumes that there is no p2p between VFs or between
the PF and VF, so the result is that a multifunction endpoint without
ACS supporting SR-IOV would have the PFs grouped together, but the VFs
would be grouped individually, assuming the upstream topology provides
isolation.  Intel has since provided verification that nearly none of
their multi-port NICs implement internal p2p and they now use quirks to
provide the ACS equivalent information for grouping for those older
devices.  Newer Intel NICs implement ACS capabilities as in your
example above to indicate this without the need for quirks.  Thanks,

Alex

> > Also comment style:
> > https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127  
> 
> I see - will fix,
> 
> > > +        cap_bits =
> > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF;
> > > +    }
> > > +
> > > +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > > +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > > +}
> > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > index 5b82a0d..4c40711 100644
> > > --- a/include/hw/pci/pcie.h
> > > +++ b/include/hw/pci/pcie.h
> > > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> > >  
> > >      /* Offset of ATS capability in config space */
> > >      uint16_t ats_cap;
> > > +
> > > +    /* ACS */
> > > +    uint16_t acs_cap;
> > >  };
> > >  
> > >  #define COMPAT_PROP_PCP "power_controller_present"
> > > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> > >  void pcie_cap_flr_write_config(PCIDevice *dev,
> > >                             uint32_t addr, uint32_t val, int len);
> > >  
> > > +void pcie_cap_acs_reset(PCIDevice *dev);
> > > +
> > >  /* ARI forwarding capability and control */
> > >  void pcie_cap_arifwd_init(PCIDevice *dev);
> > >  void pcie_cap_arifwd_reset(PCIDevice *dev);
> > > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> > >  void pcie_sync_bridge_lnk(PCIDevice *dev);
> > >  
> > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);  
> > 
> > nit, why aren't the reset and init declarations together?  
> 
> Good point, will fix.
> 
> Thanks!
> Knut
> 
> > >  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t
> > > ser_num);
> > >  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> > >  
> > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > index ad4e780..1db86b0 100644
> > > --- a/include/hw/pci/pcie_regs.h
> > > +++ b/include/hw/pci/pcie_regs.h
> > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> > >                                           PCI_ERR_COR_INTERNAL |         \
> > >                                           PCI_ERR_COR_HL_OVERFLOW)
> > >  
> > > +/* ACS */
> > > +#define PCI_ACS_VER                     0x1
> > > +#define PCI_ACS_SIZEOF                  8
> > > +
> > >  #endif /* QEMU_PCIE_REGS_H */  
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-12 15:59       ` Alex Williamson
@ 2019-02-12 16:25         ` Knut Omang
  2019-02-12 17:14           ` Alex Williamson
  0 siblings, 1 reply; 13+ messages in thread
From: Knut Omang @ 2019-02-12 16:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Tue, 2019-02-12 at 08:59 -0700, Alex Williamson wrote:
> On Tue, 12 Feb 2019 09:07:43 +0100
> Knut Omang <knut.omang@oracle.com> wrote:
> 
> > On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:
> > > On Sun, 10 Feb 2019 07:52:59 +0100
> > > Knut Omang <knut.omang@oracle.com> wrote:
> > >   
> > > > Add a helper function to add PCIe capability for Access Control Services
> > > > (ACS)
> > > > ACS support in the associated root port is a prerequisite to be able to
> > > > do
> > > > passthrough of individual functions of a device with VFIO
> > > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > > in the guest.  
> > > 
> > > This is still incorrect, the ACS override patch is only required for
> > > separating multifunction endpoints or multifunction root ports.  Single
> > > function endpoints are assignable without ACS simply by placing them
> > > downstream of a single function root port or directly on the root
> > > complex.  
> > 
> > Hmm - that was the intended meaning of the comment, but I'll see if I can
> > make
> > it more clear by saying it explicitly.
> 
> "ACS support... is a prerequisite".  Prerequisite: a thing that is
> required as a prior condition for something else to happen or exist.
> 
> Assignment of individual functions exists today, as is, by using QEMU
> to define a PCIe topology that allows the desired grouping.  The code
> here enables specific topologies, it is clearly not a prerequisite.
> 
> > > > Endpoints may also implement an ACS capability, but with
> > > > limited features.
> > > > 
> > > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > > ---
> > > >  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
> > > >  include/hw/pci/pcie.h      |  6 ++++++
> > > >  include/hw/pci/pcie_regs.h |  4 ++++
> > > >  3 files changed, 39 insertions(+)
> > > > 
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index 230478f..509632f 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice
> > > > *dev)
> > > >          PCI_EXP_DEVCTL2_ARI;
> > > >  }
> > > >  
> > > > +/* Access Control Services (ACS) */
> > > > +void pcie_cap_acs_reset(PCIDevice *dev)
> > > > +{
> > > > +    if (dev->exp.acs_cap) {
> > > > +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> > > > +    }
> > > > +}
> > > > +
> > > >  /**********************************************************************
> > > > ****
> > > >   * pci express extended capability list management functions
> > > >   * uint16_t ext_cap_id (16 bit)
> > > > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > > >  
> > > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > > >  }
> > > > +
> > > > +/* ACS (Access Control Services) */
> > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > > +{
> > > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > > TYPE_PCIE_SLOT);
> > > > +    uint16_t cap_bits = 0;
> > > > +
> > > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > > +                        PCI_ACS_SIZEOF);
> > > > +    dev->exp.acs_cap = offset;
> > > > +
> > > > +    if (is_pcie_slot) {
> > > > +        /* Endpoints may also implement ACS, but these capabilities are
> > > > */
> > > > +        /* only valid for slots: */  
> > > 
> > > Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> > > and CR must be implemented by multifunction endpoints that support p2p
> > > if they provide an ACS capability.    
> > 
> > Hmm - are you ok with setting 0 here as I have done, just amending your 
> > description to the comment? Then any future emulation that do support p2p 
> > would have to set the needed bits after calling the init function.
> 
> The comment definitely needs work, but I don't know what to do about
> single function, non-SR-IOV capable devices calling into this.  

I agree - I have only been thinking about multifunction devices, I should 
probably assert on not multifunction (or SR/IOV with my SR/IOV patch set) 
to avoid misuse.

And what about someone passing a PF for SR/IOV through to a VM - 
Has that use case showed up (yet) ? 
I have so far only tested with my emulated SR/IOV.

> Per the
> spec, these devices cannot implement an ACS capability at all, but will
> anyone care if they do?  Maybe endpoints need a different init
> function.  Maybe the capability ID needs to be obscured until
> additional functions or an SR-IOV capability are added.  I'm not really
> concerned that this helper can only indicate lack of p2p between
> functions for endpoints, other than the comments being wrong.

Ok, I think I am in the clear then with my v4 comment :-)

> > After your previous comments on this, I had a look at Mellanox CX4 and CX5
> > which
> > are the only devices I could find in the lab that are endpoints and
> > implement an
> > ACS capability, and neither seems to implement any extra capabilities:
> > 
> >         Capabilities: [230 v1] Access Control Services
> >                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > UpstreamFwd- EgressCtrl- DirectTrans-
> >                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > UpstreamFwd- EgressCtrl- DirectTrans-
> > 
> > that was the reason for my choice of value here - after skimming through the
> > spec (with my still very limited understanding of the details)
> 
> If the above device is a multifunction device or SR-IOV capable device,
> this is the correct way to indicate there is no p2p between functions.
> This takes advantage of the wording in the spec that certain
> capabilities must be implemented by functions that support p2p
> traffic.  Therefore if the capability is not implemented then p2p is
> not supported and requires no control.

I see, that makes sense, and corresponds with the way the device I know best 
worked.

> > > Linux therefore infers that if ACS
> > > is supported by an endpoint and RR and CR are not implemented, the
> > > device does not support p2p.    
> > 
> > Interesting - I thought the CX5 supported p2p, but I have not kept up with
> > what
> > happens on the RDMA list on that front.
> 
> ACS can only indicate p2p capabilities within the device reporting it.
> The above example indicates that the device does not do p2p internally
> between functions.  It does not preclude p2p between devices or even
> functions, but it does require that the p2p traffic occurs on the link
> and therefore allows routing to be managed through the ACS capabilities
> of the interconnect components.
> 
> For instance if we have a multifunction endpoint with the above
> capability, we know that a DMA from function 1 targeting an address
> range on function 0 will not complete the DMA internally, it will go
> through the egress port to the downstream port above the endpoint.  The
> ACS capability on that downstream port then controls whether that DMA
> request is redirected back to the ingress port of the endpoint or
> forwarded upstream.  If a redirect occurs here then the IOMMU is not
> fully in control of the IOVA space of the device and the IOMMU group is
> extended to indicate this.

Thanks, this definitely clarifies!

> > > We could just say that nothing supports
> > > p2p yet, but single function endpoints (except those implementing
> > > SR-IOV) must not implement an ACS capability per the spec, which could
> > > be difficult to exclude since the multifunction bit is handled
> > > separately from the device model.  
> > 
> > Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices and
> > any of
> > the older Mellanox devices, and our own cancelled Infiniband device, all
> > seem
> > not to implement ACS?
> 
> Linux effectively assumes that there is no p2p between VFs or between
> the PF and VF, so the result is that a multifunction endpoint without
> ACS supporting SR-IOV would have the PFs grouped together, but the VFs
> would be grouped individually, assuming the upstream topology provides
> isolation.  Intel has since provided verification that nearly none of
> their multi-port NICs implement internal p2p and they now use quirks to
> provide the ACS equivalent information for grouping for those older
> devices.  Newer Intel NICs implement ACS capabilities as in your
> example above to indicate this without the need for quirks.  Thanks,

Yes, so for devices like the CX4 and CX5 which exposes more than one PF,
they need the ACS capability to indicate that individual PFs are isolated from 
each other, while the single PF SR/IOV devices can do without.

Thanks for the excellent explanations, this makes it all much clearer!
Knut

> Alex
> 
> > > Also comment style:
> > > https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127  
> > 
> > I see - will fix,
> > 
> > > > +        cap_bits =
> > > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > > PCI_ACS_UF;
> > > > +    }
> > > > +
> > > > +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > > > +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > > > +}
> > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > index 5b82a0d..4c40711 100644
> > > > --- a/include/hw/pci/pcie.h
> > > > +++ b/include/hw/pci/pcie.h
> > > > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> > > >  
> > > >      /* Offset of ATS capability in config space */
> > > >      uint16_t ats_cap;
> > > > +
> > > > +    /* ACS */
> > > > +    uint16_t acs_cap;
> > > >  };
> > > >  
> > > >  #define COMPAT_PROP_PCP "power_controller_present"
> > > > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> > > >  void pcie_cap_flr_write_config(PCIDevice *dev,
> > > >                             uint32_t addr, uint32_t val, int len);
> > > >  
> > > > +void pcie_cap_acs_reset(PCIDevice *dev);
> > > > +
> > > >  /* ARI forwarding capability and control */
> > > >  void pcie_cap_arifwd_init(PCIDevice *dev);
> > > >  void pcie_cap_arifwd_reset(PCIDevice *dev);
> > > > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > >  void pcie_sync_bridge_lnk(PCIDevice *dev);
> > > >  
> > > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);  
> > > 
> > > nit, why aren't the reset and init declarations together?  
> > 
> > Good point, will fix.
> > 
> > Thanks!
> > Knut
> > 
> > > >  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t
> > > > ser_num);
> > > >  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> > > >  
> > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > > index ad4e780..1db86b0 100644
> > > > --- a/include/hw/pci/pcie_regs.h
> > > > +++ b/include/hw/pci/pcie_regs.h
> > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> > > >                                           PCI_ERR_COR_INTERNAL
> > > > |         \
> > > >                                           PCI_ERR_COR_HL_OVERFLOW)
> > > >  
> > > > +/* ACS */
> > > > +#define PCI_ACS_VER                     0x1
> > > > +#define PCI_ACS_SIZEOF                  8
> > > > +
> > > >  #endif /* QEMU_PCIE_REGS_H */  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-12 16:25         ` Knut Omang
@ 2019-02-12 17:14           ` Alex Williamson
  2019-02-12 19:52             ` Knut Omang
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2019-02-12 17:14 UTC (permalink / raw)
  To: Knut Omang
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Tue, 12 Feb 2019 17:25:46 +0100
Knut Omang <knut.omang@oracle.com> wrote:

> On Tue, 2019-02-12 at 08:59 -0700, Alex Williamson wrote:
> > On Tue, 12 Feb 2019 09:07:43 +0100
> > Knut Omang <knut.omang@oracle.com> wrote:
> >   
> > > On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:  
> > > > On Sun, 10 Feb 2019 07:52:59 +0100
> > > > Knut Omang <knut.omang@oracle.com> wrote:
> > > >     
> > > > > Add a helper function to add PCIe capability for Access Control Services
> > > > > (ACS)
> > > > > ACS support in the associated root port is a prerequisite to be able to
> > > > > do
> > > > > passthrough of individual functions of a device with VFIO
> > > > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > > > in the guest.    
> > > > 
> > > > This is still incorrect, the ACS override patch is only required for
> > > > separating multifunction endpoints or multifunction root ports.  Single
> > > > function endpoints are assignable without ACS simply by placing them
> > > > downstream of a single function root port or directly on the root
> > > > complex.    
> > > 
> > > Hmm - that was the intended meaning of the comment, but I'll see if I can
> > > make
> > > it more clear by saying it explicitly.  
> > 
> > "ACS support... is a prerequisite".  Prerequisite: a thing that is
> > required as a prior condition for something else to happen or exist.
> > 
> > Assignment of individual functions exists today, as is, by using QEMU
> > to define a PCIe topology that allows the desired grouping.  The code
> > here enables specific topologies, it is clearly not a prerequisite.
> >   
> > > > > Endpoints may also implement an ACS capability, but with
> > > > > limited features.
> > > > > 
> > > > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > > > ---
> > > > >  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
> > > > >  include/hw/pci/pcie.h      |  6 ++++++
> > > > >  include/hw/pci/pcie_regs.h |  4 ++++
> > > > >  3 files changed, 39 insertions(+)
> > > > > 
> > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > index 230478f..509632f 100644
> > > > > --- a/hw/pci/pcie.c
> > > > > +++ b/hw/pci/pcie.c
> > > > > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice
> > > > > *dev)
> > > > >          PCI_EXP_DEVCTL2_ARI;
> > > > >  }
> > > > >  
> > > > > +/* Access Control Services (ACS) */
> > > > > +void pcie_cap_acs_reset(PCIDevice *dev)
> > > > > +{
> > > > > +    if (dev->exp.acs_cap) {
> > > > > +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >  /**********************************************************************
> > > > > ****
> > > > >   * pci express extended capability list management functions
> > > > >   * uint16_t ext_cap_id (16 bit)
> > > > > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > > > >  
> > > > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > > > >  }
> > > > > +
> > > > > +/* ACS (Access Control Services) */
> > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > > > +{
> > > > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > > > TYPE_PCIE_SLOT);
> > > > > +    uint16_t cap_bits = 0;
> > > > > +
> > > > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > > > +                        PCI_ACS_SIZEOF);
> > > > > +    dev->exp.acs_cap = offset;
> > > > > +
> > > > > +    if (is_pcie_slot) {
> > > > > +        /* Endpoints may also implement ACS, but these capabilities are
> > > > > */
> > > > > +        /* only valid for slots: */    
> > > > 
> > > > Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> > > > and CR must be implemented by multifunction endpoints that support p2p
> > > > if they provide an ACS capability.      
> > > 
> > > Hmm - are you ok with setting 0 here as I have done, just amending your 
> > > description to the comment? Then any future emulation that do support p2p 
> > > would have to set the needed bits after calling the init function.  
> > 
> > The comment definitely needs work, but I don't know what to do about
> > single function, non-SR-IOV capable devices calling into this.    
> 
> I agree - I have only been thinking about multifunction devices, I should 
> probably assert on not multifunction (or SR/IOV with my SR/IOV patch set) 
> to avoid misuse.

Be sure to check both for (dev->cap_present &
QEMU_PCI_CAP_MULTIFUNCTION) and PCI_FUNC(dev->devfn) since only
function 0 is required to have the header bit set.

> And what about someone passing a PF for SR/IOV through to a VM - 
> Has that use case showed up (yet) ? 
> I have so far only tested with my emulated SR/IOV.

The SR-IOV capability is masked to the L1 guest.  The issue is that an
assigned SR-IOV PF could create VFs *on the host* which has security
implications unless those VFs can be quarantined from general use by
the host kernel.  There have been patches and proposals, but none yet
that sufficiently address this issue.  Thanks,

Alex

> > Per the
> > spec, these devices cannot implement an ACS capability at all, but will
> > anyone care if they do?  Maybe endpoints need a different init
> > function.  Maybe the capability ID needs to be obscured until
> > additional functions or an SR-IOV capability are added.  I'm not really
> > concerned that this helper can only indicate lack of p2p between
> > functions for endpoints, other than the comments being wrong.  
> 
> Ok, I think I am in the clear then with my v4 comment :-)
> 
> > > After your previous comments on this, I had a look at Mellanox CX4 and CX5
> > > which
> > > are the only devices I could find in the lab that are endpoints and
> > > implement an
> > > ACS capability, and neither seems to implement any extra capabilities:
> > > 
> > >         Capabilities: [230 v1] Access Control Services
> > >                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > > UpstreamFwd- EgressCtrl- DirectTrans-
> > >                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > > UpstreamFwd- EgressCtrl- DirectTrans-
> > > 
> > > that was the reason for my choice of value here - after skimming through the
> > > spec (with my still very limited understanding of the details)  
> > 
> > If the above device is a multifunction device or SR-IOV capable device,
> > this is the correct way to indicate there is no p2p between functions.
> > This takes advantage of the wording in the spec that certain
> > capabilities must be implemented by functions that support p2p
> > traffic.  Therefore if the capability is not implemented then p2p is
> > not supported and requires no control.  
> 
> I see, that makes sense, and corresponds with the way the device I know best 
> worked.
> 
> > > > Linux therefore infers that if ACS
> > > > is supported by an endpoint and RR and CR are not implemented, the
> > > > device does not support p2p.      
> > > 
> > > Interesting - I thought the CX5 supported p2p, but I have not kept up with
> > > what
> > > happens on the RDMA list on that front.  
> > 
> > ACS can only indicate p2p capabilities within the device reporting it.
> > The above example indicates that the device does not do p2p internally
> > between functions.  It does not preclude p2p between devices or even
> > functions, but it does require that the p2p traffic occurs on the link
> > and therefore allows routing to be managed through the ACS capabilities
> > of the interconnect components.
> > 
> > For instance if we have a multifunction endpoint with the above
> > capability, we know that a DMA from function 1 targeting an address
> > range on function 0 will not complete the DMA internally, it will go
> > through the egress port to the downstream port above the endpoint.  The
> > ACS capability on that downstream port then controls whether that DMA
> > request is redirected back to the ingress port of the endpoint or
> > forwarded upstream.  If a redirect occurs here then the IOMMU is not
> > fully in control of the IOVA space of the device and the IOMMU group is
> > extended to indicate this.  
> 
> Thanks, this definitely clarifies!
> 
> > > > We could just say that nothing supports
> > > > p2p yet, but single function endpoints (except those implementing
> > > > SR-IOV) must not implement an ACS capability per the spec, which could
> > > > be difficult to exclude since the multifunction bit is handled
> > > > separately from the device model.    
> > > 
> > > Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices and
> > > any of
> > > the older Mellanox devices, and our own cancelled Infiniband device, all
> > > seem
> > > not to implement ACS?  
> > 
> > Linux effectively assumes that there is no p2p between VFs or between
> > the PF and VF, so the result is that a multifunction endpoint without
> > ACS supporting SR-IOV would have the PFs grouped together, but the VFs
> > would be grouped individually, assuming the upstream topology provides
> > isolation.  Intel has since provided verification that nearly none of
> > their multi-port NICs implement internal p2p and they now use quirks to
> > provide the ACS equivalent information for grouping for those older
> > devices.  Newer Intel NICs implement ACS capabilities as in your
> > example above to indicate this without the need for quirks.  Thanks,  
> 
> Yes, so for devices like the CX4 and CX5 which exposes more than one PF,
> they need the ACS capability to indicate that individual PFs are isolated from 
> each other, while the single PF SR/IOV devices can do without.
> 
> Thanks for the excellent explanations, this makes it all much clearer!
> Knut
> 
> > Alex
> >   
> > > > Also comment style:
> > > > https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127    
> > > 
> > > I see - will fix,
> > >   
> > > > > +        cap_bits =
> > > > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > > > PCI_ACS_UF;
> > > > > +    }
> > > > > +
> > > > > +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > > > > +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > > > > +}
> > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > index 5b82a0d..4c40711 100644
> > > > > --- a/include/hw/pci/pcie.h
> > > > > +++ b/include/hw/pci/pcie.h
> > > > > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> > > > >  
> > > > >      /* Offset of ATS capability in config space */
> > > > >      uint16_t ats_cap;
> > > > > +
> > > > > +    /* ACS */
> > > > > +    uint16_t acs_cap;
> > > > >  };
> > > > >  
> > > > >  #define COMPAT_PROP_PCP "power_controller_present"
> > > > > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> > > > >  void pcie_cap_flr_write_config(PCIDevice *dev,
> > > > >                             uint32_t addr, uint32_t val, int len);
> > > > >  
> > > > > +void pcie_cap_acs_reset(PCIDevice *dev);
> > > > > +
> > > > >  /* ARI forwarding capability and control */
> > > > >  void pcie_cap_arifwd_init(PCIDevice *dev);
> > > > >  void pcie_cap_arifwd_reset(PCIDevice *dev);
> > > > > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > >  void pcie_sync_bridge_lnk(PCIDevice *dev);
> > > > >  
> > > > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);    
> > > > 
> > > > nit, why aren't the reset and init declarations together?    
> > > 
> > > Good point, will fix.
> > > 
> > > Thanks!
> > > Knut
> > >   
> > > > >  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t
> > > > > ser_num);
> > > > >  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> > > > >  
> > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > > > index ad4e780..1db86b0 100644
> > > > > --- a/include/hw/pci/pcie_regs.h
> > > > > +++ b/include/hw/pci/pcie_regs.h
> > > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> > > > >                                           PCI_ERR_COR_INTERNAL
> > > > > |         \
> > > > >                                           PCI_ERR_COR_HL_OVERFLOW)
> > > > >  
> > > > > +/* ACS */
> > > > > +#define PCI_ACS_VER                     0x1
> > > > > +#define PCI_ACS_SIZEOF                  8
> > > > > +
> > > > >  #endif /* QEMU_PCIE_REGS_H */    
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
  2019-02-12 17:14           ` Alex Williamson
@ 2019-02-12 19:52             ` Knut Omang
  0 siblings, 0 replies; 13+ messages in thread
From: Knut Omang @ 2019-02-12 19:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Michael S . Tsirkin, Marcel Apfelbaum,
	Stefan Hajnoczi, Elijah Shakkour, Tal Attaly

On Tue, 2019-02-12 at 10:14 -0700, Alex Williamson wrote:
> On Tue, 12 Feb 2019 17:25:46 +0100
> Knut Omang <knut.omang@oracle.com> wrote:
> 
> > On Tue, 2019-02-12 at 08:59 -0700, Alex Williamson wrote:
> > > On Tue, 12 Feb 2019 09:07:43 +0100
> > > Knut Omang <knut.omang@oracle.com> wrote:
> > >   
> > > > On Mon, 2019-02-11 at 16:09 -0700, Alex Williamson wrote:  
> > > > > On Sun, 10 Feb 2019 07:52:59 +0100
> > > > > Knut Omang <knut.omang@oracle.com> wrote:
> > > > >     
> > > > > > Add a helper function to add PCIe capability for Access Control
> > > > > > Services
> > > > > > (ACS)
> > > > > > ACS support in the associated root port is a prerequisite to be able
> > > > > > to
> > > > > > do
> > > > > > passthrough of individual functions of a device with VFIO
> > > > > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > > > > in the guest.    
> > > > > 
> > > > > This is still incorrect, the ACS override patch is only required for
> > > > > separating multifunction endpoints or multifunction root
> > > > > ports.  Single
> > > > > function endpoints are assignable without ACS simply by placing them
> > > > > downstream of a single function root port or directly on the root
> > > > > complex.    
> > > > 
> > > > Hmm - that was the intended meaning of the comment, but I'll see if I
> > > > can
> > > > make
> > > > it more clear by saying it explicitly.  
> > > 
> > > "ACS support... is a prerequisite".  Prerequisite: a thing that is
> > > required as a prior condition for something else to happen or exist.
> > > 
> > > Assignment of individual functions exists today, as is, by using QEMU
> > > to define a PCIe topology that allows the desired grouping.  The code
> > > here enables specific topologies, it is clearly not a prerequisite.
> > >   
> > > > > > Endpoints may also implement an ACS capability, but with
> > > > > > limited features.
> > > > > > 
> > > > > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > > > > ---
> > > > > >  hw/pci/pcie.c              | 29 +++++++++++++++++++++++++++++
> > > > > >  include/hw/pci/pcie.h      |  6 ++++++
> > > > > >  include/hw/pci/pcie_regs.h |  4 ++++
> > > > > >  3 files changed, 39 insertions(+)
> > > > > > 
> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > index 230478f..509632f 100644
> > > > > > --- a/hw/pci/pcie.c
> > > > > > +++ b/hw/pci/pcie.c
> > > > > > @@ -742,6 +742,14 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice
> > > > > > *dev)
> > > > > >          PCI_EXP_DEVCTL2_ARI;
> > > > > >  }
> > > > > >  
> > > > > > +/* Access Control Services (ACS) */
> > > > > > +void pcie_cap_acs_reset(PCIDevice *dev)
> > > > > > +{
> > > > > > +    if (dev->exp.acs_cap) {
> > > > > > +        pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL,
> > > > > > 0);
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > >  /******************************************************************
> > > > > > ****
> > > > > > ****
> > > > > >   * pci express extended capability list management functions
> > > > > >   * uint16_t ext_cap_id (16 bit)
> > > > > > @@ -906,3 +914,24 @@ void pcie_ats_init(PCIDevice *dev, uint16_t
> > > > > > offset)
> > > > > >  
> > > > > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL,
> > > > > > 0x800f);
> > > > > >  }
> > > > > > +
> > > > > > +/* ACS (Access Control Services) */
> > > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > > > > +{
> > > > > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > > > > TYPE_PCIE_SLOT);
> > > > > > +    uint16_t cap_bits = 0;
> > > > > > +
> > > > > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER,
> > > > > > offset,
> > > > > > +                        PCI_ACS_SIZEOF);
> > > > > > +    dev->exp.acs_cap = offset;
> > > > > > +
> > > > > > +    if (is_pcie_slot) {
> > > > > > +        /* Endpoints may also implement ACS, but these capabilities
> > > > > > are
> > > > > > */
> > > > > > +        /* only valid for slots: */    
> > > > > 
> > > > > Not quite, SV, TB, and UF must not be implemented by endpoints, but RR
> > > > > and CR must be implemented by multifunction endpoints that support p2p
> > > > > if they provide an ACS capability.      
> > > > 
> > > > Hmm - are you ok with setting 0 here as I have done, just amending your 
> > > > description to the comment? Then any future emulation that do support
> > > > p2p 
> > > > would have to set the needed bits after calling the init function.  
> > > 
> > > The comment definitely needs work, but I don't know what to do about
> > > single function, non-SR-IOV capable devices calling into this.    
> > 
> > I agree - I have only been thinking about multifunction devices, I should 
> > probably assert on not multifunction (or SR/IOV with my SR/IOV patch set) 
> > to avoid misuse.
> 
> Be sure to check both for (dev->cap_present &
> QEMU_PCI_CAP_MULTIFUNCTION) and PCI_FUNC(dev->devfn) since only
> function 0 is required to have the header bit set.

done thanks,

> > And what about someone passing a PF for SR/IOV through to a VM - 
> > Has that use case showed up (yet) ? 
> > I have so far only tested with my emulated SR/IOV.
> 
> The SR-IOV capability is masked to the L1 guest.  The issue is that an
> assigned SR-IOV PF could create VFs *on the host* which has security
> implications unless those VFs can be quarantined from general use by
> the host kernel.  There have been patches and proposals, but none yet
> that sufficiently address this issue.  Thanks,

Ah, yes, that makes sense.
It's getting late here, so will send v4 tomorrow..

Thanks!
Knut

> Alex
> 
> > > Per the
> > > spec, these devices cannot implement an ACS capability at all, but will
> > > anyone care if they do?  Maybe endpoints need a different init
> > > function.  Maybe the capability ID needs to be obscured until
> > > additional functions or an SR-IOV capability are added.  I'm not really
> > > concerned that this helper can only indicate lack of p2p between
> > > functions for endpoints, other than the comments being wrong.  
> > 
> > Ok, I think I am in the clear then with my v4 comment :-)
> > 
> > > > After your previous comments on this, I had a look at Mellanox CX4 and
> > > > CX5
> > > > which
> > > > are the only devices I could find in the lab that are endpoints and
> > > > implement an
> > > > ACS capability, and neither seems to implement any extra capabilities:
> > > > 
> > > >         Capabilities: [230 v1] Access Control Services
> > > >                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > > > UpstreamFwd- EgressCtrl- DirectTrans-
> > > >                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> > > > UpstreamFwd- EgressCtrl- DirectTrans-
> > > > 
> > > > that was the reason for my choice of value here - after skimming through
> > > > the
> > > > spec (with my still very limited understanding of the details)  
> > > 
> > > If the above device is a multifunction device or SR-IOV capable device,
> > > this is the correct way to indicate there is no p2p between functions.
> > > This takes advantage of the wording in the spec that certain
> > > capabilities must be implemented by functions that support p2p
> > > traffic.  Therefore if the capability is not implemented then p2p is
> > > not supported and requires no control.  
> > 
> > I see, that makes sense, and corresponds with the way the device I know
> > best 
> > worked.
> > 
> > > > > Linux therefore infers that if ACS
> > > > > is supported by an endpoint and RR and CR are not implemented, the
> > > > > device does not support p2p.      
> > > > 
> > > > Interesting - I thought the CX5 supported p2p, but I have not kept up
> > > > with
> > > > what
> > > > happens on the RDMA list on that front.  
> > > 
> > > ACS can only indicate p2p capabilities within the device reporting it.
> > > The above example indicates that the device does not do p2p internally
> > > between functions.  It does not preclude p2p between devices or even
> > > functions, but it does require that the p2p traffic occurs on the link
> > > and therefore allows routing to be managed through the ACS capabilities
> > > of the interconnect components.
> > > 
> > > For instance if we have a multifunction endpoint with the above
> > > capability, we know that a DMA from function 1 targeting an address
> > > range on function 0 will not complete the DMA internally, it will go
> > > through the egress port to the downstream port above the endpoint.  The
> > > ACS capability on that downstream port then controls whether that DMA
> > > request is redirected back to the ingress port of the endpoint or
> > > forwarded upstream.  If a redirect occurs here then the IOMMU is not
> > > fully in control of the IOVA space of the device and the IOMMU group is
> > > extended to indicate this.  
> > 
> > Thanks, this definitely clarifies!
> > 
> > > > > We could just say that nothing supports
> > > > > p2p yet, but single function endpoints (except those implementing
> > > > > SR-IOV) must not implement an ACS capability per the spec, which could
> > > > > be difficult to exclude since the multifunction bit is handled
> > > > > separately from the device model.    
> > > > 
> > > > Hmm - the older SR/IOV devices I know of, some Intel Ethernet devices
> > > > and
> > > > any of
> > > > the older Mellanox devices, and our own cancelled Infiniband device, all
> > > > seem
> > > > not to implement ACS?  
> > > 
> > > Linux effectively assumes that there is no p2p between VFs or between
> > > the PF and VF, so the result is that a multifunction endpoint without
> > > ACS supporting SR-IOV would have the PFs grouped together, but the VFs
> > > would be grouped individually, assuming the upstream topology provides
> > > isolation.  Intel has since provided verification that nearly none of
> > > their multi-port NICs implement internal p2p and they now use quirks to
> > > provide the ACS equivalent information for grouping for those older
> > > devices.  Newer Intel NICs implement ACS capabilities as in your
> > > example above to indicate this without the need for quirks.  Thanks,  
> > 
> > Yes, so for devices like the CX4 and CX5 which exposes more than one PF,
> > they need the ACS capability to indicate that individual PFs are isolated
> > from 
> > each other, while the single PF SR/IOV devices can do without.
> > 
> > Thanks for the excellent explanations, this makes it all much clearer!
> > Knut
> > 
> > > Alex
> > >   
> > > > > Also comment style:
> > > > > https://git.qemu.org/?p=qemu.git;a=blob;f=CODING_STYLE;#l127    
> > > > 
> > > > I see - will fix,
> > > >   
> > > > > > +        cap_bits =
> > > > > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > > > > PCI_ACS_UF;
> > > > > > +    }
> > > > > > +
> > > > > > +    pci_set_word(dev->config + offset + PCI_ACS_CAP, cap_bits);
> > > > > > +    pci_set_word(dev->wmask + offset + PCI_ACS_CTRL, cap_bits);
> > > > > > +}
> > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > > index 5b82a0d..4c40711 100644
> > > > > > --- a/include/hw/pci/pcie.h
> > > > > > +++ b/include/hw/pci/pcie.h
> > > > > > @@ -79,6 +79,9 @@ struct PCIExpressDevice {
> > > > > >  
> > > > > >      /* Offset of ATS capability in config space */
> > > > > >      uint16_t ats_cap;
> > > > > > +
> > > > > > +    /* ACS */
> > > > > > +    uint16_t acs_cap;
> > > > > >  };
> > > > > >  
> > > > > >  #define COMPAT_PROP_PCP "power_controller_present"
> > > > > > @@ -116,6 +119,8 @@ void pcie_cap_flr_init(PCIDevice *dev);
> > > > > >  void pcie_cap_flr_write_config(PCIDevice *dev,
> > > > > >                             uint32_t addr, uint32_t val, int len);
> > > > > >  
> > > > > > +void pcie_cap_acs_reset(PCIDevice *dev);
> > > > > > +
> > > > > >  /* ARI forwarding capability and control */
> > > > > >  void pcie_cap_arifwd_init(PCIDevice *dev);
> > > > > >  void pcie_cap_arifwd_reset(PCIDevice *dev);
> > > > > > @@ -129,6 +134,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > > >  void pcie_sync_bridge_lnk(PCIDevice *dev);
> > > > > >  
> > > > > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t
> > > > > > nextfn);
> > > > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset);    
> > > > > 
> > > > > nit, why aren't the reset and init declarations together?    
> > > > 
> > > > Good point, will fix.
> > > > 
> > > > Thanks!
> > > > Knut
> > > >   
> > > > > >  void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset,
> > > > > > uint64_t
> > > > > > ser_num);
> > > > > >  void pcie_ats_init(PCIDevice *dev, uint16_t offset);
> > > > > >  
> > > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > > > > > index ad4e780..1db86b0 100644
> > > > > > --- a/include/hw/pci/pcie_regs.h
> > > > > > +++ b/include/hw/pci/pcie_regs.h
> > > > > > @@ -175,4 +175,8 @@ typedef enum PCIExpLinkWidth {
> > > > > >                                           PCI_ERR_COR_INTERNAL
> > > > > > >         \
> > > > > >                                           PCI_ERR_COR_HL_OVERFLOW)
> > > > > >  
> > > > > > +/* ACS */
> > > > > > +#define PCI_ACS_VER                     0x1
> > > > > > +#define PCI_ACS_SIZEOF                  8
> > > > > > +
> > > > > >  #endif /* QEMU_PCIE_REGS_H */    

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability
  2019-02-11  8:07   ` Marcel Apfelbaum
@ 2019-02-13  9:34     ` Knut Omang
  0 siblings, 0 replies; 13+ messages in thread
From: Knut Omang @ 2019-02-13  9:34 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu-devel
  Cc: Michael S . Tsirkin, Alex Williamson, Stefan Hajnoczi,
	Elijah Shakkour, Tal Attaly

On Mon, 2019-02-11 at 10:07 +0200, Marcel Apfelbaum wrote:
> 
> On 2/10/19 8:53 AM, Knut Omang wrote:
> > Claim ACS support in the generic PCIe root port to allow
> > passthrough of individual functions of a device to different
> > guests (in a nested virt.setting) with VFIO.
> > Without this patch, all functions of a device, such as all VFs of
> > an SR/IOV device, will end up in the same IOMMU group.
> > A similar situation occurs on Windows with Hyper-V.
> > 
> > In the single function device case, it also has a small cosmetic
> > benefit in that the root port itself is not grouped with
> > the device. VFIO handles that situation in that binding rules
> > only apply to endpoints, so it does not limit passthrough in
> > those cases.
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > ---
> >   hw/pci-bridge/gen_pcie_root_port.c | 4 ++++
> >   hw/pci-bridge/pcie_root_port.c     | 4 ++++
> >   include/hw/pci/pcie_port.h         | 1 +
> >   3 files changed, 9 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/gen_pcie_root_port.c b/hw/pci-
> > bridge/gen_pcie_root_port.c
> > index 9766edb..26bda73 100644
> > --- a/hw/pci-bridge/gen_pcie_root_port.c
> > +++ b/hw/pci-bridge/gen_pcie_root_port.c
> > @@ -20,6 +20,9 @@
> >           OBJECT_CHECK(GenPCIERootPort, (obj), TYPE_GEN_PCIE_ROOT_PORT)
> >   
> >   #define GEN_PCIE_ROOT_PORT_AER_OFFSET           0x100
> > +#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
> > +        (GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
> > +
> >   #define GEN_PCIE_ROOT_PORT_MSIX_NR_VECTOR       1
> >   
> >   typedef struct GenPCIERootPort {
> > @@ -149,6 +152,7 @@ static void gen_rp_dev_class_init(ObjectClass *klass,
> > void *data)
> >       rpc->interrupts_init = gen_rp_interrupts_init;
> >       rpc->interrupts_uninit = gen_rp_interrupts_uninit;
> >       rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
> > +    rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
> >   }
> >   
> >   static const TypeInfo gen_rp_dev_info = {
> > diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
> > index 34ad767..a0b4cf7 100644
> > --- a/hw/pci-bridge/pcie_root_port.c
> > +++ b/hw/pci-bridge/pcie_root_port.c
> > @@ -47,6 +47,7 @@ static void rp_reset(DeviceState *qdev)
> >       pcie_cap_deverr_reset(d);
> >       pcie_cap_slot_reset(d);
> >       pcie_cap_arifwd_reset(d);
> > +    pcie_cap_acs_reset(d);
> >       pcie_aer_root_reset(d);
> >       pci_bridge_reset(qdev);
> >       pci_bridge_disable_base_limit(d);
> > @@ -106,6 +107,9 @@ static void rp_realize(PCIDevice *d, Error **errp)
> >       pcie_aer_root_init(d);
> >       rp_aer_vector_update(d);
> >   
> > +    if (rpc->acs_offset) {
> > +        pcie_acs_init(d, rpc->acs_offset);
> > +    }
> >       return;
> >   
> >   err:
> > diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
> > index df242a0..09586f4 100644
> > --- a/include/hw/pci/pcie_port.h
> > +++ b/include/hw/pci/pcie_port.h
> > @@ -78,6 +78,7 @@ typedef struct PCIERootPortClass {
> >       int exp_offset;
> >       int aer_offset;
> >       int ssvid_offset;
> > +    int acs_offset;    /* If nonzero, optional ACS capability offset */
> >       int ssid;
> >   } PCIERootPortClass;
> >   
> 
> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>

Thanks! I added this r-b to v4, but left out the one for patch 1 as it got some
more changes based on Alex's feedback and our fruitful discussion afterwards,
so you might want to recheck it.

Knut
> 
> Thanks,
> Marcel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-02-13  9:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-10  6:52 [Qemu-devel] [PATCH v3 0/2] pcie: Add simple ACS "support" to the generic PCIe root port Knut Omang
2019-02-10  6:52 ` [Qemu-devel] [PATCH v3 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function Knut Omang
2019-02-11  8:05   ` Marcel Apfelbaum
2019-02-11 23:09   ` Alex Williamson
2019-02-12  8:07     ` Knut Omang
2019-02-12  9:48       ` Knut Omang
2019-02-12 15:59       ` Alex Williamson
2019-02-12 16:25         ` Knut Omang
2019-02-12 17:14           ` Alex Williamson
2019-02-12 19:52             ` Knut Omang
2019-02-10  6:53 ` [Qemu-devel] [PATCH v3 2/2] gen_pcie_root_port: Add ACS (Access Control Services) capability Knut Omang
2019-02-11  8:07   ` Marcel Apfelbaum
2019-02-13  9:34     ` Knut Omang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.