All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] pcie: Add support for Single Root I/O Virtualization
@ 2014-09-02 11:00 Knut Omang
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Knut Omang @ 2014-09-02 11:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi,
	Michael S. Tsirkin, Laszlo Ersek, Gabriel Somlo, Knut Omang,
	Alexander Graf, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong

This patch set implements generic support for SR/IOV as an extension to the 
core PCIe functionality, similar to the way other capabilities such as AER
is implemented. 

There is no implementation of any device that provides
SR/IOV support included, but I have implemented a test 
example which can be found together with this patch set here:

  git://github.com/knuto/qemu.git sriov_patches_v2

Testing with this example device documented here:

  http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05110.html 

Changes since v1:
  - Rebased on top of latest master, eliminating prereqs.
  - Implement proper support for VF_STRIDE, VF_OFFSET and SUP_PGSIZE
    Time better spent fixing it than explaining what the previous 
    limitations were.
    - Added new first patch to fix pci bug related to this
  - Split out patch to pci_default_config_write to a separate patch 2
    to highlight bug fix.
  - Refactored out logic into new source files 
    hw/pci/pcie_sriov.c include/hw/pci/pcie_sriov.h
    similar to pcie_aer.c/h.
  - Rename functions and introduce structs to better separate 
    pf and vf functionality.
  - Replaced is_vf member with pci_is_vf() function abstraction
  - Fix numerous syntax, whitespace and comment issues
    according to Michael's review.
  - Fix memory leaks.
  - Removed igb example device - a rebased version available
    on github instead.

Knut Omang (4):
  pci: Make use of the devfn property when registering new devices
  pci: Avoid losing config updates to MSI/MSIX cap regs
  pci: Update pci_regs header
  pcie: Add support for Single Root I/O Virtualization (SR/IOV)

 hw/i386/kvm/pci-assign.c    |   4 +-
 hw/misc/vfio.c              |   8 +-
 hw/pci/Makefile.objs        |   2 +-
 hw/pci/msi.c                |   4 -
 hw/pci/msix.c               |   2 +-
 hw/pci/pci.c                | 106 +++++++++----
 hw/pci/pcie.c               |   9 +-
 hw/pci/pcie_sriov.c         | 267 +++++++++++++++++++++++++++++++
 include/hw/pci/pci.h        |  11 +-
 include/hw/pci/pci_regs.h   | 371 +++++++++++++++++++++++++++++++++-----------
 include/hw/pci/pcie.h       |   6 +
 include/hw/pci/pcie_sriov.h |  55 +++++++
 include/qemu/typedefs.h     |   2 +
 13 files changed, 708 insertions(+), 139 deletions(-)
 create mode 100644 hw/pci/pcie_sriov.c
 create mode 100644 include/hw/pci/pcie_sriov.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices
  2014-09-02 11:00 [Qemu-devel] [PATCH v2 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
@ 2014-09-02 11:00 ` Knut Omang
  2014-09-02 13:03   ` Michael S. Tsirkin
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 2/4] pci: Avoid losing config updates to MSI/MSIX cap regs Knut Omang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Knut Omang @ 2014-09-02 11:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi,
	Michael S. Tsirkin, Laszlo Ersek, Gabriel Somlo, Knut Omang,
	Alexander Graf, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong

Without this, the devfn argument to pci_create_*()
does not affect the assigned devfn.

Needed to support (VF_STRIDE,VF_OFFSET) values other than (1,1)
for SR/IOV.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 hw/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index daeaeac..6b21dee 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1757,7 +1757,7 @@ static int pci_qdev_init(DeviceState *qdev)
     bus = PCI_BUS(qdev_get_parent_bus(qdev));
     pci_dev = do_pci_register_device(pci_dev, bus,
                                      object_get_typename(OBJECT(qdev)),
-                                     pci_dev->devfn);
+                                     object_property_get_int(OBJECT(qdev), "addr", NULL));
     if (pci_dev == NULL)
         return -1;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] pci: Avoid losing config updates to MSI/MSIX cap regs
  2014-09-02 11:00 [Qemu-devel] [PATCH v2 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
@ 2014-09-02 11:00 ` Knut Omang
  2014-09-02 12:57   ` Michael S. Tsirkin
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 3/4] pci: Update pci_regs header Knut Omang
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 4/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
  3 siblings, 1 reply; 10+ messages in thread
From: Knut Omang @ 2014-09-02 11:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi,
	Michael S. Tsirkin, Laszlo Ersek, Gabriel Somlo, Knut Omang,
	Alexander Graf, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong

Introduce an extra local variable to avoid that the
shifting in the for loop causes updates to msi/msix capability
registers to get lost in pci_default_write_config.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 hw/pci/pci.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6b21dee..af547dc 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1146,9 +1146,10 @@ uint32_t pci_default_read_config(PCIDevice *d,
     return le32_to_cpu(val);
 }
 
-void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
+void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int l)
 {
     int i, was_irq_disabled = pci_irq_disabled(d);
+    uint32_t val = val_in;
 
     for (i = 0; i < l; val >>= 8, ++i) {
         uint8_t wmask = d->wmask[addr + i];
@@ -1170,8 +1171,8 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
                                     & PCI_COMMAND_MASTER);
     }
 
-    msi_write_config(d, addr, val, l);
-    msix_write_config(d, addr, val, l);
+    msi_write_config(d, addr, val_in, l);
+    msix_write_config(d, addr, val_in, l);
 }
 
 /***********************************************************/
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] pci: Update pci_regs header
  2014-09-02 11:00 [Qemu-devel] [PATCH v2 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 2/4] pci: Avoid losing config updates to MSI/MSIX cap regs Knut Omang
@ 2014-09-02 11:00 ` Knut Omang
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 4/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
  3 siblings, 0 replies; 10+ messages in thread
From: Knut Omang @ 2014-09-02 11:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi,
	Michael S. Tsirkin, Laszlo Ersek, Gabriel Somlo, Knut Omang,
	Alexander Graf, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong

Pulls in latest version from kernel v3.16
msi: Removed local definitions now in pci_regs.h

Also replace usage of PCI_MSIX_FLAGS_BIRMASK with PCI_MSIX_TABLE_BIR
to keep in sync with the kernel for future updates.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 hw/i386/kvm/pci-assign.c  |   4 +-
 hw/misc/vfio.c            |   8 +-
 hw/pci/msi.c              |   4 -
 hw/pci/msix.c             |   2 +-
 include/hw/pci/pci_regs.h | 371 ++++++++++++++++++++++++++++++++++------------
 5 files changed, 285 insertions(+), 104 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 17c7d6dc..a2285c4 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1321,8 +1321,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev, Error **errp)
                      PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
 
         msix_table_entry = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
-        bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
-        msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
+        bar_nr = msix_table_entry & PCI_MSIX_TABLE_BIR;
+        msix_table_entry &= ~PCI_MSIX_TABLE_BIR;
         dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
         dev->msix_max = msix_max;
     }
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 40dcaa6..f275084 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -2799,10 +2799,10 @@ static int vfio_early_setup_msix(VFIODevice *vdev)
     pba = le32_to_cpu(pba);
 
     vdev->msix = g_malloc0(sizeof(*(vdev->msix)));
-    vdev->msix->table_bar = table & PCI_MSIX_FLAGS_BIRMASK;
-    vdev->msix->table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
-    vdev->msix->pba_bar = pba & PCI_MSIX_FLAGS_BIRMASK;
-    vdev->msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
+    vdev->msix->table_bar = table & PCI_MSIX_TABLE_BIR;
+    vdev->msix->table_offset = table & ~PCI_MSIX_TABLE_BIR;
+    vdev->msix->pba_bar = pba & PCI_MSIX_TABLE_BIR;
+    vdev->msix->pba_offset = pba & ~PCI_MSIX_TABLE_BIR;
     vdev->msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
 
     DPRINTF("%04x:%02x:%02x.%x "
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 52d2313..1d25b62 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -21,10 +21,6 @@
 #include "hw/pci/msi.h"
 #include "qemu/range.h"
 
-/* Eventually those constants should go to Linux pci_regs.h */
-#define PCI_MSI_PENDING_32      0x10
-#define PCI_MSI_PENDING_64      0x14
-
 /* PCI_MSI_ADDRESS_LO */
 #define PCI_MSI_ADDRESS_LO_MASK         (~0x3)
 
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 24de260..8206264 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -250,7 +250,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
          ranges_overlap(table_offset, table_size, pba_offset, pba_size)) ||
         table_offset + table_size > memory_region_size(table_bar) ||
         pba_offset + pba_size > memory_region_size(pba_bar) ||
-        (table_offset | pba_offset) & PCI_MSIX_FLAGS_BIRMASK) {
+        (table_offset | pba_offset) & PCI_MSIX_TABLE_BIR) {
         return -EINVAL;
     }
 
diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index 56a404b..30db069 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -13,10 +13,10 @@
  *	PCI to PCI Bridge Specification
  *	PCI System Design Guide
  *
- * 	For hypertransport information, please consult the following manuals
- * 	from http://www.hypertransport.org
+ *	For HyperTransport information, please consult the following manuals
+ *	from http://www.hypertransport.org
  *
- *	The Hypertransport I/O Link Specification
+ *	The HyperTransport I/O Link Specification
  */
 
 #ifndef LINUX_PCI_REGS_H
@@ -26,6 +26,7 @@
  * Under PCI, each device has 256 bytes of configuration address space,
  * of which the first 64 bytes are standardized as follows:
  */
+#define PCI_STD_HEADER_SIZEOF	64
 #define PCI_VENDOR_ID		0x00	/* 16 bits */
 #define PCI_DEVICE_ID		0x02	/* 16 bits */
 #define PCI_COMMAND		0x04	/* 16 bits */
@@ -36,7 +37,7 @@
 #define  PCI_COMMAND_INVALIDATE	0x10	/* Use memory write and invalidate */
 #define  PCI_COMMAND_VGA_PALETTE 0x20	/* Enable palette snooping */
 #define  PCI_COMMAND_PARITY	0x40	/* Enable parity checking */
-#define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
+#define  PCI_COMMAND_WAIT	0x80	/* Enable address/data stepping */
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
@@ -44,7 +45,7 @@
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
-#define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
+#define  PCI_STATUS_66MHZ	0x20	/* Support 66 MHz PCI 2.1 bus */
 #define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
 #define  PCI_STATUS_FAST_BACK	0x80	/* Accept fast-back to back */
 #define  PCI_STATUS_PARITY	0x100	/* Detected parity error */
@@ -125,7 +126,8 @@
 #define  PCI_IO_RANGE_TYPE_MASK	0x0fUL	/* I/O bridging type */
 #define  PCI_IO_RANGE_TYPE_16	0x00
 #define  PCI_IO_RANGE_TYPE_32	0x01
-#define  PCI_IO_RANGE_MASK	(~0x0fUL)
+#define  PCI_IO_RANGE_MASK	(~0x0fUL) /* Standard 4K I/O windows */
+#define  PCI_IO_1K_RANGE_MASK	(~0x03UL) /* Intel 1K I/O windows */
 #define PCI_SEC_STATUS		0x1e	/* Secondary status register, only bit 14 used */
 #define PCI_MEMORY_BASE		0x20	/* Memory range behind */
 #define PCI_MEMORY_LIMIT	0x22
@@ -203,16 +205,18 @@
 #define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
 #define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
 #define  PCI_CAP_ID_HT		0x08	/* HyperTransport */
-#define  PCI_CAP_ID_VNDR	0x09	/* Vendor specific */
+#define  PCI_CAP_ID_VNDR	0x09	/* Vendor-Specific */
 #define  PCI_CAP_ID_DBG		0x0A	/* Debug port */
 #define  PCI_CAP_ID_CCRC	0x0B	/* CompactPCI Central Resource Control */
-#define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
+#define  PCI_CAP_ID_SHPC	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
-#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
+#define  PCI_CAP_ID_SECDEV	0x0F	/* Secure Device */
+#define  PCI_CAP_ID_EXP		0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
-#define  PCI_CAP_ID_SATA	0x12	/* Serial ATA */
+#define  PCI_CAP_ID_SATA	0x12	/* SATA Data/Index Conf. */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
+#define  PCI_CAP_ID_MAX		PCI_CAP_ID_AF
 #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
 #define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF		4
@@ -264,8 +268,8 @@
 #define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
 #define  PCI_AGP_COMMAND_SBA	0x0200	/* Sideband addressing enabled */
 #define  PCI_AGP_COMMAND_AGP	0x0100	/* Allow processing of AGP transactions */
-#define  PCI_AGP_COMMAND_64BIT	0x0020 	/* Allow processing of 64-bit addresses */
-#define  PCI_AGP_COMMAND_FW	0x0010 	/* Force FW transfers */
+#define  PCI_AGP_COMMAND_64BIT	0x0020	/* Allow processing of 64-bit addresses */
+#define  PCI_AGP_COMMAND_FW	0x0010	/* Force FW transfers */
 #define  PCI_AGP_COMMAND_RATE4	0x0004	/* Use 4x rate */
 #define  PCI_AGP_COMMAND_RATE2	0x0002	/* Use 2x rate */
 #define  PCI_AGP_COMMAND_RATE1	0x0001	/* Use 1x rate */
@@ -277,6 +281,7 @@
 #define  PCI_VPD_ADDR_MASK	0x7fff	/* Address mask */
 #define  PCI_VPD_ADDR_F		0x8000	/* Write 0, 1 indicates completion */
 #define PCI_VPD_DATA		4	/* 32-bits of data returned here */
+#define PCI_CAP_VPD_SIZEOF	8
 
 /* Slot Identification */
 
@@ -287,30 +292,36 @@
 
 /* Message Signalled Interrupts registers */
 
-#define PCI_MSI_FLAGS		2	/* Various flags */
-#define  PCI_MSI_FLAGS_64BIT	0x80	/* 64-bit addresses allowed */
-#define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
-#define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
-#define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
-#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
+#define PCI_MSI_FLAGS		2	/* Message Control */
+#define  PCI_MSI_FLAGS_ENABLE	0x0001	/* MSI feature enabled */
+#define  PCI_MSI_FLAGS_QMASK	0x000e	/* Maximum queue size available */
+#define  PCI_MSI_FLAGS_QSIZE	0x0070	/* Message queue size configured */
+#define  PCI_MSI_FLAGS_64BIT	0x0080	/* 64-bit addresses allowed */
+#define  PCI_MSI_FLAGS_MASKBIT	0x0100	/* Per-vector masking capable */
 #define PCI_MSI_RFU		3	/* Rest of capability flags */
 #define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
 #define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
 #define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
 #define PCI_MSI_MASK_32		12	/* Mask bits register for 32-bit devices */
+#define PCI_MSI_PENDING_32	16	/* Pending intrs for 32-bit devices */
 #define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
 #define PCI_MSI_MASK_64		16	/* Mask bits register for 64-bit devices */
+#define PCI_MSI_PENDING_64	20	/* Pending intrs for 64-bit devices */
 
 /* MSI-X registers */
-#define PCI_MSIX_FLAGS		2
-#define  PCI_MSIX_FLAGS_QSIZE	0x7FF
-#define  PCI_MSIX_FLAGS_ENABLE	(1 << 15)
-#define  PCI_MSIX_FLAGS_MASKALL	(1 << 14)
-#define PCI_MSIX_TABLE		4
-#define PCI_MSIX_PBA		8
-#define  PCI_MSIX_FLAGS_BIRMASK	(7 << 0)
-
-/* MSI-X entry's format */
+#define PCI_MSIX_FLAGS		2	/* Message Control */
+#define  PCI_MSIX_FLAGS_QSIZE	0x07FF	/* Table size */
+#define  PCI_MSIX_FLAGS_MASKALL	0x4000	/* Mask all vectors for this function */
+#define  PCI_MSIX_FLAGS_ENABLE	0x8000	/* MSI-X enable */
+#define PCI_MSIX_TABLE		4	/* Table offset */
+#define  PCI_MSIX_TABLE_BIR	0x00000007 /* BAR index */
+#define  PCI_MSIX_TABLE_OFFSET	0xfffffff8 /* Offset into specified BAR */
+#define PCI_MSIX_PBA		8	/* Pending Bit Array offset */
+#define  PCI_MSIX_PBA_BIR	0x00000007 /* BAR index */
+#define  PCI_MSIX_PBA_OFFSET	0xfffffff8 /* Offset into specified BAR */
+#define PCI_CAP_MSIX_SIZEOF	12	/* size of MSIX registers */
+
+/* MSI-X Table entry format */
 #define PCI_MSIX_ENTRY_SIZE		16
 #define  PCI_MSIX_ENTRY_LOWER_ADDR	0
 #define  PCI_MSIX_ENTRY_UPPER_ADDR	4
@@ -339,8 +350,9 @@
 #define  PCI_AF_CTRL_FLR	0x01
 #define PCI_AF_STATUS		5
 #define  PCI_AF_STATUS_TP	0x01
+#define PCI_CAP_AF_SIZEOF	6	/* size of AF registers */
 
-/* PCI-X registers */
+/* PCI-X registers (Type 0 (non-bridge) devices) */
 
 #define PCI_X_CMD		2	/* Modes & Features */
 #define  PCI_X_CMD_DPERR_E	0x0001	/* Data Parity Error Recovery Enable */
@@ -360,7 +372,7 @@
 #define  PCI_X_CMD_SPLIT_16	0x0060	/* Max 16 */
 #define  PCI_X_CMD_SPLIT_32	0x0070	/* Max 32 */
 #define  PCI_X_CMD_MAX_SPLIT	0x0070	/* Max Outstanding Split Transactions */
-#define  PCI_X_CMD_VERSION(x) 	(((x) >> 12) & 3) /* Version */
+#define  PCI_X_CMD_VERSION(x)	(((x) >> 12) & 3) /* Version */
 #define PCI_X_STATUS		4	/* PCI-X capabilities */
 #define  PCI_X_STATUS_DEVFN	0x000000ff	/* A copy of devfn */
 #define  PCI_X_STATUS_BUS	0x0000ff00	/* A copy of bus nr */
@@ -375,11 +387,28 @@
 #define  PCI_X_STATUS_SPL_ERR	0x20000000	/* Rcvd Split Completion Error Msg */
 #define  PCI_X_STATUS_266MHZ	0x40000000	/* 266 MHz capable */
 #define  PCI_X_STATUS_533MHZ	0x80000000	/* 533 MHz capable */
+#define PCI_X_ECC_CSR		8	/* ECC control and status */
+#define PCI_CAP_PCIX_SIZEOF_V0	8	/* size of registers for Version 0 */
+#define PCI_CAP_PCIX_SIZEOF_V1	24	/* size for Version 1 */
+#define PCI_CAP_PCIX_SIZEOF_V2	PCI_CAP_PCIX_SIZEOF_V1	/* Same for v2 */
+
+/* PCI-X registers (Type 1 (bridge) devices) */
+
+#define PCI_X_BRIDGE_SSTATUS	2	/* Secondary Status */
+#define  PCI_X_SSTATUS_64BIT	0x0001	/* Secondary AD interface is 64 bits */
+#define  PCI_X_SSTATUS_133MHZ	0x0002	/* 133 MHz capable */
+#define  PCI_X_SSTATUS_FREQ	0x03c0	/* Secondary Bus Mode and Frequency */
+#define  PCI_X_SSTATUS_VERS	0x3000	/* PCI-X Capability Version */
+#define  PCI_X_SSTATUS_V1	0x1000	/* Mode 2, not Mode 1 */
+#define  PCI_X_SSTATUS_V2	0x2000	/* Mode 1 or Modes 1 and 2 */
+#define  PCI_X_SSTATUS_266MHZ	0x4000	/* 266 MHz capable */
+#define  PCI_X_SSTATUS_533MHZ	0x8000	/* 533 MHz capable */
+#define PCI_X_BRIDGE_STATUS	4	/* Bridge Status */
 
 /* PCI Bridge Subsystem ID registers */
 
-#define PCI_SSVID_VENDOR_ID     4	/* PCI-Bridge subsystem vendor id register */
-#define PCI_SSVID_DEVICE_ID     6	/* PCI-Bridge subsystem device id register */
+#define PCI_SSVID_VENDOR_ID     4	/* PCI Bridge subsystem vendor ID */
+#define PCI_SSVID_DEVICE_ID     6	/* PCI Bridge subsystem device ID */
 
 /* PCI Express capability registers */
 
@@ -391,24 +420,24 @@
 #define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
 #define  PCI_EXP_TYPE_UPSTREAM	0x5	/* Upstream Port */
 #define  PCI_EXP_TYPE_DOWNSTREAM 0x6	/* Downstream Port */
-#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7	/* PCI/PCI-X Bridge */
-#define  PCI_EXP_TYPE_PCIE_BRIDGE 0x8   /* PCI/PCI-X to PCIE Bridge */
+#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7	/* PCIe to PCI/PCI-X Bridge */
+#define  PCI_EXP_TYPE_PCIE_BRIDGE 0x8	/* PCI/PCI-X to PCIe Bridge */
 #define  PCI_EXP_TYPE_RC_END	0x9	/* Root Complex Integrated Endpoint */
-#define  PCI_EXP_TYPE_RC_EC     0xa     /* Root Complex Event Collector */
+#define  PCI_EXP_TYPE_RC_EC	0xa	/* Root Complex Event Collector */
 #define PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
 #define PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
 #define PCI_EXP_DEVCAP		4	/* Device capabilities */
-#define  PCI_EXP_DEVCAP_PAYLOAD	0x07	/* Max_Payload_Size */
-#define  PCI_EXP_DEVCAP_PHANTOM	0x18	/* Phantom functions */
-#define  PCI_EXP_DEVCAP_EXT_TAG	0x20	/* Extended tags */
-#define  PCI_EXP_DEVCAP_L0S	0x1c0	/* L0s Acceptable Latency */
-#define  PCI_EXP_DEVCAP_L1	0xe00	/* L1 Acceptable Latency */
-#define  PCI_EXP_DEVCAP_ATN_BUT	0x1000	/* Attention Button Present */
-#define  PCI_EXP_DEVCAP_ATN_IND	0x2000	/* Attention Indicator Present */
-#define  PCI_EXP_DEVCAP_PWR_IND	0x4000	/* Power Indicator Present */
-#define  PCI_EXP_DEVCAP_RBER	0x8000	/* Role-Based Error Reporting */
-#define  PCI_EXP_DEVCAP_PWR_VAL	0x3fc0000 /* Slot Power Limit Value */
-#define  PCI_EXP_DEVCAP_PWR_SCL	0xc000000 /* Slot Power Limit Scale */
+#define  PCI_EXP_DEVCAP_PAYLOAD	0x00000007 /* Max_Payload_Size */
+#define  PCI_EXP_DEVCAP_PHANTOM	0x00000018 /* Phantom functions */
+#define  PCI_EXP_DEVCAP_EXT_TAG	0x00000020 /* Extended tags */
+#define  PCI_EXP_DEVCAP_L0S	0x000001c0 /* L0s Acceptable Latency */
+#define  PCI_EXP_DEVCAP_L1	0x00000e00 /* L1 Acceptable Latency */
+#define  PCI_EXP_DEVCAP_ATN_BUT	0x00001000 /* Attention Button Present */
+#define  PCI_EXP_DEVCAP_ATN_IND	0x00002000 /* Attention Indicator Present */
+#define  PCI_EXP_DEVCAP_PWR_IND	0x00004000 /* Power Indicator Present */
+#define  PCI_EXP_DEVCAP_RBER	0x00008000 /* Role-Based Error Reporting */
+#define  PCI_EXP_DEVCAP_PWR_VAL	0x03fc0000 /* Slot Power Limit Value */
+#define  PCI_EXP_DEVCAP_PWR_SCL	0x0c000000 /* Slot Power Limit Scale */
 #define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
 #define PCI_EXP_DEVCTL		8	/* Device Control */
 #define  PCI_EXP_DEVCTL_CERE	0x0001	/* Correctable Error Reporting En. */
@@ -424,45 +453,55 @@
 #define  PCI_EXP_DEVCTL_READRQ	0x7000	/* Max_Read_Request_Size */
 #define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
 #define PCI_EXP_DEVSTA		10	/* Device Status */
-#define  PCI_EXP_DEVSTA_CED	0x01	/* Correctable Error Detected */
-#define  PCI_EXP_DEVSTA_NFED	0x02	/* Non-Fatal Error Detected */
-#define  PCI_EXP_DEVSTA_FED	0x04	/* Fatal Error Detected */
-#define  PCI_EXP_DEVSTA_URD	0x08	/* Unsupported Request Detected */
-#define  PCI_EXP_DEVSTA_AUXPD	0x10	/* AUX Power Detected */
-#define  PCI_EXP_DEVSTA_TRPND	0x20	/* Transactions Pending */
+#define  PCI_EXP_DEVSTA_CED	0x0001	/* Correctable Error Detected */
+#define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
+#define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
+#define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
+#define  PCI_EXP_DEVSTA_AUXPD	0x0010	/* AUX Power Detected */
+#define  PCI_EXP_DEVSTA_TRPND	0x0020	/* Transactions Pending */
 #define PCI_EXP_LNKCAP		12	/* Link Capabilities */
 #define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
+#define  PCI_EXP_LNKCAP_SLS_2_5GB 0x00000001 /* LNKCAP2 SLS Vector bit 0 */
+#define  PCI_EXP_LNKCAP_SLS_5_0GB 0x00000002 /* LNKCAP2 SLS Vector bit 1 */
 #define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
 #define  PCI_EXP_LNKCAP_ASPMS	0x00000c00 /* ASPM Support */
 #define  PCI_EXP_LNKCAP_L0SEL	0x00007000 /* L0s Exit Latency */
 #define  PCI_EXP_LNKCAP_L1EL	0x00038000 /* L1 Exit Latency */
-#define  PCI_EXP_LNKCAP_CLKPM	0x00040000 /* L1 Clock Power Management */
+#define  PCI_EXP_LNKCAP_CLKPM	0x00040000 /* Clock Power Management */
 #define  PCI_EXP_LNKCAP_SDERC	0x00080000 /* Surprise Down Error Reporting Capable */
 #define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
 #define  PCI_EXP_LNKCAP_LBNC	0x00200000 /* Link Bandwidth Notification Capability */
 #define  PCI_EXP_LNKCAP_PN	0xff000000 /* Port Number */
 #define PCI_EXP_LNKCTL		16	/* Link Control */
 #define  PCI_EXP_LNKCTL_ASPMC	0x0003	/* ASPM Control */
+#define  PCI_EXP_LNKCTL_ASPM_L0S 0x0001	/* L0s Enable */
+#define  PCI_EXP_LNKCTL_ASPM_L1  0x0002	/* L1 Enable */
 #define  PCI_EXP_LNKCTL_RCB	0x0008	/* Read Completion Boundary */
 #define  PCI_EXP_LNKCTL_LD	0x0010	/* Link Disable */
 #define  PCI_EXP_LNKCTL_RL	0x0020	/* Retrain Link */
 #define  PCI_EXP_LNKCTL_CCC	0x0040	/* Common Clock Configuration */
 #define  PCI_EXP_LNKCTL_ES	0x0080	/* Extended Synch */
-#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100	/* Enable clkreq */
+#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x0100 /* Enable clkreq */
 #define  PCI_EXP_LNKCTL_HAWD	0x0200	/* Hardware Autonomous Width Disable */
 #define  PCI_EXP_LNKCTL_LBMIE	0x0400	/* Link Bandwidth Management Interrupt Enable */
-#define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Lnk Autonomous Bandwidth Interrupt Enable */
+#define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Link Autonomous Bandwidth Interrupt Enable */
 #define PCI_EXP_LNKSTA		18	/* Link Status */
 #define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
-#define  PCI_EXP_LNKSTA_CLS_2_5GB 0x01	/* Current Link Speed 2.5GT/s */
-#define  PCI_EXP_LNKSTA_CLS_5_0GB 0x02	/* Current Link Speed 5.0GT/s */
-#define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Nogotiated Link Width */
+#define  PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */
+#define  PCI_EXP_LNKSTA_CLS_5_0GB 0x0002 /* Current Link Speed 5.0GT/s */
+#define  PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */
+#define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Negotiated Link Width */
+#define  PCI_EXP_LNKSTA_NLW_X1	0x0010	/* Current Link Width x1 */
+#define  PCI_EXP_LNKSTA_NLW_X2	0x0020	/* Current Link Width x2 */
+#define  PCI_EXP_LNKSTA_NLW_X4	0x0040	/* Current Link Width x4 */
+#define  PCI_EXP_LNKSTA_NLW_X8	0x0080	/* Current Link Width x8 */
 #define  PCI_EXP_LNKSTA_NLW_SHIFT 4	/* start of NLW mask in link status */
 #define  PCI_EXP_LNKSTA_LT	0x0800	/* Link Training */
 #define  PCI_EXP_LNKSTA_SLC	0x1000	/* Slot Clock Configuration */
 #define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
 #define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
 #define  PCI_EXP_LNKSTA_LABS	0x8000	/* Link Autonomous Bandwidth Status */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1	20	/* v1 endpoints end here */
 #define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
 #define  PCI_EXP_SLTCAP_ABP	0x00000001 /* Attention Button Present */
 #define  PCI_EXP_SLTCAP_PCP	0x00000002 /* Power Controller Present */
@@ -484,8 +523,16 @@
 #define  PCI_EXP_SLTCTL_CCIE	0x0010	/* Command Completed Interrupt Enable */
 #define  PCI_EXP_SLTCTL_HPIE	0x0020	/* Hot-Plug Interrupt Enable */
 #define  PCI_EXP_SLTCTL_AIC	0x00c0	/* Attention Indicator Control */
+#define  PCI_EXP_SLTCTL_ATTN_IND_ON    0x0040 /* Attention Indicator on */
+#define  PCI_EXP_SLTCTL_ATTN_IND_BLINK 0x0080 /* Attention Indicator blinking */
+#define  PCI_EXP_SLTCTL_ATTN_IND_OFF   0x00c0 /* Attention Indicator off */
 #define  PCI_EXP_SLTCTL_PIC	0x0300	/* Power Indicator Control */
+#define  PCI_EXP_SLTCTL_PWR_IND_ON     0x0100 /* Power Indicator on */
+#define  PCI_EXP_SLTCTL_PWR_IND_BLINK  0x0200 /* Power Indicator blinking */
+#define  PCI_EXP_SLTCTL_PWR_IND_OFF    0x0300 /* Power Indicator off */
 #define  PCI_EXP_SLTCTL_PCC	0x0400	/* Power Controller Control */
+#define  PCI_EXP_SLTCTL_PWR_ON         0x0000 /* Power On */
+#define  PCI_EXP_SLTCTL_PWR_OFF        0x0400 /* Power Off */
 #define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
 #define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
 #define PCI_EXP_SLTSTA		26	/* Slot Status */
@@ -499,52 +546,93 @@
 #define  PCI_EXP_SLTSTA_EIS	0x0080	/* Electromechanical Interlock Status */
 #define  PCI_EXP_SLTSTA_DLLSC	0x0100	/* Data Link Layer State Changed */
 #define PCI_EXP_RTCTL		28	/* Root Control */
-#define  PCI_EXP_RTCTL_SECEE	0x01	/* System Error on Correctable Error */
-#define  PCI_EXP_RTCTL_SENFEE	0x02	/* System Error on Non-Fatal Error */
-#define  PCI_EXP_RTCTL_SEFEE	0x04	/* System Error on Fatal Error */
-#define  PCI_EXP_RTCTL_PMEIE	0x08	/* PME Interrupt Enable */
-#define  PCI_EXP_RTCTL_CRSSVE	0x10	/* CRS Software Visibility Enable */
+#define  PCI_EXP_RTCTL_SECEE	0x0001	/* System Error on Correctable Error */
+#define  PCI_EXP_RTCTL_SENFEE	0x0002	/* System Error on Non-Fatal Error */
+#define  PCI_EXP_RTCTL_SEFEE	0x0004	/* System Error on Fatal Error */
+#define  PCI_EXP_RTCTL_PMEIE	0x0008	/* PME Interrupt Enable */
+#define  PCI_EXP_RTCTL_CRSSVE	0x0010	/* CRS Software Visibility Enable */
 #define PCI_EXP_RTCAP		30	/* Root Capabilities */
 #define PCI_EXP_RTSTA		32	/* Root Status */
-#define PCI_EXP_RTSTA_PME	0x10000 /* PME status */
-#define PCI_EXP_RTSTA_PENDING	0x20000 /* PME pending */
+#define PCI_EXP_RTSTA_PME	0x00010000 /* PME status */
+#define PCI_EXP_RTSTA_PENDING	0x00020000 /* PME pending */
+/*
+ * The Device Capabilities 2, Device Status 2, Device Control 2,
+ * Link Capabilities 2, Link Status 2, Link Control 2,
+ * Slot Capabilities 2, Slot Status 2, and Slot Control 2 registers
+ * are only present on devices with PCIe Capability version 2.
+ * Use pcie_capability_read_word() and similar interfaces to use them
+ * safely.
+ */
 #define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
-#define  PCI_EXP_DEVCAP2_ARI	0x20	/* Alternative Routing-ID */
-#define  PCI_EXP_DEVCAP2_LTR	0x800	/* Latency tolerance reporting */
-#define  PCI_EXP_OBFF_MASK	0xc0000 /* OBFF support mechanism */
-#define  PCI_EXP_OBFF_MSG	0x40000 /* New message signaling */
-#define  PCI_EXP_OBFF_WAKE	0x80000 /* Re-use WAKE# for OBFF */
+#define  PCI_EXP_DEVCAP2_ARI		0x00000020 /* Alternative Routing-ID */
+#define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
+#define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
+#define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
+#define  PCI_EXP_DEVCAP2_OBFF_WAKE	0x00080000 /* Re-use WAKE# for OBFF */
 #define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
-#define  PCI_EXP_DEVCTL2_ARI	0x20	/* Alternative Routing-ID */
-#define  PCI_EXP_IDO_REQ_EN	0x100	/* ID-based ordering request enable */
-#define  PCI_EXP_IDO_CMP_EN	0x200	/* ID-based ordering completion enable */
-#define  PCI_EXP_LTR_EN		0x400	/* Latency tolerance reporting */
-#define  PCI_EXP_OBFF_MSGA_EN	0x2000	/* OBFF enable with Message type A */
-#define  PCI_EXP_OBFF_MSGB_EN	0x4000	/* OBFF enable with Message type B */
-#define  PCI_EXP_OBFF_WAKE_EN	0x6000	/* OBFF using WAKE# signaling */
+#define  PCI_EXP_DEVCTL2_COMP_TIMEOUT	0x000f	/* Completion Timeout Value */
+#define  PCI_EXP_DEVCTL2_ARI		0x0020	/* Alternative Routing-ID */
+#define  PCI_EXP_DEVCTL2_IDO_REQ_EN	0x0100	/* Allow IDO for requests */
+#define  PCI_EXP_DEVCTL2_IDO_CMP_EN	0x0200	/* Allow IDO for completions */
+#define  PCI_EXP_DEVCTL2_LTR_EN		0x0400	/* Enable LTR mechanism */
+#define  PCI_EXP_DEVCTL2_OBFF_MSGA_EN	0x2000	/* Enable OBFF Message type A */
+#define  PCI_EXP_DEVCTL2_OBFF_MSGB_EN	0x4000	/* Enable OBFF Message type B */
+#define  PCI_EXP_DEVCTL2_OBFF_WAKE_EN	0x6000	/* OBFF using WAKE# signaling */
+#define PCI_EXP_DEVSTA2		42	/* Device Status 2 */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	44	/* v2 endpoints end here */
+#define PCI_EXP_LNKCAP2		44	/* Link Capabilities 2 */
+#define  PCI_EXP_LNKCAP2_SLS_2_5GB	0x00000002 /* Supported Speed 2.5GT/s */
+#define  PCI_EXP_LNKCAP2_SLS_5_0GB	0x00000004 /* Supported Speed 5.0GT/s */
+#define  PCI_EXP_LNKCAP2_SLS_8_0GB	0x00000008 /* Supported Speed 8.0GT/s */
+#define  PCI_EXP_LNKCAP2_CROSSLINK	0x00000100 /* Crosslink supported */
 #define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
+#define PCI_EXP_LNKSTA2		50	/* Link Status 2 */
+#define PCI_EXP_SLTCAP2		52	/* Slot Capabilities 2 */
 #define PCI_EXP_SLTCTL2		56	/* Slot Control 2 */
+#define PCI_EXP_SLTSTA2		58	/* Slot Status 2 */
 
 /* Extended Capabilities (PCI-X 2.0 and Express) */
 #define PCI_EXT_CAP_ID(header)		(header & 0x0000ffff)
 #define PCI_EXT_CAP_VER(header)		((header >> 16) & 0xf)
 #define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
 
-#define PCI_EXT_CAP_ID_ERR	1
-#define PCI_EXT_CAP_ID_VC	2
-#define PCI_EXT_CAP_ID_DSN	3
-#define PCI_EXT_CAP_ID_PWR	4
-#define PCI_EXT_CAP_ID_VNDR	11
-#define PCI_EXT_CAP_ID_ACS	13
-#define PCI_EXT_CAP_ID_ARI	14
-#define PCI_EXT_CAP_ID_ATS	15
-#define PCI_EXT_CAP_ID_SRIOV	16
-#define PCI_EXT_CAP_ID_LTR	24
+#define PCI_EXT_CAP_ID_ERR	0x01	/* Advanced Error Reporting */
+#define PCI_EXT_CAP_ID_VC	0x02	/* Virtual Channel Capability */
+#define PCI_EXT_CAP_ID_DSN	0x03	/* Device Serial Number */
+#define PCI_EXT_CAP_ID_PWR	0x04	/* Power Budgeting */
+#define PCI_EXT_CAP_ID_RCLD	0x05	/* Root Complex Link Declaration */
+#define PCI_EXT_CAP_ID_RCILC	0x06	/* Root Complex Internal Link Control */
+#define PCI_EXT_CAP_ID_RCEC	0x07	/* Root Complex Event Collector */
+#define PCI_EXT_CAP_ID_MFVC	0x08	/* Multi-Function VC Capability */
+#define PCI_EXT_CAP_ID_VC9	0x09	/* same as _VC */
+#define PCI_EXT_CAP_ID_RCRB	0x0A	/* Root Complex RB? */
+#define PCI_EXT_CAP_ID_VNDR	0x0B	/* Vendor-Specific */
+#define PCI_EXT_CAP_ID_CAC	0x0C	/* Config Access - obsolete */
+#define PCI_EXT_CAP_ID_ACS	0x0D	/* Access Control Services */
+#define PCI_EXT_CAP_ID_ARI	0x0E	/* Alternate Routing ID */
+#define PCI_EXT_CAP_ID_ATS	0x0F	/* Address Translation Services */
+#define PCI_EXT_CAP_ID_SRIOV	0x10	/* Single Root I/O Virtualization */
+#define PCI_EXT_CAP_ID_MRIOV	0x11	/* Multi Root I/O Virtualization */
+#define PCI_EXT_CAP_ID_MCAST	0x12	/* Multicast */
+#define PCI_EXT_CAP_ID_PRI	0x13	/* Page Request Interface */
+#define PCI_EXT_CAP_ID_AMD_XXX	0x14	/* Reserved for AMD */
+#define PCI_EXT_CAP_ID_REBAR	0x15	/* Resizable BAR */
+#define PCI_EXT_CAP_ID_DPA	0x16	/* Dynamic Power Allocation */
+#define PCI_EXT_CAP_ID_TPH	0x17	/* TPH Requester */
+#define PCI_EXT_CAP_ID_LTR	0x18	/* Latency Tolerance Reporting */
+#define PCI_EXT_CAP_ID_SECPCI	0x19	/* Secondary PCIe Capability */
+#define PCI_EXT_CAP_ID_PMUX	0x1A	/* Protocol Multiplexing */
+#define PCI_EXT_CAP_ID_PASID	0x1B	/* Process Address Space ID */
+#define PCI_EXT_CAP_ID_MAX	PCI_EXT_CAP_ID_PASID
+
+#define PCI_EXT_CAP_DSN_SIZEOF	12
+#define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40
 
 /* Advanced Error Reporting */
 #define PCI_ERR_UNCOR_STATUS	4	/* Uncorrectable Error Status */
 #define  PCI_ERR_UNC_TRAIN	0x00000001	/* Training */
 #define  PCI_ERR_UNC_DLP	0x00000010	/* Data Link Protocol */
+#define  PCI_ERR_UNC_SURPDN	0x00000020	/* Surprise Down */
 #define  PCI_ERR_UNC_POISON_TLP	0x00001000	/* Poisoned TLP */
 #define  PCI_ERR_UNC_FCP	0x00002000	/* Flow Control Protocol */
 #define  PCI_ERR_UNC_COMP_TIME	0x00004000	/* Completion Timeout */
@@ -554,6 +642,11 @@
 #define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
 #define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
 #define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
+#define  PCI_ERR_UNC_ACSV	0x00200000	/* ACS Violation */
+#define  PCI_ERR_UNC_INTN	0x00400000	/* internal error */
+#define  PCI_ERR_UNC_MCBTLP	0x00800000	/* MC blocked TLP */
+#define  PCI_ERR_UNC_ATOMEG	0x01000000	/* Atomic egress blocked */
+#define  PCI_ERR_UNC_TLPPRE	0x02000000	/* TLP prefix blocked */
 #define PCI_ERR_UNCOR_MASK	8	/* Uncorrectable Error Mask */
 	/* Same bits as above */
 #define PCI_ERR_UNCOR_SEVER	12	/* Uncorrectable Error Severity */
@@ -564,6 +657,9 @@
 #define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
 #define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
 #define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
+#define  PCI_ERR_COR_ADV_NFAT	0x00002000	/* Advisory Non-Fatal */
+#define  PCI_ERR_COR_INTERNAL	0x00004000	/* Corrected Internal */
+#define  PCI_ERR_COR_LOG_OVER	0x00008000	/* Header Log Overflow */
 #define PCI_ERR_COR_MASK	20	/* Correctable Error Mask */
 	/* Same bits as above */
 #define PCI_ERR_CAP		24	/* Advanced Error Capabilities */
@@ -584,9 +680,9 @@
 #define PCI_ERR_ROOT_COR_RCV		0x00000001	/* ERR_COR Received */
 /* Multi ERR_COR Received */
 #define PCI_ERR_ROOT_MULTI_COR_RCV	0x00000002
-/* ERR_FATAL/NONFATAL Recevied */
+/* ERR_FATAL/NONFATAL Received */
 #define PCI_ERR_ROOT_UNCOR_RCV		0x00000004
-/* Multi ERR_FATAL/NONFATAL Recevied */
+/* Multi ERR_FATAL/NONFATAL Received */
 #define PCI_ERR_ROOT_MULTI_UNCOR_RCV	0x00000008
 #define PCI_ERR_ROOT_FIRST_FATAL	0x00000010	/* First Fatal */
 #define PCI_ERR_ROOT_NONFATAL_RCV	0x00000020	/* Non-Fatal Received */
@@ -594,13 +690,36 @@
 #define PCI_ERR_ROOT_ERR_SRC	52	/* Error Source Identification */
 
 /* Virtual Channel */
-#define PCI_VC_PORT_REG1	4
-#define PCI_VC_PORT_REG2	8
+#define PCI_VC_PORT_CAP1	4
+#define  PCI_VC_CAP1_EVCC	0x00000007	/* extended VC count */
+#define  PCI_VC_CAP1_LPEVCC	0x00000070	/* low prio extended VC count */
+#define  PCI_VC_CAP1_ARB_SIZE	0x00000c00
+#define PCI_VC_PORT_CAP2	8
+#define  PCI_VC_CAP2_32_PHASE		0x00000002
+#define  PCI_VC_CAP2_64_PHASE		0x00000004
+#define  PCI_VC_CAP2_128_PHASE		0x00000008
+#define  PCI_VC_CAP2_ARB_OFF		0xff000000
 #define PCI_VC_PORT_CTRL	12
+#define  PCI_VC_PORT_CTRL_LOAD_TABLE	0x00000001
 #define PCI_VC_PORT_STATUS	14
+#define  PCI_VC_PORT_STATUS_TABLE	0x00000001
 #define PCI_VC_RES_CAP		16
+#define  PCI_VC_RES_CAP_32_PHASE	0x00000002
+#define  PCI_VC_RES_CAP_64_PHASE	0x00000004
+#define  PCI_VC_RES_CAP_128_PHASE	0x00000008
+#define  PCI_VC_RES_CAP_128_PHASE_TB	0x00000010
+#define  PCI_VC_RES_CAP_256_PHASE	0x00000020
+#define  PCI_VC_RES_CAP_ARB_OFF		0xff000000
 #define PCI_VC_RES_CTRL		20
+#define  PCI_VC_RES_CTRL_LOAD_TABLE	0x00010000
+#define  PCI_VC_RES_CTRL_ARB_SELECT	0x000e0000
+#define  PCI_VC_RES_CTRL_ID		0x07000000
+#define  PCI_VC_RES_CTRL_ENABLE		0x80000000
 #define PCI_VC_RES_STATUS	26
+#define  PCI_VC_RES_STATUS_TABLE	0x00000001
+#define  PCI_VC_RES_STATUS_NEGO		0x00000002
+#define PCI_CAP_VC_BASE_SIZEOF		0x10
+#define PCI_CAP_VC_PER_VC_SIZEOF	0x0C
 
 /* Power Budgeting */
 #define PCI_PWR_DSR		4	/* Data Select Register */
@@ -613,9 +732,16 @@
 #define  PCI_PWR_DATA_RAIL(x)	(((x) >> 18) & 7)   /* Power Rail */
 #define PCI_PWR_CAP		12	/* Capability */
 #define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
+#define PCI_EXT_CAP_PWR_SIZEOF	16
+
+/* Vendor-Specific (VSEC, PCI_EXT_CAP_ID_VNDR) */
+#define PCI_VNDR_HEADER		4	/* Vendor-Specific Header */
+#define  PCI_VNDR_HEADER_ID(x)	((x) & 0xffff)
+#define  PCI_VNDR_HEADER_REV(x)	(((x) >> 16) & 0xf)
+#define  PCI_VNDR_HEADER_LEN(x)	(((x) >> 20) & 0xfff)
 
 /*
- * Hypertransport sub capability types
+ * HyperTransport sub capability types
  *
  * Unfortunately there are both 3 bit and 5 bit capability types defined
  * in the HT spec, catering for that is a little messy. You probably don't
@@ -643,8 +769,10 @@
 #define HT_CAPTYPE_DIRECT_ROUTE	0xB0	/* Direct routing configuration */
 #define HT_CAPTYPE_VCSET	0xB8	/* Virtual Channel configuration */
 #define HT_CAPTYPE_ERROR_RETRY	0xC0	/* Retry on error configuration */
-#define HT_CAPTYPE_GEN3		0xD0	/* Generation 3 hypertransport configuration */
-#define HT_CAPTYPE_PM		0xE0	/* Hypertransport powermanagement configuration */
+#define HT_CAPTYPE_GEN3		0xD0	/* Generation 3 HyperTransport configuration */
+#define HT_CAPTYPE_PM		0xE0	/* HyperTransport power management configuration */
+#define HT_CAP_SIZEOF_LONG	28	/* slave & primary */
+#define HT_CAP_SIZEOF_SHORT	24	/* host & secondary */
 
 /* Alternative Routing-ID Interpretation */
 #define PCI_ARI_CAP		0x04	/* ARI Capability Register */
@@ -655,6 +783,7 @@
 #define  PCI_ARI_CTRL_MFVC	0x0001	/* MFVC Function Groups Enable */
 #define  PCI_ARI_CTRL_ACS	0x0002	/* ACS Function Groups Enable */
 #define  PCI_ARI_CTRL_FG(x)	(((x) >> 4) & 7) /* Function Group */
+#define PCI_EXT_CAP_ARI_SIZEOF	8
 
 /* Address Translation Service */
 #define PCI_ATS_CAP		0x04	/* ATS Capability Register */
@@ -664,6 +793,29 @@
 #define  PCI_ATS_CTRL_ENABLE	0x8000	/* ATS Enable */
 #define  PCI_ATS_CTRL_STU(x)	((x) & 0x1f)	/* Smallest Translation Unit */
 #define  PCI_ATS_MIN_STU	12	/* shift of minimum STU block */
+#define PCI_EXT_CAP_ATS_SIZEOF	8
+
+/* Page Request Interface */
+#define PCI_PRI_CTRL		0x04	/* PRI control register */
+#define  PCI_PRI_CTRL_ENABLE	0x01	/* Enable */
+#define  PCI_PRI_CTRL_RESET	0x02	/* Reset */
+#define PCI_PRI_STATUS		0x06	/* PRI status register */
+#define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
+#define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
+#define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
+#define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
+#define PCI_EXT_CAP_PRI_SIZEOF	16
+
+/* Process Address Space ID */
+#define PCI_PASID_CAP		0x04    /* PASID feature register */
+#define  PCI_PASID_CAP_EXEC	0x02	/* Exec permissions Supported */
+#define  PCI_PASID_CAP_PRIV	0x04	/* Privilege Mode Supported */
+#define PCI_PASID_CTRL		0x06    /* PASID control register */
+#define  PCI_PASID_CTRL_ENABLE	0x01	/* Enable bit */
+#define  PCI_PASID_CTRL_EXEC	0x02	/* Exec permissions Enable */
+#define  PCI_PASID_CTRL_PRIV	0x04	/* Privilege Mode Enable */
+#define PCI_EXT_CAP_PASID_SIZEOF	8
 
 /* Single Root I/O Virtualization */
 #define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
@@ -695,12 +847,14 @@
 #define  PCI_SRIOV_VFM_MI	0x1	/* Dormant.MigrateIn */
 #define  PCI_SRIOV_VFM_MO	0x2	/* Active.MigrateOut */
 #define  PCI_SRIOV_VFM_AV	0x3	/* Active.Available */
+#define PCI_EXT_CAP_SRIOV_SIZEOF 64
 
 #define PCI_LTR_MAX_SNOOP_LAT	0x4
 #define PCI_LTR_MAX_NOSNOOP_LAT	0x6
 #define  PCI_LTR_VALUE_MASK	0x000003ff
 #define  PCI_LTR_SCALE_MASK	0x00001c00
 #define  PCI_LTR_SCALE_SHIFT	10
+#define PCI_EXT_CAP_LTR_SIZEOF	8
 
 /* Access Control Service */
 #define PCI_ACS_CAP		0x04	/* ACS Capability Register */
@@ -711,7 +865,38 @@
 #define  PCI_ACS_UF		0x10	/* Upstream Forwarding */
 #define  PCI_ACS_EC		0x20	/* P2P Egress Control */
 #define  PCI_ACS_DT		0x40	/* Direct Translated P2P */
+#define PCI_ACS_EGRESS_BITS	0x05	/* ACS Egress Control Vector Size */
 #define PCI_ACS_CTRL		0x06	/* ACS Control Register */
 #define PCI_ACS_EGRESS_CTL_V	0x08	/* ACS Egress Control Vector */
 
+#define PCI_VSEC_HDR		4	/* extended cap - vendor-specific */
+#define  PCI_VSEC_HDR_LEN_SHIFT	20	/* shift for length field */
+
+/* SATA capability */
+#define PCI_SATA_REGS		4	/* SATA REGs specifier */
+#define  PCI_SATA_REGS_MASK	0xF	/* location - BAR#/inline */
+#define  PCI_SATA_REGS_INLINE	0xF	/* REGS in config space */
+#define PCI_SATA_SIZEOF_SHORT	8
+#define PCI_SATA_SIZEOF_LONG	16
+
+/* Resizable BARs */
+#define PCI_REBAR_CTRL		8	/* control register */
+#define  PCI_REBAR_CTRL_NBAR_MASK	(7 << 5)	/* mask for # bars */
+#define  PCI_REBAR_CTRL_NBAR_SHIFT	5	/* shift for # bars */
+
+/* Dynamic Power Allocation */
+#define PCI_DPA_CAP		4	/* capability register */
+#define  PCI_DPA_CAP_SUBSTATE_MASK	0x1F	/* # substates - 1 */
+#define PCI_DPA_BASE_SIZEOF	16	/* size with 0 substates */
+
+/* TPH Requester */
+#define PCI_TPH_CAP		4	/* capability register */
+#define  PCI_TPH_CAP_LOC_MASK	0x600	/* location mask */
+#define   PCI_TPH_LOC_NONE	0x000	/* no location */
+#define   PCI_TPH_LOC_CAP	0x200	/* in capability */
+#define   PCI_TPH_LOC_MSIX	0x400	/* in MSI-X */
+#define PCI_TPH_CAP_ST_MASK	0x07FF0000	/* st table mask */
+#define PCI_TPH_CAP_ST_SHIFT	16	/* st table shift */
+#define PCI_TPH_BASE_SIZEOF	12	/* size with no st table */
+
 #endif /* LINUX_PCI_REGS_H */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
  2014-09-02 11:00 [Qemu-devel] [PATCH v2 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
                   ` (2 preceding siblings ...)
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 3/4] pci: Update pci_regs header Knut Omang
@ 2014-09-02 11:00 ` Knut Omang
  3 siblings, 0 replies; 10+ messages in thread
From: Knut Omang @ 2014-09-02 11:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi,
	Michael S. Tsirkin, Laszlo Ersek, Gabriel Somlo, Knut Omang,
	Alexander Graf, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong

This patch provides the building blocks for creating an SR/IOV
PCIe Extended Capability header and register/unregister
SR/IOV Virtual Functions.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 hw/pci/Makefile.objs        |   2 +-
 hw/pci/pci.c                |  97 ++++++++++++----
 hw/pci/pcie.c               |   9 +-
 hw/pci/pcie_sriov.c         | 267 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h        |  11 +-
 include/hw/pci/pcie.h       |   6 +
 include/hw/pci/pcie_sriov.h |  55 +++++++++
 include/qemu/typedefs.h     |   2 +
 8 files changed, 418 insertions(+), 31 deletions(-)
 create mode 100644 hw/pci/pcie_sriov.c
 create mode 100644 include/hw/pci/pcie_sriov.h

diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
index 80f8aa6..4ed4205 100644
--- a/hw/pci/Makefile.objs
+++ b/hw/pci/Makefile.objs
@@ -3,7 +3,7 @@ common-obj-$(CONFIG_PCI) += msix.o msi.o
 common-obj-$(CONFIG_PCI) += shpc.o
 common-obj-$(CONFIG_PCI) += slotid_cap.o
 common-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
-common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
+common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o pcie_sriov.o
 
 common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
 common-obj-$(CONFIG_ALL) += pci-stub.o
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index af547dc..c8335b7 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -35,7 +35,6 @@
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "exec/address-spaces.h"
-#include "hw/hotplug.h"
 
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
@@ -126,6 +125,9 @@ static int pci_bar(PCIDevice *d, int reg)
 {
     uint8_t type;
 
+    /* PCIe virtual functions do not have their own BARs */
+    assert(!pci_is_vf(d));
+
     if (reg != PCI_ROM_SLOT)
         return PCI_BASE_ADDRESS_0 + reg * 4;
 
@@ -184,22 +186,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
     }
 }
 
-static void pci_do_device_reset(PCIDevice *dev)
+static void pci_reset_regions(PCIDevice *dev)
 {
     int r;
+    if (pci_is_vf(dev)) {
+        return;
+    }
 
-    pci_device_deassert_intx(dev);
-    assert(dev->irq_state == 0);
-
-    /* Clear all writable bits */
-    pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
-                                 pci_get_word(dev->wmask + PCI_COMMAND) |
-                                 pci_get_word(dev->w1cmask + PCI_COMMAND));
-    pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
-                                 pci_get_word(dev->wmask + PCI_STATUS) |
-                                 pci_get_word(dev->w1cmask + PCI_STATUS));
-    dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
-    dev->config[PCI_INTERRUPT_LINE] = 0x0;
     for (r = 0; r < PCI_NUM_REGIONS; ++r) {
         PCIIORegion *region = &dev->io_regions[r];
         if (!region->size) {
@@ -213,6 +206,27 @@ static void pci_do_device_reset(PCIDevice *dev)
             pci_set_long(dev->config + pci_bar(dev, r), region->type);
         }
     }
+}
+
+static void pci_do_device_reset(PCIDevice *dev)
+{
+    qdev_reset_all(&dev->qdev);
+
+    dev->irq_state = 0;
+    pci_update_irq_status(dev);
+    pci_device_deassert_intx(dev);
+    assert(dev->irq_state == 0);
+
+    /* Clear all writable bits */
+    pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
+                                 pci_get_word(dev->wmask + PCI_COMMAND) |
+                                 pci_get_word(dev->w1cmask + PCI_COMMAND));
+    pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
+                                 pci_get_word(dev->wmask + PCI_STATUS) |
+                                 pci_get_word(dev->w1cmask + PCI_STATUS));
+    dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
+    dev->config[PCI_INTERRUPT_LINE] = 0x0;
+    pci_reset_regions(dev);
     pci_update_mappings(dev);
 
     msi_reset(dev);
@@ -734,6 +748,15 @@ static int pci_init_multifunction(PCIBus *bus, PCIDevice *dev)
         dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
     }
 
+    /* With SR/IOV and ARI, a device at function 0 need not be a multifunction
+     * device, as it may just be a VF that ended up with function 0 in
+     * the legacy PCI interpretation. Avoid failing in such cases:
+     */
+    if (pci_is_vf(dev) &&
+        dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+        return 0;
+    }
+
     /*
      * multifunction bit is interpreted in two ways as follows.
      *   - all functions must set the bit to 1.
@@ -920,6 +943,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
     uint64_t wmask;
     pcibus_t size = memory_region_size(memory);
 
+    assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_register_vf_bar */
     assert(region_num >= 0);
     assert(region_num < PCI_NUM_REGIONS);
     if (size & (size-1)) {
@@ -1018,18 +1042,47 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
     return pci_dev->io_regions[region_num].addr;
 }
 
-static pcibus_t pci_bar_address(PCIDevice *d,
-				int reg, uint8_t type, pcibus_t size)
+
+static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
+                                        uint8_t type, pcibus_t size)
+{
+    pcibus_t new_addr;
+    if (!pci_is_vf(d)) {
+        int bar = pci_bar(d, reg);
+        if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+            new_addr = pci_get_quad(d->config + bar);
+        } else {
+            new_addr = pci_get_long(d->config + bar);
+        }
+    } else {
+        PCIDevice *pf = d->exp.sriov_vf.pf;
+        int bar = pf->exp.sriov_cap + PCI_SRIOV_BAR + reg * 4;
+        uint32_t vf_num = d->devfn - (pf->devfn + 1);
+        if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+            new_addr = pci_get_quad(pf->config + bar);
+        } else {
+            new_addr = pci_get_long(pf->config + bar);
+        }
+        new_addr += vf_num * size;
+    }
+    if (reg != PCI_ROM_SLOT) {
+        /* Preserve the rom enable bit */
+        new_addr &= ~(size - 1);
+    }
+    return new_addr;
+}
+
+pcibus_t pci_bar_address(PCIDevice *d,
+                         int reg, uint8_t type, pcibus_t size)
 {
     pcibus_t new_addr, last_addr;
-    int bar = pci_bar(d, reg);
     uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
 
     if (type & PCI_BASE_ADDRESS_SPACE_IO) {
         if (!(cmd & PCI_COMMAND_IO)) {
             return PCI_BAR_UNMAPPED;
         }
-        new_addr = pci_get_long(d->config + bar) & ~(size - 1);
+        new_addr = pci_config_get_bar_addr(d, reg, type, size);
         last_addr = new_addr + size - 1;
         /* Check if 32 bit BAR wraps around explicitly.
          * TODO: make priorities correct and remove this work around.
@@ -1043,11 +1096,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
     if (!(cmd & PCI_COMMAND_MEMORY)) {
         return PCI_BAR_UNMAPPED;
     }
-    if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
-        new_addr = pci_get_quad(d->config + bar);
-    } else {
-        new_addr = pci_get_long(d->config + bar);
-    }
+    new_addr = pci_config_get_bar_addr(d, reg, type, size);
     /* the ROM slot has a specific enable bit */
     if (reg == PCI_ROM_SLOT && !(new_addr & PCI_ROM_ADDRESS_ENABLE)) {
         return PCI_BAR_UNMAPPED;
@@ -1173,6 +1222,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
 
     msi_write_config(d, addr, val_in, l);
     msix_write_config(d, addr, val_in, l);
+    pcie_sriov_config_write(d, addr, val_in, l);
 }
 
 /***********************************************************/
@@ -1777,7 +1827,6 @@ static int pci_qdev_init(DeviceState *qdev)
         is_default_rom = true;
     }
     pci_add_option_rom(pci_dev, is_default_rom);
-
     return 0;
 }
 
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 1babddf..a2ac110 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -254,7 +254,7 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
      * Right now, only a device of function = 0 is allowed to be
      * hot plugged/unplugged.
      */
-    assert(PCI_FUNC(pci_dev->devfn) == 0);
+    assert(PCI_FUNC(pci_dev->devfn) == 0 || pci_is_vf(pci_dev));
 
     pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
                                PCI_EXP_SLTSTA_PDS);
@@ -266,10 +266,11 @@ void pcie_cap_slot_hot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
                                  Error **errp)
 {
     uint8_t *exp_cap;
+    PCIDevice *pdev = PCI_DEVICE(hotplug_dev);
 
-    pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
+    pcie_cap_slot_hotplug_common(pdev, dev, &exp_cap, errp);
 
-    pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
+    pcie_cap_slot_push_attention_button(pdev);
 }
 
 /* pci express slot for pci express root/downstream port
@@ -409,7 +410,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
     }
 
     /*
-     * If the slot is polulated, power indicator is off and power
+     * If the slot is populated, power indicator is off and power
      * controller is off, it is safe to detach the devices.
      */
     if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
new file mode 100644
index 0000000..2393c69
--- /dev/null
+++ b/hw/pci/pcie_sriov.c
@@ -0,0 +1,267 @@
+/*
+ * pcie_sriov.c:
+ *
+ * Implementation of SR/IOV emulation support.
+ *
+ * Copyright (c) 2014 Knut Omang <knut.omang@oracle.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pci_bus.h"
+#include "qemu/range.h"
+
+//#define DEBUG_SRIOV
+#ifdef DEBUG_SRIOV
+# define SRIOV_DPRINTF(fmt, ...)                                         \
+    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
+#else
+# define SRIOV_DPRINTF(fmt, ...) do {} while (0)
+#endif
+#define SRIOV_DEV_PRINTF(dev, fmt, ...)                                  \
+    SRIOV_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
+
+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name);
+static void unregister_vfs(PCIDevice *dev);
+
+void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+                        const char *vfname, uint16_t vf_dev_id,
+                        uint16_t init_vfs, uint16_t total_vfs,
+                        uint16_t vf_offset, uint16_t vf_stride)
+{
+    uint8_t *cfg = dev->config + offset;
+    uint8_t *wmask;
+
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
+                        offset, PCI_EXT_CAP_SRIOV_SIZEOF);
+    dev->exp.sriov_cap = offset;
+    dev->exp.sriov_pf.num_vfs = 0;
+    dev->exp.sriov_pf.vfname = g_strdup(vfname);
+    dev->exp.sriov_pf.vf = NULL;
+
+    pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
+    pci_set_word(cfg + PCI_SRIOV_VF_STRIDE, vf_stride);
+
+    /* Minimal page size support values required by the SR/IOV standard:
+     * 0x553 << 12 = 0x553000 = 4K + 8K + 64K + 256K + 1M + 4M
+     * Hard coded for simplicity in this version
+     */
+    pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, 0x553);
+
+    /* Default is to use 4K pages, software can modify it
+     * to any of the supported bits
+     */
+    pci_set_word(cfg + PCI_SRIOV_SYS_PGSIZE, 0x1);
+
+    /* Set up device ID and initial/total number of VFs available */
+    pci_set_word(cfg + PCI_SRIOV_VF_DID, vf_dev_id);
+    pci_set_word(cfg + PCI_SRIOV_INITIAL_VF, init_vfs);
+    pci_set_word(cfg + PCI_SRIOV_TOTAL_VF, total_vfs);
+
+    /* Write enable control bits */
+    wmask = dev->wmask + offset;
+    pci_set_word(wmask + PCI_SRIOV_CTRL,
+                 PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI);
+    pci_set_word(wmask + PCI_SRIOV_NUM_VF, 0xffff);
+    pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
+
+    qdev_prop_set_bit(&dev->qdev, "multifunction", true);
+}
+
+void pcie_sriov_pf_exit(PCIDevice *dev)
+{
+    SRIOV_DPRINTF("\n");
+    unregister_vfs(dev);
+    g_free((char *)dev->exp.sriov_pf.vfname);
+    dev->exp.sriov_pf.vfname = NULL;
+}
+
+void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
+                               uint8_t type, dma_addr_t size)
+{
+    uint32_t addr;
+    uint64_t wmask;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+
+    assert(sriov_cap > 0);
+    assert(region_num >= 0);
+    assert(region_num < PCI_NUM_REGIONS);
+    assert(region_num != PCI_ROM_SLOT);
+
+    wmask = ~(size - 1);
+    addr = sriov_cap + PCI_SRIOV_BAR + region_num * 4;
+
+    pci_set_long(dev->config + addr, type);
+    if (!(type & PCI_BASE_ADDRESS_SPACE_IO) &&
+        type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+        pci_set_quad(dev->wmask + addr, wmask);
+        pci_set_quad(dev->cmask + addr, ~0ULL);
+    } else {
+        pci_set_long(dev->wmask + addr, wmask & 0xffffffff);
+        pci_set_long(dev->cmask + addr, 0xffffffff);
+    }
+    dev->exp.sriov_pf.vf_bar_type[region_num] = type;
+}
+
+void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
+                                MemoryRegion *memory)
+{
+    PCIIORegion *r;
+    uint8_t type;
+    pcibus_t size = memory_region_size(memory);
+
+    assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
+    assert(region_num >= 0);
+    assert(region_num < PCI_NUM_REGIONS);
+    type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num];
+
+    if (size & (size-1)) {
+        fprintf(stderr, "ERROR: PCI region size must be a power"
+                " of two - type=0x%x, size=0x%"FMT_PCIBUS"\n", type, size);
+        exit(1);
+    }
+
+    r = &dev->io_regions[region_num];
+    r->memory = memory;
+    r->address_space =
+        type & PCI_BASE_ADDRESS_SPACE_IO
+        ? dev->bus->address_space_io
+        : dev->bus->address_space_mem;
+    r->size = size;
+    r->type = type;
+
+    r->addr = pci_bar_address(dev, region_num, r->type, r->size);
+    if (r->addr != PCI_BAR_UNMAPPED) {
+        memory_region_add_subregion_overlap(r->address_space,
+                                            r->addr, r->memory, 1);
+    }
+}
+
+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name)
+{
+    int ret;
+    PCIDevice *dev = pci_create(pf->bus, devfn, name);
+    dev->exp.sriov_vf.pf = pf;
+
+    ret = qdev_init(&dev->qdev);
+    if (ret) {
+        return NULL;
+    }
+
+    /* set vid/did according to sr/iov spec - they are not used */
+    pci_config_set_vendor_id(dev->config, 0xffff);
+    pci_config_set_device_id(dev->config, 0xffff);
+    return dev;
+}
+
+static void register_vfs(PCIDevice *dev)
+{
+    uint16_t num_vfs;
+    uint16_t i;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+    uint16_t vf_offset = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
+    uint16_t vf_stride = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
+    int32_t devfn = dev->devfn + vf_offset;
+
+    assert(sriov_cap > 0);
+    num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+
+    dev->exp.sriov_pf.vf = g_malloc(sizeof(PCIDevice *) * num_vfs);
+    assert(dev->exp.sriov_pf.vf);
+
+    SRIOV_DEV_PRINTF(dev, "creating %d vf devs\n", num_vfs);
+    for (i = 0; i < num_vfs; i++) {
+        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, dev->exp.sriov_pf.vfname);
+        if (!dev->exp.sriov_pf.vf[i]) {
+            SRIOV_DEV_PRINTF(dev, "Failed to create VF %d at devfn %x.%x\n", i,
+                             PCI_SLOT(devfn), PCI_FUNC(devfn));
+            num_vfs = i;
+            break;
+        }
+        devfn += vf_stride;
+    }
+    dev->exp.sriov_pf.num_vfs = num_vfs;
+}
+
+static void unregister_vfs(PCIDevice *dev)
+{
+    Error *local_err = NULL;
+    uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
+    uint16_t i;
+    SRIOV_DEV_PRINTF(dev, "Unregistering %d vf devs\n", num_vfs);
+    for (i = 0; i < num_vfs; i++) {
+        qdev_unplug(&dev->exp.sriov_pf.vf[i]->qdev, &local_err);
+        if (local_err) {
+            fprintf(stderr, "Failed to unplug: %s\n",
+                    error_get_pretty(local_err));
+            error_free(local_err);
+        }
+    }
+    g_free(dev->exp.sriov_pf.vf);
+    dev->exp.sriov_pf.vf = NULL;
+    dev->exp.sriov_pf.num_vfs = 0;
+}
+
+void pcie_sriov_config_write(PCIDevice *dev, uint32_t address, uint32_t val, int len)
+{
+    uint32_t off;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+
+    if (!sriov_cap || address < sriov_cap) {
+        return;
+    }
+    off = address - sriov_cap;
+    if (off >= PCI_EXT_CAP_SRIOV_SIZEOF) {
+        return;
+    }
+
+    SRIOV_DEV_PRINTF(dev, "cap at 0x%x sriov offset 0x%x val 0x%x len %d\n",
+                 sriov_cap, off, val, len);
+
+    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
+        if (dev->exp.sriov_pf.num_vfs) {
+            if (!(val & PCI_SRIOV_CTRL_VFE)) {
+                unregister_vfs(dev);
+            }
+        } else {
+            if (val & PCI_SRIOV_CTRL_VFE) {
+                register_vfs(dev);
+            }
+        }
+    }
+}
+
+
+/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
+void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
+{
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+    if (sriov_cap) {
+        uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
+        if (val & PCI_SRIOV_CTRL_VFE) {
+            val &= ~PCI_SRIOV_CTRL_VFE;
+            pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
+        }
+    }
+}
+
+/* Add optional supported page sizes to the mask of supported page sizes */
+void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize)
+{
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+    uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
+
+    uint16_t sup_pgsize = pci_get_word(cfg + PCI_SRIOV_SUP_PGSIZE);
+
+    sup_pgsize |= opt_sup_pgsize;
+
+    /* Make sure the new bits are set, and that system page size
+     * also can be set to any of the new values according to spec:
+     */
+    pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, sup_pgsize);
+    pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, sup_pgsize);
+}
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c352c7b..ebda70f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -11,8 +11,6 @@
 /* PCI includes legacy ISA access.  */
 #include "hw/isa/isa.h"
 
-#include "hw/pci/pcie.h"
-
 /* PCI bus */
 
 #define PCI_DEVFN(slot, func)   ((((slot) & 0x1f) << 3) | ((func) & 0x07))
@@ -127,6 +125,7 @@ enum {
 #define QEMU_PCI_VGA_IO_HI_SIZE 0x20
 
 #include "hw/pci/pci_regs.h"
+#include "hw/pci/pcie.h"
 
 /* PCI HEADER_TYPE */
 #define  PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
@@ -413,6 +412,9 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+pcibus_t pci_bar_address(PCIDevice *d,
+                         int reg, uint8_t type, pcibus_t size);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
@@ -664,6 +666,11 @@ static inline int pci_is_express(const PCIDevice *d)
     return d->cap_present & QEMU_PCI_CAP_EXPRESS;
 }
 
+static inline int pci_is_vf(const PCIDevice *d)
+{
+    return d->exp.sriov_vf.pf != NULL;
+}
+
 static inline uint32_t pci_config_size(const PCIDevice *d)
 {
     return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index d139d58..68578e2 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -25,6 +25,7 @@
 #include "hw/pci/pci_regs.h"
 #include "hw/pci/pcie_regs.h"
 #include "hw/pci/pcie_aer.h"
+#include "hw/pci/pcie_sriov.h"
 #include "hw/hotplug.h"
 
 typedef enum {
@@ -74,6 +75,11 @@ struct PCIExpressDevice {
     /* AER */
     uint16_t aer_cap;
     PCIEAERLog aer_log;
+
+    /* SR/IOV */
+    uint16_t sriov_cap;
+    PCIESriovPF sriov_pf;
+    PCIESriovVF sriov_vf;
 };
 
 #define COMPAT_PROP_PCP "power_controller_present"
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
new file mode 100644
index 0000000..df8729f
--- /dev/null
+++ b/include/hw/pci/pcie_sriov.h
@@ -0,0 +1,55 @@
+/*
+ * pcie_sriov.h:
+ *
+ * Implementation of SR/IOV emulation support.
+ *
+ * Copyright (c) 2014 Knut Omang <knut.omang@oracle.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_PCIE_SRIOV_H
+#define QEMU_PCIE_SRIOV_H
+
+
+
+struct PCIESriovPF {
+    uint16_t num_vfs;           /* Number of virtual functions created */
+    uint8_t vf_bar_type[PCI_NUM_REGIONS];  /* Store type for each VF bar */
+    const char *vfname;         /* Reference to the device type used for the VFs */
+    PCIDevice **vf;             /* Pointer to an array of num_vfs VF devices */
+};
+
+struct PCIESriovVF { 
+    PCIDevice *pf;              /* Pointer back to owner physical function */
+};
+
+void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+                        const char *vfname, uint16_t vf_dev_id,
+                        uint16_t init_vfs, uint16_t total_vfs,
+                        uint16_t vf_offset, uint16_t vf_stride);
+void pcie_sriov_pf_exit(PCIDevice *dev);
+
+/* Set up a VF bar in the SR/IOV bar area */
+void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
+                               uint8_t type, dma_addr_t size);
+
+/* Instantiate a bar for a VF */
+void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
+                                MemoryRegion *memory);
+
+/* Add optional supported page sizes to the mask of supported page sizes 
+ * Page size values are interpreted as opt_sup_pgsize << 12.
+ */
+void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
+
+/* SR/IOV capability config write handler */
+void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
+                             uint32_t val, int len);
+
+/* Reset SR/IOV VF Enable bit to unregister all VFs */
+void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
+
+#endif /* QEMU_PCIE_SRIOV_H */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..1954818 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -58,6 +58,8 @@ typedef struct PCIBridge PCIBridge;
 typedef struct PCIEAERMsg PCIEAERMsg;
 typedef struct PCIEAERLog PCIEAERLog;
 typedef struct PCIEAERErr PCIEAERErr;
+typedef struct PCIESriovPF PCIESriovPF;
+typedef struct PCIESriovVF PCIESriovVF;
 typedef struct PCIEPort PCIEPort;
 typedef struct PCIESlot PCIESlot;
 typedef struct MSIMessage MSIMessage;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] pci: Avoid losing config updates to MSI/MSIX cap regs
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 2/4] pci: Avoid losing config updates to MSI/MSIX cap regs Knut Omang
@ 2014-09-02 12:57   ` Michael S. Tsirkin
  0 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2014-09-02 12:57 UTC (permalink / raw)
  To: Knut Omang
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi, Laszlo Ersek,
	Alexander Graf, qemu-devel, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong,
	Gabriel Somlo, qemu-stable

On Tue, Sep 02, 2014 at 01:00:04PM +0200, Knut Omang wrote:
> Introduce an extra local variable to avoid that the
> shifting in the for loop causes updates to msi/msix capability
> registers to get lost in pci_default_write_config.
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>

OK
So this is a regression introduced by:

commit 95d658002401e2e47a5404298ebe9508846e8a39
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Fri May 11 11:42:40 2012 -0300
    msi: Invoke msi/msix_write_config from PCI core

Will queue to qemu-stable as well.


> ---
>  hw/pci/pci.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 6b21dee..af547dc 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1146,9 +1146,10 @@ uint32_t pci_default_read_config(PCIDevice *d,
>      return le32_to_cpu(val);
>  }
>  
> -void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> +void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int l)
>  {
>      int i, was_irq_disabled = pci_irq_disabled(d);
> +    uint32_t val = val_in;
>  
>      for (i = 0; i < l; val >>= 8, ++i) {
>          uint8_t wmask = d->wmask[addr + i];
> @@ -1170,8 +1171,8 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
>                                      & PCI_COMMAND_MASTER);
>      }
>  
> -    msi_write_config(d, addr, val, l);
> -    msix_write_config(d, addr, val, l);
> +    msi_write_config(d, addr, val_in, l);
> +    msix_write_config(d, addr, val_in, l);
>  }
>  
>  /***********************************************************/
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices
  2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
@ 2014-09-02 13:03   ` Michael S. Tsirkin
  2014-09-02 13:44     ` Knut Omang
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2014-09-02 13:03 UTC (permalink / raw)
  To: Knut Omang
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi, Laszlo Ersek,
	Alexander Graf, qemu-devel, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong,
	Gabriel Somlo

On Tue, Sep 02, 2014 at 01:00:03PM +0200, Knut Omang wrote:
> Without this, the devfn argument to pci_create_*()
> does not affect the assigned devfn.
> 
> Needed to support (VF_STRIDE,VF_OFFSET) values other than (1,1)
> for SR/IOV.
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>

Sorry, I don't understand the explanation exactly.
pci_dev->devfn is not set correctly? why?

> ---
>  hw/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index daeaeac..6b21dee 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1757,7 +1757,7 @@ static int pci_qdev_init(DeviceState *qdev)
>      bus = PCI_BUS(qdev_get_parent_bus(qdev));
>      pci_dev = do_pci_register_device(pci_dev, bus,
>                                       object_get_typename(OBJECT(qdev)),
> -                                     pci_dev->devfn);
> +                                     object_property_get_int(OBJECT(qdev), "addr", NULL));
>      if (pci_dev == NULL)
>          return -1;
>  
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices
  2014-09-02 13:03   ` Michael S. Tsirkin
@ 2014-09-02 13:44     ` Knut Omang
  2014-09-02 13:55       ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Knut Omang @ 2014-09-02 13:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi, Laszlo Ersek,
	Alexander Graf, qemu-devel, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong,
	Gabriel Somlo

On Tue, 2014-09-02 at 16:03 +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 02, 2014 at 01:00:03PM +0200, Knut Omang wrote:
> > Without this, the devfn argument to pci_create_*()
> > does not affect the assigned devfn.
> > 
> > Needed to support (VF_STRIDE,VF_OFFSET) values other than (1,1)
> > for SR/IOV.
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> 
> Sorry, I don't understand the explanation exactly.
> pci_dev->devfn is not set correctly? why?

This probably has been broken by some of the qom adaptation if I
understand it right: The devfn parameter in pci_create_multifunction()
is only used to set the "addr" property, which in turn is not used
anywhere as far as I can see. So the pci_dev->devfn has it's default
value.

What I observe is that when I enable VFs with the igb example device,
now with stride = 2 and offset 0x80 (as my real lab hardware implements)
and enable some VFs, this yields devfn's of 0x80, 0x82, 0x84... sent as
parameters to pci_create..(), I still end up with this (eg. 
the "natural" next function will be selected instead of the provided
devfn):

01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
01:00.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
01:00.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
01:00.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

With this patch applied, I get the expected:

01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
01:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
01:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
01:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
01:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

> > ---
> >  hw/pci/pci.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index daeaeac..6b21dee 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -1757,7 +1757,7 @@ static int pci_qdev_init(DeviceState *qdev)
> >      bus = PCI_BUS(qdev_get_parent_bus(qdev));
> >      pci_dev = do_pci_register_device(pci_dev, bus,
> >                                       object_get_typename(OBJECT(qdev)),
> > -                                     pci_dev->devfn);
> > +                                     object_property_get_int(OBJECT(qdev), "addr", NULL));
> >      if (pci_dev == NULL)
> >          return -1;
> >  
> > -- 
> > 1.9.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices
  2014-09-02 13:44     ` Knut Omang
@ 2014-09-02 13:55       ` Michael S. Tsirkin
  2014-10-03 11:59         ` Knut Omang
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2014-09-02 13:55 UTC (permalink / raw)
  To: Knut Omang
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi, Laszlo Ersek,
	Alexander Graf, qemu-devel, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong,
	Gabriel Somlo

On Tue, Sep 02, 2014 at 03:44:28PM +0200, Knut Omang wrote:
> On Tue, 2014-09-02 at 16:03 +0300, Michael S. Tsirkin wrote:
> > On Tue, Sep 02, 2014 at 01:00:03PM +0200, Knut Omang wrote:
> > > Without this, the devfn argument to pci_create_*()
> > > does not affect the assigned devfn.
> > > 
> > > Needed to support (VF_STRIDE,VF_OFFSET) values other than (1,1)
> > > for SR/IOV.
> > > 
> > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > 
> > Sorry, I don't understand the explanation exactly.
> > pci_dev->devfn is not set correctly? why?
> 
> This probably has been broken by some of the qom adaptation if I
> understand it right: The devfn parameter in pci_create_multifunction()
> is only used to set the "addr" property, which in turn is not used
> anywhere as far as I can see. So the pci_dev->devfn has it's default
> value.

There are a bunch of places that depend on devfn being correct
though so I think this needs to be fixed.


> 
> What I observe is that when I enable VFs with the igb example device,
> now with stride = 2 and offset 0x80 (as my real lab hardware implements)
> and enable some VFs, this yields devfn's of 0x80, 0x82, 0x84... sent as
> parameters to pci_create..(), I still end up with this (eg. 
> the "natural" next function will be selected instead of the provided
> devfn):
> 
> 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 01:00.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 01:00.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 01:00.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 01:00.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 
> With this patch applied, I get the expected:
> 
> 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 01:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 01:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 01:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 01:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

> > > ---
> > >  hw/pci/pci.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index daeaeac..6b21dee 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -1757,7 +1757,7 @@ static int pci_qdev_init(DeviceState *qdev)
> > >      bus = PCI_BUS(qdev_get_parent_bus(qdev));
> > >      pci_dev = do_pci_register_device(pci_dev, bus,
> > >                                       object_get_typename(OBJECT(qdev)),
> > > -                                     pci_dev->devfn);
> > > +                                     object_property_get_int(OBJECT(qdev), "addr", NULL));
> > >      if (pci_dev == NULL)
> > >          return -1;
> > >  
> > > -- 
> > > 1.9.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices
  2014-09-02 13:55       ` Michael S. Tsirkin
@ 2014-10-03 11:59         ` Knut Omang
  0 siblings, 0 replies; 10+ messages in thread
From: Knut Omang @ 2014-10-03 11:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Peter Crosthwaite, Stefan Hajnoczi, Laszlo Ersek,
	Alexander Graf, qemu-devel, Fabien Chouteau, Luiz Capitulino,
	Beniamino Galvani, Alex Williamson, Gonglei (Arei),
	Jan Kiszka, Anthony Liguori, Paolo Bonzini, Amos Kong,
	Gabriel Somlo

Michael,
 
I just realized this patch set might have got "stuck" because I did not
respond to this?

Or did I miss something else?

On Tue, 2014-09-02 at 16:55 +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 02, 2014 at 03:44:28PM +0200, Knut Omang wrote:
> > On Tue, 2014-09-02 at 16:03 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Sep 02, 2014 at 01:00:03PM +0200, Knut Omang wrote:
> > > > Without this, the devfn argument to pci_create_*()
> > > > does not affect the assigned devfn.
> > > > 
> > > > Needed to support (VF_STRIDE,VF_OFFSET) values other than (1,1)
> > > > for SR/IOV.
> > > > 
> > > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > 
> > > Sorry, I don't understand the explanation exactly.
> > > pci_dev->devfn is not set correctly? why?
> > 
> > This probably has been broken by some of the qom adaptation if I
> > understand it right: The devfn parameter in pci_create_multifunction()
> > is only used to set the "addr" property, which in turn is not used
> > anywhere as far as I can see. So the pci_dev->devfn has it's default
> > value.
> 
> There are a bunch of places that depend on devfn being correct
> though so I think this needs to be fixed.

The problem with the current code is that pci_dev->devfn is used as an
argument to the function that is the one that sets pci_dev->devfn,
anything that uses devfn after it has been registered (the normal case)
should be fine as far as I can see?

Thanks,

Knut

> > 
> > What I observe is that when I enable VFs with the igb example device,
> > now with stride = 2 and offset 0x80 (as my real lab hardware implements)
> > and enable some VFs, this yields devfn's of 0x80, 0x82, 0x84... sent as
> > parameters to pci_create..(), I still end up with this (eg. 
> > the "natural" next function will be selected instead of the provided
> > devfn):
> > 
> > 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> > 01:00.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 01:00.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 01:00.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 01:00.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 
> > With this patch applied, I get the expected:
> > 
> > 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> > 01:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 01:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 01:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> > 01:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 
> > > > ---
> > > >  hw/pci/pci.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index daeaeac..6b21dee 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -1757,7 +1757,7 @@ static int pci_qdev_init(DeviceState *qdev)
> > > >      bus = PCI_BUS(qdev_get_parent_bus(qdev));
> > > >      pci_dev = do_pci_register_device(pci_dev, bus,
> > > >                                       object_get_typename(OBJECT(qdev)),
> > > > -                                     pci_dev->devfn);
> > > > +                                     object_property_get_int(OBJECT(qdev), "addr", NULL));
> > > >      if (pci_dev == NULL)
> > > >          return -1;
> > > >  
> > > > -- 
> > > > 1.9.0
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-10-03 12:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 11:00 [Qemu-devel] [PATCH v2 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
2014-09-02 13:03   ` Michael S. Tsirkin
2014-09-02 13:44     ` Knut Omang
2014-09-02 13:55       ` Michael S. Tsirkin
2014-10-03 11:59         ` Knut Omang
2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 2/4] pci: Avoid losing config updates to MSI/MSIX cap regs Knut Omang
2014-09-02 12:57   ` Michael S. Tsirkin
2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 3/4] pci: Update pci_regs header Knut Omang
2014-09-02 11:00 ` [Qemu-devel] [PATCH v2 4/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.