All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/13]  pcie port switch emulators
@ 2010-09-15  5:38 Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 01/13] msi: implemented msi Isaku Yamahata
                   ` (13 more replies)
  0 siblings, 14 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

Here is v3 of the patch series.
I didn't address the pcie_init() issue yet with v3 because
there are already many changes. So I'd like to get feed back
before going too far. The issue would be addressed with the next spin
if necessary.

new patches: 2, 3, 4, 5, 6
Other patches are (almost) same as before except adjustment to compile.
The patches of 1 and 13 can be harmlessly merged, I think.

Patch description:
This patch series implements pcie port switch emulators
which is basic part for pcie/q35 support.
This is for mst/pci tree.

changes v2 -> v3:
- msi: improved commant and simplified shift/ffs dance
- pci w1c config register framework
- split pcie.[ch] into pcie_regs.h, pcie.[ch] and pcie_aer.[ch]
- pcie, aer: many changes by following reviews.

changes v1 -> v2:
- update msi
- dropped already pushed out patches.
- added msix patches.

Isaku Yamahata (13):
  msi: implemented msi.
  pci: implement RW1C register framework.
  pci: introduce helper function pci_shift_word/long which returns
    shifted value.
  pcie: add pcie constants to pcie_regs.h
  pcie: helper functions for pcie capability and extended capability.
  pcie/aer: helper functions for pcie aer capability.
  pcie port: define struct PCIEPort/PCIESlot and helper functions
  pcie root port: implement pcie root port.
  pcie upstream port: pci express switch upstream port.
  pcie downstream port: pci express switch downstream port.
  pcie/hotplug: glue pushing attention button command. pcie_abp
  pcie/aer: glue aer error injection into qemu monitor.
  msix: clear not only INTA, but all INTx when MSI-X is enabled.

 Makefile.objs        |    6 +-
 hw/msi.c             |  358 ++++++++++++++++++++
 hw/msi.h             |   41 +++
 hw/msix.c            |    5 +-
 hw/pci.c             |    5 +
 hw/pci.h             |   51 +++-
 hw/pcie.c            |  638 ++++++++++++++++++++++++++++++++++++
 hw/pcie.h            |  102 ++++++
 hw/pcie_aer.c        |  881 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie_aer.h        |  105 ++++++
 hw/pcie_downstream.c |  218 +++++++++++++
 hw/pcie_downstream.h |   33 ++
 hw/pcie_port.c       |  188 +++++++++++
 hw/pcie_port.h       |   51 +++
 hw/pcie_regs.h       |  170 ++++++++++
 hw/pcie_root.c       |  240 ++++++++++++++
 hw/pcie_root.h       |   32 ++
 hw/pcie_upstream.c   |  200 ++++++++++++
 hw/pcie_upstream.h   |   32 ++
 qemu-common.h        |    6 +
 qemu-monitor.hx      |   36 ++
 sysemu.h             |    9 +
 22 files changed, 3401 insertions(+), 6 deletions(-)
 create mode 100644 hw/msi.c
 create mode 100644 hw/msi.h
 create mode 100644 hw/pcie.c
 create mode 100644 hw/pcie.h
 create mode 100644 hw/pcie_aer.c
 create mode 100644 hw/pcie_aer.h
 create mode 100644 hw/pcie_downstream.c
 create mode 100644 hw/pcie_downstream.h
 create mode 100644 hw/pcie_port.c
 create mode 100644 hw/pcie_port.h
 create mode 100644 hw/pcie_regs.h
 create mode 100644 hw/pcie_root.c
 create mode 100644 hw/pcie_root.h
 create mode 100644 hw/pcie_upstream.c
 create mode 100644 hw/pcie_upstream.h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 01/13] msi: implemented msi.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-15 13:03   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 02/13] pci: implement RW1C register framework Isaku Yamahata
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

implemented msi support functions.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

---
Changes v2 -> v3:
- improved comment wording.
- simplified shift/ffs dance.

Changes v1 -> v2:
- opencode some oneline helper function/macros for readability
- use ffs where appropriate
- rename some functions/variables as suggested.
- added assert()
- 1 -> 1U
- clear INTx# when MSI is enabled
- clear pending bits for freed vectors.
- check the requested number of vectors.
---
 Makefile.objs |    2 +-
 hw/msi.c      |  358 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/msi.h      |   41 +++++++
 hw/pci.h      |   10 +-
 4 files changed, 407 insertions(+), 4 deletions(-)
 create mode 100644 hw/msi.c
 create mode 100644 hw/msi.h

diff --git a/Makefile.objs b/Makefile.objs
index 594894b..5f5a4c5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += msix.o
+hw-obj-y += msix.o msi.o
 
 # PCI network cards
 hw-obj-y += ne2000.o
diff --git a/hw/msi.c b/hw/msi.c
new file mode 100644
index 0000000..65c163f
--- /dev/null
+++ b/hw/msi.c
@@ -0,0 +1,358 @@
+/*
+ * msi.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "msi.h"
+
+/* Eventually those constants should go to Linux pci_regs.h */
+#define PCI_MSI_PENDING_32      0x10
+#define PCI_MSI_PENDING_64      0x14
+
+/* PCI_MSI_ADDRESS_LO */
+#define PCI_MSI_ADDRESS_LO_MASK         (~0x3)
+
+/* If we get rid of cap allocator, we won't need those. */
+#define PCI_MSI_32_SIZEOF       0x0a
+#define PCI_MSI_64_SIZEOF       0x0e
+#define PCI_MSI_32M_SIZEOF      0x14
+#define PCI_MSI_64M_SIZEOF      0x18
+
+/* If we get rid of cap allocator, we won't need this. */
+static inline uint8_t msi_cap_sizeof(uint16_t flags)
+{
+    switch (flags & (PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT)) {
+    case PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT:
+        return PCI_MSI_64M_SIZEOF;
+    case PCI_MSI_FLAGS_64BIT:
+        return PCI_MSI_64_SIZEOF;
+    case PCI_MSI_FLAGS_MASKBIT:
+        return PCI_MSI_32M_SIZEOF;
+    case 0:
+        return PCI_MSI_32_SIZEOF;
+    default:
+        abort();
+        break;
+    }
+    return 0;
+}
+
+//#define MSI_DEBUG
+
+#ifdef MSI_DEBUG
+# define MSI_DPRINTF(fmt, ...)                                          \
+    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
+#else
+# define MSI_DPRINTF(fmt, ...)  do { } while (0)
+#endif
+#define MSI_DEV_PRINTF(dev, fmt, ...)                                   \
+    MSI_DPRINTF("%s:%x " fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
+
+static inline uint8_t msi_nr_vectors(uint16_t flags)
+{
+    return 1U <<
+        ((flags & PCI_MSI_FLAGS_QSIZE) >> (ffs(PCI_MSI_FLAGS_QSIZE) - 1));
+}
+
+static inline uint8_t msi_flags_off(const PCIDevice* dev)
+{
+    return dev->msi_cap + PCI_MSI_FLAGS;
+}
+
+static inline uint8_t msi_address_lo_off(const PCIDevice* dev)
+{
+    return dev->msi_cap + PCI_MSI_ADDRESS_LO;
+}
+
+static inline uint8_t msi_address_hi_off(const PCIDevice* dev)
+{
+    return dev->msi_cap + PCI_MSI_ADDRESS_HI;
+}
+
+static inline uint8_t msi_data_off(const PCIDevice* dev, bool msi64bit)
+{
+    return dev->msi_cap + (msi64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32);
+}
+
+static inline uint8_t msi_mask_off(const PCIDevice* dev, bool msi64bit)
+{
+    return dev->msi_cap + (msi64bit ? PCI_MSI_MASK_64 : PCI_MSI_MASK_32);
+}
+
+static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit)
+{
+    return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32);
+}
+
+bool msi_enabled(const PCIDevice *dev)
+{
+    return msi_present(dev) &&
+        (pci_get_word(dev->config + msi_flags_off(dev)) &
+         PCI_MSI_FLAGS_ENABLE);
+}
+
+int msi_init(struct PCIDevice *dev, uint8_t offset,
+             uint8_t nr_vectors, bool msi64bit, bool msi_per_vector_mask)
+{
+    uint8_t vectors_order;
+    uint16_t flags;
+    uint8_t cap_size;
+    int config_offset;
+    MSI_DEV_PRINTF(dev,
+                   "init offset: 0x%"PRIx8" vector: %"PRId8
+                   " 64bit %d mask %d\n",
+                   offset, nr_vectors, msi64bit, msi_per_vector_mask);
+
+    assert(!(nr_vectors & (nr_vectors - 1)));   /* power of 2 */
+    assert(nr_vectors > 0);
+    assert(nr_vectors <= 32);   /* the nr of MSI vectors is up to 32 */
+    vectors_order = ffs(nr_vectors) - 1;
+
+    flags = vectors_order << (ffs(PCI_MSI_FLAGS_QMASK) - 1);
+    if (msi64bit) {
+        flags |= PCI_MSI_FLAGS_64BIT;
+    }
+    if (msi_per_vector_mask) {
+        flags |= PCI_MSI_FLAGS_MASKBIT;
+    }
+
+    cap_size = msi_cap_sizeof(flags);
+    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
+    if (config_offset < 0) {
+        return config_offset;
+    }
+
+    dev->msi_cap = config_offset;
+    dev->cap_present |= QEMU_PCI_CAP_MSI;
+
+    pci_set_word(dev->config + msi_flags_off(dev), flags);
+    pci_set_word(dev->wmask + msi_flags_off(dev),
+                 PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
+    pci_set_long(dev->wmask + msi_address_lo_off(dev),
+                 PCI_MSI_ADDRESS_LO_MASK);
+    if (msi64bit) {
+        pci_set_long(dev->wmask + msi_address_hi_off(dev), 0xffffffff);
+    }
+    pci_set_word(dev->wmask + msi_data_off(dev, msi64bit), 0xffff);
+
+    if (msi_per_vector_mask) {
+        pci_set_long(dev->wmask + msi_mask_off(dev, msi64bit),
+                     (1U << nr_vectors) - 1);
+    }
+    return config_offset;
+}
+
+void msi_uninit(struct PCIDevice *dev)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    uint8_t cap_size = msi_cap_sizeof(flags);
+    pci_del_capability(dev, PCI_CAP_ID_MSIX, cap_size);
+    MSI_DEV_PRINTF(dev, "uninit\n");
+}
+
+void msi_reset(PCIDevice *dev)
+{
+    uint16_t flags;
+    bool msi64bit;
+
+    flags = pci_get_word(dev->config + msi_flags_off(dev));
+    flags &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
+    msi64bit = flags & PCI_MSI_FLAGS_64BIT;
+
+    pci_set_word(dev->config + msi_flags_off(dev), flags);
+    pci_set_long(dev->config + msi_address_lo_off(dev), 0);
+    if (msi64bit) {
+        pci_set_long(dev->config + msi_address_hi_off(dev), 0);
+    }
+    pci_set_word(dev->config + msi_data_off(dev, msi64bit), 0);
+    if (flags & PCI_MSI_FLAGS_MASKBIT) {
+        pci_set_long(dev->config + msi_mask_off(dev, msi64bit), 0);
+        pci_set_long(dev->config + msi_pending_off(dev, msi64bit), 0);
+    }
+    MSI_DEV_PRINTF(dev, "reset\n");
+}
+
+static bool msi_is_masked(const PCIDevice *dev, uint8_t vector)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    uint32_t mask;
+
+    if (!(flags & PCI_MSI_FLAGS_MASKBIT)) {
+        return false;
+    }
+
+    mask = pci_get_long(dev->config +
+                        msi_mask_off(dev, flags & PCI_MSI_FLAGS_64BIT));
+    return mask & (1U << vector);
+}
+
+static void msi_set_pending(PCIDevice *dev, uint8_t vector)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
+    uint32_t pending;
+
+    assert(flags & PCI_MSI_FLAGS_MASKBIT);
+
+    pending = pci_get_long(dev->config + msi_pending_off(dev, msi64bit));
+    pending |= 1U << vector;
+    pci_set_long(dev->config + msi_pending_off(dev, msi64bit), pending);
+    MSI_DEV_PRINTF(dev, "pending vector 0x%"PRIx8"\n", vector);
+}
+
+void msi_notify(PCIDevice *dev, uint8_t vector)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
+    uint8_t nr_vectors = msi_nr_vectors(flags);
+    uint64_t address;
+    uint32_t data;
+
+    assert(vector < nr_vectors);
+    if (msi_is_masked(dev, vector)) {
+        msi_set_pending(dev, vector);
+        return;
+    }
+
+    if (msi64bit){
+        address = pci_get_quad(dev->config + msi_address_lo_off(dev));
+    } else {
+        address = pci_get_long(dev->config + msi_address_lo_off(dev));
+    }
+
+    /* upper bit 31:16 is zero */
+    data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
+    if (nr_vectors > 1) {
+        data &= ~(nr_vectors - 1);
+        data |= vector;
+    }
+
+    MSI_DEV_PRINTF(dev,
+                   "notify vector 0x%"PRIx8
+                   " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
+                   vector, address, data);
+    stl_phys(address, data);
+}
+
+/* call this function after updating configs by pci_default_write_config(). */
+void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
+    bool msi_per_vector_mask = flags & PCI_MSI_FLAGS_MASKBIT;
+    uint8_t nr_vectors;
+    uint8_t log_num_vecs;
+    uint8_t log_max_vecs;
+    uint8_t vector;
+    uint32_t pending;
+    int i;
+
+#ifdef MSI_DEBUG
+    if (ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
+        MSI_DEV_PRINTF(dev, "addr 0x%"PRIx32" val 0x%"PRIx32" len %d\n",
+                       addr, val, len);
+        MSI_DEV_PRINTF(dev, "ctrl: 0x%"PRIx16" address: 0x%"PRIx32,
+                       flags,
+                       pci_get_long(dev->config + msi_address_lo_off(dev)));
+        if (msi64bit) {
+            fprintf(stderr, " addrss-hi: 0x%"PRIx32,
+                    pci_get_long(dev->config + msi_address_hi_off(dev)));
+        }
+        fprintf(stderr, " data: 0x%"PRIx16,
+                pci_get_word(dev->config + msi_data_off(dev, msi64bit)));
+        if (flags & PCI_MSI_FLAGS_MASKBIT) {
+            fprintf(stderr, " mask 0x%"PRIx32" pending 0x%"PRIx32,
+                    pci_get_long(dev->config + msi_mask_off(dev, msi64bit)),
+                    pci_get_long(dev->config + msi_pending_off(dev, msi64bit)));
+        }
+        fprintf(stderr, "\n");
+    }
+#endif
+
+    /* Are we modified? */
+    if (!(ranges_overlap(addr, len, msi_flags_off(dev), 2) ||
+          (msi_per_vector_mask &&
+           ranges_overlap(addr, len, msi_mask_off(dev, msi64bit), 4)))) {
+        return;
+    }
+
+    if (!(flags & PCI_MSI_FLAGS_ENABLE)) {
+        return;
+    }
+
+    /*
+     * Now MSI is enabled, clear INTx# interrupts.
+     * the driver is prohibited from writing enable bit to mask
+     * a service request. But the guest OS could do this.
+     * So we just discard the interrupts as moderate fallback.
+     *
+     * 6.8.3.3. Enabling Operation
+     *   While enabled for MSI or MSI-X operation, a function is prohibited
+     *   from using its INTx# pin (if implemented) to request
+     *   service (MSI, MSI-X, and INTx# are mutually exclusive).
+     */
+    for (i = 0; i < PCI_NUM_PINS; ++i) {
+        qemu_set_irq(dev->irq[i], 0);
+    }
+
+    /*
+     * nr_vectors might be set bigger than capable. So clamp it.
+     * This is not legal by spec, so we can do anything we like,
+     * just don't crash the host
+     */
+    log_num_vecs =
+        (flags & PCI_MSI_FLAGS_QSIZE) >> (ffs(PCI_MSI_FLAGS_QSIZE) - 1);
+    log_max_vecs =
+        (flags & PCI_MSI_FLAGS_QMASK) >> (ffs(PCI_MSI_FLAGS_QMASK) - 1);
+    if (log_num_vecs > log_max_vecs) {
+        flags &= ~PCI_MSI_FLAGS_QSIZE;
+        flags |= log_max_vecs << (ffs(PCI_MSI_FLAGS_QSIZE) - 1);
+        pci_set_word(dev->config + msi_flags_off(dev), flags);
+    }
+
+    if (!msi_per_vector_mask) {
+        /* if per vector masking isn't supported,
+           there is no pending interrupt. */
+        return;
+    }
+
+    nr_vectors = msi_nr_vectors(flags);
+
+    /* This will discard pending interrupts, if any. */
+    pending = pci_get_long(dev->config + msi_pending_off(dev, msi64bit));
+    pending &= (1U << nr_vectors) - 1;
+    pci_set_long(dev->config + msi_pending_off(dev, msi64bit), pending);
+
+    /* deliver pending interrupts which are unmasked */
+    for (vector = 0; vector < nr_vectors; ++vector) {
+        if (msi_is_masked(dev, vector) || !(pending & (1U << vector))) {
+            continue;
+        }
+
+        pending &= ~(1U << vector);
+        pci_set_long(dev->config + msi_pending_off(dev, msi64bit),
+                     pending);
+        msi_notify(dev, vector);
+    }
+}
+
+uint8_t msi_nr_vectors_allocated(const PCIDevice *dev)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    return msi_nr_vectors(flags);
+}
diff --git a/hw/msi.h b/hw/msi.h
new file mode 100644
index 0000000..eac9c78
--- /dev/null
+++ b/hw/msi.h
@@ -0,0 +1,41 @@
+/*
+ * msi.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_MSI_H
+#define QEMU_MSI_H
+
+#include "qemu-common.h"
+#include "pci.h"
+
+bool msi_enabled(const PCIDevice *dev);
+int msi_init(struct PCIDevice *dev, uint8_t offset,
+             uint8_t nr_vectors, bool msi64bit, bool msi_per_vector_mask);
+void msi_uninit(struct PCIDevice *dev);
+void msi_reset(PCIDevice *dev);
+void msi_notify(PCIDevice *dev, uint8_t vector);
+void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
+uint8_t msi_nr_vectors_allocated(const PCIDevice *dev);
+
+static inline bool msi_present(const PCIDevice *dev)
+{
+    return dev->cap_present & QEMU_PCI_CAP_MSI;
+}
+
+#endif /* QEMU_MSI_H */
diff --git a/hw/pci.h b/hw/pci.h
index 1c6075e..3879708 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -109,11 +109,12 @@ typedef struct PCIIORegion {
 
 /* Bits in cap_present field. */
 enum {
-    QEMU_PCI_CAP_MSIX = 0x1,
-    QEMU_PCI_CAP_EXPRESS = 0x2,
+    QEMU_PCI_CAP_MSI = 0x1,
+    QEMU_PCI_CAP_MSIX = 0x2,
+    QEMU_PCI_CAP_EXPRESS = 0x4,
 
     /* multifunction capable device */
-#define QEMU_PCI_CAP_MULTIFUNCTION_BITNR        2
+#define QEMU_PCI_CAP_MULTIFUNCTION_BITNR        3
     QEMU_PCI_CAP_MULTIFUNCTION = (1 << QEMU_PCI_CAP_MULTIFUNCTION_BITNR),
 };
 
@@ -168,6 +169,9 @@ struct PCIDevice {
     /* Version id needed for VMState */
     int32_t version_id;
 
+    /* Offset of MSI capability in config space */
+    uint8_t msi_cap;
+
     /* Location of option rom */
     char *romfile;
     ram_addr_t rom_offset;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 02/13] pci: implement RW1C register framework.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 01/13] msi: implemented msi Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value Isaku Yamahata
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

Implement RW1C register framework.
With this patch, it would be easy to implement
W1C(Write 1 to Clear) register by just setting w1cmask.
Later RW1C register will be used by pcie.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hw/pci.c |    5 +++++
 hw/pci.h |    3 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index ceee291..afb52dd 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -627,6 +627,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
     pci_dev->config = qemu_mallocz(config_size);
     pci_dev->cmask = qemu_mallocz(config_size);
     pci_dev->wmask = qemu_mallocz(config_size);
+    pci_dev->w1cmask = qemu_mallocz(config_size);
     pci_dev->used = qemu_mallocz(config_size);
 }
 
@@ -635,6 +636,7 @@ static void pci_config_free(PCIDevice *pci_dev)
     qemu_free(pci_dev->config);
     qemu_free(pci_dev->cmask);
     qemu_free(pci_dev->wmask);
+    qemu_free(pci_dev->w1cmask);
     qemu_free(pci_dev->used);
 }
 
@@ -997,7 +999,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 
     for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
         uint8_t wmask = d->wmask[addr + i];
+        uint8_t w1cmask = d->w1cmask[addr + i];
+        assert(!(wmask & w1cmask));
         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
+        d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
     }
     if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
diff --git a/hw/pci.h b/hw/pci.h
index 3879708..f4ea97a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -130,6 +130,9 @@ struct PCIDevice {
     /* Used to implement R/W bytes */
     uint8_t *wmask;
 
+    /* Used to implement RW1C(Write 1 to Clear) bytes */
+    uint8_t *w1cmask;
+
     /* Used to allocate config space for capabilities. */
     uint8_t *used;
 
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 01/13] msi: implemented msi Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 02/13] pci: implement RW1C register framework Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-15 12:49   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 04/13] pcie: add pcie constants to pcie_regs.h Isaku Yamahata
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

introduce helper function pci_shift_{word, long}() which returns
returns shifted word/long of given position and range.
They will be used later.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hw/pci.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index f4ea97a..630631b 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -327,6 +327,25 @@ pci_config_set_interrupt_pin(uint8_t *pci_config, uint8_t val)
     pci_set_byte(&pci_config[PCI_INTERRUPT_PIN], val);
 }
 
+static inline uint32_t
+pci_shift_long(uint32_t addr, uint32_t val, uint32_t pos)
+{
+    if (addr >= pos) {
+        assert(addr - pos <= 32 / 8);
+        val <<= (addr - pos) * 8;
+    } else {
+        assert(pos - addr <= 32 / 8);
+        val >>= (pos - addr) * 8;
+    }
+    return val;
+}
+
+static inline uint16_t
+pci_shift_word(uint32_t addr, uint32_t val, uint32_t pos)
+{
+    return pci_shift_long(addr, val, pos);
+}
+
 typedef int (*pci_qdev_initfn)(PCIDevice *dev);
 typedef struct {
     DeviceInfo qdev;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 04/13] pcie: add pcie constants to pcie_regs.h
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (2 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-20 18:14   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability Isaku Yamahata
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

add pcie constants to pcie_regs.h.
Those constants should go to Linux pci_regs.h and then the file should
go away eventually.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- moved out pcie constants from pcie.c to pcie_regs.h.
- removed unused macros
---
 hw/pcie_regs.h |  170 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 170 insertions(+), 0 deletions(-)
 create mode 100644 hw/pcie_regs.h

diff --git a/hw/pcie_regs.h b/hw/pcie_regs.h
new file mode 100644
index 0000000..abd39ef
--- /dev/null
+++ b/hw/pcie_regs.h
@@ -0,0 +1,170 @@
+/*
+ * pcie.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef QEMU_PCIE_REGS_H
+#define QEMU_PCIE_REGS_H
+
+/*
+ * TODO:
+ * Those constants and macros should be go to Linux pci_regs.h
+ * Once they're merged, they will go away.
+ */
+
+/* express capability */
+
+#define PCI_EXP_VER2_SIZEOF             0x3c /* express capability of ver. 2 */
+#define PCI_EXT_CAP_VER_SHIFT           16
+#define PCI_EXT_CAP_NEXT_SHIFT          20
+#define PCI_EXT_CAP_NEXT_MASK           (0xffc << PCI_EXT_CAP_NEXT_SHIFT)
+
+#define PCI_EXT_CAP(id, ver, next)                                      \
+    ((id) |                                                             \
+     ((ver) << PCI_EXT_CAP_VER_SHIFT) |                                 \
+     ((next) << PCI_EXT_CAP_NEXT_SHIFT))
+
+#define PCI_EXT_CAP_ALIGN               4
+#define PCI_EXT_CAP_ALIGNUP(x)                                  \
+    (((x) + PCI_EXT_CAP_ALIGN - 1) & ~(PCI_EXT_CAP_ALIGN - 1))
+
+/* PCI_EXP_FLAGS */
+#define PCI_EXP_FLAGS_VER2              2 /* for now, supports only ver. 2 */
+#define PCI_EXP_FLAGS_IRQ_SHIFT         (ffs(PCI_EXP_FLAGS_IRQ) - 1)
+#define PCI_EXP_FLAGS_TYPE_SHIFT        (ffs(PCI_EXP_FLAGS_TYPE) - 1)
+
+
+/* PCI_EXP_LINK{CAP, STA} */
+/* link speed */
+#define PCI_EXP_LNK_LS_25               1
+
+#define PCI_EXP_LNK_MLW_SHIFT           (ffs(PCI_EXP_LNKCAP_MLW) - 1)
+#define PCI_EXP_LNK_MLW_1               (1 << PCI_EXP_LNK_MLW_SHIFT)
+
+/* PCI_EXP_LINKCAP */
+#define PCI_EXP_LNKCAP_ASPMS_SHIFT      (ffs(PCI_EXP_LNKCAP_ASPMS) - 1)
+#define PCI_EXP_LNKCAP_ASPMS_0S         (1 << PCI_EXP_LNKCAP_ASPMS_SHIFT)
+
+#define PCI_EXP_LNKCAP_PN_SHIFT         (ffs(PCI_EXP_LNKCAP_PN) - 1)
+
+#define PCI_EXP_SLTCAP_PSN_SHIFT        (ffs(PCI_EXP_SLTCAP_PSN) - 1)
+
+#define PCI_EXP_SLTCTL_IND_RESERVED     0x0
+#define PCI_EXP_SLTCTL_IND_ON           0x1
+#define PCI_EXP_SLTCTL_IND_BLINK        0x2
+#define PCI_EXP_SLTCTL_IND_OFF          0x3
+#define PCI_EXP_SLTCTL_AIC_SHIFT        (ffs(PCI_EXP_SLTCTL_AIC) - 1)
+#define PCI_EXP_SLTCTL_AIC_OFF                          \
+    (PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_AIC_SHIFT)
+
+#define PCI_EXP_SLTCTL_PIC_SHIFT        (ffs(PCI_EXP_SLTCTL_PIC) - 1)
+#define PCI_EXP_SLTCTL_PIC_OFF                          \
+    (PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT)
+
+#define PCI_EXP_SLTCTL_SUPPORTED        \
+            (PCI_EXP_SLTCTL_ABPE |      \
+             PCI_EXP_SLTCTL_PDCE |      \
+             PCI_EXP_SLTCTL_CCIE |      \
+             PCI_EXP_SLTCTL_HPIE |      \
+             PCI_EXP_SLTCTL_AIC |       \
+             PCI_EXP_SLTCTL_PCC |       \
+             PCI_EXP_SLTCTL_EIC)
+
+#define PCI_EXP_DEVCAP2_EFF             0x100000
+#define PCI_EXP_DEVCAP2_EETLPP          0x200000
+
+#define PCI_EXP_DEVCTL2_EETLPPB         0x80
+
+/* ARI */
+#define PCI_ARI_VER                     1
+#define PCI_ARI_SIZEOF                  8
+
+/* AER */
+#define PCI_ERR_VER                     2
+#define PCI_ERR_SIZEOF                  0x48
+
+#define PCI_ERR_UNC_SDN                 0x00000020      /* surprise down */
+#define PCI_ERR_UNC_ACSV                0x00200000      /* ACS Violation */
+#define PCI_ERR_UNC_INTN                0x00400000      /* Internal Error */
+#define PCI_ERR_UNC_MCBTLP              0x00800000      /* MC Blcoked TLP */
+#define PCI_ERR_UNC_ATOP_EBLOCKED       0x01000000      /* atomic op egress blocked */
+#define PCI_ERR_UNC_TLP_PRF_BLOCKED     0x02000000      /* TLP Prefix Blocked */
+#define PCI_ERR_COR_ADV_NONFATAL        0x00002000      /* Advisory Non-Fatal */
+#define PCI_ERR_COR_INTERNAL            0x00004000      /* Corrected Internal */
+#define PCI_ERR_COR_HL_OVERFLOW         0x00008000      /* Header Long Overflow */
+#define PCI_ERR_CAP_FEP_MASK            0x0000001f
+#define PCI_ERR_CAP_MHRC                0x00000200
+#define PCI_ERR_CAP_MHRE                0x00000400
+#define PCI_ERR_CAP_TLP                 0x00000800
+
+#define PCI_ERR_TLP_PREFIX_LOG          0x38
+
+#define PCI_SEC_STATUS_RCV_SYSTEM_ERROR         0x4000
+
+/* aer root error command/status */
+#define PCI_ERR_ROOT_CMD_EN_MASK        (PCI_ERR_ROOT_CMD_COR_EN |      \
+                                         PCI_ERR_ROOT_CMD_NONFATAL_EN | \
+                                         PCI_ERR_ROOT_CMD_FATAL_EN)
+
+#define PCI_ERR_ROOT_IRQ                0xf8000000
+#define PCI_ERR_ROOT_IRQ_SHIFT          (ffs(PCI_ERR_ROOT_IRQ) - 1)
+#define PCI_ERR_ROOT_STATUS_REPORT_MASK (PCI_ERR_ROOT_COR_RCV |         \
+                                         PCI_ERR_ROOT_MULTI_COR_RCV |   \
+                                         PCI_ERR_ROOT_UNCOR_RCV |       \
+                                         PCI_ERR_ROOT_MULTI_UNCOR_RCV | \
+                                         PCI_ERR_ROOT_FIRST_FATAL |     \
+                                         PCI_ERR_ROOT_NONFATAL_RCV |    \
+                                         PCI_ERR_ROOT_FATAL_RCV)
+
+#define PCI_ERR_UNC_SUPPORTED           (PCI_ERR_UNC_DLP |              \
+                                         PCI_ERR_UNC_SDN |              \
+                                         PCI_ERR_UNC_POISON_TLP |       \
+                                         PCI_ERR_UNC_FCP |              \
+                                         PCI_ERR_UNC_COMP_TIME |        \
+                                         PCI_ERR_UNC_COMP_ABORT |       \
+                                         PCI_ERR_UNC_UNX_COMP |         \
+                                         PCI_ERR_UNC_RX_OVER |          \
+                                         PCI_ERR_UNC_MALF_TLP |         \
+                                         PCI_ERR_UNC_ECRC |             \
+                                         PCI_ERR_UNC_UNSUP |            \
+                                         PCI_ERR_UNC_ACSV |             \
+                                         PCI_ERR_UNC_INTN |             \
+                                         PCI_ERR_UNC_MCBTLP |           \
+                                         PCI_ERR_UNC_ATOP_EBLOCKED |    \
+                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
+
+#define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
+                                         PCI_ERR_UNC_SDN |              \
+                                         PCI_ERR_UNC_FCP |              \
+                                         PCI_ERR_UNC_RX_OVER |          \
+                                         PCI_ERR_UNC_MALF_TLP |         \
+                                         PCI_ERR_UNC_INTN)
+
+#define PCI_ERR_COR_SUPPORTED           (PCI_ERR_COR_RCVR |             \
+                                         PCI_ERR_COR_BAD_TLP |          \
+                                         PCI_ERR_COR_BAD_DLLP |         \
+                                         PCI_ERR_COR_REP_ROLL |         \
+                                         PCI_ERR_COR_REP_TIMER |        \
+                                         PCI_ERR_COR_ADV_NONFATAL |     \
+                                         PCI_ERR_COR_INTERNAL |         \
+                                         PCI_ERR_COR_HL_OVERFLOW)
+
+#define PCI_ERR_COR_MASK_DEFAULT        (PCI_ERR_COR_ADV_NONFATAL |     \
+                                         PCI_ERR_COR_INTERNAL |         \
+                                         PCI_ERR_COR_HL_OVERFLOW)
+
+#endif /* QEMU_PCIE_REGS_H */
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (3 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 04/13] pcie: add pcie constants to pcie_regs.h Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-15 12:43   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability Isaku Yamahata
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

This patch implements helper functions for pci express capability
and pci express extended capability allocation.
NOTE: presence detection depends on pci_qdev_init() change.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

---
Changes v2 -> v3:
- don't use 0b gcc extension. use 0x instead.
- split out constants into pcie_regs.h for linux merge.
- export some helpers for pcie-aer split.
- split out aer helper functions from pcie.c to pcie_aer.c
- embed PCIExpressDevice into PCIDevice.
---
 Makefile.objs |    1 +
 hw/pci.h      |   12 +
 hw/pcie.c     |  638 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie.h     |   96 +++++++++
 qemu-common.h |    1 +
 5 files changed, 748 insertions(+), 0 deletions(-)
 create mode 100644 hw/pcie.c
 create mode 100644 hw/pcie.h

diff --git a/Makefile.objs b/Makefile.objs
index 5f5a4c5..eeb5134 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,6 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
+hw-obj-y += pcie.o
 hw-obj-y += msix.o msi.o
 
 # PCI network cards
diff --git a/hw/pci.h b/hw/pci.h
index 630631b..19e85f5 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -9,6 +9,8 @@
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
 
+#include "pcie.h"
+
 /* PCI bus */
 
 #define PCI_DEVFN(slot, func)   ((((slot) & 0x1f) << 3) | ((func) & 0x07))
@@ -175,6 +177,9 @@ struct PCIDevice {
     /* Offset of MSI capability in config space */
     uint8_t msi_cap;
 
+    /* PCI Express */
+    PCIExpressDevice exp;
+
     /* Location of option rom */
     char *romfile;
     ram_addr_t rom_offset;
@@ -389,6 +394,13 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
     return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
 }
 
+/* These are pci express specific, so should belong to pcie.h.
+   they're here to avoid mutual header dependency. */
+static inline uint8_t pci_pcie_cap(const PCIDevice *d)
+{
+    return pci_is_express(d) ? d->exp.exp_cap : 0;
+}
+
 /* These are not pci specific. Should move into a separate header.
  * Only pci.c uses them, so keep them here for now.
  */
diff --git a/hw/pcie.c b/hw/pcie.c
new file mode 100644
index 0000000..a6f396b
--- /dev/null
+++ b/hw/pcie.c
@@ -0,0 +1,638 @@
+/*
+ * pcie.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "sysemu.h"
+#include "pci_bridge.h"
+#include "pcie.h"
+#include "msix.h"
+#include "msi.h"
+#include "pci_internals.h"
+#include "pcie_regs.h"
+
+//#define DEBUG_PCIE
+#ifdef DEBUG_PCIE
+# define PCIE_DPRINTF(fmt, ...)                                         \
+    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
+#else
+# define PCIE_DPRINTF(fmt, ...) do {} while (0)
+#endif
+#define PCIE_DEV_PRINTF(dev, fmt, ...)                                  \
+    PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
+
+static inline const char *pcie_hp_event_name(enum PCIExpressHotPlugEvent event)
+{
+    switch (event) {
+    case PCI_EXP_HP_EV_ABP:
+        return "attention button pushed";
+    case PCI_EXP_HP_EV_PDC:
+        return "present detection changed";
+    case PCI_EXP_HP_EV_CCI:
+        return "command completed";
+    default:
+        break;
+    }
+    return "Unknown event";
+}
+
+/***************************************************************************
+ * pci express capability helper functions
+ */
+void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level)
+{
+    /* masking/masking interrupt is handled by upper layer.
+     * i.e. msix_notify() for MSI-X
+     *      msi_notify()  for MSI
+     *      pci_set_irq() for INTx
+     */
+    PCIE_DEV_PRINTF(dev, "noitfy vector %d tirgger:%d level:%d\n",
+                    vector, trigger, level);
+    if (msix_enabled(dev)) {
+        if (trigger) {
+            msix_notify(dev, vector);
+        }
+    } else if (msi_enabled(dev)) {
+        if (trigger){
+            msi_notify(dev, vector);
+        }
+    } else {
+        qemu_set_irq(dev->irq[0], level);
+    }
+}
+
+int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
+{
+    int exp_cap;
+    uint8_t *pcie_cap;
+
+    assert(pci_is_express(dev));
+
+    exp_cap = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
+                                 PCI_EXP_VER2_SIZEOF);
+    if (exp_cap < 0) {
+        return exp_cap;
+    }
+    dev->exp.exp_cap = exp_cap;
+
+    /* already done in pci_qdev_init() */
+    assert(dev->cap_present & QEMU_PCI_CAP_EXPRESS);
+
+    pcie_cap = dev->config + pci_pcie_cap(dev);
+
+    /* capability register
+       interrupt message number defaults to 0 */
+    pci_set_word(pcie_cap + PCI_EXP_FLAGS,
+                 ((type << PCI_EXP_FLAGS_TYPE_SHIFT) & PCI_EXP_FLAGS_TYPE) |
+                 PCI_EXP_FLAGS_VER2);
+
+    /* device capability register
+     * table 7-12:
+     * roll based error reporting bit must be set by all
+     * Functions conforming to the ECN, PCI Express Base
+     * Specification, Revision 1.1., or subsequent PCI Express Base
+     * Specification revisions.
+     */
+    pci_set_long(pcie_cap + PCI_EXP_DEVCAP, PCI_EXP_DEVCAP_RBER);
+
+    pci_set_long(pcie_cap + PCI_EXP_LNKCAP,
+                 (port << PCI_EXP_LNKCAP_PN_SHIFT) |
+                 PCI_EXP_LNKCAP_ASPMS_0S |
+                 PCI_EXP_LNK_MLW_1 |
+                 PCI_EXP_LNK_LS_25);
+
+    pci_set_word(pcie_cap + PCI_EXP_LNKSTA,
+                 PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25);
+
+    pci_set_long(pcie_cap + PCI_EXP_DEVCAP2,
+                 PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
+
+    pci_set_word(dev->wmask + exp_cap, PCI_EXP_DEVCTL2_EETLPPB);
+    return exp_cap;
+}
+
+void pcie_cap_exit(PCIDevice *dev)
+{
+    pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER2_SIZEOF);
+}
+
+uint8_t pcie_cap_get_type(const PCIDevice *dev)
+{
+    uint32_t pos = pci_pcie_cap(dev);
+    assert(pos > 0);
+    return (pci_get_word(dev->config + pos + PCI_EXP_FLAGS) &
+            PCI_EXP_FLAGS_TYPE) >> PCI_EXP_FLAGS_TYPE_SHIFT;
+}
+
+/* MSI/MSI-X */
+/* pci express interrupt message number */
+void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    uint16_t tmp;
+
+    assert(vector <= 32);
+    tmp = pci_get_word(pcie_cap + PCI_EXP_FLAGS);
+    tmp &= ~PCI_EXP_FLAGS_IRQ;
+    tmp |= vector << PCI_EXP_FLAGS_IRQ_SHIFT;
+    pci_set_word(pcie_cap + PCI_EXP_FLAGS, tmp);
+}
+
+uint8_t pcie_cap_flags_get_vector(PCIDevice *dev)
+{
+    return (pci_get_word(dev->config + pci_pcie_cap(dev) + PCI_EXP_FLAGS) &
+            PCI_EXP_FLAGS_IRQ) >> PCI_EXP_FLAGS_IRQ_SHIFT;
+}
+
+void pcie_cap_deverr_init(PCIDevice *dev)
+{
+    uint32_t pos = pci_pcie_cap(dev);
+    uint8_t *pcie_cap = dev->config + pos;
+    uint8_t *pcie_wmask = dev->wmask + pos;
+    uint8_t *pcie_w1cmask = dev->wmask + pos;
+
+    pci_set_long(pcie_cap + PCI_EXP_DEVCAP,
+                 pci_get_long(pcie_cap + PCI_EXP_DEVCAP) |
+                 PCI_EXP_DEVCAP_RBER);
+
+    pci_set_long(pcie_wmask + PCI_EXP_DEVCTL,
+                 pci_get_long(pcie_wmask + PCI_EXP_DEVCTL) |
+                 PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE |
+                 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE);
+
+    pci_set_long(pcie_w1cmask + PCI_EXP_DEVSTA,
+                 pci_get_long(pcie_w1cmask + PCI_EXP_DEVSTA) |
+                 PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
+                 PCI_EXP_DEVSTA_URD | PCI_EXP_DEVSTA_URD);
+}
+
+void pcie_cap_deverr_reset(PCIDevice *dev)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    pci_set_long(pcie_cap + PCI_EXP_DEVCTL,
+                 pci_get_long(pcie_cap + PCI_EXP_DEVCTL) &
+                 ~(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE |
+                   PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE));
+}
+
+/*
+ * events: PCI_EXP_HP_EV_xxx
+ * status: bit or of PCI_EXP_SLTSTA_xxx
+ */
+static void pcie_cap_slot_event(PCIDevice *dev,
+                                enum PCIExpressHotPlugEvent events,
+                                uint16_t status)
+{
+    bool trigger;
+    int level;
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    uint16_t sltctl = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
+    uint16_t sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
+
+    PCIE_DEV_PRINTF(dev,
+                    "sltctl: 0x%0x2 sltsta: 0x%02x event:%x %s status:%d\n",
+                    sltctl, sltsta,
+                    events, pcie_hp_event_name(events), status);
+    events &= PCI_EXP_HP_EV_SUPPORTED;
+    if ((sltctl & PCI_EXP_SLTCTL_HPIE) && (sltctl & events) &&
+        ((sltsta ^ events) & events) /* 0 -> 1 */) {
+        trigger = true;
+    } else {
+        trigger = false;
+    }
+
+    if (events & PCI_EXP_HP_EV_PDC) {
+        sltsta &= ~PCI_EXP_SLTSTA_PDS;
+        sltsta |= (status & PCI_EXP_SLTSTA_PDS);
+    }
+    sltsta |= events;
+    pci_set_word(pcie_cap + PCI_EXP_SLTSTA, sltsta);
+    PCIE_DEV_PRINTF(dev, "sltsta -> %02xn", sltsta);
+
+    if ((sltctl & PCI_EXP_SLTCTL_HPIE) && (sltsta & PCI_EXP_HP_EV_SUPPORTED)) {
+        level = 1;
+    } else {
+        level = 0;
+    }
+
+    pcie_notify(dev, pcie_cap_flags_get_vector(dev), trigger, level);
+}
+
+static int pcie_cap_slot_hotplug(DeviceState *qdev,
+                                 PCIDevice *pci_dev, int state)
+{
+    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+    uint8_t *pcie_cap = d->config + pci_pcie_cap(d);
+    uint16_t sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
+
+    if (!pci_dev->qdev.hotplugged) {
+        assert(state); /* this case only happens machine creation. */
+        sltsta |= PCI_EXP_SLTSTA_PDS;
+        pci_set_word(pcie_cap + PCI_EXP_SLTSTA, sltsta);
+        return 0;
+    }
+
+    PCIE_DEV_PRINTF(pci_dev, "hotplug state: %d\n", state);
+    if (sltsta & PCI_EXP_SLTSTA_EIS) {
+        /* the slot is electromechanically locked. */
+        return -EBUSY;
+    }
+
+    if (state) {
+        if (PCI_FUNC(pci_dev->devfn) == 0) {
+            /* event is per slot. Not per function
+             * only generates event for function = 0.
+             * When hot plug, populate functions > 0
+             * and then add function = 0 last.
+             */
+            pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC, PCI_EXP_SLTSTA_PDS);
+        }
+    } else {
+        PCIBridge *br;
+        PCIBus *bus;
+        DeviceState *next;
+        if (PCI_FUNC(pci_dev->devfn) != 0) {
+            /* event is per slot. Not per function.
+               accepts function = 0 only. */
+            return -EINVAL;
+        }
+
+        /* zap all functions. */
+        br = DO_UPCAST(PCIBridge, dev, d);
+        bus = pci_bridge_get_sec_bus(br);
+        QLIST_FOREACH_SAFE(qdev, &bus->qbus.children, sibling, next) {
+            qdev_free(qdev);
+        }
+
+        pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC, 0);
+    }
+    return 0;
+}
+
+/* pci express slot for pci express root/downstream port
+   PCI express capability slot registers */
+void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    uint8_t *pcie_wmask = dev->wmask + pci_pcie_cap(dev);
+    uint8_t *pcie_w1cmask = dev->w1cmask + pci_pcie_cap(dev);
+    uint32_t tmp;
+
+    pci_set_word(pcie_cap + PCI_EXP_FLAGS,
+                 pci_get_word(pcie_cap + PCI_EXP_FLAGS) | PCI_EXP_FLAGS_SLOT);
+
+    tmp = pci_get_long(pcie_cap + PCI_EXP_SLTCAP);
+    tmp &= PCI_EXP_SLTCAP_PSN;
+    tmp |=
+        (slot << PCI_EXP_SLTCAP_PSN_SHIFT) |
+        PCI_EXP_SLTCAP_EIP |
+        PCI_EXP_SLTCAP_HPS |
+        PCI_EXP_SLTCAP_HPC |
+        PCI_EXP_SLTCAP_PIP |
+        PCI_EXP_SLTCAP_AIP |
+        PCI_EXP_SLTCAP_ABP;
+    pci_set_long(pcie_cap + PCI_EXP_SLTCAP, tmp);
+
+    tmp = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
+    tmp &= ~(PCI_EXP_SLTCTL_PIC | PCI_EXP_SLTCTL_AIC);
+    tmp |= PCI_EXP_SLTCTL_PIC_OFF | PCI_EXP_SLTCTL_AIC_OFF;
+    pci_set_word(pcie_cap + PCI_EXP_SLTCTL, tmp);
+    pci_set_word(pcie_wmask + PCI_EXP_SLTCTL,
+                 pci_get_word(pcie_wmask + PCI_EXP_SLTCTL) |
+                 PCI_EXP_SLTCTL_PIC |
+                 PCI_EXP_SLTCTL_AIC |
+                 PCI_EXP_SLTCTL_HPIE |
+                 PCI_EXP_SLTCTL_CCIE |
+                 PCI_EXP_SLTCTL_PDCE |
+                 PCI_EXP_SLTCTL_ABPE);
+
+    pci_set_word(pcie_w1cmask + PCI_EXP_SLTSTA,
+                 pci_get_word(pcie_w1cmask + PCI_EXP_SLTSTA) |
+                 PCI_EXP_HP_EV_SUPPORTED);
+
+    pci_bus_hotplug(pci_bridge_get_sec_bus(DO_UPCAST(PCIBridge, dev, dev)),
+                    pcie_cap_slot_hotplug, &dev->qdev);
+}
+
+void pcie_cap_slot_reset(PCIDevice *dev)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    uint32_t tmp;
+
+    PCIE_DEV_PRINTF(dev, "reset\n");
+
+    tmp = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
+    tmp &= ~(PCI_EXP_SLTCTL_EIC |
+             PCI_EXP_SLTCTL_PIC |
+             PCI_EXP_SLTCTL_AIC |
+             PCI_EXP_SLTCTL_HPIE |
+             PCI_EXP_SLTCTL_CCIE |
+             PCI_EXP_SLTCTL_PDCE |
+             PCI_EXP_SLTCTL_ABPE);
+    tmp |= PCI_EXP_SLTCTL_PIC_OFF | PCI_EXP_SLTCTL_AIC_OFF;
+    pci_set_word(pcie_cap + PCI_EXP_SLTCTL, tmp);
+
+    tmp = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
+    tmp &= ~(PCI_EXP_SLTSTA_EIS | /* by reset, the lock is released */
+             PCI_EXP_SLTSTA_CC |
+             PCI_EXP_SLTSTA_PDC |
+             PCI_EXP_SLTSTA_ABP);
+    pci_set_word(pcie_cap + PCI_EXP_SLTSTA, tmp);
+}
+
+void pcie_cap_slot_write_config(PCIDevice *dev,
+                                uint32_t addr, uint32_t val, int len,
+                                uint16_t sltctl_prev)
+{
+    uint32_t pos = pci_pcie_cap(dev);
+    uint8_t *pcie_cap = dev->config + pos;
+    uint16_t sltctl = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
+    uint16_t sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
+
+    PCIE_DEV_PRINTF(dev,
+                    "addr: 0x%x val: 0x%x len: %d\n"
+                    "\tsltctl_prev: 0x%02x sltctl: 0x%02x sltsta 0x%02x\n",
+                    addr, val, len, sltctl_prev, sltctl, sltsta);
+    /* SLTSTA: process SLTSTA before SLTCTL to avoid spurious interrupt */
+    if (ranges_overlap(addr, len, pos + PCI_EXP_SLTSTA, 2)) {
+        sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
+
+        /* write to stlsta results in clearing bits,
+           so new interrupts won't be generated. */
+        PCIE_DEV_PRINTF(dev, "sltsta -> 0x%02x\n", sltsta);
+    }
+
+    /* SLTCTL */
+    if (ranges_overlap(addr, len, pos + PCI_EXP_SLTCTL, 2)) {
+        PCIE_DEV_PRINTF(dev, "sltctl: 0x%02x -> 0x%02x\n",
+                        sltctl_prev, sltctl);
+        if (pci_shift_word(addr, val, pos + PCI_EXP_SLTCTL) &
+            PCI_EXP_SLTCTL_EIC) {
+            /* toggle PCI_EXP_SLTSTA_EIS */
+            sltsta = (sltsta & ~PCI_EXP_SLTSTA_EIS) |
+                ((sltsta ^ PCI_EXP_SLTSTA_EIS) & PCI_EXP_SLTSTA_EIS);
+            pci_set_word(pcie_cap + PCI_EXP_SLTSTA, sltsta);
+            PCIE_DEV_PRINTF(dev, "PCI_EXP_SLTCTL_EIC: sltsta -> 0x%02x\n",
+                            sltsta);
+        }
+
+        if (sltctl & PCI_EXP_SLTCTL_HPIE) {
+            bool trigger;
+            int level;
+
+            if (((sltctl_prev ^ sltctl) & sltctl) & PCI_EXP_HP_EV_SUPPORTED) {
+                /* 0 -> 1 */
+                trigger = true;
+            } else {
+                trigger = false;
+            }
+            if ((sltctl & sltsta) & PCI_EXP_HP_EV_SUPPORTED) {
+                level = 1;
+            } else {
+                level = 0;
+            }
+            pcie_notify(dev, pcie_cap_flags_get_vector(dev), trigger, level);
+        }
+
+        if (!((sltctl_prev ^ sltctl) & PCI_EXP_SLTCTL_SUPPORTED)) {
+            PCIE_DEV_PRINTF(dev,
+                            "sprious command completion slctl 0x%x -> 0x%x\n",
+                            sltctl_prev, sltctl);
+        }
+
+        /* command completion.
+         * Real hardware might take a while to complete
+         * requested command because physical movement would be involved
+         * like locking the electromechanical lock.
+         * However in our case, command is completed instantaneously above,
+         * so send a command completion event right now.
+         */
+        /* set command completed bit */
+        pcie_cap_slot_event(dev, PCI_EXP_HP_EV_CCI, 0);
+    }
+}
+
+void pcie_cap_slot_push_attention_button(PCIDevice *dev)
+{
+    pcie_cap_slot_event(dev, PCI_EXP_HP_EV_ABP, 0);
+}
+
+/* root control/capabilities/status. PME isn't emulated for now */
+void pcie_cap_root_init(PCIDevice *dev)
+{
+    uint8_t pos = pci_pcie_cap(dev);
+    pci_set_word(dev->wmask + pos + PCI_EXP_RTCTL,
+                 PCI_EXP_RTCTL_SECEE | PCI_EXP_RTCTL_SENFEE |
+                 PCI_EXP_RTCTL_SEFEE);
+}
+
+void pcie_cap_root_reset(PCIDevice *dev)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    pci_set_word(pcie_cap + PCI_EXP_RTCTL, 0);
+}
+
+/* function level reset(FLR) */
+void pcie_cap_flr_init(PCIDevice *dev, pcie_flr_fn flr)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    pci_set_word(pcie_cap + PCI_EXP_DEVCAP,
+                 pci_get_word(pcie_cap + PCI_EXP_DEVCAP) | PCI_EXP_DEVCAP_FLR);
+    dev->exp.flr = flr;
+}
+
+void pcie_cap_flr_write_config(PCIDevice *dev,
+                               uint32_t addr, uint32_t val, int len)
+{
+    uint32_t pos = pci_pcie_cap(dev);
+    if (ranges_overlap(addr, len, pos + PCI_EXP_DEVCTL, 2)) {
+        uint16_t val16 = pci_shift_word(addr, val, pos + PCI_EXP_DEVCTL);
+        if ((val16 & PCI_EXP_DEVCTL_BCR_FLR) && dev->exp.flr) {
+            dev->exp.flr(dev);
+        }
+    }
+}
+
+/* Alternative Routing-ID Interpretation (ARI) */
+/* ari forwarding support for down stream port */
+void pcie_cap_ari_init(PCIDevice *dev)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+    uint8_t *pcie_wmask = dev->wmask + pci_pcie_cap(dev);
+
+    pci_set_long(pcie_cap + PCI_EXP_DEVCAP2,
+                 pci_get_long(pcie_cap + PCI_EXP_DEVCAP2) |
+                 PCI_EXP_DEVCAP2_ARI);
+
+    pci_set_long(pcie_wmask + PCI_EXP_DEVCTL2,
+                 pci_get_long(pcie_wmask + PCI_EXP_DEVCTL2) |
+                 PCI_EXP_DEVCTL2_ARI);
+}
+
+void pcie_cap_ari_reset(PCIDevice *dev)
+{
+    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
+
+    pci_set_long(pcie_cap + PCI_EXP_DEVCTL2,
+                 pci_get_long(pcie_cap + PCI_EXP_DEVCTL2) &
+                 ~PCI_EXP_DEVCTL2_ARI);
+}
+
+bool pcie_cap_is_ari_enabled(const PCIDevice *dev)
+{
+    if (!pci_is_express(dev)) {
+        return false;
+    }
+    if (!pci_pcie_cap(dev)) {
+        return false;
+    }
+
+    return pci_get_long(dev->config + pci_pcie_cap(dev) + PCI_EXP_DEVCTL2) &
+        PCI_EXP_DEVCTL2_ARI;
+}
+
+/**************************************************************************
+ * pci express extended capability allocation functions
+ * uint16_t ext_cap_id (16 bit)
+ * uint8_t cap_ver (4 bit)
+ * uint16_t cap_offset (12 bit)
+ * uint16_t ext_cap_size
+ */
+
+#define PCI_EXT_CAP_NO_ID       ((uint16_t)0)   /* 0 is reserved cap id.
+                                                 * use internally to find the
+                                                 * last capability in the
+                                                 * linked list
+                                                 */
+
+static uint16_t pcie_find_capability_list(PCIDevice *dev, uint16_t cap_id,
+                                          uint16_t *prev_p)
+{
+    uint16_t prev = 0;
+    uint16_t next = PCI_CONFIG_SPACE_SIZE;
+    uint32_t header = pci_get_long(dev->config + next);
+
+    if (!header) {
+        /* no extended capability */
+        next = 0;
+        goto out;
+    }
+
+    while (next) {
+        assert(next >= PCI_CONFIG_SPACE_SIZE);
+        assert(next <= PCIE_CONFIG_SPACE_SIZE - 8);
+
+        header = pci_get_long(dev->config + next);
+        if (PCI_EXT_CAP_ID(header) == cap_id) {
+            break;
+        }
+        prev = next;
+        next = PCI_EXT_CAP_NEXT(header);
+    }
+
+out:
+    if (prev_p) {
+        *prev_p = prev;
+    }
+    return next;
+}
+
+uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id)
+{
+    return pcie_find_capability_list(dev, cap_id, NULL);
+}
+
+static void pcie_ext_cap_set_next(PCIDevice *dev, uint16_t pos, uint16_t next)
+{
+    uint16_t header = pci_get_long(dev->config + pos);
+    assert(!(next & (PCI_EXT_CAP_ALIGN - 1)));
+    header = (header & ~PCI_EXT_CAP_NEXT_MASK) |
+        ((next << PCI_EXT_CAP_NEXT_SHIFT) & PCI_EXT_CAP_NEXT_MASK);
+    pci_set_long(dev->config + pos, header);
+}
+
+/*
+ * caller must supply valid (offset, size) * such that the range shouldn't
+ * overlap with other capability or other registers.
+ * This function doesn't check it.
+ */
+void pcie_add_capability(PCIDevice *dev,
+                         uint16_t cap_id, uint8_t cap_ver,
+                         uint16_t offset, uint16_t size)
+{
+    uint32_t header;
+    uint16_t next;
+
+    assert(offset >= PCI_CONFIG_SPACE_SIZE);
+    assert(offset < offset + size);
+    assert(offset + size < PCIE_CONFIG_SPACE_SIZE);
+    assert(size >= 8);
+    assert(pci_is_express(dev));
+
+    if (offset == PCI_CONFIG_SPACE_SIZE) {
+        header = pci_get_long(dev->config + offset);
+        next = PCI_EXT_CAP_NEXT(header);
+    } else {
+        uint16_t prev;
+        next = pcie_find_capability_list(dev, PCI_EXT_CAP_NO_ID, &prev);
+        assert(next == 0);
+        pcie_ext_cap_set_next(dev, prev, offset);
+    }
+    pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, next));
+
+    /* Make capability read-only by default */
+    memset(dev->wmask + offset, 0, size);
+    /* Check capability by default */
+    memset(dev->cmask + offset, 0xFF, size);
+}
+
+void pcie_del_capability(PCIDevice *dev, uint16_t cap_id, uint16_t size)
+{
+    uint16_t prev;
+    uint16_t offset = pcie_find_capability_list(dev, cap_id, &prev);
+    uint32_t header;
+
+    assert(offset >= PCI_CONFIG_SPACE_SIZE);
+    header = pci_get_long(dev->config + offset);
+    if (prev) {
+        pcie_ext_cap_set_next(dev, prev, PCI_EXT_CAP_NEXT(header));
+    } else {
+        /* move up next ext cap to PCI_CONFIG_SPACE_SIZE? */
+        assert(offset == PCI_CONFIG_SPACE_SIZE);
+        pci_set_long(dev->config + offset,
+                     PCI_EXT_CAP(0, 0, PCI_EXT_CAP_NEXT(header)));
+    }
+
+    /* Make those registers read-only reserved zero */
+    memset(dev->config + offset, 0, size);
+    memset(dev->wmask + offset, 0, size);
+    /* Clear cmask as device-specific registers can't be checked */
+    memset(dev->cmask + offset, 0, size);
+}
+
+/**************************************************************************
+ * pci express extended capability helper functions
+ */
+
+/* ARI */
+void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn)
+{
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_ARI, PCI_ARI_VER,
+                        offset, PCI_ARI_SIZEOF);
+    pci_set_long(dev->config + offset + PCI_ARI_CAP, PCI_ARI_CAP_NFN(nextfn));
+}
diff --git a/hw/pcie.h b/hw/pcie.h
new file mode 100644
index 0000000..37713dc
--- /dev/null
+++ b/hw/pcie.h
@@ -0,0 +1,96 @@
+/*
+ * pcie.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_PCIE_H
+#define QEMU_PCIE_H
+
+#include "hw.h"
+#include "pcie_regs.h"
+
+enum PCIExpressIndicator {
+    /* for attention and power indicator */
+    PCI_EXP_HP_IND_RESERVED     = PCI_EXP_SLTCTL_IND_RESERVED,
+    PCI_EXP_HP_IND_ON           = PCI_EXP_SLTCTL_IND_ON,
+    PCI_EXP_HP_IND_BLINK        = PCI_EXP_SLTCTL_IND_BLINK,
+    PCI_EXP_HP_IND_OFF          = PCI_EXP_SLTCTL_IND_OFF,
+};
+
+enum PCIExpressHotPlugEvent {
+    /* the bits match the bits in Slot Control/Status registers.
+     * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx
+     */
+    PCI_EXP_HP_EV_ABP           = 0x01,         /* attention button preseed */
+    PCI_EXP_HP_EV_PDC           = 0x08,         /* presence detect changed */
+    PCI_EXP_HP_EV_CCI           = 0x10,         /* command completed */
+
+    PCI_EXP_HP_EV_SUPPORTED     = 0x19,         /* supported event mask  */
+    /* events not listed aren't supported */
+};
+
+typedef void (*pcie_flr_fn)(PCIDevice *dev);
+
+struct PCIExpressDevice {
+    /* Offset of express capability in config space */
+    uint8_t exp_cap;
+
+    /* FLR */
+    pcie_flr_fn flr;
+};
+
+void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level);
+
+/* PCI express capability helper functions */
+int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port);
+void pcie_cap_exit(PCIDevice *dev);
+uint8_t pcie_cap_get_type(const PCIDevice *dev);
+void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector);
+uint8_t pcie_cap_flags_get_vector(PCIDevice *dev);
+
+void pcie_cap_deverr_init(PCIDevice *dev);
+void pcie_cap_deverr_reset(PCIDevice *dev);
+
+void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot);
+void pcie_cap_slot_reset(PCIDevice *dev);
+void pcie_cap_slot_write_config(PCIDevice *dev,
+                                uint32_t addr, uint32_t val, int len,
+                                uint16_t sltctl_prev);
+void pcie_cap_slot_push_attention_button(PCIDevice *dev);
+
+void pcie_cap_root_init(PCIDevice *dev);
+void pcie_cap_root_reset(PCIDevice *dev);
+
+void pcie_cap_flr_init(PCIDevice *dev, pcie_flr_fn flr);
+void pcie_cap_flr_write_config(PCIDevice *dev,
+                           uint32_t addr, uint32_t val, int len);
+
+void pcie_cap_ari_init(PCIDevice *dev);
+void pcie_cap_ari_reset(PCIDevice *dev);
+bool pcie_cap_is_ari_enabled(const PCIDevice *dev);
+
+/* PCI express extended capability helper functions */
+uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id);
+void pcie_add_capability(PCIDevice *dev,
+                         uint16_t cap_id, uint8_t cap_ver,
+                         uint16_t offset, uint16_t size);
+void pcie_del_capability(PCIDevice *dev, uint16_t cap_id, uint16_t size);
+
+void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+
+#endif /* QEMU_PCIE_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..6d9ee26 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -219,6 +219,7 @@ typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
 typedef struct PCIDevice PCIDevice;
+typedef struct PCIExpressDevice PCIExpressDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (4 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-22 11:50   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 07/13] pcie port: define struct PCIEPort/PCIESlot and helper functions Isaku Yamahata
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

This patch implements helper functions for pcie aer capability
which will be used later.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- split out from pcie.[ch] to pcie_aer.[ch] to make the files sorter.
- embeded PCIExpressDevice into PCIDevice.
- CodingStyle fix
---
 Makefile.objs |    2 +-
 hw/pci.h      |    7 +
 hw/pcie.h     |    6 +
 hw/pcie_aer.c |  796 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie_aer.h |  105 ++++++++
 qemu-common.h |    3 +
 6 files changed, 918 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_aer.c
 create mode 100644 hw/pcie_aer.h

diff --git a/Makefile.objs b/Makefile.objs
index eeb5134..68bcc48 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += pcie.o
+hw-obj-y += pcie.o pcie_aer.o
 hw-obj-y += msix.o msi.o
 
 # PCI network cards
diff --git a/hw/pci.h b/hw/pci.h
index 19e85f5..73bf901 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -401,6 +401,13 @@ static inline uint8_t pci_pcie_cap(const PCIDevice *d)
     return pci_is_express(d) ? d->exp.exp_cap : 0;
 }
 
+/* AER */
+static inline uint16_t pcie_aer_cap(const PCIDevice *d)
+{
+    assert(pci_is_express(d));
+    return d->exp.aer_cap;
+}
+
 /* These are not pci specific. Should move into a separate header.
  * Only pci.c uses them, so keep them here for now.
  */
diff --git a/hw/pcie.h b/hw/pcie.h
index 37713dc..febcbc2 100644
--- a/hw/pcie.h
+++ b/hw/pcie.h
@@ -23,6 +23,7 @@
 
 #include "hw.h"
 #include "pcie_regs.h"
+#include "pcie_aer.h"
 
 enum PCIExpressIndicator {
     /* for attention and power indicator */
@@ -52,6 +53,11 @@ struct PCIExpressDevice {
 
     /* FLR */
     pcie_flr_fn flr;
+
+    /* AER */
+    uint16_t aer_cap;
+    pcie_aer_errmsg_fn aer_errmsg;
+    PCIE_AERLog aer_log;
 };
 
 void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level);
diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
new file mode 100644
index 0000000..9e3f48e
--- /dev/null
+++ b/hw/pcie_aer.c
@@ -0,0 +1,796 @@
+/*
+ * pcie_aer.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "sysemu.h"
+#include "pci_bridge.h"
+#include "pcie.h"
+#include "msix.h"
+#include "msi.h"
+#include "pci_internals.h"
+#include "pcie_regs.h"
+
+//#define DEBUG_PCIE
+#ifdef DEBUG_PCIE
+# define PCIE_DPRINTF(fmt, ...)                                         \
+    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
+#else
+# define PCIE_DPRINTF(fmt, ...) do {} while (0)
+#endif
+#define PCIE_DEV_PRINTF(dev, fmt, ...)                                  \
+    PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
+
+static void pcie_aer_clear_error(PCIDevice *dev);
+static uint8_t pcie_aer_root_get_vector(PCIDevice *dev);
+static AER_ERR_MSG_RESULT
+pcie_aer_errmsg_alldev(PCIDevice *dev, const PCIE_AERErrMsg *msg);
+static AER_ERR_MSG_RESULT
+pcie_aer_errmsg_vbridge(PCIDevice *dev, const PCIE_AERErrMsg *msg);
+
+/* From 6.2.7 Error Listing and Rules. Table 6-2, 6-3 and 6-4 */
+static enum PCIE_AER_SEVERITY pcie_aer_uncor_default_severity(uint32_t status)
+{
+    switch (status) {
+    case PCI_ERR_UNC_INTN:
+    case PCI_ERR_UNC_DLP:
+    case PCI_ERR_UNC_SDN:
+    case PCI_ERR_UNC_RX_OVER:
+    case PCI_ERR_UNC_FCP:
+    case PCI_ERR_UNC_MALF_TLP:
+        return AER_ERR_FATAL;
+    case PCI_ERR_UNC_POISON_TLP:
+    case PCI_ERR_UNC_ECRC:
+    case PCI_ERR_UNC_UNSUP:
+    case PCI_ERR_UNC_COMP_TIME:
+    case PCI_ERR_UNC_COMP_ABORT:
+    case PCI_ERR_UNC_UNX_COMP:
+    case PCI_ERR_UNC_ACSV:
+    case PCI_ERR_UNC_MCBTLP:
+    case PCI_ERR_UNC_ATOP_EBLOCKED:
+    case PCI_ERR_UNC_TLP_PRF_BLOCKED:
+        return AER_ERR_NONFATAL;
+    default:
+        break;
+    }
+    abort();
+    return AER_ERR_FATAL;
+}
+
+static uint32_t pcie_aer_log_next(uint32_t i, uint32_t max)
+{
+    return (i + 1) % max;
+}
+
+static bool pcie_aer_log_empty_index(uint32_t producer, uint32_t consumer)
+{
+    return producer == consumer;
+}
+
+static bool pcie_aer_log_empty(PCIE_AERLog *aer_log)
+{
+    return pcie_aer_log_empty_index(aer_log->producer, aer_log->consumer);
+}
+
+static bool pcie_aer_log_full(PCIE_AERLog *aer_log)
+{
+    return pcie_aer_log_next(aer_log->producer, aer_log->log_max) ==
+        aer_log->consumer;
+}
+
+static uint32_t pcie_aer_log_add(PCIE_AERLog *aer_log)
+{
+    uint32_t i = aer_log->producer;
+    aer_log->producer = pcie_aer_log_next(aer_log->producer, aer_log->log_max);
+    return i;
+}
+
+static uint32_t pcie_aer_log_del(PCIE_AERLog *aer_log)
+{
+    uint32_t i = aer_log->consumer;
+    aer_log->consumer = pcie_aer_log_next(aer_log->consumer, aer_log->log_max);
+    return i;
+}
+
+static int pcie_aer_log_add_err(PCIE_AERLog *aer_log, const PCIE_AERErr *err)
+{
+    uint32_t i;
+    if (pcie_aer_log_full(aer_log)) {
+        return -1;
+    }
+    i = pcie_aer_log_add(aer_log);
+    memcpy(&aer_log->log[i], err, sizeof(*err));
+    return 0;
+}
+
+static const PCIE_AERErr* pcie_aer_log_del_err(PCIE_AERLog *aer_log)
+{
+    uint32_t i;
+    assert(!pcie_aer_log_empty(aer_log));
+    i = pcie_aer_log_del(aer_log);
+    return &aer_log->log[i];
+}
+
+static void pcie_aer_log_clear_all_err(PCIE_AERLog *aer_log)
+{
+    aer_log->producer = 0;
+    aer_log->consumer = 0;
+}
+
+void pcie_aer_init(PCIDevice *dev, uint16_t offset)
+{
+    PCIExpressDevice *exp;
+
+    pci_set_word(dev->wmask + PCI_COMMAND,
+                 pci_get_word(dev->wmask + PCI_COMMAND) | PCI_COMMAND_SERR);
+    pci_set_word(dev->w1cmask + PCI_STATUS,
+                 pci_get_word(dev->w1cmask + PCI_STATUS) |
+                 PCI_STATUS_SIG_SYSTEM_ERROR);
+
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,
+                        offset, PCI_ERR_SIZEOF);
+    exp = &dev->exp;
+    exp->aer_cap = offset;
+    if (dev->exp.aer_log.log_max == PCIE_AER_LOG_MAX_UNSET) {
+        dev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+    }
+    if (dev->exp.aer_log.log_max > PCIE_AER_LOG_MAX_MAX) {
+        dev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_MAX;
+    }
+    dev->exp.aer_log.log = qemu_mallocz(sizeof(dev->exp.aer_log.log[0]) *
+                                        dev->exp.aer_log.log_max);
+
+    /* On reset PCI_ERR_CAP_MHRE is disabled
+     * PCI_ERR_CAP_MHRE is RWS so that reset doesn't affect related
+     * registers
+     */
+    pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
+                 PCI_ERR_UNC_SUPPORTED);
+
+    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
+                 PCI_ERR_UNC_SUPPORTED);
+
+    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
+                 PCI_ERR_UNC_SEVERITY_DEFAULT);
+    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_SEVER,
+                 PCI_ERR_UNC_SUPPORTED);
+
+    pci_set_long(dev->w1cmask + offset + PCI_ERR_COR_STATUS,
+                 pci_get_long(dev->w1cmask + offset + PCI_ERR_COR_STATUS) |
+                 PCI_ERR_COR_STATUS);
+
+    pci_set_long(dev->config + offset + PCI_ERR_COR_MASK,
+                 PCI_ERR_COR_MASK_DEFAULT);
+    pci_set_long(dev->wmask + offset + PCI_ERR_COR_MASK,
+                 PCI_ERR_COR_SUPPORTED);
+
+    /* capabilities and control. multiple header logging is supported */
+    if (dev->exp.aer_log.log_max > 0) {
+        pci_set_long(dev->config + offset + PCI_ERR_CAP,
+                     PCI_ERR_CAP_ECRC_GENC | PCI_ERR_CAP_ECRC_CHKC |
+                     PCI_ERR_CAP_MHRC);
+        pci_set_long(dev->wmask + offset + PCI_ERR_CAP,
+                     PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE |
+                     PCI_ERR_CAP_MHRE);
+    } else {
+        pci_set_long(dev->config + offset + PCI_ERR_CAP,
+                     PCI_ERR_CAP_ECRC_GENC | PCI_ERR_CAP_ECRC_CHKC);
+        pci_set_long(dev->wmask + offset + PCI_ERR_CAP,
+                     PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+    }
+
+    switch (pcie_cap_get_type(dev)) {
+    case PCI_EXP_TYPE_ROOT_PORT:
+        /* this case will be set by pcie_aer_root_init() */
+        /* fallthrough */
+    case PCI_EXP_TYPE_DOWNSTREAM:
+    case PCI_EXP_TYPE_UPSTREAM:
+        pci_set_word(dev->wmask + PCI_BRIDGE_CONTROL,
+                     pci_get_word(dev->wmask + PCI_BRIDGE_CONTROL) |
+                     PCI_BRIDGE_CTL_SERR);
+        pci_set_long(dev->w1cmask + PCI_STATUS,
+                     pci_get_long(dev->w1cmask + PCI_STATUS) |
+                     PCI_SEC_STATUS_RCV_SYSTEM_ERROR);
+        exp->aer_errmsg = pcie_aer_errmsg_vbridge;
+        break;
+    default:
+        exp->aer_errmsg = pcie_aer_errmsg_alldev;
+        break;
+    }
+}
+
+void pcie_aer_exit(PCIDevice *dev)
+{
+    pci_del_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_SIZEOF);
+    qemu_free(dev->exp.aer_log.log);
+}
+
+/* Multiple Header recording isn't implemented. Is it wanted? */
+void pcie_aer_write_config(PCIDevice *dev,
+                           uint32_t addr, uint32_t val, int len,
+                           uint32_t uncorsta_prev)
+{
+    uint32_t pos = dev->exp.aer_cap;
+
+    /* uncorrectable */
+    if (ranges_overlap(addr, len, pos + PCI_ERR_UNCOR_STATUS, 4)) {
+        uint32_t written =
+            pci_shift_long(addr, val, pos + PCI_ERR_UNCOR_STATUS) &
+            PCI_ERR_UNC_SUPPORTED;
+        uint32_t errcap = pci_get_long(dev->config + pos + PCI_ERR_CAP);
+        uint32_t first_error = (1 << PCI_ERR_CAP_FEP(errcap));
+
+        if ((uncorsta_prev & first_error) && (written & first_error)) {
+            pcie_aer_clear_error(dev);
+        }
+    }
+
+    /* capability & control */
+    if (ranges_overlap(addr, len, pos + PCI_ERR_CAP, 4)) {
+        uint32_t err_cap = pci_get_long(dev->config + pos + PCI_ERR_CAP);
+        if (!(err_cap & PCI_ERR_CAP_MHRE)) {
+            pcie_aer_log_clear_all_err(&dev->exp.aer_log);
+            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS,
+                         PCI_ERR_UNC_SUPPORTED);
+        } else {
+            /* When multiple header recording is enabled, only the bit that
+             * first error pointer indicates is cleared.
+             * that is handled specifically.
+             */
+            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS, 0);
+        }
+    }
+}
+
+static inline void pcie_aer_errmsg(PCIDevice *dev, const PCIE_AERErrMsg *msg)
+{
+    assert(pci_is_express(dev));
+    assert(dev->exp.aer_errmsg);
+    dev->exp.aer_errmsg(dev, msg);
+}
+
+static AER_ERR_MSG_RESULT
+pcie_aer_errmsg_alldev(PCIDevice *dev, const PCIE_AERErrMsg *msg)
+{
+    uint16_t cmd = pci_get_word(dev->config + PCI_COMMAND);
+    bool transmit1 =
+        pcie_aer_err_msg_is_uncor(msg) && (cmd & PCI_COMMAND_SERR);
+    uint32_t pos = pci_pcie_cap(dev);
+    uint32_t devctl = pci_get_word(dev->config + pos + PCI_EXP_DEVCTL);
+    bool transmit2 = msg->severity & devctl;
+    PCIDevice *parent_port;
+
+    if (transmit1) {
+        if (pcie_aer_err_msg_is_uncor(msg)) {
+            /* Signaled System Error */
+            uint8_t *status = dev->config + PCI_STATUS;
+            pci_set_word(status,
+                         pci_get_word(status) | PCI_STATUS_SIG_SYSTEM_ERROR);
+        }
+    }
+
+    if (!(transmit1 || transmit2)) {
+        return AER_ERR_MSG_MASKED;
+    }
+
+    /* send up error message */
+    if (pci_is_express(dev) &&
+        pcie_cap_get_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
+        /* Root port notify system itself,
+           or send the error message to root complex event collector. */
+        /*
+         * if root port is associated to event collector, set
+         * parent_port = root complex event collector
+         * For now root complex event collector isn't supported.
+         */
+        parent_port = NULL;
+    } else {
+        parent_port = pci_bridge_get_device(dev->bus);
+    }
+    if (parent_port) {
+        if (!pci_is_express(parent_port)) {
+            /* What to do? */
+            return AER_ERR_MSG_MASKED;
+        }
+        pcie_aer_errmsg(parent_port, msg);
+    }
+    return AER_ERR_MSG_SENT;
+}
+
+static AER_ERR_MSG_RESULT
+pcie_aer_errmsg_vbridge(PCIDevice *dev, const PCIE_AERErrMsg *msg)
+{
+    uint16_t bridge_control = pci_get_word(dev->config + PCI_BRIDGE_CONTROL);
+
+    if (pcie_aer_err_msg_is_uncor(msg)) {
+        /* Received System Error */
+        uint8_t *sec_status = dev->config + PCI_SEC_STATUS;
+        pci_set_word(sec_status,
+                     pci_get_word(sec_status) |
+                     PCI_SEC_STATUS_RCV_SYSTEM_ERROR);
+    }
+
+    if (!(bridge_control & PCI_BRIDGE_CTL_SERR)) {
+        return AER_ERR_MSG_MASKED;
+    }
+    return pcie_aer_errmsg_alldev(dev, msg);
+}
+
+static AER_ERR_MSG_RESULT
+pcie_aer_errmsg_root_port(PCIDevice *dev, const PCIE_AERErrMsg *msg)
+{
+    AER_ERR_MSG_RESULT ret;
+    uint16_t cmd;
+    uint8_t *aer_cap;
+    uint32_t root_cmd;
+    uint32_t root_sta;
+    bool trigger;
+
+    ret = pcie_aer_errmsg_vbridge(dev, msg);
+    if (ret != AER_ERR_MSG_SENT) {
+        return ret;
+    }
+
+    ret = AER_ERR_MSG_MASKED;
+    cmd = pci_get_word(dev->config + PCI_COMMAND);
+    aer_cap = dev->config + pcie_aer_cap(dev);
+    root_cmd = pci_get_long(aer_cap + PCI_ERR_ROOT_COMMAND);
+    root_sta = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
+    trigger = false;
+
+    if (cmd & PCI_COMMAND_SERR) {
+        /* System Error. Platform Specific */
+        /* ret = AER_ERR_MSG_SENT; */
+    }
+
+    /* Errro Message Received: Root Error Status register */
+    switch (msg->severity) {
+    case AER_ERR_COR:
+        if (root_sta & PCI_ERR_ROOT_COR_RCV) {
+            root_sta |= PCI_ERR_ROOT_MULTI_COR_RCV;
+        } else {
+            if (root_cmd & PCI_ERR_ROOT_CMD_COR_EN) {
+                trigger = true;
+            }
+            pci_set_word(aer_cap + PCI_ERR_ROOT_COR_SRC, msg->source_id);
+        }
+        root_sta |= PCI_ERR_ROOT_COR_RCV;
+        break;
+    case AER_ERR_NONFATAL:
+        if (!(root_sta & PCI_ERR_ROOT_NONFATAL_RCV) &&
+            root_cmd & PCI_ERR_ROOT_CMD_NONFATAL_EN) {
+            trigger = true;
+        }
+        root_sta |= PCI_ERR_ROOT_NONFATAL_RCV;
+        break;
+    case AER_ERR_FATAL:
+        if (!(root_sta & PCI_ERR_ROOT_FATAL_RCV) &&
+            root_cmd & PCI_ERR_ROOT_CMD_FATAL_EN) {
+            trigger = true;
+        }
+        if (!(root_sta & PCI_ERR_ROOT_UNCOR_RCV)) {
+            root_sta |= PCI_ERR_ROOT_FIRST_FATAL;
+        }
+        root_sta |= PCI_ERR_ROOT_FATAL_RCV;
+        break;
+    }
+    if (pcie_aer_err_msg_is_uncor(msg)) {
+        if (root_sta & PCI_ERR_ROOT_UNCOR_RCV) {
+            root_sta |= PCI_ERR_ROOT_MULTI_UNCOR_RCV;
+        } else {
+            pci_set_word(aer_cap + PCI_ERR_ROOT_SRC, msg->source_id);
+        }
+        root_sta |= PCI_ERR_ROOT_UNCOR_RCV;
+    }
+    pci_set_long(aer_cap + PCI_ERR_ROOT_STATUS, root_sta);
+
+    if (root_cmd & msg->severity) {
+        /* Error Interrupt(INTx or MSI) */
+        pcie_notify(dev, pcie_aer_root_get_vector(dev), trigger, 1);
+        ret = AER_ERR_MSG_SENT;
+    }
+    return ret;
+}
+
+static void pcie_aer_update_log(PCIDevice *dev, const PCIE_AERErr *err)
+{
+    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
+    uint8_t first_bit = ffsl(err->status) - 1;
+    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
+    int i;
+    uint32_t dw;
+
+    errcap &= ~(PCI_ERR_CAP_FEP_MASK | PCI_ERR_CAP_TLP);
+    errcap |= PCI_ERR_CAP_FEP(first_bit);
+
+    if (err->flags & PCIE_AER_ERR_HEADER_VALID) {
+        for (i = 0; i < ARRAY_SIZE(err->header); ++i) {
+            /* 7.10.8 Header Log Register */
+            cpu_to_be32wu(&dw, err->header[i]);
+            memcpy(aer_cap + PCI_ERR_HEADER_LOG + sizeof(err->header[0]) * i,
+                   &dw, sizeof(dw));
+        }
+    } else {
+        assert(!(err->flags & PCIE_AER_ERR_TLP_PRESENT));
+        memset(aer_cap + PCI_ERR_HEADER_LOG, 0, sizeof(err->header));
+    }
+
+    if ((err->flags & PCIE_AER_ERR_TLP_PRESENT) &&
+        (pci_get_long(dev->config + pci_pcie_cap(dev) + PCI_EXP_DEVCTL2) &
+         PCI_EXP_DEVCAP2_EETLPP)) {
+        for (i = 0; i < ARRAY_SIZE(err->prefix); ++i) {
+            /* 7.10.12 tlp prefix log register */
+            cpu_to_be32wu(&dw, err->prefix[i]);
+            memcpy(aer_cap + PCI_ERR_TLP_PREFIX_LOG +
+                   sizeof(err->prefix[0]) * i, &dw, sizeof(dw));
+        }
+        errcap |= PCI_ERR_CAP_TLP;
+    } else {
+        memset(aer_cap + PCI_ERR_TLP_PREFIX_LOG, 0, sizeof(err->prefix));
+    }
+    pci_set_long(aer_cap + PCI_ERR_CAP, errcap);
+}
+
+static void pcie_aer_clear_log(PCIDevice *dev)
+{
+    PCIE_AERErr *err;
+    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
+    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
+
+    errcap &= ~(PCI_ERR_CAP_FEP_MASK | PCI_ERR_CAP_TLP);
+    pci_set_long(aer_cap + PCI_ERR_CAP, errcap);
+
+    memset(aer_cap + PCI_ERR_HEADER_LOG, 0, sizeof(err->header));
+    memset(aer_cap + PCI_ERR_TLP_PREFIX_LOG, 0, sizeof(err->prefix));
+}
+
+static int pcie_aer_record_error(PCIDevice *dev,
+                                 const PCIE_AERErr *err)
+{
+    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
+    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
+    int fep = PCI_ERR_CAP_FEP(errcap);
+
+    if (errcap & PCI_ERR_CAP_MHRE &&
+        (pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) & (1ULL << fep))) {
+        /*  Not first error. queue error */
+        if (pcie_aer_log_add_err(&dev->exp.aer_log, err) < 0) {
+            /* overflow */
+            return -1;
+        }
+        return 0;
+    }
+
+    pcie_aer_update_log(dev, err);
+    return 0;
+}
+
+static void pcie_aer_clear_error(PCIDevice *dev)
+{
+    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
+    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
+    uint32_t old_err = (1UL << PCI_ERR_CAP_FEP(errcap));
+    PCIE_AERLog *aer_log = &dev->exp.aer_log;
+    const PCIE_AERErr *err;
+    uint32_t consumer;
+
+    if (!(errcap & PCI_ERR_CAP_MHRE) || pcie_aer_log_empty(aer_log)) {
+        pcie_aer_clear_log(dev);
+        pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
+                     pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) & ~old_err);
+        return;
+    }
+
+    /* if no same error is queued, clear bit in uncorrectable error status */
+    for (consumer = dev->exp.aer_log.consumer;
+         !pcie_aer_log_empty_index(dev->exp.aer_log.producer, consumer);
+         consumer = pcie_aer_log_next(consumer, dev->exp.aer_log.log_max)) {
+        if (dev->exp.aer_log.log[consumer].status & old_err) {
+            old_err = 0;
+            break;
+        }
+    }
+    if (old_err) {
+        pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
+                     pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) & ~old_err);
+    }
+
+    err = pcie_aer_log_del_err(aer_log);
+    pcie_aer_update_log(dev, err);
+}
+
+/*
+ * non-Function specific error must be recorded in all functions.
+ * It is the responsibility of the caller of this function.
+ * It is also caller's responsiblity to determine which function should
+ * report the rerror.
+ *
+ * 6.2.4 Error Logging
+ * 6.2.5 Sqeucne of Device Error Signaling and Logging Operations
+ * table 6-2: Flowchard Showing Sequence of Device Error Signaling and Logging
+ *            Operations
+ *
+ * Although this implementation can be shortened/optimized, this is kept
+ * parallel to table 6-2.
+ */
+void pcie_aer_inject_error(PCIDevice *dev, const PCIE_AERErr *err)
+{
+    uint8_t *exp_cap;
+    uint8_t *aer_cap = NULL;
+    uint32_t devctl = 0;
+    uint32_t devsta = 0;
+    uint32_t status = err->status;
+    uint32_t mask;
+    bool is_unsupported_request =
+        (!(err->flags & PCIE_AER_ERR_IS_CORRECTABLE) &&
+         err->status == PCI_ERR_UNC_UNSUP);
+    bool is_advisory_nonfatal = false;  /* for advisory non-fatal error */
+    uint32_t uncor_status = 0;          /* for advisory non-fatal error */
+    PCIE_AERErrMsg msg;
+    int is_header_log_overflowed = 0;
+
+    if (!pci_is_express(dev)) {
+        /* What to do? */
+        return;
+    }
+
+    if (err->flags & PCIE_AER_ERR_IS_CORRECTABLE) {
+        status &= PCI_ERR_COR_SUPPORTED;
+    } else {
+        status &= PCI_ERR_UNC_SUPPORTED;
+    }
+    if (!status || status & (status - 1)) {
+        /* invalid status bit. one and only one bit must be set */
+        return;
+    }
+
+    exp_cap = dev->config + pci_pcie_cap(dev);
+    if (dev->exp.aer_cap) {
+        aer_cap = dev->config + pcie_aer_cap(dev);
+        devctl = pci_get_long(exp_cap + PCI_EXP_DEVCTL);
+        devsta = pci_get_long(exp_cap + PCI_EXP_DEVSTA);
+    }
+    if (err->flags & PCIE_AER_ERR_IS_CORRECTABLE) {
+    correctable_error:
+        devsta |= PCI_EXP_DEVSTA_CED;
+        if (is_unsupported_request) {
+            devsta |= PCI_EXP_DEVSTA_URD;
+        }
+        pci_set_word(exp_cap + PCI_EXP_DEVSTA, devsta);
+
+        if (aer_cap) {
+            pci_set_long(aer_cap + PCI_ERR_COR_STATUS,
+                         pci_get_long(aer_cap + PCI_ERR_COR_STATUS) | status);
+            mask = pci_get_long(aer_cap + PCI_ERR_COR_MASK);
+            if (mask & status) {
+                return;
+            }
+            if (is_advisory_nonfatal) {
+                uint32_t uncor_mask =
+                    pci_get_long(aer_cap + PCI_ERR_UNCOR_MASK);
+                if (!(uncor_mask & uncor_status)) {
+                    is_header_log_overflowed = pcie_aer_record_error(dev, err);
+                }
+                pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
+                             pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) |
+                             uncor_status);
+            }
+        }
+
+        if (is_unsupported_request && !(devctl & PCI_EXP_DEVCTL_URRE)) {
+            return;
+        }
+        if (!(devctl & PCI_EXP_DEVCTL_CERE)) {
+            return;
+        }
+        msg.severity = AER_ERR_COR;
+    } else {
+        bool is_fatal =
+            (pcie_aer_uncor_default_severity(status) == AER_ERR_FATAL);
+        uint16_t cmd;
+
+        if (aer_cap) {
+            is_fatal = status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+        }
+        if (!is_fatal && (err->flags & PCIE_AER_ERR_MAYBE_ADVISORY)) {
+            is_advisory_nonfatal = true;
+            uncor_status = status;
+            status = PCI_ERR_COR_ADV_NONFATAL;
+            goto correctable_error;
+        }
+        if (is_fatal) {
+            devsta |= PCI_EXP_DEVSTA_FED;
+        } else {
+            devsta |= PCI_EXP_DEVSTA_NFED;
+        }
+        if (is_unsupported_request) {
+            devsta |= PCI_EXP_DEVSTA_URD;
+        }
+        pci_set_long(exp_cap + PCI_EXP_DEVSTA, devsta);
+
+        if (aer_cap) {
+            mask = pci_get_long(aer_cap + PCI_ERR_UNCOR_MASK);
+            if (mask & status) {
+                pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
+                             pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) |
+                             status);
+                return;
+            }
+
+            is_header_log_overflowed = pcie_aer_record_error(dev, err);
+            pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
+                         pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) |
+                         status);
+        }
+
+        cmd = pci_get_word(dev->config + PCI_COMMAND);
+        if (is_unsupported_request &&
+            !(devctl & PCI_EXP_DEVCTL_URRE) && !(cmd & PCI_COMMAND_SERR)) {
+            return;
+        }
+        if (is_fatal) {
+            if (!((cmd & PCI_COMMAND_SERR) ||
+                  (devctl & PCI_EXP_DEVCTL_FERE))) {
+                return;
+            }
+            msg.severity = AER_ERR_FATAL;
+        } else {
+            if (!((cmd & PCI_COMMAND_SERR) ||
+                  (devctl & PCI_EXP_DEVCTL_NFERE))) {
+                return;
+            }
+            msg.severity = AER_ERR_NONFATAL;
+        }
+    }
+
+    /* send up error message */
+    msg.source_id = err->source_id;
+    pcie_aer_errmsg(dev, &msg);
+
+    if (is_header_log_overflowed) {
+        PCIE_AERErr header_log_overflow = {
+            .status = PCI_ERR_COR_HL_OVERFLOW,
+            .flags = PCIE_AER_ERR_IS_CORRECTABLE,
+            .header = {0, 0, 0, 0},
+            .prefix = {0, 0, 0, 0},
+        };
+        pcie_aer_inject_error(dev, &header_log_overflow);
+    }
+}
+
+void pcie_aer_root_set_vector(PCIDevice *dev, uint8_t vector)
+{
+    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
+    uint32_t root_status = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
+    root_status &= ~PCI_ERR_ROOT_IRQ;
+    root_status |=
+        (((uint32_t)vector) << PCI_ERR_ROOT_IRQ_SHIFT) & PCI_ERR_ROOT_IRQ;
+    pci_set_long(aer_cap + PCI_ERR_ROOT_STATUS, root_status);
+}
+
+static uint8_t pcie_aer_root_get_vector(PCIDevice *dev)
+{
+    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
+    uint32_t root_status = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
+    return (root_status & PCI_ERR_ROOT_IRQ) >> PCI_ERR_ROOT_IRQ_SHIFT;
+}
+
+void pcie_aer_root_init(PCIDevice *dev)
+{
+    uint16_t pos = pcie_aer_cap(dev);
+
+    pci_set_long(dev->wmask + pos + PCI_ERR_ROOT_COMMAND,
+                 PCI_ERR_ROOT_CMD_EN_MASK);
+    pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
+                 PCI_ERR_ROOT_STATUS_REPORT_MASK);
+    dev->exp.aer_errmsg = pcie_aer_errmsg_root_port;
+}
+
+void pcie_aer_root_reset(PCIDevice *dev)
+{
+    uint8_t* aer_cap = dev->config + pcie_aer_cap(dev);
+
+    pci_set_long(aer_cap + PCI_ERR_ROOT_COMMAND, 0);
+
+    /*
+     * Advanced Error Interrupt Message Number in Root Error Status Register
+     * must be updated by chip dependent code.
+     */
+}
+
+static bool pcie_aer_root_does_trigger(uint32_t cmd, uint32_t sta)
+{
+    return
+        ((cmd & PCI_ERR_ROOT_CMD_COR_EN) && (sta & PCI_ERR_ROOT_COR_RCV)) ||
+        ((cmd & PCI_ERR_ROOT_CMD_NONFATAL_EN) &&
+         (sta & PCI_ERR_ROOT_NONFATAL_RCV)) ||
+        ((cmd & PCI_ERR_ROOT_CMD_FATAL_EN) && (sta & PCI_ERR_ROOT_FATAL_RCV));
+}
+
+void pcie_aer_root_write_config(PCIDevice *dev,
+                                uint32_t addr, uint32_t val, int len,
+                                uint32_t root_cmd_prev)
+{
+    uint16_t pos = pcie_aer_cap(dev);
+    uint8_t *aer_cap = dev->config + pos;
+    uint32_t root_status;
+
+    /* root command */
+    if (ranges_overlap(addr, len, pos + PCI_ERR_ROOT_COMMAND, 4)) {
+        uint32_t root_cmd = pci_get_long(aer_cap + PCI_ERR_ROOT_COMMAND);
+        if (root_cmd & PCI_ERR_ROOT_CMD_EN_MASK) {
+            bool trigger;
+            int level;
+            uint32_t root_cmd_set = (root_cmd_prev ^ root_cmd) & root_cmd;
+
+            /* 0 -> 1 */
+            root_status = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
+            if (pcie_aer_root_does_trigger(root_cmd_set, root_status)) {
+                trigger = true;
+            } else {
+                trigger = false;
+            }
+            if (pcie_aer_root_does_trigger(root_cmd, root_status)) {
+                level = 1;
+            } else {
+                level = 0;
+            }
+            pcie_notify(dev, pcie_aer_root_get_vector(dev), trigger, level);
+        }
+    }
+}
+
+static const VMStateDescription vmstate_pcie_aer_err = {
+    .name = "PCIE_AER_ERROR",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields     = (VMStateField[]) {
+        VMSTATE_UINT32(status, PCIE_AERErr),
+        VMSTATE_UINT16(source_id, PCIE_AERErr),
+        VMSTATE_UINT16(flags, PCIE_AERErr),
+        VMSTATE_UINT32_ARRAY(header, PCIE_AERErr, 4),
+        VMSTATE_UINT32_ARRAY(prefix, PCIE_AERErr, 4),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+#define VMSTATE_PCIE_AER_ERRS(_field, _state, _field_num, _vmsd, _type) { \
+    .name       = (stringify(_field)),                                    \
+    .version_id = 0,                                                      \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),     \
+    .size       = sizeof(_type),                                          \
+    .vmsd       = &(_vmsd),                                               \
+    .flags      = VMS_POINTER | VMS_VARRAY_UINT16 | VMS_STRUCT,           \
+    .offset     = vmstate_offset_pointer(_state, _field, _type),          \
+}
+
+const VMStateDescription vmstate_pcie_aer_log = {
+    .name = "PCIE_AER_ERROR_LOG",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields     = (VMStateField[]) {
+        VMSTATE_UINT32(producer, PCIE_AERLog),
+        VMSTATE_UINT32(consumer, PCIE_AERLog),
+        VMSTATE_UINT16(log_max, PCIE_AERLog),
+        VMSTATE_PCIE_AER_ERRS(log, PCIE_AERLog, log_max,
+                              vmstate_pcie_aer_err, PCIE_AERErr),
+        VMSTATE_END_OF_LIST()
+    }
+};
diff --git a/hw/pcie_aer.h b/hw/pcie_aer.h
new file mode 100644
index 0000000..5a72bee
--- /dev/null
+++ b/hw/pcie_aer.h
@@ -0,0 +1,105 @@
+/*
+ * pcie_aer.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_PCIE_AER_H
+#define QEMU_PCIE_AER_H
+
+#include "hw.h"
+
+/* definitions which PCIExpressDevice uses */
+enum AER_ERR_MSG_RESULT {
+    AER_ERR_MSG_MASKED,
+    AER_ERR_MSG_SENT,
+};
+typedef enum AER_ERR_MSG_RESULT AER_ERR_MSG_RESULT;
+typedef AER_ERR_MSG_RESULT (*pcie_aer_errmsg_fn)(PCIDevice *dev, const PCIE_AERErrMsg *msg);
+
+/* AER log */
+struct PCIE_AERLog {
+    uint32_t producer;
+    uint32_t consumer;
+
+#define PCIE_AER_LOG_MAX_DEFAULT        8
+#define PCIE_AER_LOG_MAX_MAX            128 /* what is appropriate? */
+#define PCIE_AER_LOG_MAX_UNSET          (~(uint16_t)0)
+    uint16_t log_max;
+
+    PCIE_AERErr *log;
+};
+
+/* aer error severity */
+enum PCIE_AER_SEVERITY {
+    /* those value are same as
+     * Root error command register in aer extended cap and
+     * root control register in pci express cap.
+     */
+    AER_ERR_COR         = 0x1,
+    AER_ERR_NONFATAL    = 0x2,
+    AER_ERR_FATAL       = 0x4,
+};
+
+/* aer error message: error signaling message has only error sevirity and
+   source id. See 2.2.8.3 error signaling messages */
+struct PCIE_AERErrMsg {
+    enum PCIE_AER_SEVERITY severity;
+    uint16_t source_id; /* bdf */
+};
+
+static inline bool
+pcie_aer_err_msg_is_uncor(const PCIE_AERErrMsg *msg)
+{
+    return msg->severity == AER_ERR_NONFATAL || msg->severity == AER_ERR_FATAL;
+}
+
+/* error */
+struct PCIE_AERErr {
+    uint32_t status;    /* error status bits */
+    uint16_t source_id; /* bdf */
+
+#define PCIE_AER_ERR_IS_CORRECTABLE     0x1     /* correctable/uncorrectable */
+#define PCIE_AER_ERR_MAYBE_ADVISORY     0x2     /* maybe advisory non-fatal */
+#define PCIE_AER_ERR_HEADER_VALID       0x4     /* TLP header is logged */
+#define PCIE_AER_ERR_TLP_PRESENT        0x8     /* TLP Prefix is logged */
+    uint16_t flags;
+
+    uint32_t header[4]; /* TLP header */
+    uint32_t prefix[4]; /* TLP header prefix */
+};
+
+extern const VMStateDescription vmstate_pcie_aer_log;
+
+void pcie_aer_init(PCIDevice *dev, uint16_t offset);
+void pcie_aer_exit(PCIDevice *dev);
+void pcie_aer_write_config(PCIDevice *dev,
+                           uint32_t addr, uint32_t val, int len,
+                           uint32_t uncorsta_prev);
+
+/* aer root port */
+void pcie_aer_root_set_vector(PCIDevice *dev, uint8_t vector);
+void pcie_aer_root_init(PCIDevice *dev);
+void pcie_aer_root_reset(PCIDevice *dev);
+void pcie_aer_root_write_config(PCIDevice *dev,
+                                uint32_t addr, uint32_t val, int len,
+                                uint32_t root_cmd_prev);
+
+/* error injection */
+void pcie_aer_inject_error(PCIDevice *dev, const PCIE_AERErr *err);
+
+#endif /* QEMU_PCIE_AER_H */
diff --git a/qemu-common.h b/qemu-common.h
index 6d9ee26..fee772e 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -221,6 +221,9 @@ typedef struct PCIBus PCIBus;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIExpressDevice PCIExpressDevice;
 typedef struct PCIBridge PCIBridge;
+typedef struct PCIE_AERErrMsg PCIE_AERErrMsg;
+typedef struct PCIE_AERLog PCIE_AERLog;
+typedef struct PCIE_AERErr PCIE_AERErr;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
 typedef struct PCMCIACardState PCMCIACardState;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 07/13] pcie port: define struct PCIEPort/PCIESlot and helper functions
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (5 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 08/13] pcie root port: implement pcie root port Isaku Yamahata
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

define struct PCIEPort which represents common part
of pci express port.(root, upstream and downstream.)
add a helper function for pcie port which can be used commonly by
root/upstream/downstream port.
define struct PCIESlot which represents common part of
pcie slot.(root and downstream.) and helper functions for it.
helper functions for chassis, slot -> PCIESlot conversion.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- static'fy chassis.
- compilation adjustment.
---
 Makefile.objs  |    2 +-
 hw/pcie_port.c |  106 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie_port.h |   51 +++++++++++++++++++++++++++
 qemu-common.h  |    2 +
 4 files changed, 160 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_port.c
 create mode 100644 hw/pcie_port.h

diff --git a/Makefile.objs b/Makefile.objs
index 68bcc48..6c3b84a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += pcie.o pcie_aer.o
+hw-obj-y += pcie.o pcie_aer.o pcie_port.o
 hw-obj-y += msix.o msi.o
 
 # PCI network cards
diff --git a/hw/pcie_port.c b/hw/pcie_port.c
new file mode 100644
index 0000000..e7c3cef
--- /dev/null
+++ b/hw/pcie_port.c
@@ -0,0 +1,106 @@
+/*
+ * pcie_port.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "pcie_port.h"
+
+void pcie_port_init_reg(PCIDevice *d)
+{
+    /* Unlike pci bridge,
+       66MHz and fast back to back don't apply to pci express port. */
+    pci_set_word(d->config + PCI_STATUS, 0);
+    pci_set_word(d->config + PCI_SEC_STATUS, 0);
+}
+
+/**************************************************************************
+ * (chassis number, pcie physical slot number) -> pcie slot conversion
+ */
+struct PCIEChassis {
+    uint8_t     number;
+
+    QLIST_HEAD(, PCIESlot) slots;
+    QLIST_ENTRY(PCIEChassis) next;
+};
+
+static QLIST_HEAD(, PCIEChassis) chassis = QLIST_HEAD_INITIALIZER(chassis);
+
+static struct PCIEChassis *pcie_chassis_find(uint8_t chassis_number)
+{
+    struct PCIEChassis *c;
+    QLIST_FOREACH(c, &chassis, next) {
+        if (c->number == chassis_number) {
+            break;
+        }
+    }
+    return c;
+}
+
+void pcie_chassis_create(uint8_t chassis_number)
+{
+    struct PCIEChassis *c;
+    c = pcie_chassis_find(chassis_number);
+    if (c) {
+        return;
+    }
+    c = qemu_mallocz(sizeof(*c));
+    c->number = chassis_number;
+    QLIST_INIT(&c->slots);
+    QLIST_INSERT_HEAD(&chassis, c, next);
+}
+
+static PCIESlot *pcie_chassis_find_slot_with_chassis(struct PCIEChassis *c,
+                                                     uint8_t slot)
+{
+    PCIESlot *s;
+    QLIST_FOREACH(s, &c->slots, next) {
+        if (s->slot == slot) {
+            break;
+        }
+    }
+    return s;
+}
+
+PCIESlot *pcie_chassis_find_slot(uint8_t chassis_number, uint16_t slot)
+{
+    struct PCIEChassis *c;
+    c = pcie_chassis_find(chassis_number);
+    if (!c) {
+        return NULL;
+    }
+    return pcie_chassis_find_slot_with_chassis(c, slot);
+}
+
+int pcie_chassis_add_slot(struct PCIESlot *slot)
+{
+    struct PCIEChassis *c;
+    c = pcie_chassis_find(slot->chassis);
+    if (!c) {
+        return -ENODEV;
+    }
+    if (pcie_chassis_find_slot_with_chassis(c, slot->slot)) {
+        return -EBUSY;
+    }
+    QLIST_INSERT_HEAD(&c->slots, slot, next);
+    return 0;
+}
+
+void pcie_chassis_del_slot(PCIESlot *s)
+{
+    QLIST_REMOVE(s, next);
+}
diff --git a/hw/pcie_port.h b/hw/pcie_port.h
new file mode 100644
index 0000000..3709583
--- /dev/null
+++ b/hw/pcie_port.h
@@ -0,0 +1,51 @@
+/*
+ * pcie_port.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_PCIE_PORT_H
+#define QEMU_PCIE_PORT_H
+
+#include "pci_bridge.h"
+#include "pci_internals.h"
+
+struct PCIEPort {
+    PCIBridge   br;
+
+    /* pci express switch port */
+    uint8_t     port;
+};
+
+void pcie_port_init_reg(PCIDevice *d);
+
+struct PCIESlot {
+    PCIEPort    port;
+
+    /* pci express switch port with slot */
+    uint8_t     chassis;
+    uint16_t    slot;
+    QLIST_ENTRY(PCIESlot) next;
+};
+
+void pcie_chassis_create(uint8_t chassis_number);
+void pcie_main_chassis_create(void);
+PCIESlot *pcie_chassis_find_slot(uint8_t chassis, uint16_t slot);
+int pcie_chassis_add_slot(struct PCIESlot *slot);
+void pcie_chassis_del_slot(PCIESlot *s);
+
+#endif /* QEMU_PCIE_PORT_H */
diff --git a/qemu-common.h b/qemu-common.h
index fee772e..e3c4acb 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -224,6 +224,8 @@ typedef struct PCIBridge PCIBridge;
 typedef struct PCIE_AERErrMsg PCIE_AERErrMsg;
 typedef struct PCIE_AERLog PCIE_AERLog;
 typedef struct PCIE_AERErr PCIE_AERErr;
+typedef struct PCIEPort PCIEPort;
+typedef struct PCIESlot PCIESlot;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
 typedef struct PCMCIACardState PCMCIACardState;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (6 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 07/13] pcie port: define struct PCIEPort/PCIESlot and helper functions Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-22 11:25   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 09/13] pcie upstream port: pci express switch upstream port Isaku Yamahata
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

pcie root port.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- compilation adjustment.
---
 Makefile.objs  |    2 +-
 hw/pcie_root.c |  240 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie_root.h |   32 ++++++++
 3 files changed, 273 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_root.c
 create mode 100644 hw/pcie_root.h

diff --git a/Makefile.objs b/Makefile.objs
index 6c3b84a..7e81b57 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += pcie.o pcie_aer.o pcie_port.o
+hw-obj-y += pcie.o pcie_aer.o pcie_port.o pcie_root.o
 hw-obj-y += msix.o msi.o
 
 # PCI network cards
diff --git a/hw/pcie_root.c b/hw/pcie_root.c
new file mode 100644
index 0000000..9255bed
--- /dev/null
+++ b/hw/pcie_root.c
@@ -0,0 +1,240 @@
+/*
+ * pcie_root.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "pci_ids.h"
+#include "msi.h"
+#include "pcie.h"
+#include "pcie_root.h"
+
+/* For now, Intel X58 IOH corporate deice exporess* root port.
+   need to get its own id? */
+#define PCI_DEVICE_ID_IOH_EPORT         0x3420  /* D0:F0 express mode */
+#define PCI_DEVICE_ID_IOH_REV           0x2
+#define IOH_EP_SSVID_OFFSET             0x40
+#define IOH_EP_SSVID_SVID               PCI_VENDOR_ID_INTEL
+#define IOH_EP_SSVID_SSID               0
+#define IOH_EP_MSI_OFFSET               0x60
+#define IOH_EP_MSI_SUPPORTED_FLAGS      PCI_MSI_FLAGS_MASKBIT
+#define IOH_EP_MSI_NR_VECTOR            2
+#define IOH_EP_EXP_OFFSET               0x90
+#define IOH_EP_AER_OFFSET               0x100
+
+#define PCIE_ROOT_VID                   PCI_VENDOR_ID_INTEL
+#define PCIE_ROOT_DID                   PCI_DEVICE_ID_IOH_EPORT
+#define PCIE_ROOT_REV                   PCI_DEVICE_ID_IOH_REV
+#define PCIE_ROOT_SSVID_OFFSET          IOH_EP_SSVID_OFFSET
+#define PCIE_ROOT_SVID                  IOH_EP_SSVID_SVID
+#define PCIE_ROOT_SSID                  IOH_EP_SSVID_SSID
+#define PCIE_ROOT_MSI_SUPPORTED_FLAGS   IOH_EP_MSI_SUPPORTED_FLAGS
+#define PCIE_ROOT_MSI_NR_VECTOR         IOH_EP_MSI_NR_VECTOR
+#define PCIE_ROOT_MSI_OFFSET            IOH_EP_MSI_OFFSET
+#define PCIE_ROOT_EXP_OFFSET            IOH_EP_EXP_OFFSET
+#define PCIE_ROOT_AER_OFFSET            IOH_EP_AER_OFFSET
+
+/*
+ * If two MSI vector are allocated, Advanced Error Interrupt Message Number
+ * is 1. otherwise 0.
+ * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
+ */
+static uint8_t pcie_root_aer_vector(const PCIDevice *d)
+{
+    switch (msi_nr_vectors_allocated(d)) {
+    case 1:
+        return 0;
+    case 2:
+        return 1;
+    case 4:
+    case 8:
+    case 16:
+    case 32:
+    default:
+        break;
+    }
+    abort();
+    return 0;
+}
+
+static void pcie_root_aer_vector_update(PCIDevice *d)
+{
+    pcie_aer_root_set_vector(d, pcie_root_aer_vector(d));
+}
+
+static void pcie_root_write_config(PCIDevice *d,
+                                   uint32_t address, uint32_t val, int len)
+{
+    uint16_t sltctl =
+        pci_get_word(d->config + pci_pcie_cap(d) + PCI_EXP_SLTCTL);
+    uint32_t uncorsta =
+        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_UNCOR_STATUS);
+    uint32_t root_cmd =
+        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_ROOT_COMMAND);
+
+    pci_bridge_write_config(d, address, val, len);
+    msi_write_config(d, address, val, len);
+    pcie_root_aer_vector_update(d);
+    pcie_cap_slot_write_config(d, address, val, len, sltctl);
+    pcie_aer_write_config(d, address, val, len, uncorsta);
+    pcie_aer_root_write_config(d, address, val, len, root_cmd);
+}
+
+static void pcie_root_reset(DeviceState *qdev)
+{
+    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+    msi_reset(d);
+    pcie_root_aer_vector_update(d);
+    pcie_cap_root_reset(d);
+    pcie_cap_deverr_reset(d);
+    pcie_cap_slot_reset(d);
+    pcie_aer_root_reset(d);
+    pci_bridge_reset(qdev);
+}
+
+static int pcie_root_initfn(PCIDevice *d)
+{
+    PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
+    PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
+    PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
+    int rc;
+
+    rc = pci_bridge_initfn(d);
+    if (rc < 0) {
+        return rc;
+    }
+
+    d->config[PCI_REVISION_ID] = PCIE_ROOT_REV;
+    pcie_port_init_reg(d);
+
+    pci_config_set_vendor_id(d->config, PCIE_ROOT_VID);
+    pci_config_set_device_id(d->config, PCIE_ROOT_DID);
+
+    rc = pci_bridge_ssvid_init(d, PCIE_ROOT_SSVID_OFFSET,
+                               PCIE_ROOT_SVID, PCIE_ROOT_SSID);
+    if (rc < 0) {
+        return rc;
+    }
+    rc = msi_init(d, PCIE_ROOT_MSI_OFFSET, PCIE_ROOT_MSI_NR_VECTOR,
+                  PCIE_ROOT_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
+                  PCIE_ROOT_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
+    if (rc < 0) {
+        return rc;
+    }
+    rc = pcie_cap_init(d, PCIE_ROOT_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT,
+                       p->port);
+    if (rc < 0) {
+        return rc;
+    }
+    pcie_cap_deverr_init(d);
+    pcie_cap_slot_init(d, s->slot);
+    pcie_chassis_create(s->chassis);
+    rc = pcie_chassis_add_slot(s);
+    if (rc < 0) {
+        return rc;
+    }
+    pcie_cap_root_init(d);
+    pcie_aer_init(d, PCIE_ROOT_AER_OFFSET);
+    pcie_aer_root_init(d);
+    pcie_root_aer_vector_update(d);
+    return 0;
+}
+
+static int pcie_root_exitfn(PCIDevice *d)
+{
+    pcie_aer_exit(d);
+    msi_uninit(d);
+    pcie_cap_exit(d);
+    return pci_bridge_exitfn(d);
+}
+
+PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
+                         const char *bus_name, pci_map_irq_fn map_irq,
+                         uint8_t port, uint8_t chassis, uint16_t slot)
+{
+    PCIDevice *d;
+    PCIBridge *br;
+    DeviceState *qdev;
+
+    d = pci_create_multifunction(bus, devfn, multifunction, PCIE_ROOT_PORT);
+    if (!d) {
+        return NULL;
+    }
+    br = DO_UPCAST(PCIBridge, dev, d);
+
+    qdev = &br->dev.qdev;
+    pci_bridge_map_irq(br, bus_name, map_irq);
+    qdev_prop_set_uint8(qdev, "port", port);
+    qdev_prop_set_uint8(qdev, "chassis", chassis);
+    qdev_prop_set_uint16(qdev, "slot", slot);
+    qdev_init_nofail(qdev);
+
+    return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br));
+}
+
+static const VMStateDescription vmstate_pcie_root = {
+    .name = "pcie-root-port",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCIE_DEVICE(port.br.dev, PCIESlot),
+        VMSTATE_STRUCT(port.br.dev.exp.aer_log, PCIESlot, 0,
+                       vmstate_pcie_aer_log, PCIE_AERLog),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo pcie_root_info = {
+    .qdev.name = PCIE_ROOT_PORT,
+    .qdev.desc = "Root Port of PCI Express Switch",
+    .qdev.size = sizeof(PCIESlot),
+    .qdev.reset = pcie_root_reset,
+    .qdev.vmsd = &vmstate_pcie_root,
+
+    .is_express = 1,
+    .is_bridge = 1,
+    .config_write = pcie_root_write_config,
+    .init = pcie_root_initfn,
+    .exit = pcie_root_exitfn,
+
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT8("port", PCIESlot, port.port, 0),
+        DEFINE_PROP_UINT8("chassis", PCIESlot, chassis, 0),
+        DEFINE_PROP_UINT16("slot", PCIESlot, slot, 0),
+        DEFINE_PROP_UINT16("aer_log_max", PCIESlot,
+                           port.br.dev.exp.aer_log.log_max,
+                           PCIE_AER_LOG_MAX_DEFAULT),
+        DEFINE_PROP_END_OF_LIST(),
+    }
+};
+
+static void pcie_root_register(void)
+{
+    pci_qdev_register(&pcie_root_info);
+}
+
+device_init(pcie_root_register);
+
+/*
+ * Local variables:
+ *  c-indent-level: 4
+ *  c-basic-offset: 4
+ *  tab-width: 8
+ *  indent-tab-mode: nil
+ * End:
+ */
diff --git a/hw/pcie_root.h b/hw/pcie_root.h
new file mode 100644
index 0000000..9c5d4d0
--- /dev/null
+++ b/hw/pcie_root.h
@@ -0,0 +1,32 @@
+/*
+ * pcie_root.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_PCIE_ROOT_H
+#define QEMU_PCIE_ROOT_H
+
+#include "pcie_port.h"
+
+#define PCIE_ROOT_PORT    "pcie-root-port"
+
+PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
+                         const char *bus_name, pci_map_irq_fn map_irq,
+                         uint8_t port, uint8_t chassis, uint16_t slot);
+
+#endif /* QEMU_PCIE_ROOT_H */
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 09/13] pcie upstream port: pci express switch upstream port.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (7 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 08/13] pcie root port: implement pcie root port Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-22 11:22   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 10/13] pcie downstream port: pci express switch downstream port Isaku Yamahata
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

pci express switch upstream port.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- compilation adjustment.
---
 Makefile.objs      |    2 +-
 hw/pcie_upstream.c |  200 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie_upstream.h |   32 ++++++++
 3 files changed, 233 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_upstream.c
 create mode 100644 hw/pcie_upstream.h

diff --git a/Makefile.objs b/Makefile.objs
index 7e81b57..72ca8be 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -139,7 +139,7 @@ user-obj-y += cutils.o cache-utils.o
 hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-y += virtio.o virtio-console.o
-hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o
+hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o pcie_upstream.o
 hw-obj-y += watchdog.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
diff --git a/hw/pcie_upstream.c b/hw/pcie_upstream.c
new file mode 100644
index 0000000..a08fce1
--- /dev/null
+++ b/hw/pcie_upstream.c
@@ -0,0 +1,200 @@
+/*
+ * pcie_upstream.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "pci_ids.h"
+#include "msi.h"
+#include "pcie.h"
+#include "pcie_upstream.h"
+
+/* For now, TI XIO3130 is borrowed. need to get its own id? */
+#define PCI_DEVICE_ID_TI_XIO3130U       0x8232  /* upstream port */
+#define XIO3130_REVISION                0x2
+#define XIO3130_MSI_OFFSET              0x70
+#define XIO3130_MSI_SUPPORTED_FLAGS     PCI_MSI_FLAGS_64BIT
+#define XIO3130_MSI_NR_VECTOR           1
+#define XIO3130_SSVID_OFFSET            0x80
+#define XIO3130_SSVID_SVID              0
+#define XIO3130_SSVID_SSID              0
+#define XIO3130_EXP_OFFSET              0x90
+#define XIO3130_AER_OFFSET              0x100
+
+#define PCIE_UPSTREAM_VID               PCI_VENDOR_ID_TI
+#define PCIE_UPSTREAM_DID               PCI_DEVICE_ID_TI_XIO3130U
+#define PCIE_UPSTREAM_REVISION          XIO3130_REVISION
+#define PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS       XIO3130_MSI_SUPPORTED_FLAGS
+#define PCIE_UPSTREAM_MSI_NR_VECTOR     XIO3130_MSI_NR_VECTOR
+#define PCIE_UPSTREAM_MSI_OFFSET        XIO3130_MSI_OFFSET
+#define PCIE_UPSTREAM_SSVID_OFFSET      XIO3130_SSVID_OFFSET
+#define PCIE_UPSTREAM_SVID              XIO3130_SSVID_SVID
+#define PCIE_UPSTREAM_SSID              XIO3130_SSVID_SSID
+#define PCIE_UPSTREAM_EXP_OFFSET        XIO3130_EXP_OFFSET
+#define PCIE_UPSTREAM_AER_OFFSET        XIO3130_AER_OFFSET
+
+static void pcie_upstream_write_config(PCIDevice *d,
+                                       uint32_t address, uint32_t val, int len)
+{
+    uint32_t uncorsta =
+        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_UNCOR_STATUS);
+
+    pci_bridge_write_config(d, address, val, len);
+    pcie_cap_flr_write_config(d, address, val, len);
+    msi_write_config(d, address, val, len);
+    pcie_aer_write_config(d, address, val, len, uncorsta);
+}
+
+static void pcie_upstream_reset(DeviceState *qdev)
+{
+    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+    msi_reset(d);
+    pci_bridge_reset(qdev);
+    pcie_cap_deverr_reset(d);
+}
+
+static void pcie_upstream_flr(PCIDevice *d)
+{
+    /* TODO: not enabled until qdev reset clean up
+       waiting for Anthony's qdev cealn up */
+#if 0
+    /* So far, sticky bit registers or register which must be preserved
+       over FLR aren't emulated. So just reset this device. */
+    pci_device_reset(d);
+#endif
+}
+
+static int pcie_upstream_initfn(PCIDevice *d)
+{
+    PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
+    PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
+    int rc;
+
+    rc = pci_bridge_initfn(d);
+    if (rc < 0) {
+        return rc;
+    }
+
+    pcie_port_init_reg(d);
+    pci_config_set_vendor_id(d->config, PCIE_UPSTREAM_VID);
+    pci_config_set_device_id(d->config, PCIE_UPSTREAM_DID);
+    d->config[PCI_REVISION_ID] = PCIE_UPSTREAM_REVISION;
+
+    rc = msi_init(d, PCIE_UPSTREAM_MSI_OFFSET, PCIE_UPSTREAM_MSI_NR_VECTOR,
+                  PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
+                  PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
+    if (rc < 0) {
+        return rc;
+    }
+    rc = pci_bridge_ssvid_init(d, PCIE_UPSTREAM_SSVID_OFFSET,
+                               PCIE_UPSTREAM_SVID, PCIE_UPSTREAM_SSID);
+    if (rc < 0) {
+        return rc;
+    }
+    rc = pcie_cap_init(d, PCIE_UPSTREAM_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
+                       p->port);
+    if (rc < 0) {
+        return rc;
+    }
+    pcie_cap_flr_init(d, &pcie_upstream_flr);
+    pcie_cap_deverr_init(d);
+    pcie_aer_init(d, PCIE_UPSTREAM_AER_OFFSET);
+
+    return 0;
+}
+
+static int pcie_upstream_exitfn(PCIDevice *d)
+{
+    pcie_aer_exit(d);
+    msi_uninit(d);
+    pcie_cap_exit(d);
+    return pci_bridge_exitfn(d);
+}
+
+PCIEPort *pcie_upstream_init(PCIBus *bus, int devfn, bool multifunction,
+                             const char *bus_name, pci_map_irq_fn map_irq,
+                             uint8_t port)
+{
+    PCIDevice *d;
+    PCIBridge *br;
+    DeviceState *qdev;
+
+    d = pci_create_multifunction(bus, devfn, multifunction,
+                                 PCIE_UPSTREAM_PORT);
+    if (!d) {
+        return NULL;
+    }
+    br = DO_UPCAST(PCIBridge, dev, d);
+
+    qdev = &br->dev.qdev;
+    pci_bridge_map_irq(br, bus_name, map_irq);
+    qdev_prop_set_uint8(qdev, "port", port);
+    qdev_init_nofail(qdev);
+
+    return DO_UPCAST(PCIEPort, br, br);
+}
+
+static const VMStateDescription vmstate_pcie_upstream = {
+    .name = "pcie-upstream-port",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCIE_DEVICE(br.dev, PCIEPort),
+        VMSTATE_STRUCT(br.dev.exp.aer_log, PCIEPort, 0, vmstate_pcie_aer_log,
+                       PCIE_AERLog),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo pcie_upstream_info = {
+    .qdev.name = PCIE_UPSTREAM_PORT,
+    .qdev.desc = "Upstream Port of PCI Express Switch",
+    .qdev.size = sizeof(PCIEPort),
+    .qdev.reset = pcie_upstream_reset,
+    .qdev.vmsd = &vmstate_pcie_upstream,
+
+    .is_express = 1,
+    .is_bridge = 1,
+    .config_write = pcie_upstream_write_config,
+    .init = pcie_upstream_initfn,
+    .exit = pcie_upstream_exitfn,
+
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT8("port", PCIEPort, port, 0),
+        DEFINE_PROP_UINT16("aer_log_max", PCIEPort, br.dev.exp.aer_log.log_max,
+                           PCIE_AER_LOG_MAX_DEFAULT),
+        DEFINE_PROP_END_OF_LIST(),
+    }
+};
+
+static void pcie_upstream_register(void)
+{
+    pci_qdev_register(&pcie_upstream_info);
+}
+
+device_init(pcie_upstream_register);
+
+
+/*
+ * Local variables:
+ *  c-indent-level: 4
+ *  c-basic-offset: 4
+ *  tab-width: 8
+ *  indent-tab-mode: nil
+ * End:
+ */
diff --git a/hw/pcie_upstream.h b/hw/pcie_upstream.h
new file mode 100644
index 0000000..1d36317
--- /dev/null
+++ b/hw/pcie_upstream.h
@@ -0,0 +1,32 @@
+/*
+ * pcie_upstream.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_PCIE_UPSTREAM_H
+#define QEMU_PCIE_UPSTREAM_H
+
+#include "pcie_port.h"
+
+#define PCIE_UPSTREAM_PORT      "pcie-upstream-port"
+
+PCIEPort *pcie_upstream_init(PCIBus *bus, int devfn, bool multifunction,
+                             const char *bus_name, pci_map_irq_fn map_irq,
+                             uint8_t port);
+
+#endif /* QEMU_PCIE_UPSTREAM_H */
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 10/13] pcie downstream port: pci express switch downstream port.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (8 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 09/13] pcie upstream port: pci express switch upstream port Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-22 11:22   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 11/13] pcie/hotplug: glue pushing attention button command. pcie_abp Isaku Yamahata
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

pcie switch downstream port.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- compilation adjustment.
---
 Makefile.objs        |    1 +
 hw/pcie_downstream.c |  218 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcie_downstream.h |   33 ++++++++
 3 files changed, 252 insertions(+), 0 deletions(-)
 create mode 100644 hw/pcie_downstream.c
 create mode 100644 hw/pcie_downstream.h

diff --git a/Makefile.objs b/Makefile.objs
index 72ca8be..baff9ec 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -140,6 +140,7 @@ hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-y += virtio.o virtio-console.o
 hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o pcie_upstream.o
+hw-obj-y += pcie_downstream.o
 hw-obj-y += watchdog.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
diff --git a/hw/pcie_downstream.c b/hw/pcie_downstream.c
new file mode 100644
index 0000000..7a629ea
--- /dev/null
+++ b/hw/pcie_downstream.c
@@ -0,0 +1,218 @@
+/*
+ * pcie_downstream.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "pci_ids.h"
+#include "msi.h"
+#include "pcie.h"
+#include "pcie_downstream.h"
+
+/* For now, TI XIO3130 is borrowed. need to get its own id? */
+#define PCI_DEVICE_ID_TI_XIO3130D       0x8233  /* downstream port */
+#define XIO3130_REVISION                0x1
+#define XIO3130_MSI_OFFSET              0x70
+#define XIO3130_MSI_SUPPORTED_FLAGS     PCI_MSI_FLAGS_64BIT
+#define XIO3130_MSI_NR_VECTOR           1
+#define XIO3130_SSVID_OFFSET            0x80
+#define XIO3130_SSVID_SVID              0
+#define XIO3130_SSVID_SSID              0
+#define XIO3130_EXP_OFFSET              0x90
+#define XIO3130_AER_OFFSET              0x100
+
+#define PCIE_DOWNSTREAM_VID             PCI_VENDOR_ID_TI
+#define PCIE_DOWNSTREAM_DID             PCI_DEVICE_ID_TI_XIO3130D
+#define PCIE_DOWNSTREAM_REVISION        XIO3130_REVISION
+#define PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS     XIO3130_MSI_SUPPORTED_FLAGS
+#define PCIE_DOWNSTREAM_MSI_NR_VECTOR   XIO3130_MSI_NR_VECTOR
+#define PCIE_DOWNSTREAM_MSI_OFFSET      XIO3130_MSI_OFFSET
+#define PCIE_DOWNSTREAM_SSVID_OFFSET    XIO3130_SSVID_OFFSET
+#define PCIE_DOWNSTREAM_SVID            XIO3130_SSVID_SVID
+#define PCIE_DOWNSTREAM_SSID            XIO3130_SSVID_SSID
+#define PCIE_DOWNSTREAM_EXP_OFFSET      XIO3130_EXP_OFFSET
+#define PCIE_DOWNSTREAM_AER_OFFSET      XIO3130_AER_OFFSET
+
+static void pcie_downstream_write_config(PCIDevice *d, uint32_t address,
+                                         uint32_t val, int len)
+{
+    uint16_t sltctl =
+        pci_get_word(d->config + pci_pcie_cap(d) + PCI_EXP_SLTCTL);
+    uint32_t uncorsta =
+        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_UNCOR_STATUS);
+
+    pci_bridge_write_config(d, address, val, len);
+    pcie_cap_flr_write_config(d, address, val, len);
+    pcie_cap_slot_write_config(d, address, val, len, sltctl);
+    msi_write_config(d, address, val, len);
+    pcie_aer_write_config(d, address, val, len, uncorsta);
+}
+
+static void pcie_downstream_reset(DeviceState *qdev)
+{
+    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+    msi_reset(d);
+    pcie_cap_deverr_reset(d);
+    pcie_cap_slot_reset(d);
+    pcie_cap_ari_reset(d);
+    pci_bridge_reset(qdev);
+}
+
+static void pcie_downstream_flr(PCIDevice *d)
+{
+    /* TODO: not enabled until qdev reset clean up
+       waiting for Anthony's qdev cealn up */
+#if 0
+    /* So far, sticky bit registers or register which must be preserved
+       over FLR aren't emulated. So just reset this device. */
+    pci_device_reset(d);
+#endif
+}
+
+static int pcie_downstream_initfn(PCIDevice *d)
+{
+    PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
+    PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
+    PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
+    int rc;
+
+    rc = pci_bridge_initfn(d);
+    if (rc < 0) {
+        return rc;
+    }
+
+    pcie_port_init_reg(d);
+    pci_config_set_vendor_id(d->config, PCIE_DOWNSTREAM_VID);
+    pci_config_set_device_id(d->config, PCIE_DOWNSTREAM_DID);
+    d->config[PCI_REVISION_ID] = PCIE_DOWNSTREAM_REVISION;
+
+    rc = msi_init(d, PCIE_DOWNSTREAM_MSI_OFFSET, PCIE_DOWNSTREAM_MSI_NR_VECTOR,
+                  PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
+                  PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
+    if (rc < 0) {
+        return rc;
+    }
+    rc = pci_bridge_ssvid_init(d, PCIE_DOWNSTREAM_SSVID_OFFSET,
+                               PCIE_DOWNSTREAM_SVID, PCIE_DOWNSTREAM_SSID);
+    if (rc < 0) {
+        return rc;
+    }
+    rc = pcie_cap_init(d, PCIE_DOWNSTREAM_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
+                       p->port);
+    if (rc < 0) {
+        return rc;
+    }
+    pcie_cap_flr_init(d, &pcie_downstream_flr);
+    pcie_cap_deverr_init(d);
+    pcie_cap_slot_init(d, s->slot);
+    pcie_chassis_create(s->chassis);
+    rc = pcie_chassis_add_slot(s);
+    if (rc < 0) {
+        return rc;
+    }
+    pcie_cap_ari_init(d);
+    pcie_aer_init(d, PCIE_DOWNSTREAM_AER_OFFSET);
+
+    return 0;
+}
+
+static int pcie_downstream_exitfn(PCIDevice *d)
+{
+    pcie_aer_exit(d);
+    msi_uninit(d);
+    pcie_cap_exit(d);
+    return pci_bridge_exitfn(d);
+}
+
+PCIESlot *pcie_downstream_init(PCIBus *bus,
+                               int devfn, bool multifunction,
+                               const char *bus_name, pci_map_irq_fn map_irq,
+                               uint8_t port, uint8_t chassis, uint16_t slot)
+{
+    PCIDevice *d;
+    PCIBridge *br;
+    DeviceState *qdev;
+
+    d = pci_create_multifunction(bus, devfn, multifunction,
+                                 PCIE_DOWNSTREAM_PORT);
+    if (!d) {
+        return NULL;
+    }
+    br = DO_UPCAST(PCIBridge, dev, d);
+
+    qdev = &br->dev.qdev;
+    pci_bridge_map_irq(br, bus_name, map_irq);
+    qdev_prop_set_uint8(qdev, "port", port);
+    qdev_prop_set_uint8(qdev, "chassis", chassis);
+    qdev_prop_set_uint16(qdev, "slot", slot);
+    qdev_init_nofail(qdev);
+
+    return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br));
+}
+
+static const VMStateDescription vmstate_pcie_downstream = {
+    .name = "pcie-downstream-port",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCIE_DEVICE(port.br.dev, PCIESlot),
+        VMSTATE_STRUCT(port.br.dev.exp.aer_log, PCIESlot, 0,
+                       vmstate_pcie_aer_log, PCIE_AERLog),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo pcie_downstream_info = {
+    .qdev.name = PCIE_DOWNSTREAM_PORT,
+    .qdev.desc = "Downstream Port of PCI Express Switch",
+    .qdev.size = sizeof(PCIESlot),
+    .qdev.reset = pcie_downstream_reset,
+    .qdev.vmsd = &vmstate_pcie_downstream,
+
+    .is_express = 1,
+    .is_bridge = 1,
+    .config_write = pcie_downstream_write_config,
+    .init = pcie_downstream_initfn,
+    .exit = pcie_downstream_exitfn,
+
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT8("port", PCIESlot, port.port, 0),
+        DEFINE_PROP_UINT8("chassis", PCIESlot, chassis, 0),
+        DEFINE_PROP_UINT16("slot", PCIESlot, slot, 0),
+        DEFINE_PROP_UINT16("aer_log_max", PCIESlot,
+                           port.br.dev.exp.aer_log.log_max,
+                           PCIE_AER_LOG_MAX_DEFAULT),
+        DEFINE_PROP_END_OF_LIST(),
+    }
+};
+
+static void pcie_downstream_register(void)
+{
+    pci_qdev_register(&pcie_downstream_info);
+}
+
+device_init(pcie_downstream_register);
+
+/*
+ * Local variables:
+ *  c-indent-level: 4
+ *  c-basic-offset: 4
+ *  tab-width: 8
+ *  indent-tab-mode: nil
+ * End:
+ */
diff --git a/hw/pcie_downstream.h b/hw/pcie_downstream.h
new file mode 100644
index 0000000..686fdac
--- /dev/null
+++ b/hw/pcie_downstream.h
@@ -0,0 +1,33 @@
+/*
+ * pcie_downstream.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
+ *                    VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_PCIE_DOWNSTREAM_H
+#define QEMU_PCIE_DOWNSTREAM_H
+
+#include "pcie_port.h"
+
+#define PCIE_DOWNSTREAM_PORT    "pcie-downstream-port"
+
+PCIESlot *pcie_downstream_init(PCIBus *bus,
+                               int devfn, bool multifunction,
+                               const char *bus_name, pci_map_irq_fn map_irq,
+                               uint8_t port, uint8_t chassis, uint16_t slot);
+
+#endif /* QEMU_PCIE_DOWNSTREAM_H */
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 11/13] pcie/hotplug: glue pushing attention button command. pcie_abp
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (9 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 10/13] pcie downstream port: pci express switch downstream port Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-22 11:30   ` [Qemu-devel] " Michael S. Tsirkin
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 12/13] pcie/aer: glue aer error injection into qemu monitor Isaku Yamahata
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

glue to pcie_abp monitor command.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hw/pcie_port.c  |   82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-monitor.hx |   14 +++++++++
 sysemu.h        |    4 +++
 3 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/hw/pcie_port.c b/hw/pcie_port.c
index e7c3cef..641c458 100644
--- a/hw/pcie_port.c
+++ b/hw/pcie_port.c
@@ -18,6 +18,10 @@
  * with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "qemu-objects.h"
+#include "sysemu.h"
+#include "monitor.h"
+#include "pcie.h"
 #include "pcie_port.h"
 
 void pcie_port_init_reg(PCIDevice *d)
@@ -104,3 +108,81 @@ void pcie_chassis_del_slot(PCIESlot *s)
 {
     QLIST_REMOVE(s, next);
 }
+
+/**************************************************************************
+ * glue for qemu monitor
+ */
+
+/* Parse [<chassis>.]<slot>, return -1 on error */
+static int pcie_parse_slot_addr(const char* slot_addr,
+                                uint8_t *chassisp, uint16_t *slotp)
+{
+    const char *p;
+    char *e;
+    unsigned long val;
+    unsigned long chassis = 0;
+    unsigned long slot;
+
+    p = slot_addr;
+    val = strtoul(p, &e, 0);
+    if (e == p) {
+        return -1;
+    }
+    if (*e == '.') {
+        chassis = val;
+        p = e + 1;
+        val = strtoul(p, &e, 0);
+        if (e == p) {
+            return -1;
+        }
+    }
+    slot = val;
+
+    if (*e) {
+        return -1;
+    }
+
+    if (chassis > 0xff || slot > 0xffff) {
+        return -1;
+    }
+
+    *chassisp = chassis;
+    *slotp = slot;
+    return 0;
+}
+
+void pcie_attention_button_push_print(Monitor *mon, const QObject *data)
+{
+    QDict *qdict;
+
+    assert(qobject_type(data) == QTYPE_QDICT);
+    qdict = qobject_to_qdict(data);
+
+    monitor_printf(mon, "OK chassis %d, slot %d\n",
+                   (int) qdict_get_int(qdict, "chassis"),
+                   (int) qdict_get_int(qdict, "slot"));
+}
+
+int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
+                               QObject **ret_data)
+{
+    const char* pcie_slot = qdict_get_str(qdict, "pcie_slot");
+    uint8_t chassis;
+    uint16_t slot;
+    PCIESlot *s;
+
+    if (pcie_parse_slot_addr(pcie_slot, &chassis, &slot) < 0) {
+        monitor_printf(mon, "invalid pcie slot address %s\n", pcie_slot);
+        return -1;
+    }
+    s = pcie_chassis_find_slot(chassis, slot);
+    if (!s) {
+        monitor_printf(mon, "slot is not found. %s\n", pcie_slot);
+        return -1;
+    }
+    pcie_cap_slot_push_attention_button(&s->port.br.dev);
+    *ret_data = qobject_from_jsonf("{ 'chassis': %d, 'slot': %d}",
+                                   chassis, slot);
+    assert(*ret_data);
+    return 0;
+}
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 2af3de6..02fbda1 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -1154,6 +1154,20 @@ Hot remove PCI device.
 ETEXI
 
     {
+        .name       = "pcie_abp",
+        .args_type  = "pcie_slot:s",
+        .params     = "[<chassis>.]<slot>",
+        .help       = "push pci express attention button",
+        .user_print  = pcie_attention_button_push_print,
+        .mhandler.cmd_new = pcie_attention_button_push,
+    },
+
+STEXI
+@item pcie_abp
+Push PCI express attention button
+ETEXI
+
+    {
         .name       = "host_net_add",
         .args_type  = "device:s,opts:s?",
         .params     = "tap|user|socket|vde|dump [options]",
diff --git a/sysemu.h b/sysemu.h
index 9c988bb..cca411d 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -150,6 +150,10 @@ extern unsigned int nb_prom_envs;
 void pci_device_hot_add(Monitor *mon, const QDict *qdict);
 void drive_hot_add(Monitor *mon, const QDict *qdict);
 void do_pci_device_hot_remove(Monitor *mon, const QDict *qdict);
+/* pcie hotplug */
+void pcie_attention_button_push_print(Monitor *mon, const QObject *data);
+int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
+                               QObject **ret_data);
 
 /* serial ports */
 
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 12/13] pcie/aer: glue aer error injection into qemu monitor.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (10 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 11/13] pcie/hotplug: glue pushing attention button command. pcie_abp Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 13/13] msix: clear not only INTA, but all INTx when MSI-X is enabled Isaku Yamahata
  2010-09-20 18:18 ` [Qemu-devel] Re: [PATCH v3 00/13] pcie port switch emulators Michael S. Tsirkin
  13 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

glue aer error injection into qemu monitor.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v2 -> v3:
- compilation adjustment.
---
 hw/pcie_aer.c   |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-monitor.hx |   22 ++++++++++++++
 sysemu.h        |    5 +++
 3 files changed, 112 insertions(+), 0 deletions(-)

diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
index 9e3f48e..fa1f66c 100644
--- a/hw/pcie_aer.c
+++ b/hw/pcie_aer.c
@@ -19,6 +19,8 @@
  */
 
 #include "sysemu.h"
+#include "qemu-objects.h"
+#include "monitor.h"
 #include "pci_bridge.h"
 #include "pcie.h"
 #include "msix.h"
@@ -794,3 +796,86 @@ const VMStateDescription vmstate_pcie_aer_log = {
         VMSTATE_END_OF_LIST()
     }
 };
+
+void pcie_aer_inject_error_print(Monitor *mon, const QObject *data)
+{
+    QDict *qdict;
+    int devfn;
+    assert(qobject_type(data) == QTYPE_QDICT);
+    qdict = qobject_to_qdict(data);
+
+    devfn = (int)qdict_get_int(qdict, "devfn");
+    monitor_printf(mon, "OK domain: %x, bus: %x devfn: %x.%x\n",
+                   (int) qdict_get_int(qdict, "domain"),
+                   (int) qdict_get_int(qdict, "bus"),
+                   PCI_SLOT(devfn), PCI_FUNC(devfn));
+}
+
+int do_pcie_aer_inejct_error(Monitor *mon,
+                             const QDict *qdict, QObject **ret_data)
+{
+    const char *pci_addr = qdict_get_str(qdict, "pci_addr");
+    int dom;
+    int bus;
+    unsigned int slot;
+    unsigned int func;
+    PCIDevice *dev;
+    PCIE_AERErr err;
+
+    /* Ideally qdev device path should be used.
+     * However at the moment there is no reliable way to determine
+     * wheher a given qdev is pci device or not.
+     * so pci_addr is used.
+     */
+    if (pci_parse_devaddr(pci_addr, &dom, &bus, &slot, &func)) {
+        monitor_printf(mon, "invalid pci address %s\n", pci_addr);
+        return -1;
+    }
+    dev = pci_find_device(pci_find_root_bus(dom), bus, slot, func);
+    if (!dev) {
+        monitor_printf(mon, "device is not found. 0x%x:0x%x.0x%x\n",
+                       bus, slot, func);
+        return -1;
+    }
+    if (!pci_is_express(dev)) {
+        monitor_printf(mon, "the device doesn't support pci express. "
+                       "0x%x:0x%x.0x%x\n",
+                       bus, slot, func);
+        return -1;
+    }
+
+    err.status = qdict_get_int(qdict, "error_status");
+    err.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn;
+
+    err.flags = 0;
+    if (qdict_get_int(qdict, "is_correctable")) {
+        err.flags |= PCIE_AER_ERR_IS_CORRECTABLE;
+    }
+    if (qdict_get_int(qdict, "advisory_non_fatal")) {
+        err.flags |= PCIE_AER_ERR_MAYBE_ADVISORY;
+    }
+    if (qdict_haskey(qdict, "tlph0")) {
+        err.flags |= PCIE_AER_ERR_HEADER_VALID;
+    }
+    if (qdict_haskey(qdict, "hpfx0")) {
+        err.flags |= PCIE_AER_ERR_TLP_PRESENT;
+    }
+
+    err.header[0] = qdict_get_try_int(qdict, "tlph0", 0);
+    err.header[1] = qdict_get_try_int(qdict, "tlph1", 0);
+    err.header[2] = qdict_get_try_int(qdict, "tlph2", 0);
+    err.header[3] = qdict_get_try_int(qdict, "tlph3", 0);
+
+    err.prefix[0] = qdict_get_try_int(qdict, "hpfx0", 0);
+    err.prefix[1] = qdict_get_try_int(qdict, "hpfx1", 0);
+    err.prefix[2] = qdict_get_try_int(qdict, "hpfx2", 0);
+    err.prefix[3] = qdict_get_try_int(qdict, "hpfx3", 0);
+
+    pcie_aer_inject_error(dev, &err);
+    *ret_data = qobject_from_jsonf("{ 'domain': %d, 'bus': %d, 'devfn': %d }",
+                                   pci_find_domain(dev->bus),
+                                   pci_bus_num(dev->bus), dev->devfn);
+    assert(*ret_data);
+
+    return 0;
+}
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 02fbda1..080c90e 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -1168,6 +1168,28 @@ Push PCI express attention button
 ETEXI
 
     {
+        .name       = "pcie_aer_inject_error",
+        .args_type  = "advisory_non_fatal:-a,is_correctable:-c,"
+	              "pci_addr:s,error_status:i,"
+	              "tlph0:i?,tlph1:i?,tlph2:i?,tlph3:i?,"
+	              "hpfx0:i?,hpfx1:i?,hpfx2:i?,hpfx3:i?",
+        .params     = "[-a] [-c] [[<domain>:]<bus>:]<slot>.<func> "
+	              "<error status:32bit> "
+	              "[<tlp header:(32bit x 4)>] "
+	              "[<tlp header prefix:(32bit x 4)>]",
+        .help       = "inject pcie aer error "
+	               "(use -a for advisory non fatal error) "
+	               "(use -c for correctrable error)",
+        .user_print  = pcie_aer_inject_error_print,
+        .mhandler.cmd_new = do_pcie_aer_inejct_error,
+    },
+
+STEXI
+@item pcie_abp
+Push PCI express attention button
+ETEXI
+
+    {
         .name       = "host_net_add",
         .args_type  = "device:s,opts:s?",
         .params     = "tap|user|socket|vde|dump [options]",
diff --git a/sysemu.h b/sysemu.h
index cca411d..2f7157c 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -155,6 +155,11 @@ void pcie_attention_button_push_print(Monitor *mon, const QObject *data);
 int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
                                QObject **ret_data);
 
+/* pcie aer error injection */
+void pcie_aer_inject_error_print(Monitor *mon, const QObject *data);
+int do_pcie_aer_inejct_error(Monitor *mon,
+                             const QDict *qdict, QObject **ret_data);
+
 /* serial ports */
 
 #define MAX_SERIAL_PORTS 4
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v3 13/13] msix: clear not only INTA, but all INTx when MSI-X is enabled.
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (11 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 12/13] pcie/aer: glue aer error injection into qemu monitor Isaku Yamahata
@ 2010-09-15  5:38 ` Isaku Yamahata
  2010-09-20 18:18 ` [Qemu-devel] Re: [PATCH v3 00/13] pcie port switch emulators Michael S. Tsirkin
  13 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-15  5:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: skandasa, yamahata, etmartin, wexu2, mst

clear not only INTA, but all INTx when MSI-X is enabled.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hw/msix.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 7ce63eb..b202ff7 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -158,6 +158,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
 {
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
     int vector;
+    int i;
 
     if (!range_covers_byte(addr, len, enable_pos)) {
         return;
@@ -167,7 +168,9 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
         return;
     }
 
-    qemu_set_irq(dev->irq[0], 0);
+    for (i = 0; i < PCI_NUM_PINS; ++i) {
+        qemu_set_irq(dev->irq[i], 0);
+    }
 
     if (msix_function_masked(dev)) {
         return;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability Isaku Yamahata
@ 2010-09-15 12:43   ` Michael S. Tsirkin
  2010-09-19  4:56     ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-15 12:43 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:18PM +0900, Isaku Yamahata wrote:
> This patch implements helper functions for pci express capability
> and pci express extended capability allocation.
> NOTE: presence detection depends on pci_qdev_init() change.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> 
> ---
> Changes v2 -> v3:
> - don't use 0b gcc extension. use 0x instead.
> - split out constants into pcie_regs.h for linux merge.
> - export some helpers for pcie-aer split.
> - split out aer helper functions from pcie.c to pcie_aer.c
> - embed PCIExpressDevice into PCIDevice.
> ---
>  Makefile.objs |    1 +
>  hw/pci.h      |   12 +
>  hw/pcie.c     |  638 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pcie.h     |   96 +++++++++
>  qemu-common.h |    1 +
>  5 files changed, 748 insertions(+), 0 deletions(-)
>  create mode 100644 hw/pcie.c
>  create mode 100644 hw/pcie.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 5f5a4c5..eeb5134 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -186,6 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
>  # PCI watchdog devices
>  hw-obj-y += wdt_i6300esb.o
>  
> +hw-obj-y += pcie.o
>  hw-obj-y += msix.o msi.o
>  
>  # PCI network cards
> diff --git a/hw/pci.h b/hw/pci.h
> index 630631b..19e85f5 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -9,6 +9,8 @@
>  /* PCI includes legacy ISA access.  */
>  #include "isa.h"
>  
> +#include "pcie.h"
> +
>  /* PCI bus */
>  
>  #define PCI_DEVFN(slot, func)   ((((slot) & 0x1f) << 3) | ((func) & 0x07))
> @@ -175,6 +177,9 @@ struct PCIDevice {
>      /* Offset of MSI capability in config space */
>      uint8_t msi_cap;
>  
> +    /* PCI Express */
> +    PCIExpressDevice exp;
> +
>      /* Location of option rom */
>      char *romfile;
>      ram_addr_t rom_offset;
> @@ -389,6 +394,13 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
>      return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
>  }
>  
> +/* These are pci express specific, so should belong to pcie.h.
> +   they're here to avoid mutual header dependency. */
> +static inline uint8_t pci_pcie_cap(const PCIDevice *d)
> +{
> +    return pci_is_express(d) ? d->exp.exp_cap : 0;
> +}
> +

This one seems useless: 0 is not right for how you use
this function. Just use the field directly?

>  /* These are not pci specific. Should move into a separate header.
>   * Only pci.c uses them, so keep them here for now.
>   */
> diff --git a/hw/pcie.c b/hw/pcie.c
> new file mode 100644
> index 0000000..a6f396b
> --- /dev/null
> +++ b/hw/pcie.c
> @@ -0,0 +1,638 @@
> +/*
> + * pcie.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "sysemu.h"
> +#include "pci_bridge.h"
> +#include "pcie.h"
> +#include "msix.h"
> +#include "msi.h"
> +#include "pci_internals.h"
> +#include "pcie_regs.h"
> +
> +//#define DEBUG_PCIE
> +#ifdef DEBUG_PCIE
> +# define PCIE_DPRINTF(fmt, ...)                                         \
> +    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
> +#else
> +# define PCIE_DPRINTF(fmt, ...) do {} while (0)
> +#endif
> +#define PCIE_DEV_PRINTF(dev, fmt, ...)                                  \
> +    PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
> +
> +static inline const char *pcie_hp_event_name(enum PCIExpressHotPlugEvent event)
> +{
> +    switch (event) {
> +    case PCI_EXP_HP_EV_ABP:
> +        return "attention button pushed";
> +    case PCI_EXP_HP_EV_PDC:
> +        return "present detection changed";
> +    case PCI_EXP_HP_EV_CCI:
> +        return "command completed";
> +    default:
> +        break;
> +    }
> +    return "Unknown event";
> +}
> +

Nice, but so much code just for debug? print the code out inx hex ...

> +/***************************************************************************
> + * pci express capability helper functions
> + */
> +void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level)

Why is this not static? It makes sense for internal stuff possibly,
but I think functions will need to know what to do: they can't
treat msi/msix/irq identically anyway.

The API seems confusing, I think this is what is creating
code for you. Specifically level = 0 does not notify at all.
So I think we need two:
1. pcie_assert_interrupt which sends msi or sets level to 1
2. pcie_deassert_interrupt which sets level to 0, or nothing
   for non msi.

Then below you can e.g.
if (!sltctrl) {
	pcie_deassert(...);
	return;
}

> +{
> +    /* masking/masking interrupt is handled by upper layer.
> +     * i.e. msix_notify() for MSI-X
> +     *      msi_notify()  for MSI
> +     *      pci_set_irq() for INTx
> +     */
> +    PCIE_DEV_PRINTF(dev, "noitfy vector %d tirgger:%d level:%d\n",

typo

> +                    vector, trigger, level);
> +    if (msix_enabled(dev)) {
> +        if (trigger) {
> +            msix_notify(dev, vector);
> +        }
> +    } else if (msi_enabled(dev)) {
> +        if (trigger){
> +            msi_notify(dev, vector);
> +        }
> +    } else {
> +        qemu_set_irq(dev->irq[0], level);

always 0? really? This is INTA# - is this what the spec says?

> +    }
> +}
> +
> +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
> +{
> +    int exp_cap;
> +    uint8_t *pcie_cap;
> +
> +    assert(pci_is_express(dev));
> +
> +    exp_cap = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
> +                                 PCI_EXP_VER2_SIZEOF);
> +    if (exp_cap < 0) {
> +        return exp_cap;
> +    }
> +    dev->exp.exp_cap = exp_cap;
> +
> +    /* already done in pci_qdev_init() */
> +    assert(dev->cap_present & QEMU_PCI_CAP_EXPRESS);

Hmm. Why do we set this flag in qdev init but do the
rest of it here?

> +
> +    pcie_cap = dev->config + pci_pcie_cap(dev);

come on, just use exp_cap.

> +
> +    /* capability register
> +       interrupt message number defaults to 0 */
> +    pci_set_word(pcie_cap + PCI_EXP_FLAGS,
> +                 ((type << PCI_EXP_FLAGS_TYPE_SHIFT) & PCI_EXP_FLAGS_TYPE) |
> +                 PCI_EXP_FLAGS_VER2);
> +
> +    /* device capability register
> +     * table 7-12:
> +     * roll based error reporting bit must be set by all
> +     * Functions conforming to the ECN, PCI Express Base
> +     * Specification, Revision 1.1., or subsequent PCI Express Base
> +     * Specification revisions.
> +     */
> +    pci_set_long(pcie_cap + PCI_EXP_DEVCAP, PCI_EXP_DEVCAP_RBER);
> +
> +    pci_set_long(pcie_cap + PCI_EXP_LNKCAP,
> +                 (port << PCI_EXP_LNKCAP_PN_SHIFT) |
> +                 PCI_EXP_LNKCAP_ASPMS_0S |
> +                 PCI_EXP_LNK_MLW_1 |
> +                 PCI_EXP_LNK_LS_25);
> +
> +    pci_set_word(pcie_cap + PCI_EXP_LNKSTA,
> +                 PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25);
> +
> +    pci_set_long(pcie_cap + PCI_EXP_DEVCAP2,
> +                 PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
> +
> +    pci_set_word(dev->wmask + exp_cap, PCI_EXP_DEVCTL2_EETLPPB);
> +    return exp_cap;
> +}
> +
> +void pcie_cap_exit(PCIDevice *dev)
> +{
> +    pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER2_SIZEOF);
> +}
> +
> +uint8_t pcie_cap_get_type(const PCIDevice *dev)
> +{
> +    uint32_t pos = pci_pcie_cap(dev);
> +    assert(pos > 0);
> +    return (pci_get_word(dev->config + pos + PCI_EXP_FLAGS) &
> +            PCI_EXP_FLAGS_TYPE) >> PCI_EXP_FLAGS_TYPE_SHIFT;
> +}
> +
> +/* MSI/MSI-X */
> +/* pci express interrupt message number */
> +void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    uint16_t tmp;
> +
> +    assert(vector <= 32);
> +    tmp = pci_get_word(pcie_cap + PCI_EXP_FLAGS);
> +    tmp &= ~PCI_EXP_FLAGS_IRQ;
> +    tmp |= vector << PCI_EXP_FLAGS_IRQ_SHIFT;
> +    pci_set_word(pcie_cap + PCI_EXP_FLAGS, tmp);
> +}
> +
> +uint8_t pcie_cap_flags_get_vector(PCIDevice *dev)
> +{
> +    return (pci_get_word(dev->config + pci_pcie_cap(dev) + PCI_EXP_FLAGS) &
> +            PCI_EXP_FLAGS_IRQ) >> PCI_EXP_FLAGS_IRQ_SHIFT;
> +}
> +
> +void pcie_cap_deverr_init(PCIDevice *dev)
> +{
> +    uint32_t pos = pci_pcie_cap(dev);
> +    uint8_t *pcie_cap = dev->config + pos;
> +    uint8_t *pcie_wmask = dev->wmask + pos;
> +    uint8_t *pcie_w1cmask = dev->wmask + pos;
> +
> +    pci_set_long(pcie_cap + PCI_EXP_DEVCAP,
> +                 pci_get_long(pcie_cap + PCI_EXP_DEVCAP) |
> +                 PCI_EXP_DEVCAP_RBER);
> +
> +    pci_set_long(pcie_wmask + PCI_EXP_DEVCTL,
> +                 pci_get_long(pcie_wmask + PCI_EXP_DEVCTL) |
> +                 PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE |
> +                 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE);
> +
> +    pci_set_long(pcie_w1cmask + PCI_EXP_DEVSTA,
> +                 pci_get_long(pcie_w1cmask + PCI_EXP_DEVSTA) |
> +                 PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
> +                 PCI_EXP_DEVSTA_URD | PCI_EXP_DEVSTA_URD);
> +}
> +
> +void pcie_cap_deverr_reset(PCIDevice *dev)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    pci_set_long(pcie_cap + PCI_EXP_DEVCTL,
> +                 pci_get_long(pcie_cap + PCI_EXP_DEVCTL) &
> +                 ~(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE |
> +                   PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE));
> +}
> +
> +/*
> + * events: PCI_EXP_HP_EV_xxx
> + * status: bit or of PCI_EXP_SLTSTA_xxx

or of?

also - make status the right enum then?


Could you replace the comment above with one describing what this
is supposed to do?
Why can't users simply call pcie_assert_interrupt directly?
The below comments are based on an incomplete understanding
of the below function.

> + */
> +static void pcie_cap_slot_event(PCIDevice *dev,
> +                                enum PCIExpressHotPlugEvent events,
> +                                uint16_t status)
> +{

Most of the code seems to simply validate inputs. But why?
You always pass in valid numbers
also events -> event, you always pass a single one.
It looks like it'll be simpler if you just assume
a single event, and move the status bit
tweaking outside of this function, it is
only useful for PDS ...

The we will get a straightforward

> +    bool trigger;
> +    int level;
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    uint16_t sltctl = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
> +    uint16_t sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
> +
> +    PCIE_DEV_PRINTF(dev,
> +                    "sltctl: 0x%0x2 sltsta: 0x%02x event:%x %s status:%d\n",
> +                    sltctl, sltsta,
> +                    events, pcie_hp_event_name(events), status);
> +    events &= PCI_EXP_HP_EV_SUPPORTED;
> +    if ((sltctl & PCI_EXP_SLTCTL_HPIE) && (sltctl & events) &&
> +        ((sltsta ^ events) & events) /* 0 -> 1 */) {
> +        trigger = true;
> +    } else {
> +        trigger = false;
> +    }
> +
> +    if (events & PCI_EXP_HP_EV_PDC) {
> +        sltsta &= ~PCI_EXP_SLTSTA_PDS;
> +        sltsta |= (status & PCI_EXP_SLTSTA_PDS);
> +    }
> +    sltsta |= events;
> +    pci_set_word(pcie_cap + PCI_EXP_SLTSTA, sltsta);
> +    PCIE_DEV_PRINTF(dev, "sltsta -> %02xn", sltsta);
> +
> +    if ((sltctl & PCI_EXP_SLTCTL_HPIE) && (sltsta & PCI_EXP_HP_EV_SUPPORTED)) {
> +        level = 1;
> +    } else {
> +        level = 0;
> +    }


you can replace if with assignment here and elsewhere:
	level = (sltctl & PCI_EXP_SLTCTL_HPIE) && (sltsta & PCI_EXP_HP_EV_SUPPORTED);


> +
> +    pcie_notify(dev, pcie_cap_flags_get_vector(dev), trigger, level);
> +}
> +
> +static int pcie_cap_slot_hotplug(DeviceState *qdev,
> +                                 PCIDevice *pci_dev, int state)
> +{
> +    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
> +    uint8_t *pcie_cap = d->config + pci_pcie_cap(d);
> +    uint16_t sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
> +
> +    if (!pci_dev->qdev.hotplugged) {
> +        assert(state); /* this case only happens machine creation. */

at machine creation?

> +        sltsta |= PCI_EXP_SLTSTA_PDS;
> +        pci_set_word(pcie_cap + PCI_EXP_SLTSTA, sltsta);
> +        return 0;
> +    }
> +
> +    PCIE_DEV_PRINTF(pci_dev, "hotplug state: %d\n", state);
> +    if (sltsta & PCI_EXP_SLTSTA_EIS) {
> +        /* the slot is electromechanically locked. */

We'll need to produce some error here.

> +        return -EBUSY;
> +    }
> +
> +    if (state) {
> +        if (PCI_FUNC(pci_dev->devfn) == 0) {
> +            /* event is per slot. Not per function
> +             * only generates event for function = 0.
> +             * When hot plug, populate functions > 0
> +             * and then add function = 0 last.
> +             */
> +            pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC, PCI_EXP_SLTSTA_PDS);
> +        }
> +    } else {
> +        PCIBridge *br;
> +        PCIBus *bus;
> +        DeviceState *next;
> +        if (PCI_FUNC(pci_dev->devfn) != 0) {
> +            /* event is per slot. Not per function.
> +               accepts function = 0 only. */
> +            return -EINVAL;

Can user or guest trigger this?
If yes print an error.
IF no, assert.

> +        }
> +
> +        /* zap all functions. */
> +        br = DO_UPCAST(PCIBridge, dev, d);
> +        bus = pci_bridge_get_sec_bus(br);
> +        QLIST_FOREACH_SAFE(qdev, &bus->qbus.children, sibling, next) {
> +            qdev_free(qdev);
> +        }
> +
> +        pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC, 0);
> +    }
> +    return 0;
> +}
> +
> +/* pci express slot for pci express root/downstream port
> +   PCI express capability slot registers */
> +void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    uint8_t *pcie_wmask = dev->wmask + pci_pcie_cap(dev);
> +    uint8_t *pcie_w1cmask = dev->w1cmask + pci_pcie_cap(dev);
> +    uint32_t tmp;
> +
> +    pci_set_word(pcie_cap + PCI_EXP_FLAGS,
> +                 pci_get_word(pcie_cap + PCI_EXP_FLAGS) | PCI_EXP_FLAGS_SLOT);
> +
> +    tmp = pci_get_long(pcie_cap + PCI_EXP_SLTCAP);
> +    tmp &= PCI_EXP_SLTCAP_PSN;
> +    tmp |=
> +        (slot << PCI_EXP_SLTCAP_PSN_SHIFT) |
> +        PCI_EXP_SLTCAP_EIP |
> +        PCI_EXP_SLTCAP_HPS |
> +        PCI_EXP_SLTCAP_HPC |
> +        PCI_EXP_SLTCAP_PIP |
> +        PCI_EXP_SLTCAP_AIP |
> +        PCI_EXP_SLTCAP_ABP;
> +    pci_set_long(pcie_cap + PCI_EXP_SLTCAP, tmp);
> +
> +    tmp = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
> +    tmp &= ~(PCI_EXP_SLTCTL_PIC | PCI_EXP_SLTCTL_AIC);
> +    tmp |= PCI_EXP_SLTCTL_PIC_OFF | PCI_EXP_SLTCTL_AIC_OFF;
> +    pci_set_word(pcie_cap + PCI_EXP_SLTCTL, tmp);
> +    pci_set_word(pcie_wmask + PCI_EXP_SLTCTL,
> +                 pci_get_word(pcie_wmask + PCI_EXP_SLTCTL) |
> +                 PCI_EXP_SLTCTL_PIC |
> +                 PCI_EXP_SLTCTL_AIC |
> +                 PCI_EXP_SLTCTL_HPIE |
> +                 PCI_EXP_SLTCTL_CCIE |
> +                 PCI_EXP_SLTCTL_PDCE |
> +                 PCI_EXP_SLTCTL_ABPE);
> +
> +    pci_set_word(pcie_w1cmask + PCI_EXP_SLTSTA,
> +                 pci_get_word(pcie_w1cmask + PCI_EXP_SLTSTA) |
> +                 PCI_EXP_HP_EV_SUPPORTED);
> +
> +    pci_bus_hotplug(pci_bridge_get_sec_bus(DO_UPCAST(PCIBridge, dev, dev)),
> +                    pcie_cap_slot_hotplug, &dev->qdev);
> +}
> +
> +void pcie_cap_slot_reset(PCIDevice *dev)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    uint32_t tmp;
> +
> +    PCIE_DEV_PRINTF(dev, "reset\n");
> +
> +    tmp = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
> +    tmp &= ~(PCI_EXP_SLTCTL_EIC |
> +             PCI_EXP_SLTCTL_PIC |
> +             PCI_EXP_SLTCTL_AIC |
> +             PCI_EXP_SLTCTL_HPIE |
> +             PCI_EXP_SLTCTL_CCIE |
> +             PCI_EXP_SLTCTL_PDCE |
> +             PCI_EXP_SLTCTL_ABPE);
> +    tmp |= PCI_EXP_SLTCTL_PIC_OFF | PCI_EXP_SLTCTL_AIC_OFF;
> +    pci_set_word(pcie_cap + PCI_EXP_SLTCTL, tmp);
> +
> +    tmp = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
> +    tmp &= ~(PCI_EXP_SLTSTA_EIS | /* by reset, the lock is released */
> +             PCI_EXP_SLTSTA_CC |
> +             PCI_EXP_SLTSTA_PDC |
> +             PCI_EXP_SLTSTA_ABP);
> +    pci_set_word(pcie_cap + PCI_EXP_SLTSTA, tmp);
> +}
> +
> +void pcie_cap_slot_write_config(PCIDevice *dev,
> +                                uint32_t addr, uint32_t val, int len,
> +                                uint16_t sltctl_prev)
> +{
> +    uint32_t pos = pci_pcie_cap(dev);
> +    uint8_t *pcie_cap = dev->config + pos;
> +    uint16_t sltctl = pci_get_word(pcie_cap + PCI_EXP_SLTCTL);
> +    uint16_t sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
> +
> +    PCIE_DEV_PRINTF(dev,
> +                    "addr: 0x%x val: 0x%x len: %d\n"
> +                    "\tsltctl_prev: 0x%02x sltctl: 0x%02x sltsta 0x%02x\n",
> +                    addr, val, len, sltctl_prev, sltctl, sltsta);
> +    /* SLTSTA: process SLTSTA before SLTCTL to avoid spurious interrupt */
> +    if (ranges_overlap(addr, len, pos + PCI_EXP_SLTSTA, 2)) {
> +        sltsta = pci_get_word(pcie_cap + PCI_EXP_SLTSTA);
> +
> +        /* write to stlsta results in clearing bits,
> +           so new interrupts won't be generated. */
> +        PCIE_DEV_PRINTF(dev, "sltsta -> 0x%02x\n", sltsta);
> +    }
> +
> +    /* SLTCTL */
> +    if (ranges_overlap(addr, len, pos + PCI_EXP_SLTCTL, 2)) {
> +        PCIE_DEV_PRINTF(dev, "sltctl: 0x%02x -> 0x%02x\n",
> +                        sltctl_prev, sltctl);
> +        if (pci_shift_word(addr, val, pos + PCI_EXP_SLTCTL) &

This is too complex. Just make the bit writeable,
test and clear, then change status.

> +            PCI_EXP_SLTCTL_EIC) {
> +            /* toggle PCI_EXP_SLTSTA_EIS */
> +            sltsta = (sltsta & ~PCI_EXP_SLTSTA_EIS) |
> +                ((sltsta ^ PCI_EXP_SLTSTA_EIS) & PCI_EXP_SLTSTA_EIS);

You mean
		sltsta ^= PCI_EXP_SLTSTA_EIS
?

> +            pci_set_word(pcie_cap + PCI_EXP_SLTSTA, sltsta);
> +            PCIE_DEV_PRINTF(dev, "PCI_EXP_SLTCTL_EIC: sltsta -> 0x%02x\n",
> +                            sltsta);
> +        }
> +
> +        if (sltctl & PCI_EXP_SLTCTL_HPIE) {
> +            bool trigger;
> +            int level;
> +
> +            if (((sltctl_prev ^ sltctl) & sltctl) & PCI_EXP_HP_EV_SUPPORTED) {
kill extra () they are not helpful:
            if ((sltctl_prev ^ sltctl) & sltctl & PCI_EXP_HP_EV_SUPPORTED)


> +                /* 0 -> 1 */
> +                trigger = true;
> +            } else {
> +                trigger = false;
> +            }
> +            if ((sltctl & sltsta) & PCI_EXP_HP_EV_SUPPORTED) {

kill extra () they are not helpful

> +                level = 1;
> +            } else {
> +                level = 0;
> +            }

What is this trying to implement?


> +            pcie_notify(dev, pcie_cap_flags_get_vector(dev), trigger, level);
> +        }
> +
> +        if (!((sltctl_prev ^ sltctl) & PCI_EXP_SLTCTL_SUPPORTED)) {
> +            PCIE_DEV_PRINTF(dev,
> +                            "sprious command completion slctl 0x%x -> 0x%x\n",
> +                            sltctl_prev, sltctl);
> +        }
> +
> +        /* command completion.
> +         * Real hardware might take a while to complete
> +         * requested command because physical movement would be involved
> +         * like locking the electromechanical lock.
> +         * However in our case, command is completed instantaneously above,
> +         * so send a command completion event right now.
> +         */
> +        /* set command completed bit */
> +        pcie_cap_slot_event(dev, PCI_EXP_HP_EV_CCI, 0);
> +    }
> +}
> +
> +void pcie_cap_slot_push_attention_button(PCIDevice *dev)
> +{
> +    pcie_cap_slot_event(dev, PCI_EXP_HP_EV_ABP, 0);
> +}
> +
> +/* root control/capabilities/status. PME isn't emulated for now */
> +void pcie_cap_root_init(PCIDevice *dev)
> +{
> +    uint8_t pos = pci_pcie_cap(dev);
> +    pci_set_word(dev->wmask + pos + PCI_EXP_RTCTL,
> +                 PCI_EXP_RTCTL_SECEE | PCI_EXP_RTCTL_SENFEE |
> +                 PCI_EXP_RTCTL_SEFEE);
> +}
> +
> +void pcie_cap_root_reset(PCIDevice *dev)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    pci_set_word(pcie_cap + PCI_EXP_RTCTL, 0);
> +}
> +
> +/* function level reset(FLR) */
> +void pcie_cap_flr_init(PCIDevice *dev, pcie_flr_fn flr)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    pci_set_word(pcie_cap + PCI_EXP_DEVCAP,
> +                 pci_get_word(pcie_cap + PCI_EXP_DEVCAP) | PCI_EXP_DEVCAP_FLR);
> +    dev->exp.flr = flr;
> +}
> +
> +void pcie_cap_flr_write_config(PCIDevice *dev,
> +                               uint32_t addr, uint32_t val, int len)
> +{
> +    uint32_t pos = pci_pcie_cap(dev);
> +    if (ranges_overlap(addr, len, pos + PCI_EXP_DEVCTL, 2)) {
> +        uint16_t val16 = pci_shift_word(addr, val, pos + PCI_EXP_DEVCTL);
> +        if ((val16 & PCI_EXP_DEVCTL_BCR_FLR) && dev->exp.flr) {
> +            dev->exp.flr(dev);
> +        }
> +    }

Just make FLR writeable, and clear it after calling reset.
This will also make it possible for devices to detect
that they are reset because of FLR.

> +}
> +
> +/* Alternative Routing-ID Interpretation (ARI) */
> +/* ari forwarding support for down stream port */
> +void pcie_cap_ari_init(PCIDevice *dev)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +    uint8_t *pcie_wmask = dev->wmask + pci_pcie_cap(dev);
> +
> +    pci_set_long(pcie_cap + PCI_EXP_DEVCAP2,
> +                 pci_get_long(pcie_cap + PCI_EXP_DEVCAP2) |
> +                 PCI_EXP_DEVCAP2_ARI);
> +
> +    pci_set_long(pcie_wmask + PCI_EXP_DEVCTL2,
> +                 pci_get_long(pcie_wmask + PCI_EXP_DEVCTL2) |
> +                 PCI_EXP_DEVCTL2_ARI);
> +}
> +
> +void pcie_cap_ari_reset(PCIDevice *dev)
> +{
> +    uint8_t *pcie_cap = dev->config + pci_pcie_cap(dev);
> +
> +    pci_set_long(pcie_cap + PCI_EXP_DEVCTL2,
> +                 pci_get_long(pcie_cap + PCI_EXP_DEVCTL2) &
> +                 ~PCI_EXP_DEVCTL2_ARI);
> +}
> +
> +bool pcie_cap_is_ari_enabled(const PCIDevice *dev)
> +{
> +    if (!pci_is_express(dev)) {
> +        return false;
> +    }
> +    if (!pci_pcie_cap(dev)) {
> +        return false;
> +    }
> +
> +    return pci_get_long(dev->config + pci_pcie_cap(dev) + PCI_EXP_DEVCTL2) &
> +        PCI_EXP_DEVCTL2_ARI;
> +}
> +
> +/**************************************************************************
> + * pci express extended capability allocation functions
> + * uint16_t ext_cap_id (16 bit)
> + * uint8_t cap_ver (4 bit)
> + * uint16_t cap_offset (12 bit)
> + * uint16_t ext_cap_size
> + */
> +
> +#define PCI_EXT_CAP_NO_ID       ((uint16_t)0)   /* 0 is reserved cap id.
> +                                                 * use internally to find the
> +                                                 * last capability in the
> +                                                 * linked list
> +                                                 */


better remove the macro and move the comment to where
it's used.

> +
> +static uint16_t pcie_find_capability_list(PCIDevice *dev, uint16_t cap_id,
> +                                          uint16_t *prev_p)
> +{
> +    uint16_t prev = 0;
> +    uint16_t next = PCI_CONFIG_SPACE_SIZE;
> +    uint32_t header = pci_get_long(dev->config + next);

next -> PCI_CONFIG_SPACE_SIZE in line above.

> +
> +    if (!header) {
> +        /* no extended capability */
> +        next = 0;
> +        goto out;
> +    }
> +
> +    while (next) {

will be clearer as a for loop?
	for (next = PCI_CONFIG_SPACE_SIZE; next;
	 (prev = next),(next = PCI_EXT_CAP_NEXT(header)))

> +        assert(next >= PCI_CONFIG_SPACE_SIZE);
> +        assert(next <= PCIE_CONFIG_SPACE_SIZE - 8);
> +
> +        header = pci_get_long(dev->config + next);
> +        if (PCI_EXT_CAP_ID(header) == cap_id) {
> +            break;
> +        }
> +        prev = next;
> +        next = PCI_EXT_CAP_NEXT(header);
> +    }
> +
> +out:
> +    if (prev_p) {
> +        *prev_p = prev;
> +    }
> +    return next;
> +}
> +
> +uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id)
> +{
> +    return pcie_find_capability_list(dev, cap_id, NULL);
> +}
> +
> +static void pcie_ext_cap_set_next(PCIDevice *dev, uint16_t pos, uint16_t next)
> +{
> +    uint16_t header = pci_get_long(dev->config + pos);
> +    assert(!(next & (PCI_EXT_CAP_ALIGN - 1)));
> +    header = (header & ~PCI_EXT_CAP_NEXT_MASK) |
> +        ((next << PCI_EXT_CAP_NEXT_SHIFT) & PCI_EXT_CAP_NEXT_MASK);
> +    pci_set_long(dev->config + pos, header);
> +}
> +
> +/*
> + * caller must supply valid (offset, size) * such that the range shouldn't
> + * overlap with other capability or other registers.
> + * This function doesn't check it.
> + */
> +void pcie_add_capability(PCIDevice *dev,
> +                         uint16_t cap_id, uint8_t cap_ver,
> +                         uint16_t offset, uint16_t size)
> +{
> +    uint32_t header;
> +    uint16_t next;
> +
> +    assert(offset >= PCI_CONFIG_SPACE_SIZE);
> +    assert(offset < offset + size);
> +    assert(offset + size < PCIE_CONFIG_SPACE_SIZE);
> +    assert(size >= 8);
> +    assert(pci_is_express(dev));
> +
> +    if (offset == PCI_CONFIG_SPACE_SIZE) {
> +        header = pci_get_long(dev->config + offset);
> +        next = PCI_EXT_CAP_NEXT(header);
> +    } else {
> +        uint16_t prev;
> +        next = pcie_find_capability_list(dev, PCI_EXT_CAP_NO_ID, &prev);
> +        assert(next == 0);
> +        pcie_ext_cap_set_next(dev, prev, offset);
> +    }
> +    pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, next));
> +
> +    /* Make capability read-only by default */
> +    memset(dev->wmask + offset, 0, size);
> +    /* Check capability by default */
> +    memset(dev->cmask + offset, 0xFF, size);
> +}
> +
> +void pcie_del_capability(PCIDevice *dev, uint16_t cap_id, uint16_t size)
> +{
> +    uint16_t prev;
> +    uint16_t offset = pcie_find_capability_list(dev, cap_id, &prev);
> +    uint32_t header;
> +
> +    assert(offset >= PCI_CONFIG_SPACE_SIZE);

Should be assert(offset). This is what we return on error, right?

> +    header = pci_get_long(dev->config + offset);
> +    if (prev) {
> +        pcie_ext_cap_set_next(dev, prev, PCI_EXT_CAP_NEXT(header));
> +    } else {
> +        /* move up next ext cap to PCI_CONFIG_SPACE_SIZE? */

Since we don't now, add assert that next is 0.

> +        assert(offset == PCI_CONFIG_SPACE_SIZE);
> +        pci_set_long(dev->config + offset,
> +                     PCI_EXT_CAP(0, 0, PCI_EXT_CAP_NEXT(header)));
> +    }
> +
> +    /* Make those registers read-only reserved zero */

So you make them readonly in both add and delete?
delete should revert add: let's put the
masks back the way they were: writeable.

> +    memset(dev->config + offset, 0, size);
> +    memset(dev->wmask + offset, 0, size);
> +    /* Clear cmask as device-specific registers can't be checked */
> +    memset(dev->cmask + offset, 0, size);
> +}
> +
> +/**************************************************************************
> + * pci express extended capability helper functions
> + */
> +
> +/* ARI */
> +void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn)
> +{
> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ARI, PCI_ARI_VER,
> +                        offset, PCI_ARI_SIZEOF);
> +    pci_set_long(dev->config + offset + PCI_ARI_CAP, PCI_ARI_CAP_NFN(nextfn));
> +}
> diff --git a/hw/pcie.h b/hw/pcie.h
> new file mode 100644
> index 0000000..37713dc
> --- /dev/null
> +++ b/hw/pcie.h
> @@ -0,0 +1,96 @@
> +/*
> + * pcie.h
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_PCIE_H
> +#define QEMU_PCIE_H
> +
> +#include "hw.h"
> +#include "pcie_regs.h"
> +
> +enum PCIExpressIndicator {
> +    /* for attention and power indicator */
> +    PCI_EXP_HP_IND_RESERVED     = PCI_EXP_SLTCTL_IND_RESERVED,
> +    PCI_EXP_HP_IND_ON           = PCI_EXP_SLTCTL_IND_ON,
> +    PCI_EXP_HP_IND_BLINK        = PCI_EXP_SLTCTL_IND_BLINK,
> +    PCI_EXP_HP_IND_OFF          = PCI_EXP_SLTCTL_IND_OFF,
> +};
> +
> +enum PCIExpressHotPlugEvent {

this is not how we name types, I think?
Either typedef enum {} PCIExpressHotPlugEvent, or
enum pcie_hot_plug_event { }.
Same for other types.

> +    /* the bits match the bits in Slot Control/Status registers.
> +     * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx

Is it important that they match?
We don't assume this in code, do we?

> +     */
> +    PCI_EXP_HP_EV_ABP           = 0x01,         /* attention button preseed */

typo

> +    PCI_EXP_HP_EV_PDC           = 0x08,         /* presence detect changed */
> +    PCI_EXP_HP_EV_CCI           = 0x10,         /* command completed */
> +
> +    PCI_EXP_HP_EV_SUPPORTED     = 0x19,         /* supported event mask  */

Gave me pause until I saw
	PCI_EXP_HP_EV_SUPPORTED = (PCI_EXP_HP_EV_ABP | PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_CCI)
so make this explicit?

Also - non supported bits would always be readonly, right?
So why do we need this and all the masking?
I think we should be able to get away with checking
the whole register is not 0, and get rid of this?


> +    /* events not listed aren't supported */
> +};
> +
> +typedef void (*pcie_flr_fn)(PCIDevice *dev);

Is flr special?  Can't we use the generic reset handlers?
If not why?

> +
> +struct PCIExpressDevice {
> +    /* Offset of express capability in config space */
> +    uint8_t exp_cap;
> +
> +    /* FLR */
> +    pcie_flr_fn flr;
> +};
> +
> +void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level);
> +
> +/* PCI express capability helper functions */
> +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port);
> +void pcie_cap_exit(PCIDevice *dev);
> +uint8_t pcie_cap_get_type(const PCIDevice *dev);
> +void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector);
> +uint8_t pcie_cap_flags_get_vector(PCIDevice *dev);
> +
> +void pcie_cap_deverr_init(PCIDevice *dev);
> +void pcie_cap_deverr_reset(PCIDevice *dev);
> +
> +void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot);
> +void pcie_cap_slot_reset(PCIDevice *dev);
> +void pcie_cap_slot_write_config(PCIDevice *dev,
> +                                uint32_t addr, uint32_t val, int len,
> +                                uint16_t sltctl_prev);
> +void pcie_cap_slot_push_attention_button(PCIDevice *dev);
> +
> +void pcie_cap_root_init(PCIDevice *dev);
> +void pcie_cap_root_reset(PCIDevice *dev);
> +
> +void pcie_cap_flr_init(PCIDevice *dev, pcie_flr_fn flr);
> +void pcie_cap_flr_write_config(PCIDevice *dev,
> +                           uint32_t addr, uint32_t val, int len);
> +
> +void pcie_cap_ari_init(PCIDevice *dev);
> +void pcie_cap_ari_reset(PCIDevice *dev);
> +bool pcie_cap_is_ari_enabled(const PCIDevice *dev);
> +
> +/* PCI express extended capability helper functions */
> +uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id);
> +void pcie_add_capability(PCIDevice *dev,
> +                         uint16_t cap_id, uint8_t cap_ver,
> +                         uint16_t offset, uint16_t size);
> +void pcie_del_capability(PCIDevice *dev, uint16_t cap_id, uint16_t size);
> +
> +void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> +
> +#endif /* QEMU_PCIE_H */
> diff --git a/qemu-common.h b/qemu-common.h
> index d735235..6d9ee26 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -219,6 +219,7 @@ typedef struct PCIHostState PCIHostState;
>  typedef struct PCIExpressHost PCIExpressHost;
>  typedef struct PCIBus PCIBus;
>  typedef struct PCIDevice PCIDevice;
> +typedef struct PCIExpressDevice PCIExpressDevice;
>  typedef struct PCIBridge PCIBridge;
>  typedef struct SerialState SerialState;
>  typedef struct IRQState *qemu_irq;
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value Isaku Yamahata
@ 2010-09-15 12:49   ` Michael S. Tsirkin
  2010-09-19  4:13     ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-15 12:49 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:16PM +0900, Isaku Yamahata wrote:
> introduce helper function pci_shift_{word, long}() which returns
> returns shifted word/long of given position and range.
> They will be used later.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

So I think the reason you *think* you need these
is because you set the wmask wrong: you make all
capability readonly and then write a ton of
custom code to tweak it. Instead,
Make writeable registers writeable, even if
read always returns 0.
Then in your handler do
if (dev->config[offset] & mask) {
	handle bit write
	dev->config[offset] &= ~mask;
}

no range checks necessary.

If you need to do something on register change,
just keep the old state in your structure,
then

	dev->exp.a != dev->config[offset]
tells you there was a change.


BTW if you need to do this to words or longs,
not just bytes, maybe pci_set_word/pci_clear_word would be helpful?

> ---
>  hw/pci.h |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.h b/hw/pci.h
> index f4ea97a..630631b 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -327,6 +327,25 @@ pci_config_set_interrupt_pin(uint8_t *pci_config, uint8_t val)
>      pci_set_byte(&pci_config[PCI_INTERRUPT_PIN], val);
>  }
>  
> +static inline uint32_t
> +pci_shift_long(uint32_t addr, uint32_t val, uint32_t pos)
> +{
> +    if (addr >= pos) {
> +        assert(addr - pos <= 32 / 8);
> +        val <<= (addr - pos) * 8;
> +    } else {
> +        assert(pos - addr <= 32 / 8);
> +        val >>= (pos - addr) * 8;
> +    }
> +    return val;
> +}
> +
> +static inline uint16_t
> +pci_shift_word(uint32_t addr, uint32_t val, uint32_t pos)
> +{
> +    return pci_shift_long(addr, val, pos);
> +}
> +
>  typedef int (*pci_qdev_initfn)(PCIDevice *dev);
>  typedef struct {
>      DeviceInfo qdev;
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 01/13] msi: implemented msi.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 01/13] msi: implemented msi Isaku Yamahata
@ 2010-09-15 13:03   ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-15 13:03 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:14PM +0900, Isaku Yamahata wrote:
> implemented msi support functions.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> 
> ---
> Changes v2 -> v3:
> - improved comment wording.
> - simplified shift/ffs dance.
> 
> Changes v1 -> v2:
> - opencode some oneline helper function/macros for readability
> - use ffs where appropriate
> - rename some functions/variables as suggested.
> - added assert()
> - 1 -> 1U
> - clear INTx# when MSI is enabled
> - clear pending bits for freed vectors.
> - check the requested number of vectors.
> ---
>  Makefile.objs |    2 +-
>  hw/msi.c      |  358 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/msi.h      |   41 +++++++
>  hw/pci.h      |   10 +-
>  4 files changed, 407 insertions(+), 4 deletions(-)
>  create mode 100644 hw/msi.c
>  create mode 100644 hw/msi.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 594894b..5f5a4c5 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
>  # PCI watchdog devices
>  hw-obj-y += wdt_i6300esb.o
>  
> -hw-obj-y += msix.o
> +hw-obj-y += msix.o msi.o
>  
>  # PCI network cards
>  hw-obj-y += ne2000.o
> diff --git a/hw/msi.c b/hw/msi.c
> new file mode 100644
> index 0000000..65c163f
> --- /dev/null
> +++ b/hw/msi.c
> @@ -0,0 +1,358 @@
> +/*
> + * msi.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "msi.h"
> +
> +/* Eventually those constants should go to Linux pci_regs.h */
> +#define PCI_MSI_PENDING_32      0x10
> +#define PCI_MSI_PENDING_64      0x14
> +
> +/* PCI_MSI_ADDRESS_LO */
> +#define PCI_MSI_ADDRESS_LO_MASK         (~0x3)
> +
> +/* If we get rid of cap allocator, we won't need those. */
> +#define PCI_MSI_32_SIZEOF       0x0a
> +#define PCI_MSI_64_SIZEOF       0x0e
> +#define PCI_MSI_32M_SIZEOF      0x14
> +#define PCI_MSI_64M_SIZEOF      0x18
> +
> +/* If we get rid of cap allocator, we won't need this. */
> +static inline uint8_t msi_cap_sizeof(uint16_t flags)
> +{
> +    switch (flags & (PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT)) {
> +    case PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT:
> +        return PCI_MSI_64M_SIZEOF;
> +    case PCI_MSI_FLAGS_64BIT:
> +        return PCI_MSI_64_SIZEOF;
> +    case PCI_MSI_FLAGS_MASKBIT:
> +        return PCI_MSI_32M_SIZEOF;
> +    case 0:
> +        return PCI_MSI_32_SIZEOF;
> +    default:
> +        abort();
> +        break;
> +    }
> +    return 0;
> +}
> +
> +//#define MSI_DEBUG
> +
> +#ifdef MSI_DEBUG
> +# define MSI_DPRINTF(fmt, ...)                                          \
> +    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
> +#else
> +# define MSI_DPRINTF(fmt, ...)  do { } while (0)
> +#endif
> +#define MSI_DEV_PRINTF(dev, fmt, ...)                                   \
> +    MSI_DPRINTF("%s:%x " fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
> +
> +static inline uint8_t msi_nr_vectors(uint16_t flags)
> +{
> +    return 1U <<
> +        ((flags & PCI_MSI_FLAGS_QSIZE) >> (ffs(PCI_MSI_FLAGS_QSIZE) - 1));
> +}
> +
> +static inline uint8_t msi_flags_off(const PCIDevice* dev)
> +{
> +    return dev->msi_cap + PCI_MSI_FLAGS;
> +}
> +
> +static inline uint8_t msi_address_lo_off(const PCIDevice* dev)
> +{
> +    return dev->msi_cap + PCI_MSI_ADDRESS_LO;
> +}
> +
> +static inline uint8_t msi_address_hi_off(const PCIDevice* dev)
> +{
> +    return dev->msi_cap + PCI_MSI_ADDRESS_HI;
> +}
> +
> +static inline uint8_t msi_data_off(const PCIDevice* dev, bool msi64bit)
> +{
> +    return dev->msi_cap + (msi64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32);
> +}
> +
> +static inline uint8_t msi_mask_off(const PCIDevice* dev, bool msi64bit)
> +{
> +    return dev->msi_cap + (msi64bit ? PCI_MSI_MASK_64 : PCI_MSI_MASK_32);
> +}
> +
> +static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit)
> +{
> +    return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32);
> +}
> +
> +bool msi_enabled(const PCIDevice *dev)
> +{
> +    return msi_present(dev) &&
> +        (pci_get_word(dev->config + msi_flags_off(dev)) &
> +         PCI_MSI_FLAGS_ENABLE);
> +}
> +
> +int msi_init(struct PCIDevice *dev, uint8_t offset,
> +             uint8_t nr_vectors, bool msi64bit, bool msi_per_vector_mask)

I think that you want simply unsigned nr_vectors and unsigned vector here
and elsewhere (e.g. same for return type of nr_vectors_allocated).
There's no advanatage to u8 here that I can see: the value is not 0-255
as with offset, so it does not help readability, and it does make you
use weird macros to print values instead of plain %x.
Generally better use standard types when width is not relevant.
This makes it easier to notice where it *is*.
Likewise most functions here should just work with unsigned.
config handlers are an exception as they assume specific
signature and value must have specific size (32 bit) for things to work.


> +{
> +    uint8_t vectors_order;
> +    uint16_t flags;
> +    uint8_t cap_size;
> +    int config_offset;
> +    MSI_DEV_PRINTF(dev,
> +                   "init offset: 0x%"PRIx8" vector: %"PRId8
> +                   " 64bit %d mask %d\n",
> +                   offset, nr_vectors, msi64bit, msi_per_vector_mask);
> +
> +    assert(!(nr_vectors & (nr_vectors - 1)));   /* power of 2 */
> +    assert(nr_vectors > 0);
> +    assert(nr_vectors <= 32);   /* the nr of MSI vectors is up to 32 */
> +    vectors_order = ffs(nr_vectors) - 1;
> +
> +    flags = vectors_order << (ffs(PCI_MSI_FLAGS_QMASK) - 1);
> +    if (msi64bit) {
> +        flags |= PCI_MSI_FLAGS_64BIT;
> +    }
> +    if (msi_per_vector_mask) {
> +        flags |= PCI_MSI_FLAGS_MASKBIT;
> +    }
> +
> +    cap_size = msi_cap_sizeof(flags);
> +    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
> +    if (config_offset < 0) {
> +        return config_offset;
> +    }
> +
> +    dev->msi_cap = config_offset;
> +    dev->cap_present |= QEMU_PCI_CAP_MSI;
> +
> +    pci_set_word(dev->config + msi_flags_off(dev), flags);
> +    pci_set_word(dev->wmask + msi_flags_off(dev),
> +                 PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
> +    pci_set_long(dev->wmask + msi_address_lo_off(dev),
> +                 PCI_MSI_ADDRESS_LO_MASK);
> +    if (msi64bit) {
> +        pci_set_long(dev->wmask + msi_address_hi_off(dev), 0xffffffff);
> +    }
> +    pci_set_word(dev->wmask + msi_data_off(dev, msi64bit), 0xffff);
> +
> +    if (msi_per_vector_mask) {
> +        pci_set_long(dev->wmask + msi_mask_off(dev, msi64bit),
> +                     (1U << nr_vectors) - 1);

this seems wrong. shift by 32 is undefined in C, isn't it?
You want this I think:
	0xffffffff >> (32 - nr_vectors)

> +    }
> +    return config_offset;
> +}
> +
> +void msi_uninit(struct PCIDevice *dev)
> +{
> +    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    uint8_t cap_size = msi_cap_sizeof(flags);
> +    pci_del_capability(dev, PCI_CAP_ID_MSIX, cap_size);
> +    MSI_DEV_PRINTF(dev, "uninit\n");
> +}
> +
> +void msi_reset(PCIDevice *dev)
> +{
> +    uint16_t flags;
> +    bool msi64bit;
> +
> +    flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    flags &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
> +    msi64bit = flags & PCI_MSI_FLAGS_64BIT;
> +
> +    pci_set_word(dev->config + msi_flags_off(dev), flags);
> +    pci_set_long(dev->config + msi_address_lo_off(dev), 0);
> +    if (msi64bit) {
> +        pci_set_long(dev->config + msi_address_hi_off(dev), 0);
> +    }
> +    pci_set_word(dev->config + msi_data_off(dev, msi64bit), 0);
> +    if (flags & PCI_MSI_FLAGS_MASKBIT) {
> +        pci_set_long(dev->config + msi_mask_off(dev, msi64bit), 0);
> +        pci_set_long(dev->config + msi_pending_off(dev, msi64bit), 0);
> +    }
> +    MSI_DEV_PRINTF(dev, "reset\n");
> +}
> +
> +static bool msi_is_masked(const PCIDevice *dev, uint8_t vector)
> +{
> +    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    uint32_t mask;
> +
> +    if (!(flags & PCI_MSI_FLAGS_MASKBIT)) {
> +        return false;
> +    }
> +
> +    mask = pci_get_long(dev->config +
> +                        msi_mask_off(dev, flags & PCI_MSI_FLAGS_64BIT));
> +    return mask & (1U << vector);
> +}
> +
> +static void msi_set_pending(PCIDevice *dev, uint8_t vector)
> +{
> +    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
> +    uint32_t pending;
> +
> +    assert(flags & PCI_MSI_FLAGS_MASKBIT);
> +
> +    pending = pci_get_long(dev->config + msi_pending_off(dev, msi64bit));
> +    pending |= 1U << vector;
> +    pci_set_long(dev->config + msi_pending_off(dev, msi64bit), pending);
> +    MSI_DEV_PRINTF(dev, "pending vector 0x%"PRIx8"\n", vector);
> +}
> +
> +void msi_notify(PCIDevice *dev, uint8_t vector)
> +{
> +    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
> +    uint8_t nr_vectors = msi_nr_vectors(flags);
> +    uint64_t address;
> +    uint32_t data;
> +
> +    assert(vector < nr_vectors);
> +    if (msi_is_masked(dev, vector)) {
> +        msi_set_pending(dev, vector);
> +        return;
> +    }
> +
> +    if (msi64bit){
> +        address = pci_get_quad(dev->config + msi_address_lo_off(dev));
> +    } else {
> +        address = pci_get_long(dev->config + msi_address_lo_off(dev));
> +    }
> +
> +    /* upper bit 31:16 is zero */
> +    data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
> +    if (nr_vectors > 1) {
> +        data &= ~(nr_vectors - 1);
> +        data |= vector;
> +    }
> +
> +    MSI_DEV_PRINTF(dev,
> +                   "notify vector 0x%"PRIx8
> +                   " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
> +                   vector, address, data);
> +    stl_phys(address, data);
> +}
> +
> +/* call this function after updating configs by pci_default_write_config(). */
> +void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
> +{
> +    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
> +    bool msi_per_vector_mask = flags & PCI_MSI_FLAGS_MASKBIT;
> +    uint8_t nr_vectors;
> +    uint8_t log_num_vecs;
> +    uint8_t log_max_vecs;
> +    uint8_t vector;
> +    uint32_t pending;
> +    int i;
> +
> +#ifdef MSI_DEBUG
> +    if (ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
> +        MSI_DEV_PRINTF(dev, "addr 0x%"PRIx32" val 0x%"PRIx32" len %d\n",
> +                       addr, val, len);
> +        MSI_DEV_PRINTF(dev, "ctrl: 0x%"PRIx16" address: 0x%"PRIx32,
> +                       flags,
> +                       pci_get_long(dev->config + msi_address_lo_off(dev)));
> +        if (msi64bit) {
> +            fprintf(stderr, " addrss-hi: 0x%"PRIx32,
> +                    pci_get_long(dev->config + msi_address_hi_off(dev)));
> +        }
> +        fprintf(stderr, " data: 0x%"PRIx16,
> +                pci_get_word(dev->config + msi_data_off(dev, msi64bit)));
> +        if (flags & PCI_MSI_FLAGS_MASKBIT) {
> +            fprintf(stderr, " mask 0x%"PRIx32" pending 0x%"PRIx32,
> +                    pci_get_long(dev->config + msi_mask_off(dev, msi64bit)),
> +                    pci_get_long(dev->config + msi_pending_off(dev, msi64bit)));
> +        }
> +        fprintf(stderr, "\n");
> +    }
> +#endif
> +
> +    /* Are we modified? */
> +    if (!(ranges_overlap(addr, len, msi_flags_off(dev), 2) ||
> +          (msi_per_vector_mask &&
> +           ranges_overlap(addr, len, msi_mask_off(dev, msi64bit), 4)))) {
> +        return;
> +    }
> +
> +    if (!(flags & PCI_MSI_FLAGS_ENABLE)) {
> +        return;
> +    }
> +
> +    /*
> +     * Now MSI is enabled, clear INTx# interrupts.
> +     * the driver is prohibited from writing enable bit to mask
> +     * a service request. But the guest OS could do this.
> +     * So we just discard the interrupts as moderate fallback.
> +     *
> +     * 6.8.3.3. Enabling Operation
> +     *   While enabled for MSI or MSI-X operation, a function is prohibited
> +     *   from using its INTx# pin (if implemented) to request
> +     *   service (MSI, MSI-X, and INTx# are mutually exclusive).
> +     */
> +    for (i = 0; i < PCI_NUM_PINS; ++i) {
> +        qemu_set_irq(dev->irq[i], 0);
> +    }
> +
> +    /*
> +     * nr_vectors might be set bigger than capable. So clamp it.
> +     * This is not legal by spec, so we can do anything we like,
> +     * just don't crash the host
> +     */
> +    log_num_vecs =
> +        (flags & PCI_MSI_FLAGS_QSIZE) >> (ffs(PCI_MSI_FLAGS_QSIZE) - 1);
> +    log_max_vecs =
> +        (flags & PCI_MSI_FLAGS_QMASK) >> (ffs(PCI_MSI_FLAGS_QMASK) - 1);
> +    if (log_num_vecs > log_max_vecs) {
> +        flags &= ~PCI_MSI_FLAGS_QSIZE;
> +        flags |= log_max_vecs << (ffs(PCI_MSI_FLAGS_QSIZE) - 1);
> +        pci_set_word(dev->config + msi_flags_off(dev), flags);
> +    }
> +
> +    if (!msi_per_vector_mask) {
> +        /* if per vector masking isn't supported,
> +           there is no pending interrupt. */
> +        return;
> +    }
> +
> +    nr_vectors = msi_nr_vectors(flags);
> +
> +    /* This will discard pending interrupts, if any. */
> +    pending = pci_get_long(dev->config + msi_pending_off(dev, msi64bit));
> +    pending &= (1U << nr_vectors) - 1;

as above, this is wrong for nr_vectors == 32
You want:
	0xffffffff >> nr_vectors

> +    pci_set_long(dev->config + msi_pending_off(dev, msi64bit), pending);
> +
> +    /* deliver pending interrupts which are unmasked */
> +    for (vector = 0; vector < nr_vectors; ++vector) {
> +        if (msi_is_masked(dev, vector) || !(pending & (1U << vector))) {
> +            continue;
> +        }
> +
> +        pending &= ~(1U << vector);
> +        pci_set_long(dev->config + msi_pending_off(dev, msi64bit),
> +                     pending);
> +        msi_notify(dev, vector);
> +    }
> +}
> +
> +uint8_t msi_nr_vectors_allocated(const PCIDevice *dev)
> +{
> +    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> +    return msi_nr_vectors(flags);
> +}
> diff --git a/hw/msi.h b/hw/msi.h
> new file mode 100644
> index 0000000..eac9c78
> --- /dev/null
> +++ b/hw/msi.h
> @@ -0,0 +1,41 @@
> +/*
> + * msi.h
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_MSI_H
> +#define QEMU_MSI_H
> +
> +#include "qemu-common.h"
> +#include "pci.h"
> +
> +bool msi_enabled(const PCIDevice *dev);
> +int msi_init(struct PCIDevice *dev, uint8_t offset,
> +             uint8_t nr_vectors, bool msi64bit, bool msi_per_vector_mask);
> +void msi_uninit(struct PCIDevice *dev);
> +void msi_reset(PCIDevice *dev);
> +void msi_notify(PCIDevice *dev, uint8_t vector);
> +void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
> +uint8_t msi_nr_vectors_allocated(const PCIDevice *dev);
> +
> +static inline bool msi_present(const PCIDevice *dev)
> +{
> +    return dev->cap_present & QEMU_PCI_CAP_MSI;
> +}
> +
> +#endif /* QEMU_MSI_H */
> diff --git a/hw/pci.h b/hw/pci.h
> index 1c6075e..3879708 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -109,11 +109,12 @@ typedef struct PCIIORegion {
>  
>  /* Bits in cap_present field. */
>  enum {
> -    QEMU_PCI_CAP_MSIX = 0x1,
> -    QEMU_PCI_CAP_EXPRESS = 0x2,
> +    QEMU_PCI_CAP_MSI = 0x1,
> +    QEMU_PCI_CAP_MSIX = 0x2,
> +    QEMU_PCI_CAP_EXPRESS = 0x4,
>  
>      /* multifunction capable device */
> -#define QEMU_PCI_CAP_MULTIFUNCTION_BITNR        2
> +#define QEMU_PCI_CAP_MULTIFUNCTION_BITNR        3
>      QEMU_PCI_CAP_MULTIFUNCTION = (1 << QEMU_PCI_CAP_MULTIFUNCTION_BITNR),
>  };
>  
> @@ -168,6 +169,9 @@ struct PCIDevice {
>      /* Version id needed for VMState */
>      int32_t version_id;
>  
> +    /* Offset of MSI capability in config space */
> +    uint8_t msi_cap;
> +
>      /* Location of option rom */
>      char *romfile;
>      ram_addr_t rom_offset;
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value.
  2010-09-15 12:49   ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-19  4:13     ` Isaku Yamahata
  0 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-19  4:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:49:50PM +0200, Michael S. Tsirkin wrote:
> On Wed, Sep 15, 2010 at 02:38:16PM +0900, Isaku Yamahata wrote:
> > introduce helper function pci_shift_{word, long}() which returns
> > returns shifted word/long of given position and range.
> > They will be used later.
> > 
> > Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> 
> So I think the reason you *think* you need these
> is because you set the wmask wrong: you make all
> capability readonly and then write a ton of
> custom code to tweak it. Instead,
> Make writeable registers writeable, even if
> read always returns 0.
> Then in your handler do
> if (dev->config[offset] & mask) {
> 	handle bit write
> 	dev->config[offset] &= ~mask;
> }
> 
> no range checks necessary.
> 
> If you need to do something on register change,
> just keep the old state in your structure,
> then
> 
> 	dev->exp.a != dev->config[offset]
> tells you there was a change.

Okay, I'll give it a try.


> BTW if you need to do this to words or longs,
> not just bytes, maybe pci_set_word/pci_clear_word would be helpful?

Agreed.

thanks,

> > ---
> >  hw/pci.h |   19 +++++++++++++++++++
> >  1 files changed, 19 insertions(+), 0 deletions(-)
> > 
> > diff --git a/hw/pci.h b/hw/pci.h
> > index f4ea97a..630631b 100644
> > --- a/hw/pci.h
> > +++ b/hw/pci.h
> > @@ -327,6 +327,25 @@ pci_config_set_interrupt_pin(uint8_t *pci_config, uint8_t val)
> >      pci_set_byte(&pci_config[PCI_INTERRUPT_PIN], val);
> >  }
> >  
> > +static inline uint32_t
> > +pci_shift_long(uint32_t addr, uint32_t val, uint32_t pos)
> > +{
> > +    if (addr >= pos) {
> > +        assert(addr - pos <= 32 / 8);
> > +        val <<= (addr - pos) * 8;
> > +    } else {
> > +        assert(pos - addr <= 32 / 8);
> > +        val >>= (pos - addr) * 8;
> > +    }
> > +    return val;
> > +}
> > +
> > +static inline uint16_t
> > +pci_shift_word(uint32_t addr, uint32_t val, uint32_t pos)
> > +{
> > +    return pci_shift_long(addr, val, pos);
> > +}
> > +
> >  typedef int (*pci_qdev_initfn)(PCIDevice *dev);
> >  typedef struct {
> >      DeviceInfo qdev;
> > -- 
> > 1.7.1.1
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability.
  2010-09-15 12:43   ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-19  4:56     ` Isaku Yamahata
  2010-09-19 11:45       ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-19  4:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:43:10PM +0200, Michael S. Tsirkin wrote:
> > +/***************************************************************************
> > + * pci express capability helper functions
> > + */
> > +void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level)
> 
> Why is this not static? It makes sense for internal stuff possibly,
> but I think functions will need to know what to do: they can't
> treat msi/msix/irq identically anyway.

The aer code which I split out uses it.


> The API seems confusing, I think this is what is creating
> code for you. Specifically level = 0 does not notify at all.
> So I think we need two:
> 1. pcie_assert_interrupt which sends msi or sets level to 1
> 2. pcie_deassert_interrupt which sets level to 0, or nothing
>    for non msi.
> 
> Then below you can e.g.
> if (!sltctrl) {
> 	pcie_deassert(...);
> 	return;
> }

As I already mentioned in the other mail, when to assert MSI
can be different from INTx. The express spec utilizes it.
For example hot plug, aer, and so on.


> > +                    vector, trigger, level);
> > +    if (msix_enabled(dev)) {
> > +        if (trigger) {
> > +            msix_notify(dev, vector);
> > +        }
> > +    } else if (msi_enabled(dev)) {
> > +        if (trigger){
> > +            msi_notify(dev, vector);
> > +        }
> > +    } else {
> > +        qemu_set_irq(dev->irq[0], level);
> 
> always 0? really? This is INTA# - is this what the spec says?

It depends on each device implementation. So I just picked INTA#.
Okay, I'll make it customizable by using property.


> > +    }
> > +}
> > +
> > +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
> > +{
> > +    int exp_cap;
> > +    uint8_t *pcie_cap;
> > +
> > +    assert(pci_is_express(dev));
> > +
> > +    exp_cap = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
> > +                                 PCI_EXP_VER2_SIZEOF);
> > +    if (exp_cap < 0) {
> > +        return exp_cap;
> > +    }
> > +    dev->exp.exp_cap = exp_cap;
> > +
> > +    /* already done in pci_qdev_init() */
> > +    assert(dev->cap_present & QEMU_PCI_CAP_EXPRESS);
> 
> Hmm. Why do we set this flag in qdev init but do the
> rest of it here?

pci_qdev_init() needs to know it for allocating config array.


> > +        return -EBUSY;
> > +    }
> > +
> > +    if (state) {
> > +        if (PCI_FUNC(pci_dev->devfn) == 0) {
> > +            /* event is per slot. Not per function
> > +             * only generates event for function = 0.
> > +             * When hot plug, populate functions > 0
> > +             * and then add function = 0 last.
> > +             */
> > +            pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC, PCI_EXP_SLTSTA_PDS);
> > +        }
> > +    } else {
> > +        PCIBridge *br;
> > +        PCIBus *bus;
> > +        DeviceState *next;
> > +        if (PCI_FUNC(pci_dev->devfn) != 0) {
> > +            /* event is per slot. Not per function.
> > +               accepts function = 0 only. */
> > +            return -EINVAL;
> 
> Can user or guest trigger this?
> If yes print an error.
> IF no, assert.

Not yet. When multi function device hot plug is supported in some sense,
it will be triggered in some way. The code is just place holder until then.
Some TODO comment should have been there.


> > +                level = 1;
> > +            } else {
> > +                level = 0;
> > +            }
> 
> What is this trying to implement?

hot plut event notification. Please refer to
6.7. PCI Express Hot-Plug Support
6.7.3. PCI Express Hot-Plug Events
6.7.3.4. Software Notification of Hot-Plug Events

> > +        assert(offset == PCI_CONFIG_SPACE_SIZE);
> > +        pci_set_long(dev->config + offset,
> > +                     PCI_EXT_CAP(0, 0, PCI_EXT_CAP_NEXT(header)));
> > +    }
> > +
> > +    /* Make those registers read-only reserved zero */
> 
> So you make them readonly in both add and delete?
> delete should revert add: let's put the
> masks back the way they were: writeable.

In fact zeroing in add is redundant, but I added it following msix code.

The usage model is
- At first the registers are unused, so should be read only zero.
- add capability
  (zeroing is redundant)
- the device specific code sets config/mask/cmask/w1cmask as it likes
- del capability
  This makes the register unused, i.e. read only zero.

Maybe it's possible to make zeroing them caller's responsible.


> > +    /* the bits match the bits in Slot Control/Status registers.
> > +     * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx
> 
> Is it important that they match?
> We don't assume this in code, do we?

Yes, we're using it when setting stlsta.


> > +    /* events not listed aren't supported */
> > +};
> > +
> > +typedef void (*pcie_flr_fn)(PCIDevice *dev);
> 
> Is flr special?  Can't we use the generic reset handlers?
> If not why?

Reset(cold reset/warm reset) in generic sense corresponds to
conventional reset in express sense which corresponds to PCI RST#.
On the other hand FLR is different from the conventional one.

Cited from the spec

6.6. PCI Express Reset - Rules
6.6.1. Conventional Reset
 Conventional Reset includes all reset mechanisms other than Function
 Level Reset.
6.6.2. Function-Level Reset (FLR)

Most devices would implement FLR as just calling something like
qdev_reset. But the spec differentiates FLR from conventional reset,
so the generic pcie layer should do.
-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability.
  2010-09-19  4:56     ` Isaku Yamahata
@ 2010-09-19 11:45       ` Michael S. Tsirkin
  2010-09-24  2:24         ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-19 11:45 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Sun, Sep 19, 2010 at 01:56:23PM +0900, Isaku Yamahata wrote:
> On Wed, Sep 15, 2010 at 02:43:10PM +0200, Michael S. Tsirkin wrote:
> > > +/***************************************************************************
> > > + * pci express capability helper functions
> > > + */
> > > +void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level)
> > 
> > Why is this not static? It makes sense for internal stuff possibly,
> > but I think functions will need to know what to do: they can't
> > treat msi/msix/irq identically anyway.
> 
> The aer code which I split out uses it.

Move it there?

> > The API seems confusing, I think this is what is creating
> > code for you. Specifically level = 0 does not notify at all.
> > So I think we need two:
> > 1. pcie_assert_interrupt which sends msi or sets level to 1
> > 2. pcie_deassert_interrupt which sets level to 0, or nothing
> >    for non msi.
> > 
> > Then below you can e.g.
> > if (!sltctrl) {
> > 	pcie_deassert(...);
> > 	return;
> > }
> 
> As I already mentioned in the other mail, when to assert MSI
> can be different from INTx. The express spec utilizes it.
> For example hot plug, aer, and so on.

My comment really has to do with the API.
The API that gets level and trigger values is confusing.
Two functions assert and deassert would be clearer.

> 
> > > +                    vector, trigger, level);
> > > +    if (msix_enabled(dev)) {
> > > +        if (trigger) {
> > > +            msix_notify(dev, vector);
> > > +        }
> > > +    } else if (msi_enabled(dev)) {
> > > +        if (trigger){
> > > +            msi_notify(dev, vector);
> > > +        }
> > > +    } else {
> > > +        qemu_set_irq(dev->irq[0], level);
> > 
> > always 0? really? This is INTA# - is this what the spec says?
> 
> It depends on each device implementation. So I just picked INTA#.
> Okay, I'll make it customizable by using property.

This might not even be static.
What if device can assert multiple INTx# pins?
I think a saner way is just to expose this in the API.

> 
> > > +    }
> > > +}
> > > +
> > > +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
> > > +{
> > > +    int exp_cap;
> > > +    uint8_t *pcie_cap;
> > > +
> > > +    assert(pci_is_express(dev));
> > > +
> > > +    exp_cap = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
> > > +                                 PCI_EXP_VER2_SIZEOF);
> > > +    if (exp_cap < 0) {
> > > +        return exp_cap;
> > > +    }
> > > +    dev->exp.exp_cap = exp_cap;
> > > +
> > > +    /* already done in pci_qdev_init() */
> > > +    assert(dev->cap_present & QEMU_PCI_CAP_EXPRESS);
> > 
> > Hmm. Why do we set this flag in qdev init but do the
> > rest of it here?
> 
> pci_qdev_init() needs to know it for allocating config array.
> 

Can we do everything in qdev init function?

> > > +        return -EBUSY;
> > > +    }
> > > +
> > > +    if (state) {
> > > +        if (PCI_FUNC(pci_dev->devfn) == 0) {
> > > +            /* event is per slot. Not per function
> > > +             * only generates event for function = 0.
> > > +             * When hot plug, populate functions > 0
> > > +             * and then add function = 0 last.
> > > +             */
> > > +            pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC, PCI_EXP_SLTSTA_PDS);
> > > +        }
> > > +    } else {
> > > +        PCIBridge *br;
> > > +        PCIBus *bus;
> > > +        DeviceState *next;
> > > +        if (PCI_FUNC(pci_dev->devfn) != 0) {
> > > +            /* event is per slot. Not per function.
> > > +               accepts function = 0 only. */
> > > +            return -EINVAL;
> > 
> > Can user or guest trigger this?
> > If yes print an error.
> > IF no, assert.
> 
> Not yet. When multi function device hot plug is supported in some sense,
> it will be triggered in some way. The code is just place holder until then.
> Some TODO comment should have been there.

+ assert for now.

> 
> > > +                level = 1;
> > > +            } else {
> > > +                level = 0;
> > > +            }
> > 
> > What is this trying to implement?
> 
> hot plut event notification. Please refer to
> 6.7. PCI Express Hot-Plug Support
> 6.7.3. PCI Express Hot-Plug Events
> 6.7.3.4. Software Notification of Hot-Plug Events
> 
> > > +        assert(offset == PCI_CONFIG_SPACE_SIZE);
> > > +        pci_set_long(dev->config + offset,
> > > +                     PCI_EXT_CAP(0, 0, PCI_EXT_CAP_NEXT(header)));
> > > +    }
> > > +
> > > +    /* Make those registers read-only reserved zero */
> > 
> > So you make them readonly in both add and delete?
> > delete should revert add: let's put the
> > masks back the way they were: writeable.
> 
> In fact zeroing in add is redundant, but I added it following msix code.

It is not redundand there as registers are writeable by default:
    memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
           config_size - PCI_CONFIG_HEADER_SIZE);


> 
> The usage model is
> - At first the registers are unused, so should be read only zero.

You can't know they are unused: in PCI spec everything
outside capability list is vendor specific so it
is writeable to let drivers do their thing.

> - add capability
>   (zeroing is redundant)
> - the device specific code sets config/mask/cmask/w1cmask as it likes
> - del capability
>   This makes the register unused, i.e. read only zero.

as above.

> Maybe it's possible to make zeroing them caller's responsible.
> 
> 
> > > +    /* the bits match the bits in Slot Control/Status registers.
> > > +     * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx
> > 
> > Is it important that they match?
> > We don't assume this in code, do we?
> 
> Yes, we're using it when setting stlsta.

Then they should be defined one through the other,
or just use one set of macros and have a comment in code
explaining that layout is identical.

> 
> > > +    /* events not listed aren't supported */
> > > +};
> > > +
> > > +typedef void (*pcie_flr_fn)(PCIDevice *dev);
> > 
> > Is flr special?  Can't we use the generic reset handlers?
> > If not why?
> 
> Reset(cold reset/warm reset) in generic sense corresponds to
> conventional reset in express sense which corresponds to PCI RST#.
> On the other hand FLR is different from the conventional one.
> 
> Cited from the spec
> 
> 6.6. PCI Express Reset - Rules
> 6.6.1. Conventional Reset
>  Conventional Reset includes all reset mechanisms other than Function
>  Level Reset.
> 6.6.2. Function-Level Reset (FLR)
> 
> Most devices would implement FLR as just calling something like
> qdev_reset. But the spec differentiates FLR from conventional reset,
> so the generic pcie layer should do.

I am not sure I agree. If most devices don't care, or
behave almost identically, it's ok to just call qdev_reset:
devices can find out that FLR is in progress by looking
at the config space (bit will be set there) - and we can
add a helper pcie_flr_in_progress() to test it.

This way most devices will work out of box.
Only if behaviour is mostly different would it make sense
to have a completely separate reset path.
What happens with devices you implemented?

> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 04/13] pcie: add pcie constants to pcie_regs.h
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 04/13] pcie: add pcie constants to pcie_regs.h Isaku Yamahata
@ 2010-09-20 18:14   ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-20 18:14 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:17PM +0900, Isaku Yamahata wrote:
> add pcie constants to pcie_regs.h.
> Those constants should go to Linux pci_regs.h and then the file should
> go away eventually.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> Changes v2 -> v3:
> - moved out pcie constants from pcie.c to pcie_regs.h.
> - removed unused macros
> ---
>  hw/pcie_regs.h |  170 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 170 insertions(+), 0 deletions(-)
>  create mode 100644 hw/pcie_regs.h
> 
> diff --git a/hw/pcie_regs.h b/hw/pcie_regs.h
> new file mode 100644
> index 0000000..abd39ef
> --- /dev/null
> +++ b/hw/pcie_regs.h
> @@ -0,0 +1,170 @@
> +/*
> + * pcie.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.

I don't think we want these comments in this header.
The constants are from spec, we don't want to copyright them.
Copyright protecting headers seems generally not useful, and these notices do
take up the most valuable space in header: at the beginning.

> + */
> +#ifndef QEMU_PCIE_REGS_H
> +#define QEMU_PCIE_REGS_H
> +
> +/*
> + * TODO:
> + * Those constants and macros should be go to Linux pci_regs.h
> + * Once they're merged, they will go away.
> + */
> +
> +/* express capability */
> +
> +#define PCI_EXP_VER2_SIZEOF             0x3c /* express capability of ver. 2 */
> +#define PCI_EXT_CAP_VER_SHIFT           16
> +#define PCI_EXT_CAP_NEXT_SHIFT          20
> +#define PCI_EXT_CAP_NEXT_MASK           (0xffc << PCI_EXT_CAP_NEXT_SHIFT)
> +
> +#define PCI_EXT_CAP(id, ver, next)                                      \
> +    ((id) |                                                             \
> +     ((ver) << PCI_EXT_CAP_VER_SHIFT) |                                 \
> +     ((next) << PCI_EXT_CAP_NEXT_SHIFT))
> +
> +#define PCI_EXT_CAP_ALIGN               4
> +#define PCI_EXT_CAP_ALIGNUP(x)                                  \
> +    (((x) + PCI_EXT_CAP_ALIGN - 1) & ~(PCI_EXT_CAP_ALIGN - 1))
> +
> +/* PCI_EXP_FLAGS */
> +#define PCI_EXP_FLAGS_VER2              2 /* for now, supports only ver. 2 */
> +#define PCI_EXP_FLAGS_IRQ_SHIFT         (ffs(PCI_EXP_FLAGS_IRQ) - 1)
> +#define PCI_EXP_FLAGS_TYPE_SHIFT        (ffs(PCI_EXP_FLAGS_TYPE) - 1)
> +
> +
> +/* PCI_EXP_LINK{CAP, STA} */
> +/* link speed */
> +#define PCI_EXP_LNK_LS_25               1
> +
> +#define PCI_EXP_LNK_MLW_SHIFT           (ffs(PCI_EXP_LNKCAP_MLW) - 1)
> +#define PCI_EXP_LNK_MLW_1               (1 << PCI_EXP_LNK_MLW_SHIFT)
> +
> +/* PCI_EXP_LINKCAP */
> +#define PCI_EXP_LNKCAP_ASPMS_SHIFT      (ffs(PCI_EXP_LNKCAP_ASPMS) - 1)
> +#define PCI_EXP_LNKCAP_ASPMS_0S         (1 << PCI_EXP_LNKCAP_ASPMS_SHIFT)
> +
> +#define PCI_EXP_LNKCAP_PN_SHIFT         (ffs(PCI_EXP_LNKCAP_PN) - 1)
> +
> +#define PCI_EXP_SLTCAP_PSN_SHIFT        (ffs(PCI_EXP_SLTCAP_PSN) - 1)
> +
> +#define PCI_EXP_SLTCTL_IND_RESERVED     0x0
> +#define PCI_EXP_SLTCTL_IND_ON           0x1
> +#define PCI_EXP_SLTCTL_IND_BLINK        0x2
> +#define PCI_EXP_SLTCTL_IND_OFF          0x3
> +#define PCI_EXP_SLTCTL_AIC_SHIFT        (ffs(PCI_EXP_SLTCTL_AIC) - 1)
> +#define PCI_EXP_SLTCTL_AIC_OFF                          \
> +    (PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_AIC_SHIFT)
> +
> +#define PCI_EXP_SLTCTL_PIC_SHIFT        (ffs(PCI_EXP_SLTCTL_PIC) - 1)
> +#define PCI_EXP_SLTCTL_PIC_OFF                          \
> +    (PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT)
> +
> +#define PCI_EXP_SLTCTL_SUPPORTED        \
> +            (PCI_EXP_SLTCTL_ABPE |      \
> +             PCI_EXP_SLTCTL_PDCE |      \
> +             PCI_EXP_SLTCTL_CCIE |      \
> +             PCI_EXP_SLTCTL_HPIE |      \
> +             PCI_EXP_SLTCTL_AIC |       \
> +             PCI_EXP_SLTCTL_PCC |       \
> +             PCI_EXP_SLTCTL_EIC)
> +
> +#define PCI_EXP_DEVCAP2_EFF             0x100000
> +#define PCI_EXP_DEVCAP2_EETLPP          0x200000
> +
> +#define PCI_EXP_DEVCTL2_EETLPPB         0x80
> +
> +/* ARI */
> +#define PCI_ARI_VER                     1
> +#define PCI_ARI_SIZEOF                  8
> +
> +/* AER */
> +#define PCI_ERR_VER                     2
> +#define PCI_ERR_SIZEOF                  0x48
> +
> +#define PCI_ERR_UNC_SDN                 0x00000020      /* surprise down */
> +#define PCI_ERR_UNC_ACSV                0x00200000      /* ACS Violation */
> +#define PCI_ERR_UNC_INTN                0x00400000      /* Internal Error */
> +#define PCI_ERR_UNC_MCBTLP              0x00800000      /* MC Blcoked TLP */
> +#define PCI_ERR_UNC_ATOP_EBLOCKED       0x01000000      /* atomic op egress blocked */
> +#define PCI_ERR_UNC_TLP_PRF_BLOCKED     0x02000000      /* TLP Prefix Blocked */
> +#define PCI_ERR_COR_ADV_NONFATAL        0x00002000      /* Advisory Non-Fatal */
> +#define PCI_ERR_COR_INTERNAL            0x00004000      /* Corrected Internal */
> +#define PCI_ERR_COR_HL_OVERFLOW         0x00008000      /* Header Long Overflow */
> +#define PCI_ERR_CAP_FEP_MASK            0x0000001f
> +#define PCI_ERR_CAP_MHRC                0x00000200
> +#define PCI_ERR_CAP_MHRE                0x00000400
> +#define PCI_ERR_CAP_TLP                 0x00000800
> +
> +#define PCI_ERR_TLP_PREFIX_LOG          0x38
> +
> +#define PCI_SEC_STATUS_RCV_SYSTEM_ERROR         0x4000
> +
> +/* aer root error command/status */
> +#define PCI_ERR_ROOT_CMD_EN_MASK        (PCI_ERR_ROOT_CMD_COR_EN |      \
> +                                         PCI_ERR_ROOT_CMD_NONFATAL_EN | \
> +                                         PCI_ERR_ROOT_CMD_FATAL_EN)
> +
> +#define PCI_ERR_ROOT_IRQ                0xf8000000
> +#define PCI_ERR_ROOT_IRQ_SHIFT          (ffs(PCI_ERR_ROOT_IRQ) - 1)
> +#define PCI_ERR_ROOT_STATUS_REPORT_MASK (PCI_ERR_ROOT_COR_RCV |         \
> +                                         PCI_ERR_ROOT_MULTI_COR_RCV |   \
> +                                         PCI_ERR_ROOT_UNCOR_RCV |       \
> +                                         PCI_ERR_ROOT_MULTI_UNCOR_RCV | \
> +                                         PCI_ERR_ROOT_FIRST_FATAL |     \
> +                                         PCI_ERR_ROOT_NONFATAL_RCV |    \
> +                                         PCI_ERR_ROOT_FATAL_RCV)
> +
> +#define PCI_ERR_UNC_SUPPORTED           (PCI_ERR_UNC_DLP |              \
> +                                         PCI_ERR_UNC_SDN |              \
> +                                         PCI_ERR_UNC_POISON_TLP |       \
> +                                         PCI_ERR_UNC_FCP |              \
> +                                         PCI_ERR_UNC_COMP_TIME |        \
> +                                         PCI_ERR_UNC_COMP_ABORT |       \
> +                                         PCI_ERR_UNC_UNX_COMP |         \
> +                                         PCI_ERR_UNC_RX_OVER |          \
> +                                         PCI_ERR_UNC_MALF_TLP |         \
> +                                         PCI_ERR_UNC_ECRC |             \
> +                                         PCI_ERR_UNC_UNSUP |            \
> +                                         PCI_ERR_UNC_ACSV |             \
> +                                         PCI_ERR_UNC_INTN |             \
> +                                         PCI_ERR_UNC_MCBTLP |           \
> +                                         PCI_ERR_UNC_ATOP_EBLOCKED |    \
> +                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
> +
> +#define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
> +                                         PCI_ERR_UNC_SDN |              \
> +                                         PCI_ERR_UNC_FCP |              \
> +                                         PCI_ERR_UNC_RX_OVER |          \
> +                                         PCI_ERR_UNC_MALF_TLP |         \
> +                                         PCI_ERR_UNC_INTN)
> +
> +#define PCI_ERR_COR_SUPPORTED           (PCI_ERR_COR_RCVR |             \
> +                                         PCI_ERR_COR_BAD_TLP |          \
> +                                         PCI_ERR_COR_BAD_DLLP |         \
> +                                         PCI_ERR_COR_REP_ROLL |         \
> +                                         PCI_ERR_COR_REP_TIMER |        \
> +                                         PCI_ERR_COR_ADV_NONFATAL |     \
> +                                         PCI_ERR_COR_INTERNAL |         \
> +                                         PCI_ERR_COR_HL_OVERFLOW)
> +
> +#define PCI_ERR_COR_MASK_DEFAULT        (PCI_ERR_COR_ADV_NONFATAL |     \
> +                                         PCI_ERR_COR_INTERNAL |         \
> +                                         PCI_ERR_COR_HL_OVERFLOW)
> +
> +#endif /* QEMU_PCIE_REGS_H */
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 00/13]  pcie port switch emulators
  2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
                   ` (12 preceding siblings ...)
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 13/13] msix: clear not only INTA, but all INTx when MSI-X is enabled Isaku Yamahata
@ 2010-09-20 18:18 ` Michael S. Tsirkin
  13 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-20 18:18 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:13PM +0900, Isaku Yamahata wrote:
> Here is v3 of the patch series.
> I didn't address the pcie_init() issue yet with v3 because
> there are already many changes. So I'd like to get feed back
> before going too far. The issue would be addressed with the next spin
> if necessary.
> 
>   pci: implement RW1C register framework.
>   msix: clear not only INTA, but all INTx when MSI-X is enabled.

Applied these two.
Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 09/13] pcie upstream port: pci express switch upstream port.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 09/13] pcie upstream port: pci express switch upstream port Isaku Yamahata
@ 2010-09-22 11:22   ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-22 11:22 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:22PM +0900, Isaku Yamahata wrote:
> pci express switch upstream port.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> Changes v2 -> v3:
> - compilation adjustment.


This is in fact a specific upstream port, isn't it?
If so rename all PCIE names here to specific port model?
Also, this is small enough to avoid splitting it out
from the switch code: if we don't, we won't have
APIs like pcie_upstream_init which is only useful
as part of switch.

> ---
>  Makefile.objs      |    2 +-
>  hw/pcie_upstream.c |  200 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pcie_upstream.h |   32 ++++++++
>  3 files changed, 233 insertions(+), 1 deletions(-)
>  create mode 100644 hw/pcie_upstream.c
>  create mode 100644 hw/pcie_upstream.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 7e81b57..72ca8be 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -139,7 +139,7 @@ user-obj-y += cutils.o cache-utils.o
>  hw-obj-y =
>  hw-obj-y += vl.o loader.o
>  hw-obj-y += virtio.o virtio-console.o
> -hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o
> +hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o pcie_upstream.o
>  hw-obj-y += watchdog.o
>  hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
>  hw-obj-$(CONFIG_ECC) += ecc.o
> diff --git a/hw/pcie_upstream.c b/hw/pcie_upstream.c
> new file mode 100644
> index 0000000..a08fce1
> --- /dev/null
> +++ b/hw/pcie_upstream.c
> @@ -0,0 +1,200 @@
> +/*
> + * pcie_upstream.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "pci_ids.h"
> +#include "msi.h"
> +#include "pcie.h"
> +#include "pcie_upstream.h"
> +
> +/* For now, TI XIO3130 is borrowed. need to get its own id? */
> +#define PCI_DEVICE_ID_TI_XIO3130U       0x8232  /* upstream port */
> +#define XIO3130_REVISION                0x2
> +#define XIO3130_MSI_OFFSET              0x70
> +#define XIO3130_MSI_SUPPORTED_FLAGS     PCI_MSI_FLAGS_64BIT
> +#define XIO3130_MSI_NR_VECTOR           1
> +#define XIO3130_SSVID_OFFSET            0x80
> +#define XIO3130_SSVID_SVID              0
> +#define XIO3130_SSVID_SSID              0
> +#define XIO3130_EXP_OFFSET              0x90
> +#define XIO3130_AER_OFFSET              0x100
> +
> +#define PCIE_UPSTREAM_VID               PCI_VENDOR_ID_TI
> +#define PCIE_UPSTREAM_DID               PCI_DEVICE_ID_TI_XIO3130U
> +#define PCIE_UPSTREAM_REVISION          XIO3130_REVISION
> +#define PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS       XIO3130_MSI_SUPPORTED_FLAGS
> +#define PCIE_UPSTREAM_MSI_NR_VECTOR     XIO3130_MSI_NR_VECTOR
> +#define PCIE_UPSTREAM_MSI_OFFSET        XIO3130_MSI_OFFSET
> +#define PCIE_UPSTREAM_SSVID_OFFSET      XIO3130_SSVID_OFFSET
> +#define PCIE_UPSTREAM_SVID              XIO3130_SSVID_SVID
> +#define PCIE_UPSTREAM_SSID              XIO3130_SSVID_SSID
> +#define PCIE_UPSTREAM_EXP_OFFSET        XIO3130_EXP_OFFSET
> +#define PCIE_UPSTREAM_AER_OFFSET        XIO3130_AER_OFFSET
> +
> +static void pcie_upstream_write_config(PCIDevice *d,
> +                                       uint32_t address, uint32_t val, int len)
> +{
> +    uint32_t uncorsta =
> +        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_UNCOR_STATUS);
> +
> +    pci_bridge_write_config(d, address, val, len);
> +    pcie_cap_flr_write_config(d, address, val, len);
> +    msi_write_config(d, address, val, len);
> +    pcie_aer_write_config(d, address, val, len, uncorsta);
> +}
> +
> +static void pcie_upstream_reset(DeviceState *qdev)
> +{
> +    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
> +    msi_reset(d);
> +    pci_bridge_reset(qdev);
> +    pcie_cap_deverr_reset(d);
> +}
> +
> +static void pcie_upstream_flr(PCIDevice *d)
> +{
> +    /* TODO: not enabled until qdev reset clean up
> +       waiting for Anthony's qdev cealn up */
> +#if 0
> +    /* So far, sticky bit registers or register which must be preserved
> +       over FLR aren't emulated. So just reset this device. */
> +    pci_device_reset(d);
> +#endif
> +}
> +
> +static int pcie_upstream_initfn(PCIDevice *d)
> +{
> +    PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
> +    PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
> +    int rc;
> +
> +    rc = pci_bridge_initfn(d);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    pcie_port_init_reg(d);
> +    pci_config_set_vendor_id(d->config, PCIE_UPSTREAM_VID);
> +    pci_config_set_device_id(d->config, PCIE_UPSTREAM_DID);
> +    d->config[PCI_REVISION_ID] = PCIE_UPSTREAM_REVISION;
> +
> +    rc = msi_init(d, PCIE_UPSTREAM_MSI_OFFSET, PCIE_UPSTREAM_MSI_NR_VECTOR,
> +                  PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> +                  PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    rc = pci_bridge_ssvid_init(d, PCIE_UPSTREAM_SSVID_OFFSET,
> +                               PCIE_UPSTREAM_SVID, PCIE_UPSTREAM_SSID);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    rc = pcie_cap_init(d, PCIE_UPSTREAM_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
> +                       p->port);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    pcie_cap_flr_init(d, &pcie_upstream_flr);
> +    pcie_cap_deverr_init(d);
> +    pcie_aer_init(d, PCIE_UPSTREAM_AER_OFFSET);
> +
> +    return 0;
> +}
> +
> +static int pcie_upstream_exitfn(PCIDevice *d)
> +{
> +    pcie_aer_exit(d);
> +    msi_uninit(d);
> +    pcie_cap_exit(d);
> +    return pci_bridge_exitfn(d);
> +}
> +
> +PCIEPort *pcie_upstream_init(PCIBus *bus, int devfn, bool multifunction,
> +                             const char *bus_name, pci_map_irq_fn map_irq,
> +                             uint8_t port)
> +{
> +    PCIDevice *d;
> +    PCIBridge *br;
> +    DeviceState *qdev;
> +
> +    d = pci_create_multifunction(bus, devfn, multifunction,
> +                                 PCIE_UPSTREAM_PORT);
> +    if (!d) {
> +        return NULL;
> +    }
> +    br = DO_UPCAST(PCIBridge, dev, d);
> +
> +    qdev = &br->dev.qdev;
> +    pci_bridge_map_irq(br, bus_name, map_irq);
> +    qdev_prop_set_uint8(qdev, "port", port);
> +    qdev_init_nofail(qdev);
> +
> +    return DO_UPCAST(PCIEPort, br, br);
> +}
> +
> +static const VMStateDescription vmstate_pcie_upstream = {
> +    .name = "pcie-upstream-port",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCIE_DEVICE(br.dev, PCIEPort),
> +        VMSTATE_STRUCT(br.dev.exp.aer_log, PCIEPort, 0, vmstate_pcie_aer_log,
> +                       PCIE_AERLog),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo pcie_upstream_info = {
> +    .qdev.name = PCIE_UPSTREAM_PORT,
> +    .qdev.desc = "Upstream Port of PCI Express Switch",
> +    .qdev.size = sizeof(PCIEPort),
> +    .qdev.reset = pcie_upstream_reset,
> +    .qdev.vmsd = &vmstate_pcie_upstream,
> +
> +    .is_express = 1,
> +    .is_bridge = 1,
> +    .config_write = pcie_upstream_write_config,
> +    .init = pcie_upstream_initfn,
> +    .exit = pcie_upstream_exitfn,
> +
> +    .qdev.props = (Property[]) {
> +        DEFINE_PROP_UINT8("port", PCIEPort, port, 0),
> +        DEFINE_PROP_UINT16("aer_log_max", PCIEPort, br.dev.exp.aer_log.log_max,
> +                           PCIE_AER_LOG_MAX_DEFAULT),
> +        DEFINE_PROP_END_OF_LIST(),
> +    }
> +};
> +
> +static void pcie_upstream_register(void)
> +{
> +    pci_qdev_register(&pcie_upstream_info);
> +}
> +
> +device_init(pcie_upstream_register);
> +
> +
> +/*
> + * Local variables:
> + *  c-indent-level: 4
> + *  c-basic-offset: 4
> + *  tab-width: 8
> + *  indent-tab-mode: nil
> + * End:
> + */
> diff --git a/hw/pcie_upstream.h b/hw/pcie_upstream.h
> new file mode 100644
> index 0000000..1d36317
> --- /dev/null
> +++ b/hw/pcie_upstream.h
> @@ -0,0 +1,32 @@
> +/*
> + * pcie_upstream.h
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */

There's nothing 

> +
> +#ifndef QEMU_PCIE_UPSTREAM_H
> +#define QEMU_PCIE_UPSTREAM_H
> +
> +#include "pcie_port.h"
> +
> +#define PCIE_UPSTREAM_PORT      "pcie-upstream-port"
> +
> +PCIEPort *pcie_upstream_init(PCIBus *bus, int devfn, bool multifunction,
> +                             const char *bus_name, pci_map_irq_fn map_irq,
> +                             uint8_t port);
> +
> +#endif /* QEMU_PCIE_UPSTREAM_H */
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 10/13] pcie downstream port: pci express switch downstream port.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 10/13] pcie downstream port: pci express switch downstream port Isaku Yamahata
@ 2010-09-22 11:22   ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-22 11:22 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:23PM +0900, Isaku Yamahata wrote:
> pcie switch downstream port.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> Changes v2 -> v3:
> - compilation adjustment.

Same comments as for upstream apply here.

> ---
>  Makefile.objs        |    1 +
>  hw/pcie_downstream.c |  218 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pcie_downstream.h |   33 ++++++++
>  3 files changed, 252 insertions(+), 0 deletions(-)
>  create mode 100644 hw/pcie_downstream.c
>  create mode 100644 hw/pcie_downstream.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 72ca8be..baff9ec 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -140,6 +140,7 @@ hw-obj-y =
>  hw-obj-y += vl.o loader.o
>  hw-obj-y += virtio.o virtio-console.o
>  hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o pcie_upstream.o
> +hw-obj-y += pcie_downstream.o
>  hw-obj-y += watchdog.o
>  hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
>  hw-obj-$(CONFIG_ECC) += ecc.o
> diff --git a/hw/pcie_downstream.c b/hw/pcie_downstream.c
> new file mode 100644
> index 0000000..7a629ea
> --- /dev/null
> +++ b/hw/pcie_downstream.c
> @@ -0,0 +1,218 @@
> +/*
> + * pcie_downstream.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "pci_ids.h"
> +#include "msi.h"
> +#include "pcie.h"
> +#include "pcie_downstream.h"
> +
> +/* For now, TI XIO3130 is borrowed. need to get its own id? */
> +#define PCI_DEVICE_ID_TI_XIO3130D       0x8233  /* downstream port */
> +#define XIO3130_REVISION                0x1
> +#define XIO3130_MSI_OFFSET              0x70
> +#define XIO3130_MSI_SUPPORTED_FLAGS     PCI_MSI_FLAGS_64BIT
> +#define XIO3130_MSI_NR_VECTOR           1
> +#define XIO3130_SSVID_OFFSET            0x80
> +#define XIO3130_SSVID_SVID              0
> +#define XIO3130_SSVID_SSID              0
> +#define XIO3130_EXP_OFFSET              0x90
> +#define XIO3130_AER_OFFSET              0x100
> +
> +#define PCIE_DOWNSTREAM_VID             PCI_VENDOR_ID_TI
> +#define PCIE_DOWNSTREAM_DID             PCI_DEVICE_ID_TI_XIO3130D
> +#define PCIE_DOWNSTREAM_REVISION        XIO3130_REVISION
> +#define PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS     XIO3130_MSI_SUPPORTED_FLAGS
> +#define PCIE_DOWNSTREAM_MSI_NR_VECTOR   XIO3130_MSI_NR_VECTOR
> +#define PCIE_DOWNSTREAM_MSI_OFFSET      XIO3130_MSI_OFFSET
> +#define PCIE_DOWNSTREAM_SSVID_OFFSET    XIO3130_SSVID_OFFSET
> +#define PCIE_DOWNSTREAM_SVID            XIO3130_SSVID_SVID
> +#define PCIE_DOWNSTREAM_SSID            XIO3130_SSVID_SSID
> +#define PCIE_DOWNSTREAM_EXP_OFFSET      XIO3130_EXP_OFFSET
> +#define PCIE_DOWNSTREAM_AER_OFFSET      XIO3130_AER_OFFSET
> +
> +static void pcie_downstream_write_config(PCIDevice *d, uint32_t address,
> +                                         uint32_t val, int len)
> +{
> +    uint16_t sltctl =
> +        pci_get_word(d->config + pci_pcie_cap(d) + PCI_EXP_SLTCTL);
> +    uint32_t uncorsta =
> +        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_UNCOR_STATUS);
> +
> +    pci_bridge_write_config(d, address, val, len);
> +    pcie_cap_flr_write_config(d, address, val, len);
> +    pcie_cap_slot_write_config(d, address, val, len, sltctl);
> +    msi_write_config(d, address, val, len);
> +    pcie_aer_write_config(d, address, val, len, uncorsta);
> +}
> +
> +static void pcie_downstream_reset(DeviceState *qdev)
> +{
> +    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
> +    msi_reset(d);
> +    pcie_cap_deverr_reset(d);
> +    pcie_cap_slot_reset(d);
> +    pcie_cap_ari_reset(d);
> +    pci_bridge_reset(qdev);
> +}
> +
> +static void pcie_downstream_flr(PCIDevice *d)
> +{
> +    /* TODO: not enabled until qdev reset clean up
> +       waiting for Anthony's qdev cealn up */
> +#if 0
> +    /* So far, sticky bit registers or register which must be preserved
> +       over FLR aren't emulated. So just reset this device. */
> +    pci_device_reset(d);
> +#endif
> +}
> +
> +static int pcie_downstream_initfn(PCIDevice *d)
> +{
> +    PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
> +    PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
> +    PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
> +    int rc;
> +
> +    rc = pci_bridge_initfn(d);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    pcie_port_init_reg(d);
> +    pci_config_set_vendor_id(d->config, PCIE_DOWNSTREAM_VID);
> +    pci_config_set_device_id(d->config, PCIE_DOWNSTREAM_DID);
> +    d->config[PCI_REVISION_ID] = PCIE_DOWNSTREAM_REVISION;
> +
> +    rc = msi_init(d, PCIE_DOWNSTREAM_MSI_OFFSET, PCIE_DOWNSTREAM_MSI_NR_VECTOR,
> +                  PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> +                  PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    rc = pci_bridge_ssvid_init(d, PCIE_DOWNSTREAM_SSVID_OFFSET,
> +                               PCIE_DOWNSTREAM_SVID, PCIE_DOWNSTREAM_SSID);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    rc = pcie_cap_init(d, PCIE_DOWNSTREAM_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
> +                       p->port);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    pcie_cap_flr_init(d, &pcie_downstream_flr);
> +    pcie_cap_deverr_init(d);
> +    pcie_cap_slot_init(d, s->slot);
> +    pcie_chassis_create(s->chassis);
> +    rc = pcie_chassis_add_slot(s);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    pcie_cap_ari_init(d);
> +    pcie_aer_init(d, PCIE_DOWNSTREAM_AER_OFFSET);
> +
> +    return 0;
> +}
> +
> +static int pcie_downstream_exitfn(PCIDevice *d)
> +{
> +    pcie_aer_exit(d);
> +    msi_uninit(d);
> +    pcie_cap_exit(d);
> +    return pci_bridge_exitfn(d);
> +}
> +
> +PCIESlot *pcie_downstream_init(PCIBus *bus,
> +                               int devfn, bool multifunction,
> +                               const char *bus_name, pci_map_irq_fn map_irq,
> +                               uint8_t port, uint8_t chassis, uint16_t slot)
> +{
> +    PCIDevice *d;
> +    PCIBridge *br;
> +    DeviceState *qdev;
> +
> +    d = pci_create_multifunction(bus, devfn, multifunction,
> +                                 PCIE_DOWNSTREAM_PORT);
> +    if (!d) {
> +        return NULL;
> +    }
> +    br = DO_UPCAST(PCIBridge, dev, d);
> +
> +    qdev = &br->dev.qdev;
> +    pci_bridge_map_irq(br, bus_name, map_irq);
> +    qdev_prop_set_uint8(qdev, "port", port);
> +    qdev_prop_set_uint8(qdev, "chassis", chassis);
> +    qdev_prop_set_uint16(qdev, "slot", slot);
> +    qdev_init_nofail(qdev);
> +
> +    return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br));
> +}
> +
> +static const VMStateDescription vmstate_pcie_downstream = {
> +    .name = "pcie-downstream-port",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCIE_DEVICE(port.br.dev, PCIESlot),
> +        VMSTATE_STRUCT(port.br.dev.exp.aer_log, PCIESlot, 0,
> +                       vmstate_pcie_aer_log, PCIE_AERLog),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo pcie_downstream_info = {
> +    .qdev.name = PCIE_DOWNSTREAM_PORT,
> +    .qdev.desc = "Downstream Port of PCI Express Switch",
> +    .qdev.size = sizeof(PCIESlot),
> +    .qdev.reset = pcie_downstream_reset,
> +    .qdev.vmsd = &vmstate_pcie_downstream,
> +
> +    .is_express = 1,
> +    .is_bridge = 1,
> +    .config_write = pcie_downstream_write_config,
> +    .init = pcie_downstream_initfn,
> +    .exit = pcie_downstream_exitfn,
> +
> +    .qdev.props = (Property[]) {
> +        DEFINE_PROP_UINT8("port", PCIESlot, port.port, 0),
> +        DEFINE_PROP_UINT8("chassis", PCIESlot, chassis, 0),
> +        DEFINE_PROP_UINT16("slot", PCIESlot, slot, 0),
> +        DEFINE_PROP_UINT16("aer_log_max", PCIESlot,
> +                           port.br.dev.exp.aer_log.log_max,
> +                           PCIE_AER_LOG_MAX_DEFAULT),
> +        DEFINE_PROP_END_OF_LIST(),
> +    }
> +};
> +
> +static void pcie_downstream_register(void)
> +{
> +    pci_qdev_register(&pcie_downstream_info);
> +}
> +
> +device_init(pcie_downstream_register);
> +
> +/*
> + * Local variables:
> + *  c-indent-level: 4
> + *  c-basic-offset: 4
> + *  tab-width: 8
> + *  indent-tab-mode: nil
> + * End:
> + */
> diff --git a/hw/pcie_downstream.h b/hw/pcie_downstream.h
> new file mode 100644
> index 0000000..686fdac
> --- /dev/null
> +++ b/hw/pcie_downstream.h
> @@ -0,0 +1,33 @@
> +/*
> + * pcie_downstream.h
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_PCIE_DOWNSTREAM_H
> +#define QEMU_PCIE_DOWNSTREAM_H
> +
> +#include "pcie_port.h"
> +
> +#define PCIE_DOWNSTREAM_PORT    "pcie-downstream-port"
> +
> +PCIESlot *pcie_downstream_init(PCIBus *bus,
> +                               int devfn, bool multifunction,
> +                               const char *bus_name, pci_map_irq_fn map_irq,
> +                               uint8_t port, uint8_t chassis, uint16_t slot);
> +
> +#endif /* QEMU_PCIE_DOWNSTREAM_H */
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 08/13] pcie root port: implement pcie root port Isaku Yamahata
@ 2010-09-22 11:25   ` Michael S. Tsirkin
  2010-09-24  5:38     ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-22 11:25 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:21PM +0900, Isaku Yamahata wrote:
> pcie root port.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> Changes v2 -> v3:
> - compilation adjustment.

so this is a specific intel root port, lets
name file and rotines appropriately.

> ---
>  Makefile.objs  |    2 +-
>  hw/pcie_root.c |  240 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pcie_root.h |   32 ++++++++
>  3 files changed, 273 insertions(+), 1 deletions(-)
>  create mode 100644 hw/pcie_root.c
>  create mode 100644 hw/pcie_root.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 6c3b84a..7e81b57 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
>  # PCI watchdog devices
>  hw-obj-y += wdt_i6300esb.o
>  
> -hw-obj-y += pcie.o pcie_aer.o pcie_port.o
> +hw-obj-y += pcie.o pcie_aer.o pcie_port.o pcie_root.o
>  hw-obj-y += msix.o msi.o
>  
>  # PCI network cards
> diff --git a/hw/pcie_root.c b/hw/pcie_root.c
> new file mode 100644
> index 0000000..9255bed
> --- /dev/null
> +++ b/hw/pcie_root.c
> @@ -0,0 +1,240 @@
> +/*
> + * pcie_root.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "pci_ids.h"
> +#include "msi.h"
> +#include "pcie.h"
> +#include "pcie_root.h"
> +
> +/* For now, Intel X58 IOH corporate deice exporess* root port.
> +   need to get its own id? */
> +#define PCI_DEVICE_ID_IOH_EPORT         0x3420  /* D0:F0 express mode */
> +#define PCI_DEVICE_ID_IOH_REV           0x2
> +#define IOH_EP_SSVID_OFFSET             0x40
> +#define IOH_EP_SSVID_SVID               PCI_VENDOR_ID_INTEL
> +#define IOH_EP_SSVID_SSID               0
> +#define IOH_EP_MSI_OFFSET               0x60
> +#define IOH_EP_MSI_SUPPORTED_FLAGS      PCI_MSI_FLAGS_MASKBIT
> +#define IOH_EP_MSI_NR_VECTOR            2
> +#define IOH_EP_EXP_OFFSET               0x90
> +#define IOH_EP_AER_OFFSET               0x100
> +
> +#define PCIE_ROOT_VID                   PCI_VENDOR_ID_INTEL
> +#define PCIE_ROOT_DID                   PCI_DEVICE_ID_IOH_EPORT
> +#define PCIE_ROOT_REV                   PCI_DEVICE_ID_IOH_REV
> +#define PCIE_ROOT_SSVID_OFFSET          IOH_EP_SSVID_OFFSET
> +#define PCIE_ROOT_SVID                  IOH_EP_SSVID_SVID
> +#define PCIE_ROOT_SSID                  IOH_EP_SSVID_SSID
> +#define PCIE_ROOT_MSI_SUPPORTED_FLAGS   IOH_EP_MSI_SUPPORTED_FLAGS
> +#define PCIE_ROOT_MSI_NR_VECTOR         IOH_EP_MSI_NR_VECTOR
> +#define PCIE_ROOT_MSI_OFFSET            IOH_EP_MSI_OFFSET
> +#define PCIE_ROOT_EXP_OFFSET            IOH_EP_EXP_OFFSET
> +#define PCIE_ROOT_AER_OFFSET            IOH_EP_AER_OFFSET
> +
> +/*
> + * If two MSI vector are allocated, Advanced Error Interrupt Message Number
> + * is 1. otherwise 0.
> + * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
> + */
> +static uint8_t pcie_root_aer_vector(const PCIDevice *d)
> +{
> +    switch (msi_nr_vectors_allocated(d)) {
> +    case 1:
> +        return 0;
> +    case 2:
> +        return 1;
> +    case 4:
> +    case 8:
> +    case 16:
> +    case 32:
> +    default:
> +        break;
> +    }
> +    abort();
> +    return 0;
> +}
> +
> +static void pcie_root_aer_vector_update(PCIDevice *d)
> +{
> +    pcie_aer_root_set_vector(d, pcie_root_aer_vector(d));
> +}
> +
> +static void pcie_root_write_config(PCIDevice *d,
> +                                   uint32_t address, uint32_t val, int len)
> +{
> +    uint16_t sltctl =
> +        pci_get_word(d->config + pci_pcie_cap(d) + PCI_EXP_SLTCTL);
> +    uint32_t uncorsta =
> +        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_UNCOR_STATUS);
> +    uint32_t root_cmd =
> +        pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_ROOT_COMMAND);
> +
> +    pci_bridge_write_config(d, address, val, len);
> +    msi_write_config(d, address, val, len);
> +    pcie_root_aer_vector_update(d);
> +    pcie_cap_slot_write_config(d, address, val, len, sltctl);
> +    pcie_aer_write_config(d, address, val, len, uncorsta);
> +    pcie_aer_root_write_config(d, address, val, len, root_cmd);
> +}
> +
> +static void pcie_root_reset(DeviceState *qdev)
> +{
> +    PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
> +    msi_reset(d);
> +    pcie_root_aer_vector_update(d);
> +    pcie_cap_root_reset(d);
> +    pcie_cap_deverr_reset(d);
> +    pcie_cap_slot_reset(d);
> +    pcie_aer_root_reset(d);
> +    pci_bridge_reset(qdev);
> +}
> +
> +static int pcie_root_initfn(PCIDevice *d)
> +{
> +    PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
> +    PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
> +    PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
> +    int rc;
> +
> +    rc = pci_bridge_initfn(d);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    d->config[PCI_REVISION_ID] = PCIE_ROOT_REV;
> +    pcie_port_init_reg(d);
> +
> +    pci_config_set_vendor_id(d->config, PCIE_ROOT_VID);
> +    pci_config_set_device_id(d->config, PCIE_ROOT_DID);
> +
> +    rc = pci_bridge_ssvid_init(d, PCIE_ROOT_SSVID_OFFSET,
> +                               PCIE_ROOT_SVID, PCIE_ROOT_SSID);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    rc = msi_init(d, PCIE_ROOT_MSI_OFFSET, PCIE_ROOT_MSI_NR_VECTOR,
> +                  PCIE_ROOT_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> +                  PCIE_ROOT_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    rc = pcie_cap_init(d, PCIE_ROOT_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT,
> +                       p->port);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    pcie_cap_deverr_init(d);
> +    pcie_cap_slot_init(d, s->slot);
> +    pcie_chassis_create(s->chassis);
> +    rc = pcie_chassis_add_slot(s);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +    pcie_cap_root_init(d);
> +    pcie_aer_init(d, PCIE_ROOT_AER_OFFSET);
> +    pcie_aer_root_init(d);
> +    pcie_root_aer_vector_update(d);
> +    return 0;
> +}
> +
> +static int pcie_root_exitfn(PCIDevice *d)
> +{
> +    pcie_aer_exit(d);
> +    msi_uninit(d);
> +    pcie_cap_exit(d);
> +    return pci_bridge_exitfn(d);
> +}
> +
> +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> +                         const char *bus_name, pci_map_irq_fn map_irq,
> +                         uint8_t port, uint8_t chassis, uint16_t slot)
> +{
> +    PCIDevice *d;
> +    PCIBridge *br;
> +    DeviceState *qdev;
> +
> +    d = pci_create_multifunction(bus, devfn, multifunction, PCIE_ROOT_PORT);
> +    if (!d) {
> +        return NULL;
> +    }
> +    br = DO_UPCAST(PCIBridge, dev, d);
> +
> +    qdev = &br->dev.qdev;
> +    pci_bridge_map_irq(br, bus_name, map_irq);
> +    qdev_prop_set_uint8(qdev, "port", port);
> +    qdev_prop_set_uint8(qdev, "chassis", chassis);
> +    qdev_prop_set_uint16(qdev, "slot", slot);
> +    qdev_init_nofail(qdev);
> +
> +    return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br));
> +}
> +
> +static const VMStateDescription vmstate_pcie_root = {
> +    .name = "pcie-root-port",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCIE_DEVICE(port.br.dev, PCIESlot),
> +        VMSTATE_STRUCT(port.br.dev.exp.aer_log, PCIESlot, 0,
> +                       vmstate_pcie_aer_log, PCIE_AERLog),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo pcie_root_info = {
> +    .qdev.name = PCIE_ROOT_PORT,
> +    .qdev.desc = "Root Port of PCI Express Switch",
> +    .qdev.size = sizeof(PCIESlot),
> +    .qdev.reset = pcie_root_reset,
> +    .qdev.vmsd = &vmstate_pcie_root,
> +
> +    .is_express = 1,
> +    .is_bridge = 1,
> +    .config_write = pcie_root_write_config,
> +    .init = pcie_root_initfn,
> +    .exit = pcie_root_exitfn,
> +
> +    .qdev.props = (Property[]) {
> +        DEFINE_PROP_UINT8("port", PCIESlot, port.port, 0),
> +        DEFINE_PROP_UINT8("chassis", PCIESlot, chassis, 0),
> +        DEFINE_PROP_UINT16("slot", PCIESlot, slot, 0),
> +        DEFINE_PROP_UINT16("aer_log_max", PCIESlot,
> +                           port.br.dev.exp.aer_log.log_max,
> +                           PCIE_AER_LOG_MAX_DEFAULT),
> +        DEFINE_PROP_END_OF_LIST(),
> +    }
> +};
> +
> +static void pcie_root_register(void)
> +{
> +    pci_qdev_register(&pcie_root_info);
> +}
> +
> +device_init(pcie_root_register);
> +
> +/*
> + * Local variables:
> + *  c-indent-level: 4
> + *  c-basic-offset: 4
> + *  tab-width: 8
> + *  indent-tab-mode: nil
> + * End:
> + */
> diff --git a/hw/pcie_root.h b/hw/pcie_root.h
> new file mode 100644
> index 0000000..9c5d4d0
> --- /dev/null
> +++ b/hw/pcie_root.h
> @@ -0,0 +1,32 @@
> +/*
> + * pcie_root.h
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_PCIE_ROOT_H
> +#define QEMU_PCIE_ROOT_H
> +
> +#include "pcie_port.h"
> +
> +#define PCIE_ROOT_PORT    "pcie-root-port"
> +
> +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> +                         const char *bus_name, pci_map_irq_fn map_irq,
> +                         uint8_t port, uint8_t chassis, uint16_t slot);
> +

I am a bit unhappy about all these _init functions.
Can devices be created with qdev? If they were
it would be possible to configure the system completely
from qemu command line.

> +#endif /* QEMU_PCIE_ROOT_H */
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 11/13] pcie/hotplug: glue pushing attention button command. pcie_abp
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 11/13] pcie/hotplug: glue pushing attention button command. pcie_abp Isaku Yamahata
@ 2010-09-22 11:30   ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-22 11:30 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:24PM +0900, Isaku Yamahata wrote:
> glue to pcie_abp monitor command.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

Let's make the name a bit more descriptive, ok?
Also: this seems related to pci_plug/pci_unplug
commands Anthony proposed.
These could work generally for pci and pcie.

Want to take a stub at implementing?

Given these, will we still want the low level commands?

> ---
>  hw/pcie_port.c  |   82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-monitor.hx |   14 +++++++++
>  sysemu.h        |    4 +++
>  3 files changed, 100 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pcie_port.c b/hw/pcie_port.c
> index e7c3cef..641c458 100644
> --- a/hw/pcie_port.c
> +++ b/hw/pcie_port.c
> @@ -18,6 +18,10 @@
>   * with this program; if not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include "qemu-objects.h"
> +#include "sysemu.h"
> +#include "monitor.h"
> +#include "pcie.h"
>  #include "pcie_port.h"
>  
>  void pcie_port_init_reg(PCIDevice *d)
> @@ -104,3 +108,81 @@ void pcie_chassis_del_slot(PCIESlot *s)
>  {
>      QLIST_REMOVE(s, next);
>  }
> +
> +/**************************************************************************
> + * glue for qemu monitor
> + */
> +
> +/* Parse [<chassis>.]<slot>, return -1 on error */
> +static int pcie_parse_slot_addr(const char* slot_addr,
> +                                uint8_t *chassisp, uint16_t *slotp)
> +{
> +    const char *p;
> +    char *e;
> +    unsigned long val;
> +    unsigned long chassis = 0;
> +    unsigned long slot;
> +
> +    p = slot_addr;
> +    val = strtoul(p, &e, 0);
> +    if (e == p) {
> +        return -1;
> +    }
> +    if (*e == '.') {
> +        chassis = val;
> +        p = e + 1;
> +        val = strtoul(p, &e, 0);
> +        if (e == p) {
> +            return -1;
> +        }
> +    }
> +    slot = val;
> +
> +    if (*e) {
> +        return -1;
> +    }
> +
> +    if (chassis > 0xff || slot > 0xffff) {
> +        return -1;
> +    }
> +
> +    *chassisp = chassis;
> +    *slotp = slot;
> +    return 0;
> +}
> +
> +void pcie_attention_button_push_print(Monitor *mon, const QObject *data)
> +{
> +    QDict *qdict;
> +
> +    assert(qobject_type(data) == QTYPE_QDICT);
> +    qdict = qobject_to_qdict(data);
> +
> +    monitor_printf(mon, "OK chassis %d, slot %d\n",
> +                   (int) qdict_get_int(qdict, "chassis"),
> +                   (int) qdict_get_int(qdict, "slot"));
> +}
> +
> +int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
> +                               QObject **ret_data)
> +{
> +    const char* pcie_slot = qdict_get_str(qdict, "pcie_slot");
> +    uint8_t chassis;
> +    uint16_t slot;
> +    PCIESlot *s;
> +
> +    if (pcie_parse_slot_addr(pcie_slot, &chassis, &slot) < 0) {
> +        monitor_printf(mon, "invalid pcie slot address %s\n", pcie_slot);
> +        return -1;
> +    }
> +    s = pcie_chassis_find_slot(chassis, slot);
> +    if (!s) {
> +        monitor_printf(mon, "slot is not found. %s\n", pcie_slot);
> +        return -1;
> +    }
> +    pcie_cap_slot_push_attention_button(&s->port.br.dev);
> +    *ret_data = qobject_from_jsonf("{ 'chassis': %d, 'slot': %d}",
> +                                   chassis, slot);
> +    assert(*ret_data);
> +    return 0;
> +}
> diff --git a/qemu-monitor.hx b/qemu-monitor.hx
> index 2af3de6..02fbda1 100644
> --- a/qemu-monitor.hx
> +++ b/qemu-monitor.hx
> @@ -1154,6 +1154,20 @@ Hot remove PCI device.
>  ETEXI
>  
>      {
> +        .name       = "pcie_abp",
> +        .args_type  = "pcie_slot:s",
> +        .params     = "[<chassis>.]<slot>",
> +        .help       = "push pci express attention button",
> +        .user_print  = pcie_attention_button_push_print,
> +        .mhandler.cmd_new = pcie_attention_button_push,
> +    },
> +
> +STEXI
> +@item pcie_abp
> +Push PCI express attention button
> +ETEXI
> +
> +    {
>          .name       = "host_net_add",
>          .args_type  = "device:s,opts:s?",
>          .params     = "tap|user|socket|vde|dump [options]",
> diff --git a/sysemu.h b/sysemu.h
> index 9c988bb..cca411d 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -150,6 +150,10 @@ extern unsigned int nb_prom_envs;
>  void pci_device_hot_add(Monitor *mon, const QDict *qdict);
>  void drive_hot_add(Monitor *mon, const QDict *qdict);
>  void do_pci_device_hot_remove(Monitor *mon, const QDict *qdict);
> +/* pcie hotplug */
> +void pcie_attention_button_push_print(Monitor *mon, const QObject *data);
> +int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
> +                               QObject **ret_data);
>  
>  /* serial ports */
>  
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability.
  2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability Isaku Yamahata
@ 2010-09-22 11:50   ` Michael S. Tsirkin
  2010-09-24  2:50     ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-22 11:50 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 15, 2010 at 02:38:19PM +0900, Isaku Yamahata wrote:
> This patch implements helper functions for pcie aer capability
> which will be used later.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> Changes v2 -> v3:
> - split out from pcie.[ch] to pcie_aer.[ch] to make the files sorter.
> - embeded PCIExpressDevice into PCIDevice.
> - CodingStyle fix
> ---
>  Makefile.objs |    2 +-
>  hw/pci.h      |    7 +
>  hw/pcie.h     |    6 +
>  hw/pcie_aer.c |  796 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pcie_aer.h |  105 ++++++++
>  qemu-common.h |    3 +
>  6 files changed, 918 insertions(+), 1 deletions(-)
>  create mode 100644 hw/pcie_aer.c
>  create mode 100644 hw/pcie_aer.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index eeb5134..68bcc48 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
>  # PCI watchdog devices
>  hw-obj-y += wdt_i6300esb.o
>  
> -hw-obj-y += pcie.o
> +hw-obj-y += pcie.o pcie_aer.o
>  hw-obj-y += msix.o msi.o
>  
>  # PCI network cards
> diff --git a/hw/pci.h b/hw/pci.h
> index 19e85f5..73bf901 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -401,6 +401,13 @@ static inline uint8_t pci_pcie_cap(const PCIDevice *d)
>      return pci_is_express(d) ? d->exp.exp_cap : 0;
>  }
>  
> +/* AER */
> +static inline uint16_t pcie_aer_cap(const PCIDevice *d)
> +{
> +    assert(pci_is_express(d));
> +    return d->exp.aer_cap;
> +}
> +

So looking at how this is used, I think this is
a wrong API. We should have higher level APIs: pcie_get_uncor_err
or something like that. this is what devices really need, right?


>  /* These are not pci specific. Should move into a separate header.
>   * Only pci.c uses them, so keep them here for now.
>   */
> diff --git a/hw/pcie.h b/hw/pcie.h
> index 37713dc..febcbc2 100644
> --- a/hw/pcie.h
> +++ b/hw/pcie.h
> @@ -23,6 +23,7 @@
>  
>  #include "hw.h"
>  #include "pcie_regs.h"
> +#include "pcie_aer.h"
>  
>  enum PCIExpressIndicator {
>      /* for attention and power indicator */
> @@ -52,6 +53,11 @@ struct PCIExpressDevice {
>  
>      /* FLR */
>      pcie_flr_fn flr;
> +
> +    /* AER */
> +    uint16_t aer_cap;
> +    pcie_aer_errmsg_fn aer_errmsg;
> +    PCIE_AERLog aer_log;
>  };
>  
>  void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level);
> diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> new file mode 100644
> index 0000000..9e3f48e
> --- /dev/null
> +++ b/hw/pcie_aer.c
> @@ -0,0 +1,796 @@
> +/*
> + * pcie_aer.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "sysemu.h"
> +#include "pci_bridge.h"
> +#include "pcie.h"
> +#include "msix.h"
> +#include "msi.h"
> +#include "pci_internals.h"
> +#include "pcie_regs.h"
> +
> +//#define DEBUG_PCIE
> +#ifdef DEBUG_PCIE
> +# define PCIE_DPRINTF(fmt, ...)                                         \
> +    fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
> +#else
> +# define PCIE_DPRINTF(fmt, ...) do {} while (0)
> +#endif
> +#define PCIE_DEV_PRINTF(dev, fmt, ...)                                  \
> +    PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
> +
> +static void pcie_aer_clear_error(PCIDevice *dev);
> +static uint8_t pcie_aer_root_get_vector(PCIDevice *dev);
> +static AER_ERR_MSG_RESULT
> +pcie_aer_errmsg_alldev(PCIDevice *dev, const PCIE_AERErrMsg *msg);
> +static AER_ERR_MSG_RESULT
> +pcie_aer_errmsg_vbridge(PCIDevice *dev, const PCIE_AERErrMsg *msg);
> +
> +/* From 6.2.7 Error Listing and Rules. Table 6-2, 6-3 and 6-4 */
> +static enum PCIE_AER_SEVERITY pcie_aer_uncor_default_severity(uint32_t status)
> +{
> +    switch (status) {
> +    case PCI_ERR_UNC_INTN:
> +    case PCI_ERR_UNC_DLP:
> +    case PCI_ERR_UNC_SDN:
> +    case PCI_ERR_UNC_RX_OVER:
> +    case PCI_ERR_UNC_FCP:
> +    case PCI_ERR_UNC_MALF_TLP:
> +        return AER_ERR_FATAL;
> +    case PCI_ERR_UNC_POISON_TLP:
> +    case PCI_ERR_UNC_ECRC:
> +    case PCI_ERR_UNC_UNSUP:
> +    case PCI_ERR_UNC_COMP_TIME:
> +    case PCI_ERR_UNC_COMP_ABORT:
> +    case PCI_ERR_UNC_UNX_COMP:
> +    case PCI_ERR_UNC_ACSV:
> +    case PCI_ERR_UNC_MCBTLP:
> +    case PCI_ERR_UNC_ATOP_EBLOCKED:
> +    case PCI_ERR_UNC_TLP_PRF_BLOCKED:
> +        return AER_ERR_NONFATAL;
> +    default:
> +        break;
> +    }
> +    abort();
> +    return AER_ERR_FATAL;
> +}
> +
> +static uint32_t pcie_aer_log_next(uint32_t i, uint32_t max)


This is internal function, give it a short name like
aer_log_next and

> +{
> +    return (i + 1) % max;
> +}
> +
> +static bool pcie_aer_log_empty_index(uint32_t producer, uint32_t consumer)
> +{
> +    return producer == consumer;
> +}
> +
> +static bool pcie_aer_log_empty(PCIE_AERLog *aer_log)
> +{
> +    return pcie_aer_log_empty_index(aer_log->producer, aer_log->consumer);
> +}
> +
> +static bool pcie_aer_log_full(PCIE_AERLog *aer_log)
> +{
> +    return pcie_aer_log_next(aer_log->producer, aer_log->log_max) ==
> +        aer_log->consumer;
> +}
> +
> +static uint32_t pcie_aer_log_add(PCIE_AERLog *aer_log)
> +{
> +    uint32_t i = aer_log->producer;
> +    aer_log->producer = pcie_aer_log_next(aer_log->producer, aer_log->log_max);
> +    return i;
> +}
> +
> +static uint32_t pcie_aer_log_del(PCIE_AERLog *aer_log)
> +{
> +    uint32_t i = aer_log->consumer;
> +    aer_log->consumer = pcie_aer_log_next(aer_log->consumer, aer_log->log_max);
> +    return i;
> +}
> +
> +static int pcie_aer_log_add_err(PCIE_AERLog *aer_log, const PCIE_AERErr *err)
> +{
> +    uint32_t i;
> +    if (pcie_aer_log_full(aer_log)) {
> +        return -1;
> +    }
> +    i = pcie_aer_log_add(aer_log);
> +    memcpy(&aer_log->log[i], err, sizeof(*err));
> +    return 0;
> +}
> +
> +static const PCIE_AERErr* pcie_aer_log_del_err(PCIE_AERLog *aer_log)
> +{
> +    uint32_t i;
> +    assert(!pcie_aer_log_empty(aer_log));
> +    i = pcie_aer_log_del(aer_log);
> +    return &aer_log->log[i];
> +}
> +
> +static void pcie_aer_log_clear_all_err(PCIE_AERLog *aer_log)
> +{
> +    aer_log->producer = 0;
> +    aer_log->consumer = 0;
> +}
> +
> +void pcie_aer_init(PCIDevice *dev, uint16_t offset)
> +{
> +    PCIExpressDevice *exp;
> +
> +    pci_set_word(dev->wmask + PCI_COMMAND,
> +                 pci_get_word(dev->wmask + PCI_COMMAND) | PCI_COMMAND_SERR);
> +    pci_set_word(dev->w1cmask + PCI_STATUS,
> +                 pci_get_word(dev->w1cmask + PCI_STATUS) |
> +                 PCI_STATUS_SIG_SYSTEM_ERROR);
> +
> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,
> +                        offset, PCI_ERR_SIZEOF);
> +    exp = &dev->exp;
> +    exp->aer_cap = offset;
> +    if (dev->exp.aer_log.log_max == PCIE_AER_LOG_MAX_UNSET) {
> +        dev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
> +    }
> +    if (dev->exp.aer_log.log_max > PCIE_AER_LOG_MAX_MAX) {
> +        dev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_MAX;
> +    }
> +    dev->exp.aer_log.log = qemu_mallocz(sizeof(dev->exp.aer_log.log[0]) *
> +                                        dev->exp.aer_log.log_max);
> +
> +    /* On reset PCI_ERR_CAP_MHRE is disabled
> +     * PCI_ERR_CAP_MHRE is RWS so that reset doesn't affect related
> +     * registers
> +     */
> +    pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
> +                 PCI_ERR_UNC_SUPPORTED);
> +
> +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
> +                 PCI_ERR_UNC_SUPPORTED);
> +
> +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
> +                 PCI_ERR_UNC_SEVERITY_DEFAULT);
> +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_SEVER,
> +                 PCI_ERR_UNC_SUPPORTED);
> +
> +    pci_set_long(dev->w1cmask + offset + PCI_ERR_COR_STATUS,
> +                 pci_get_long(dev->w1cmask + offset + PCI_ERR_COR_STATUS) |
> +                 PCI_ERR_COR_STATUS);
> +
> +    pci_set_long(dev->config + offset + PCI_ERR_COR_MASK,
> +                 PCI_ERR_COR_MASK_DEFAULT);
> +    pci_set_long(dev->wmask + offset + PCI_ERR_COR_MASK,
> +                 PCI_ERR_COR_SUPPORTED);
> +
> +    /* capabilities and control. multiple header logging is supported */
> +    if (dev->exp.aer_log.log_max > 0) {
> +        pci_set_long(dev->config + offset + PCI_ERR_CAP,
> +                     PCI_ERR_CAP_ECRC_GENC | PCI_ERR_CAP_ECRC_CHKC |
> +                     PCI_ERR_CAP_MHRC);
> +        pci_set_long(dev->wmask + offset + PCI_ERR_CAP,
> +                     PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE |
> +                     PCI_ERR_CAP_MHRE);
> +    } else {
> +        pci_set_long(dev->config + offset + PCI_ERR_CAP,
> +                     PCI_ERR_CAP_ECRC_GENC | PCI_ERR_CAP_ECRC_CHKC);
> +        pci_set_long(dev->wmask + offset + PCI_ERR_CAP,
> +                     PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
> +    }
> +
> +    switch (pcie_cap_get_type(dev)) {
> +    case PCI_EXP_TYPE_ROOT_PORT:
> +        /* this case will be set by pcie_aer_root_init() */
> +        /* fallthrough */
> +    case PCI_EXP_TYPE_DOWNSTREAM:
> +    case PCI_EXP_TYPE_UPSTREAM:
> +        pci_set_word(dev->wmask + PCI_BRIDGE_CONTROL,
> +                     pci_get_word(dev->wmask + PCI_BRIDGE_CONTROL) |
> +                     PCI_BRIDGE_CTL_SERR);
> +        pci_set_long(dev->w1cmask + PCI_STATUS,
> +                     pci_get_long(dev->w1cmask + PCI_STATUS) |
> +                     PCI_SEC_STATUS_RCV_SYSTEM_ERROR);
> +        exp->aer_errmsg = pcie_aer_errmsg_vbridge;
> +        break;
> +    default:
> +        exp->aer_errmsg = pcie_aer_errmsg_alldev;
> +        break;
> +    }
> +}
> +
> +void pcie_aer_exit(PCIDevice *dev)
> +{
> +    pci_del_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_SIZEOF);
> +    qemu_free(dev->exp.aer_log.log);
> +}
> +
> +/* Multiple Header recording isn't implemented. Is it wanted? */
> +void pcie_aer_write_config(PCIDevice *dev,
> +                           uint32_t addr, uint32_t val, int len,
> +                           uint32_t uncorsta_prev)

Pls rename uncorsta_prev so its name tells us what it is.

> +{
> +    uint32_t pos = dev->exp.aer_cap;
> +
> +    /* uncorrectable */
> +    if (ranges_overlap(addr, len, pos + PCI_ERR_UNCOR_STATUS, 4)) {
> +        uint32_t written =
> +            pci_shift_long(addr, val, pos + PCI_ERR_UNCOR_STATUS) &
> +            PCI_ERR_UNC_SUPPORTED;
> +        uint32_t errcap = pci_get_long(dev->config + pos + PCI_ERR_CAP);
> +        uint32_t first_error = (1 << PCI_ERR_CAP_FEP(errcap));
> +
> +        if ((uncorsta_prev & first_error) && (written & first_error)) {
> +            pcie_aer_clear_error(dev);
> +        }

Is not this simple W1C? If yes we do not need custom code anymore?
If not pls make it writeable to avoid range checks.

> +    }
> +
> +    /* capability & control */
> +    if (ranges_overlap(addr, len, pos + PCI_ERR_CAP, 4)) {
> +        uint32_t err_cap = pci_get_long(dev->config + pos + PCI_ERR_CAP);
> +        if (!(err_cap & PCI_ERR_CAP_MHRE)) {
> +            pcie_aer_log_clear_all_err(&dev->exp.aer_log);
> +            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS,
> +                         PCI_ERR_UNC_SUPPORTED);
> +        } else {
> +            /* When multiple header recording is enabled, only the bit that
> +             * first error pointer indicates is cleared.
> +             * that is handled specifically.
> +             */
> +            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS, 0);

This is wrong I think: you do not change mask on each write.
Do it on setup.

> +        }
> +    }
> +}
> +
> +static inline void pcie_aer_errmsg(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> +{
> +    assert(pci_is_express(dev));
> +    assert(dev->exp.aer_errmsg);
> +    dev->exp.aer_errmsg(dev, msg);

Why do we want the indirection? Why not have users just call the function?

> +}
> +
> +static AER_ERR_MSG_RESULT
> +pcie_aer_errmsg_alldev(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> +{
> +    uint16_t cmd = pci_get_word(dev->config + PCI_COMMAND);
> +    bool transmit1 =
> +        pcie_aer_err_msg_is_uncor(msg) && (cmd & PCI_COMMAND_SERR);
> +    uint32_t pos = pci_pcie_cap(dev);
> +    uint32_t devctl = pci_get_word(dev->config + pos + PCI_EXP_DEVCTL);
> +    bool transmit2 = msg->severity & devctl;
> +    PCIDevice *parent_port;
> +
> +    if (transmit1) {
> +        if (pcie_aer_err_msg_is_uncor(msg)) {
> +            /* Signaled System Error */
> +            uint8_t *status = dev->config + PCI_STATUS;
> +            pci_set_word(status,
> +                         pci_get_word(status) | PCI_STATUS_SIG_SYSTEM_ERROR);
> +        }
> +    }
> +
> +    if (!(transmit1 || transmit2)) {
> +        return AER_ERR_MSG_MASKED;
> +    }
> +
> +    /* send up error message */
> +    if (pci_is_express(dev) &&
> +        pcie_cap_get_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
> +        /* Root port notify system itself,
> +           or send the error message to root complex event collector. */
> +        /*
> +         * if root port is associated to event collector, set
> +         * parent_port = root complex event collector
> +         * For now root complex event collector isn't supported.
> +         */
> +        parent_port = NULL;
> +    } else {
> +        parent_port = pci_bridge_get_device(dev->bus);
> +    }
> +    if (parent_port) {
> +        if (!pci_is_express(parent_port)) {
> +            /* What to do? */
> +            return AER_ERR_MSG_MASKED;
> +        }
> +        pcie_aer_errmsg(parent_port, msg);
> +    }
> +    return AER_ERR_MSG_SENT;
> +}
> +
> +static AER_ERR_MSG_RESULT
> +pcie_aer_errmsg_vbridge(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> +{
> +    uint16_t bridge_control = pci_get_word(dev->config + PCI_BRIDGE_CONTROL);
> +
> +    if (pcie_aer_err_msg_is_uncor(msg)) {
> +        /* Received System Error */
> +        uint8_t *sec_status = dev->config + PCI_SEC_STATUS;
> +        pci_set_word(sec_status,
> +                     pci_get_word(sec_status) |
> +                     PCI_SEC_STATUS_RCV_SYSTEM_ERROR);
> +    }
> +
> +    if (!(bridge_control & PCI_BRIDGE_CTL_SERR)) {
> +        return AER_ERR_MSG_MASKED;
> +    }
> +    return pcie_aer_errmsg_alldev(dev, msg);
> +}
> +
> +static AER_ERR_MSG_RESULT
> +pcie_aer_errmsg_root_port(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> +{
> +    AER_ERR_MSG_RESULT ret;
> +    uint16_t cmd;
> +    uint8_t *aer_cap;
> +    uint32_t root_cmd;
> +    uint32_t root_sta;
> +    bool trigger;
> +
> +    ret = pcie_aer_errmsg_vbridge(dev, msg);
> +    if (ret != AER_ERR_MSG_SENT) {
> +        return ret;
> +    }
> +
> +    ret = AER_ERR_MSG_MASKED;
> +    cmd = pci_get_word(dev->config + PCI_COMMAND);
> +    aer_cap = dev->config + pcie_aer_cap(dev);
> +    root_cmd = pci_get_long(aer_cap + PCI_ERR_ROOT_COMMAND);
> +    root_sta = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
> +    trigger = false;
> +
> +    if (cmd & PCI_COMMAND_SERR) {
> +        /* System Error. Platform Specific */
> +        /* ret = AER_ERR_MSG_SENT; */
> +    }
> +
> +    /* Errro Message Received: Root Error Status register */
> +    switch (msg->severity) {
> +    case AER_ERR_COR:
> +        if (root_sta & PCI_ERR_ROOT_COR_RCV) {
> +            root_sta |= PCI_ERR_ROOT_MULTI_COR_RCV;
> +        } else {
> +            if (root_cmd & PCI_ERR_ROOT_CMD_COR_EN) {
> +                trigger = true;
> +            }
> +            pci_set_word(aer_cap + PCI_ERR_ROOT_COR_SRC, msg->source_id);
> +        }
> +        root_sta |= PCI_ERR_ROOT_COR_RCV;
> +        break;
> +    case AER_ERR_NONFATAL:
> +        if (!(root_sta & PCI_ERR_ROOT_NONFATAL_RCV) &&
> +            root_cmd & PCI_ERR_ROOT_CMD_NONFATAL_EN) {
> +            trigger = true;
> +        }
> +        root_sta |= PCI_ERR_ROOT_NONFATAL_RCV;
> +        break;
> +    case AER_ERR_FATAL:
> +        if (!(root_sta & PCI_ERR_ROOT_FATAL_RCV) &&
> +            root_cmd & PCI_ERR_ROOT_CMD_FATAL_EN) {
> +            trigger = true;
> +        }
> +        if (!(root_sta & PCI_ERR_ROOT_UNCOR_RCV)) {
> +            root_sta |= PCI_ERR_ROOT_FIRST_FATAL;
> +        }
> +        root_sta |= PCI_ERR_ROOT_FATAL_RCV;
> +        break;
> +    }
> +    if (pcie_aer_err_msg_is_uncor(msg)) {
> +        if (root_sta & PCI_ERR_ROOT_UNCOR_RCV) {
> +            root_sta |= PCI_ERR_ROOT_MULTI_UNCOR_RCV;
> +        } else {
> +            pci_set_word(aer_cap + PCI_ERR_ROOT_SRC, msg->source_id);
> +        }
> +        root_sta |= PCI_ERR_ROOT_UNCOR_RCV;
> +    }
> +    pci_set_long(aer_cap + PCI_ERR_ROOT_STATUS, root_sta);
> +
> +    if (root_cmd & msg->severity) {
> +        /* Error Interrupt(INTx or MSI) */
> +        pcie_notify(dev, pcie_aer_root_get_vector(dev), trigger, 1);
> +        ret = AER_ERR_MSG_SENT;
> +    }
> +    return ret;
> +}
> +
> +static void pcie_aer_update_log(PCIDevice *dev, const PCIE_AERErr *err)
> +{
> +    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
> +    uint8_t first_bit = ffsl(err->status) - 1;
> +    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
> +    int i;
> +    uint32_t dw;
> +
> +    errcap &= ~(PCI_ERR_CAP_FEP_MASK | PCI_ERR_CAP_TLP);
> +    errcap |= PCI_ERR_CAP_FEP(first_bit);
> +
> +    if (err->flags & PCIE_AER_ERR_HEADER_VALID) {
> +        for (i = 0; i < ARRAY_SIZE(err->header); ++i) {
> +            /* 7.10.8 Header Log Register */
> +            cpu_to_be32wu(&dw, err->header[i]);
> +            memcpy(aer_cap + PCI_ERR_HEADER_LOG + sizeof(err->header[0]) * i,
> +                   &dw, sizeof(dw));
> +        }
> +    } else {
> +        assert(!(err->flags & PCIE_AER_ERR_TLP_PRESENT));
> +        memset(aer_cap + PCI_ERR_HEADER_LOG, 0, sizeof(err->header));
> +    }
> +
> +    if ((err->flags & PCIE_AER_ERR_TLP_PRESENT) &&
> +        (pci_get_long(dev->config + pci_pcie_cap(dev) + PCI_EXP_DEVCTL2) &
> +         PCI_EXP_DEVCAP2_EETLPP)) {
> +        for (i = 0; i < ARRAY_SIZE(err->prefix); ++i) {
> +            /* 7.10.12 tlp prefix log register */
> +            cpu_to_be32wu(&dw, err->prefix[i]);
> +            memcpy(aer_cap + PCI_ERR_TLP_PREFIX_LOG +
> +                   sizeof(err->prefix[0]) * i, &dw, sizeof(dw));
> +        }
> +        errcap |= PCI_ERR_CAP_TLP;
> +    } else {
> +        memset(aer_cap + PCI_ERR_TLP_PREFIX_LOG, 0, sizeof(err->prefix));
> +    }
> +    pci_set_long(aer_cap + PCI_ERR_CAP, errcap);
> +}
> +
> +static void pcie_aer_clear_log(PCIDevice *dev)
> +{
> +    PCIE_AERErr *err;
> +    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
> +    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
> +
> +    errcap &= ~(PCI_ERR_CAP_FEP_MASK | PCI_ERR_CAP_TLP);
> +    pci_set_long(aer_cap + PCI_ERR_CAP, errcap);
> +
> +    memset(aer_cap + PCI_ERR_HEADER_LOG, 0, sizeof(err->header));
> +    memset(aer_cap + PCI_ERR_TLP_PREFIX_LOG, 0, sizeof(err->prefix));
> +}
> +
> +static int pcie_aer_record_error(PCIDevice *dev,
> +                                 const PCIE_AERErr *err)
> +{
> +    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
> +    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
> +    int fep = PCI_ERR_CAP_FEP(errcap);
> +
> +    if (errcap & PCI_ERR_CAP_MHRE &&
> +        (pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) & (1ULL << fep))) {
> +        /*  Not first error. queue error */
> +        if (pcie_aer_log_add_err(&dev->exp.aer_log, err) < 0) {
> +            /* overflow */
> +            return -1;
> +        }
> +        return 0;
> +    }
> +
> +    pcie_aer_update_log(dev, err);
> +    return 0;
> +}
> +
> +static void pcie_aer_clear_error(PCIDevice *dev)
> +{
> +    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
> +    uint32_t errcap = pci_get_long(aer_cap + PCI_ERR_CAP);
> +    uint32_t old_err = (1UL << PCI_ERR_CAP_FEP(errcap));
> +    PCIE_AERLog *aer_log = &dev->exp.aer_log;
> +    const PCIE_AERErr *err;
> +    uint32_t consumer;
> +
> +    if (!(errcap & PCI_ERR_CAP_MHRE) || pcie_aer_log_empty(aer_log)) {
> +        pcie_aer_clear_log(dev);
> +        pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
> +                     pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) & ~old_err);
> +        return;
> +    }
> +
> +    /* if no same error is queued, clear bit in uncorrectable error status */
> +    for (consumer = dev->exp.aer_log.consumer;
> +         !pcie_aer_log_empty_index(dev->exp.aer_log.producer, consumer);
> +         consumer = pcie_aer_log_next(consumer, dev->exp.aer_log.log_max)) {
> +        if (dev->exp.aer_log.log[consumer].status & old_err) {
> +            old_err = 0;
> +            break;
> +        }
> +    }
> +    if (old_err) {
> +        pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
> +                     pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) & ~old_err);
> +    }
> +
> +    err = pcie_aer_log_del_err(aer_log);
> +    pcie_aer_update_log(dev, err);
> +}
> +
> +/*
> + * non-Function specific error must be recorded in all functions.
> + * It is the responsibility of the caller of this function.
> + * It is also caller's responsiblity to determine which function should
> + * report the rerror.
> + *
> + * 6.2.4 Error Logging
> + * 6.2.5 Sqeucne of Device Error Signaling and Logging Operations
> + * table 6-2: Flowchard Showing Sequence of Device Error Signaling and Logging
> + *            Operations
> + *
> + * Although this implementation can be shortened/optimized, this is kept
> + * parallel to table 6-2.
> + */
> +void pcie_aer_inject_error(PCIDevice *dev, const PCIE_AERErr *err)
> +{
> +    uint8_t *exp_cap;
> +    uint8_t *aer_cap = NULL;
> +    uint32_t devctl = 0;
> +    uint32_t devsta = 0;
> +    uint32_t status = err->status;
> +    uint32_t mask;
> +    bool is_unsupported_request =
> +        (!(err->flags & PCIE_AER_ERR_IS_CORRECTABLE) &&
> +         err->status == PCI_ERR_UNC_UNSUP);
> +    bool is_advisory_nonfatal = false;  /* for advisory non-fatal error */
> +    uint32_t uncor_status = 0;          /* for advisory non-fatal error */
> +    PCIE_AERErrMsg msg;
> +    int is_header_log_overflowed = 0;
> +
> +    if (!pci_is_express(dev)) {
> +        /* What to do? */
> +        return;
> +    }
> +
> +    if (err->flags & PCIE_AER_ERR_IS_CORRECTABLE) {
> +        status &= PCI_ERR_COR_SUPPORTED;
> +    } else {
> +        status &= PCI_ERR_UNC_SUPPORTED;
> +    }
> +    if (!status || status & (status - 1)) {
> +        /* invalid status bit. one and only one bit must be set */
> +        return;
> +    }
> +
> +    exp_cap = dev->config + pci_pcie_cap(dev);
> +    if (dev->exp.aer_cap) {
> +        aer_cap = dev->config + pcie_aer_cap(dev);
> +        devctl = pci_get_long(exp_cap + PCI_EXP_DEVCTL);
> +        devsta = pci_get_long(exp_cap + PCI_EXP_DEVSTA);
> +    }
> +    if (err->flags & PCIE_AER_ERR_IS_CORRECTABLE) {
> +    correctable_error:
> +        devsta |= PCI_EXP_DEVSTA_CED;
> +        if (is_unsupported_request) {
> +            devsta |= PCI_EXP_DEVSTA_URD;
> +        }
> +        pci_set_word(exp_cap + PCI_EXP_DEVSTA, devsta);
> +
> +        if (aer_cap) {
> +            pci_set_long(aer_cap + PCI_ERR_COR_STATUS,
> +                         pci_get_long(aer_cap + PCI_ERR_COR_STATUS) | status);
> +            mask = pci_get_long(aer_cap + PCI_ERR_COR_MASK);
> +            if (mask & status) {
> +                return;
> +            }
> +            if (is_advisory_nonfatal) {
> +                uint32_t uncor_mask =
> +                    pci_get_long(aer_cap + PCI_ERR_UNCOR_MASK);
> +                if (!(uncor_mask & uncor_status)) {
> +                    is_header_log_overflowed = pcie_aer_record_error(dev, err);
> +                }
> +                pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
> +                             pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) |
> +                             uncor_status);
> +            }
> +        }
> +
> +        if (is_unsupported_request && !(devctl & PCI_EXP_DEVCTL_URRE)) {
> +            return;
> +        }
> +        if (!(devctl & PCI_EXP_DEVCTL_CERE)) {
> +            return;
> +        }
> +        msg.severity = AER_ERR_COR;
> +    } else {
> +        bool is_fatal =
> +            (pcie_aer_uncor_default_severity(status) == AER_ERR_FATAL);
> +        uint16_t cmd;
> +
> +        if (aer_cap) {
> +            is_fatal = status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
> +        }
> +        if (!is_fatal && (err->flags & PCIE_AER_ERR_MAYBE_ADVISORY)) {
> +            is_advisory_nonfatal = true;
> +            uncor_status = status;
> +            status = PCI_ERR_COR_ADV_NONFATAL;
> +            goto correctable_error;
> +        }
> +        if (is_fatal) {
> +            devsta |= PCI_EXP_DEVSTA_FED;
> +        } else {
> +            devsta |= PCI_EXP_DEVSTA_NFED;
> +        }
> +        if (is_unsupported_request) {
> +            devsta |= PCI_EXP_DEVSTA_URD;
> +        }
> +        pci_set_long(exp_cap + PCI_EXP_DEVSTA, devsta);
> +
> +        if (aer_cap) {
> +            mask = pci_get_long(aer_cap + PCI_ERR_UNCOR_MASK);
> +            if (mask & status) {
> +                pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
> +                             pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) |
> +                             status);
> +                return;
> +            }
> +
> +            is_header_log_overflowed = pcie_aer_record_error(dev, err);
> +            pci_set_long(aer_cap + PCI_ERR_UNCOR_STATUS,
> +                         pci_get_long(aer_cap + PCI_ERR_UNCOR_STATUS) |
> +                         status);
> +        }
> +
> +        cmd = pci_get_word(dev->config + PCI_COMMAND);
> +        if (is_unsupported_request &&
> +            !(devctl & PCI_EXP_DEVCTL_URRE) && !(cmd & PCI_COMMAND_SERR)) {
> +            return;
> +        }
> +        if (is_fatal) {
> +            if (!((cmd & PCI_COMMAND_SERR) ||
> +                  (devctl & PCI_EXP_DEVCTL_FERE))) {
> +                return;
> +            }
> +            msg.severity = AER_ERR_FATAL;
> +        } else {
> +            if (!((cmd & PCI_COMMAND_SERR) ||
> +                  (devctl & PCI_EXP_DEVCTL_NFERE))) {
> +                return;
> +            }
> +            msg.severity = AER_ERR_NONFATAL;
> +        }
> +    }
> +
> +    /* send up error message */
> +    msg.source_id = err->source_id;
> +    pcie_aer_errmsg(dev, &msg);
> +
> +    if (is_header_log_overflowed) {
> +        PCIE_AERErr header_log_overflow = {
> +            .status = PCI_ERR_COR_HL_OVERFLOW,
> +            .flags = PCIE_AER_ERR_IS_CORRECTABLE,
> +            .header = {0, 0, 0, 0},
> +            .prefix = {0, 0, 0, 0},
> +        };
> +        pcie_aer_inject_error(dev, &header_log_overflow);
> +    }
> +}
> +
> +void pcie_aer_root_set_vector(PCIDevice *dev, uint8_t vector)
> +{
> +    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
> +    uint32_t root_status = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
> +    root_status &= ~PCI_ERR_ROOT_IRQ;
> +    root_status |=
> +        (((uint32_t)vector) << PCI_ERR_ROOT_IRQ_SHIFT) & PCI_ERR_ROOT_IRQ;
> +    pci_set_long(aer_cap + PCI_ERR_ROOT_STATUS, root_status);
> +}
> +
> +static uint8_t pcie_aer_root_get_vector(PCIDevice *dev)
> +{
> +    uint8_t *aer_cap = dev->config + pcie_aer_cap(dev);
> +    uint32_t root_status = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
> +    return (root_status & PCI_ERR_ROOT_IRQ) >> PCI_ERR_ROOT_IRQ_SHIFT;
> +}
> +
> +void pcie_aer_root_init(PCIDevice *dev)
> +{
> +    uint16_t pos = pcie_aer_cap(dev);
> +
> +    pci_set_long(dev->wmask + pos + PCI_ERR_ROOT_COMMAND,
> +                 PCI_ERR_ROOT_CMD_EN_MASK);
> +    pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
> +                 PCI_ERR_ROOT_STATUS_REPORT_MASK);
> +    dev->exp.aer_errmsg = pcie_aer_errmsg_root_port;
> +}
> +
> +void pcie_aer_root_reset(PCIDevice *dev)
> +{
> +    uint8_t* aer_cap = dev->config + pcie_aer_cap(dev);
> +
> +    pci_set_long(aer_cap + PCI_ERR_ROOT_COMMAND, 0);
> +
> +    /*
> +     * Advanced Error Interrupt Message Number in Root Error Status Register
> +     * must be updated by chip dependent code.
> +     */

Why?

> +}
> +
> +static bool pcie_aer_root_does_trigger(uint32_t cmd, uint32_t sta)

sta -> status

> +{
> +    return
> +        ((cmd & PCI_ERR_ROOT_CMD_COR_EN) && (sta & PCI_ERR_ROOT_COR_RCV)) ||
> +        ((cmd & PCI_ERR_ROOT_CMD_NONFATAL_EN) &&
> +         (sta & PCI_ERR_ROOT_NONFATAL_RCV)) ||
> +        ((cmd & PCI_ERR_ROOT_CMD_FATAL_EN) && (sta & PCI_ERR_ROOT_FATAL_RCV));
> +}
> +
> +void pcie_aer_root_write_config(PCIDevice *dev,
> +                                uint32_t addr, uint32_t val, int len,
> +                                uint32_t root_cmd_prev)
> +{
> +    uint16_t pos = pcie_aer_cap(dev);
> +    uint8_t *aer_cap = dev->config + pos;
> +    uint32_t root_status;
> +
> +    /* root command */
> +    if (ranges_overlap(addr, len, pos + PCI_ERR_ROOT_COMMAND, 4)) {
> +        uint32_t root_cmd = pci_get_long(aer_cap + PCI_ERR_ROOT_COMMAND);

make command writeable and then you won't need the tricky range checks.

> +        if (root_cmd & PCI_ERR_ROOT_CMD_EN_MASK) {
> +            bool trigger;
> +            int level;
> +            uint32_t root_cmd_set = (root_cmd_prev ^ root_cmd) & root_cmd;
> +
> +            /* 0 -> 1 */
> +            root_status = pci_get_long(aer_cap + PCI_ERR_ROOT_STATUS);
> +            if (pcie_aer_root_does_trigger(root_cmd_set, root_status)) {
> +                trigger = true;
> +            } else {
> +                trigger = false;
> +            }
> +            if (pcie_aer_root_does_trigger(root_cmd, root_status)) {
> +                level = 1;
> +            } else {
> +                level = 0;
> +            }
> +            pcie_notify(dev, pcie_aer_root_get_vector(dev), trigger, level);
> +        }
> +    }
> +}
> +
> +static const VMStateDescription vmstate_pcie_aer_err = {
> +    .name = "PCIE_AER_ERROR",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields     = (VMStateField[]) {
> +        VMSTATE_UINT32(status, PCIE_AERErr),
> +        VMSTATE_UINT16(source_id, PCIE_AERErr),
> +        VMSTATE_UINT16(flags, PCIE_AERErr),
> +        VMSTATE_UINT32_ARRAY(header, PCIE_AERErr, 4),
> +        VMSTATE_UINT32_ARRAY(prefix, PCIE_AERErr, 4),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +#define VMSTATE_PCIE_AER_ERRS(_field, _state, _field_num, _vmsd, _type) { \
> +    .name       = (stringify(_field)),                                    \
> +    .version_id = 0,                                                      \
> +    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),     \
> +    .size       = sizeof(_type),                                          \
> +    .vmsd       = &(_vmsd),                                               \
> +    .flags      = VMS_POINTER | VMS_VARRAY_UINT16 | VMS_STRUCT,           \
> +    .offset     = vmstate_offset_pointer(_state, _field, _type),          \
> +}
> +
> +const VMStateDescription vmstate_pcie_aer_log = {
> +    .name = "PCIE_AER_ERROR_LOG",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields     = (VMStateField[]) {
> +        VMSTATE_UINT32(producer, PCIE_AERLog),
> +        VMSTATE_UINT32(consumer, PCIE_AERLog),
> +        VMSTATE_UINT16(log_max, PCIE_AERLog),
> +        VMSTATE_PCIE_AER_ERRS(log, PCIE_AERLog, log_max,
> +                              vmstate_pcie_aer_err, PCIE_AERErr),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> diff --git a/hw/pcie_aer.h b/hw/pcie_aer.h
> new file mode 100644
> index 0000000..5a72bee
> --- /dev/null
> +++ b/hw/pcie_aer.h
> @@ -0,0 +1,105 @@
> +/*
> + * pcie_aer.h
> + *
> + * Copyright (c) 2010 Isaku Yamahata <yamahata at valinux co jp>
> + *                    VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_PCIE_AER_H
> +#define QEMU_PCIE_AER_H
> +
> +#include "hw.h"
> +
> +/* definitions which PCIExpressDevice uses */
> +enum AER_ERR_MSG_RESULT {
> +    AER_ERR_MSG_MASKED,
> +    AER_ERR_MSG_SENT,
> +};
> +typedef enum AER_ERR_MSG_RESULT AER_ERR_MSG_RESULT;
> +typedef AER_ERR_MSG_RESULT (*pcie_aer_errmsg_fn)(PCIDevice *dev, const PCIE_AERErrMsg *msg);
> +
> +/* AER log */
> +struct PCIE_AERLog {
> +    uint32_t producer;
> +    uint32_t consumer;

Make these unsigned, they are just numbers, right?
Their size does not matter.
And change to unsigned most of functions dealing with them.
Easier to print, and easier to follow to understand

> +
> +#define PCIE_AER_LOG_MAX_DEFAULT        8
> +#define PCIE_AER_LOG_MAX_MAX            128 /* what is appropriate? */
> +#define PCIE_AER_LOG_MAX_UNSET          (~(uint16_t)0)

This does not do what you want: you get ffffffff but I think you
wanted ffff.  So simply write 0xffff.

> +    uint16_t log_max;
> +
> +    PCIE_AERErr *log;
> +};

all enum names and struct names shoukd be changed: remove _
and make mixed case, add typedefs.

> +
> +/* aer error severity */
> +enum PCIE_AER_SEVERITY {

This is not how we name enums, is it? either enum pcie_aer_severity {},
or typedef enum {} PCIEAerSeverity.

> +    /* those value are same as
> +     * Root error command register in aer extended cap and
> +     * root control register in pci express cap.
> +     */
> +    AER_ERR_COR         = 0x1,
> +    AER_ERR_NONFATAL    = 0x2,
> +    AER_ERR_FATAL       = 0x4,
> +};
> +
> +/* aer error message: error signaling message has only error sevirity and
> +   source id. See 2.2.8.3 error signaling messages */
> +struct PCIE_AERErrMsg {

Same here: either struct pcie_aererrmsg  or struc PCIEAERErrMsg.
(Might be just Msg - AER includes error already).

> +    enum PCIE_AER_SEVERITY severity;
> +    uint16_t source_id; /* bdf */
> +};
> +
> +static inline bool
> +pcie_aer_err_msg_is_uncor(const PCIE_AERErrMsg *msg)
> +{
> +    return msg->severity == AER_ERR_NONFATAL || msg->severity == AER_ERR_FATAL;
> +}
> +
> +/* error */
> +struct PCIE_AERErr {
> +    uint32_t status;    /* error status bits */
> +    uint16_t source_id; /* bdf */
> +
> +#define PCIE_AER_ERR_IS_CORRECTABLE     0x1     /* correctable/uncorrectable */
> +#define PCIE_AER_ERR_MAYBE_ADVISORY     0x2     /* maybe advisory non-fatal */
> +#define PCIE_AER_ERR_HEADER_VALID       0x4     /* TLP header is logged */
> +#define PCIE_AER_ERR_TLP_PRESENT        0x8     /* TLP Prefix is logged */
> +    uint16_t flags;
> +
> +    uint32_t header[4]; /* TLP header */
> +    uint32_t prefix[4]; /* TLP header prefix */
> +};
> +
> +extern const VMStateDescription vmstate_pcie_aer_log;
> +
> +void pcie_aer_init(PCIDevice *dev, uint16_t offset);
> +void pcie_aer_exit(PCIDevice *dev);
> +void pcie_aer_write_config(PCIDevice *dev,
> +                           uint32_t addr, uint32_t val, int len,
> +                           uint32_t uncorsta_prev);
> +
> +/* aer root port */
> +void pcie_aer_root_set_vector(PCIDevice *dev, uint8_t vector);
> +void pcie_aer_root_init(PCIDevice *dev);
> +void pcie_aer_root_reset(PCIDevice *dev);
> +void pcie_aer_root_write_config(PCIDevice *dev,
> +                                uint32_t addr, uint32_t val, int len,
> +                                uint32_t root_cmd_prev);
> +
> +/* error injection */
> +void pcie_aer_inject_error(PCIDevice *dev, const PCIE_AERErr *err);
> +
> +#endif /* QEMU_PCIE_AER_H */
> diff --git a/qemu-common.h b/qemu-common.h
> index 6d9ee26..fee772e 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -221,6 +221,9 @@ typedef struct PCIBus PCIBus;
>  typedef struct PCIDevice PCIDevice;
>  typedef struct PCIExpressDevice PCIExpressDevice;
>  typedef struct PCIBridge PCIBridge;
> +typedef struct PCIE_AERErrMsg PCIE_AERErrMsg;
> +typedef struct PCIE_AERLog PCIE_AERLog;
> +typedef struct PCIE_AERErr PCIE_AERErr;
>  typedef struct SerialState SerialState;
>  typedef struct IRQState *qemu_irq;
>  typedef struct PCMCIACardState PCMCIACardState;
> -- 
> 1.7.1.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability.
  2010-09-19 11:45       ` Michael S. Tsirkin
@ 2010-09-24  2:24         ` Isaku Yamahata
  2010-09-26 12:32           ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-24  2:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Sun, Sep 19, 2010 at 01:45:33PM +0200, Michael S. Tsirkin wrote:
> On Sun, Sep 19, 2010 at 01:56:23PM +0900, Isaku Yamahata wrote:
> > On Wed, Sep 15, 2010 at 02:43:10PM +0200, Michael S. Tsirkin wrote:
> > > > +/***************************************************************************
> > > > + * pci express capability helper functions
> > > > + */
> > > > +void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level)
> > > 
> > > Why is this not static? It makes sense for internal stuff possibly,
> > > but I think functions will need to know what to do: they can't
> > > treat msi/msix/irq identically anyway.
> > 
> > The aer code which I split out uses it.
> 
> Move it there?

_Both_ pcie.c and pcie_aer.c use it.


> > > > +        assert(offset == PCI_CONFIG_SPACE_SIZE);
> > > > +        pci_set_long(dev->config + offset,
> > > > +                     PCI_EXT_CAP(0, 0, PCI_EXT_CAP_NEXT(header)));
> > > > +    }
> > > > +
> > > > +    /* Make those registers read-only reserved zero */
> > > 
> > > So you make them readonly in both add and delete?
> > > delete should revert add: let's put the
> > > masks back the way they were: writeable.
> > 
> > In fact zeroing in add is redundant, but I added it following msix code.
> 
> It is not redundand there as registers are writeable by default:
>     memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
>            config_size - PCI_CONFIG_HEADER_SIZE);
> 
> 
> > 
> > The usage model is
> > - At first the registers are unused, so should be read only zero.
> 
> You can't know they are unused: in PCI spec everything
> outside capability list is vendor specific so it
> is writeable to let drivers do their thing.

So why PCIDevice::wmask is zero when initialized by do_pci_register_device()?
Anyway pcie_capability_del() is only called via PCIDeviceInfo::exit,
so I don't see any reason why manipulating the capability linked
list just before destroying PCIDevice.
Let's eliminate pcie_capability_del() at all.


> > > > +    /* events not listed aren't supported */
> > > > +};
> > > > +
> > > > +typedef void (*pcie_flr_fn)(PCIDevice *dev);
> > > 
> > > Is flr special?  Can't we use the generic reset handlers?
> > > If not why?
> > 
> > Reset(cold reset/warm reset) in generic sense corresponds to
> > conventional reset in express sense which corresponds to PCI RST#.
> > On the other hand FLR is different from the conventional one.
> > 
> > Cited from the spec
> > 
> > 6.6. PCI Express Reset - Rules
> > 6.6.1. Conventional Reset
> >  Conventional Reset includes all reset mechanisms other than Function
> >  Level Reset.
> > 6.6.2. Function-Level Reset (FLR)
> > 
> > Most devices would implement FLR as just calling something like
> > qdev_reset. But the spec differentiates FLR from conventional reset,
> > so the generic pcie layer should do.
> 
> I am not sure I agree. If most devices don't care, or
> behave almost identically, it's ok to just call qdev_reset:
> devices can find out that FLR is in progress by looking
> at the config space (bit will be set there) - and we can
> add a helper pcie_flr_in_progress() to test it.
> 
> This way most devices will work out of box.
> Only if behaviour is mostly different would it make sense
> to have a completely separate reset path.
> What happens with devices you implemented?

With either approach, we can implement FLR.
The above approach will eventually result in passing passing parameter,
somthing like reset_kind, to callbacks. The argument tells what kind of
reset is triggered. cold reset, warm reset and many bus/device specific
resets. In fact it was my first approach.
But when we discussed on qdev reset() with Anthony, he claimed that handling
reset type should be handled by introducing other APIs because
it would just bloat qdev reset callback function with big switch.
So I introduced new flr callback.
-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability.
  2010-09-22 11:50   ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-24  2:50     ` Isaku Yamahata
  2010-09-26 12:46       ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-24  2:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 22, 2010 at 01:50:16PM +0200, Michael S. Tsirkin wrote:

> > +    }
> > +
> > +    /* capability & control */
> > +    if (ranges_overlap(addr, len, pos + PCI_ERR_CAP, 4)) {
> > +        uint32_t err_cap = pci_get_long(dev->config + pos + PCI_ERR_CAP);
> > +        if (!(err_cap & PCI_ERR_CAP_MHRE)) {
> > +            pcie_aer_log_clear_all_err(&dev->exp.aer_log);
> > +            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS,
> > +                         PCI_ERR_UNC_SUPPORTED);
> > +        } else {
> > +            /* When multiple header recording is enabled, only the bit that
> > +             * first error pointer indicates is cleared.
> > +             * that is handled specifically.
> > +             */
> > +            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS, 0);
> 
> This is wrong I think: you do not change mask on each write.
> Do it on setup.

The register behaves quite differently depending on whether
multiple header recording(MHR) is enabled or not.
With MHR disabled, the register is w1c. It indicates which errors
have occurred.
With MHR enabled, it behaves quite complexly. It reports errors in order
which had occurred.
Fro details, please refer to
6.2.4.2. Multiple Error Handling (Advanced Error Reporting Capability).


> > +static inline void pcie_aer_errmsg(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> > +{
> > +    assert(pci_is_express(dev));
> > +    assert(dev->exp.aer_errmsg);
> > +    dev->exp.aer_errmsg(dev, msg);
> 
> Why do we want the indirection? Why not have users just call the function?

To handle error signaling uniformly.
Please see 
6.2.5. Sequence of Device Error Signaling and Logging Operations
and figure 6-2 and 6-3.
-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-22 11:25   ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-24  5:38     ` Isaku Yamahata
  2010-09-26 12:49       ` Michael S. Tsirkin
  2010-09-26 12:50       ` Michael S. Tsirkin
  0 siblings, 2 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-24  5:38 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:

> > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > +
> 
> I am a bit unhappy about all these _init functions.
> Can devices be created with qdev? If they were
> it would be possible to configure the system completely
> from qemu command line.

That's very reasonable question.
Once machine configuration file is supported, those initialization
functions will go away.
I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
those functions will also go away.

Until that, those initialization glues will stay like pci_create family
or other many initialization glues unfortunately.
This is the result of qdev missing a feature, not the cause.
It would be a long-term issue to add machine configuration file support.
-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability.
  2010-09-24  2:24         ` Isaku Yamahata
@ 2010-09-26 12:32           ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-26 12:32 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Fri, Sep 24, 2010 at 11:24:50AM +0900, Isaku Yamahata wrote:
> On Sun, Sep 19, 2010 at 01:45:33PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Sep 19, 2010 at 01:56:23PM +0900, Isaku Yamahata wrote:
> > > On Wed, Sep 15, 2010 at 02:43:10PM +0200, Michael S. Tsirkin wrote:
> > > > > +/***************************************************************************
> > > > > + * pci express capability helper functions
> > > > > + */
> > > > > +void pcie_notify(PCIDevice *dev, uint16_t vector, bool trigger, int level)
> > > > 
> > > > Why is this not static? It makes sense for internal stuff possibly,
> > > > but I think functions will need to know what to do: they can't
> > > > treat msi/msix/irq identically anyway.
> > > 
> > > The aer code which I split out uses it.
> > 
> > Move it there?
> 
> _Both_ pcie.c and pcie_aer.c use it.
> 
> 
> > > > > +        assert(offset == PCI_CONFIG_SPACE_SIZE);
> > > > > +        pci_set_long(dev->config + offset,
> > > > > +                     PCI_EXT_CAP(0, 0, PCI_EXT_CAP_NEXT(header)));
> > > > > +    }
> > > > > +
> > > > > +    /* Make those registers read-only reserved zero */
> > > > 
> > > > So you make them readonly in both add and delete?
> > > > delete should revert add: let's put the
> > > > masks back the way they were: writeable.
> > > 
> > > In fact zeroing in add is redundant, but I added it following msix code.
> > 
> > It is not redundand there as registers are writeable by default:
> >     memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
> >            config_size - PCI_CONFIG_HEADER_SIZE);
> > 
> > 
> > > 
> > > The usage model is
> > > - At first the registers are unused, so should be read only zero.
> > 
> > You can't know they are unused: in PCI spec everything
> > outside capability list is vendor specific so it
> > is writeable to let drivers do their thing.
> 
> So why PCIDevice::wmask is zero when initialized by do_pci_register_device()?

I think it isn't: only the header is 0: we have in pci_init_wmask
    memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
           config_size - PCI_CONFIG_HEADER_SIZE);
what am I missing?

> Anyway pcie_capability_del() is only called via PCIDeviceInfo::exit,
> so I don't see any reason why manipulating the capability linked
> list just before destroying PCIDevice.
> Let's eliminate pcie_capability_del() at all.

OK.

> > > > > +    /* events not listed aren't supported */
> > > > > +};
> > > > > +
> > > > > +typedef void (*pcie_flr_fn)(PCIDevice *dev);
> > > > 
> > > > Is flr special?  Can't we use the generic reset handlers?
> > > > If not why?
> > > 
> > > Reset(cold reset/warm reset) in generic sense corresponds to
> > > conventional reset in express sense which corresponds to PCI RST#.
> > > On the other hand FLR is different from the conventional one.
> > > 
> > > Cited from the spec
> > > 
> > > 6.6. PCI Express Reset - Rules
> > > 6.6.1. Conventional Reset
> > >  Conventional Reset includes all reset mechanisms other than Function
> > >  Level Reset.
> > > 6.6.2. Function-Level Reset (FLR)
> > > 
> > > Most devices would implement FLR as just calling something like
> > > qdev_reset. But the spec differentiates FLR from conventional reset,
> > > so the generic pcie layer should do.
> > 
> > I am not sure I agree. If most devices don't care, or
> > behave almost identically, it's ok to just call qdev_reset:
> > devices can find out that FLR is in progress by looking
> > at the config space (bit will be set there) - and we can
> > add a helper pcie_flr_in_progress() to test it.
> > 
> > This way most devices will work out of box.
> > Only if behaviour is mostly different would it make sense
> > to have a completely separate reset path.
> > What happens with devices you implemented?
> 
> With either approach, we can implement FLR.
> The above approach will eventually result in passing passing parameter,
> somthing like reset_kind, to callbacks. The argument tells what kind of
> reset is triggered. cold reset, warm reset and many bus/device specific
> resets. In fact it was my first approach.
> But when we discussed on qdev reset() with Anthony, he claimed that handling
> reset type should be handled by introducing other APIs because
> it would just bloat qdev reset callback function with big switch.
> So I introduced new flr callback.

So I think this addresses this in a different way: we don't need a
parameter as devices can check state to see what reset is in progress.
Most of them do not need to bother, so we expect that there will not be any
giant switch statements.

But - you are writing the code after all so you tell us whether the reset code
is mostly same or mostly different: if it's same we should probably
reuse existing callbacks, to avoid boilerplate code
in devices, if it is different maybe add a new one.

Anthony, makes sense?

> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability.
  2010-09-24  2:50     ` Isaku Yamahata
@ 2010-09-26 12:46       ` Michael S. Tsirkin
  2010-09-27  6:03         ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-26 12:46 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Fri, Sep 24, 2010 at 11:50:21AM +0900, Isaku Yamahata wrote:
> On Wed, Sep 22, 2010 at 01:50:16PM +0200, Michael S. Tsirkin wrote:
> 
> > > +    }
> > > +
> > > +    /* capability & control */
> > > +    if (ranges_overlap(addr, len, pos + PCI_ERR_CAP, 4)) {
> > > +        uint32_t err_cap = pci_get_long(dev->config + pos + PCI_ERR_CAP);
> > > +        if (!(err_cap & PCI_ERR_CAP_MHRE)) {
> > > +            pcie_aer_log_clear_all_err(&dev->exp.aer_log);
> > > +            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS,
> > > +                         PCI_ERR_UNC_SUPPORTED);
> > > +        } else {
> > > +            /* When multiple header recording is enabled, only the bit that
> > > +             * first error pointer indicates is cleared.
> > > +             * that is handled specifically.
> > > +             */
> > > +            pci_set_long(dev->w1cmask + pos + PCI_ERR_UNCOR_STATUS, 0);
> > 
> > This is wrong I think: you do not change mask on each write.
> > Do it on setup.
> 
> The register behaves quite differently depending on whether
> multiple header recording(MHR) is enabled or not.
> With MHR disabled, the register is w1c. It indicates which errors
> have occurred.
> With MHR enabled, it behaves quite complexly. It reports errors in order
> which had occurred.
> Fro details, please refer to
> 6.2.4.2. Multiple Error Handling (Advanced Error Reporting Capability).
> 

I see. No bug then. However, I think the best way to implement this is this:

- always make the bit w1c
- after config write:
  if MHR is enabled, and you see that error log is not empty and that bit is 0,
  this means that someone has written 1b.
  so pop the first error from the log, and set bit to 1 if it's not empty.

This way we only touch w1c mask on setup.

> > > +static inline void pcie_aer_errmsg(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> > > +{
> > > +    assert(pci_is_express(dev));
> > > +    assert(dev->exp.aer_errmsg);
> > > +    dev->exp.aer_errmsg(dev, msg);
> > 
> > Why do we want the indirection? Why not have users just call the function?
> 
> To handle error signaling uniformly.
> Please see 
> 6.2.5. Sequence of Device Error Signaling and Logging Operations
> and figure 6-2 and 6-3.

My question was: the only difference appears to be between bridge and
non-bridge devices: bridge has to do more stuff, but most code is
common.  So this seems to be a very roundabout way to do this.
Can't we just have a common function with an if (bridge) statement up front?
If we ever only expect 2 implementations, I think a function pointer
is overkill.


> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-24  5:38     ` Isaku Yamahata
@ 2010-09-26 12:49       ` Michael S. Tsirkin
  2010-09-27  6:36         ` Isaku Yamahata
  2010-09-26 12:50       ` Michael S. Tsirkin
  1 sibling, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-26 12:49 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> 
> > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > +
> > 
> > I am a bit unhappy about all these _init functions.
> > Can devices be created with qdev? If they were
> > it would be possible to configure the system completely
> > from qemu command line.
> 
> That's very reasonable question.
> Once machine configuration file is supported, those initialization
> functions will go away.
> I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> those functions will also go away.
> 
> Until that, those initialization glues will stay like pci_create family
> or other many initialization glues unfortunately.
> This is the result of qdev missing a feature, not the cause.
> It would be a long-term issue to add machine configuration file support.

Yes, but will it be better to do everything from qdev_init?

> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-24  5:38     ` Isaku Yamahata
  2010-09-26 12:49       ` Michael S. Tsirkin
@ 2010-09-26 12:50       ` Michael S. Tsirkin
  2010-09-27  6:22         ` Isaku Yamahata
  1 sibling, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-26 12:50 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> 
> > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > +
> > 
> > I am a bit unhappy about all these _init functions.
> > Can devices be created with qdev? If they were
> > it would be possible to configure the system completely
> > from qemu command line.
> 
> That's very reasonable question.
> Once machine configuration file is supported, those initialization
> functions will go away.
> I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> those functions will also go away.
> 
> Until that, those initialization glues will stay like pci_create family
> or other many initialization glues unfortunately.
> This is the result of qdev missing a feature, not the cause.
> It would be a long-term issue to add machine configuration file support.

Just to clarify, if I wanted to have a flag to make virtio-net
a pci express device, how would I do this?

> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability.
  2010-09-26 12:46       ` Michael S. Tsirkin
@ 2010-09-27  6:03         ` Isaku Yamahata
  2010-09-27 10:36           ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-27  6:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Sun, Sep 26, 2010 at 02:46:51PM +0200, Michael S. Tsirkin wrote:
> > > > +static inline void pcie_aer_errmsg(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> > > > +{
> > > > +    assert(pci_is_express(dev));
> > > > +    assert(dev->exp.aer_errmsg);
> > > > +    dev->exp.aer_errmsg(dev, msg);
> > > 
> > > Why do we want the indirection? Why not have users just call the function?
> > 
> > To handle error signaling uniformly.
> > Please see 
> > 6.2.5. Sequence of Device Error Signaling and Logging Operations
> > and figure 6-2 and 6-3.
> 
> My question was: the only difference appears to be between bridge and
> non-bridge devices: bridge has to do more stuff, but most code is
> common.  So this seems to be a very roundabout way to do this.
> Can't we just have a common function with an if (bridge) statement up front?
> If we ever only expect 2 implementations, I think a function pointer
> is overkill.

Not 2, but 3. root port, upstream or downstream and normal device.
So you want something like the following?
More than 2 is a good reason for me, I prefer function pointer.

switch (pcie_cap_get_type(dev))
    case PCI_EXP_TYPE_ROOT_PORT:
        pcie_aer_errmsg_root_port();
        break;
    case PCI_EXP_TYPE_DOWNSTREAM:
    case PCI_EXP_TYPE_UPSTREAM:
        pcie_aer_errmsg_vbridge();
        break;
    default:
       pcie_aer_errmsg_alldev();
       break;

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-26 12:50       ` Michael S. Tsirkin
@ 2010-09-27  6:22         ` Isaku Yamahata
  2010-09-27 10:40           ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-27  6:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Sun, Sep 26, 2010 at 02:50:42PM +0200, Michael S. Tsirkin wrote:
> On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> > On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> > 
> > > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > > +
> > > 
> > > I am a bit unhappy about all these _init functions.
> > > Can devices be created with qdev? If they were
> > > it would be possible to configure the system completely
> > > from qemu command line.
> > 
> > That's very reasonable question.
> > Once machine configuration file is supported, those initialization
> > functions will go away.
> > I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> > those functions will also go away.
> > 
> > Until that, those initialization glues will stay like pci_create family
> > or other many initialization glues unfortunately.
> > This is the result of qdev missing a feature, not the cause.
> > It would be a long-term issue to add machine configuration file support.
> 
> Just to clarify, if I wanted to have a flag to make virtio-net
> a pci express device, how would I do this?

the following preparation is needed.
- register PCIDeviceInfo with name like "virtio-net-pci-xen"
  with PCIDeviceInfo::is_express = true.
- in initialization function, initialize express capability.
- in write config function, call related function.

And then,
if (express)
   create "virtio-net-pci-xen"
else
   create "virtio-net-pci"

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-26 12:49       ` Michael S. Tsirkin
@ 2010-09-27  6:36         ` Isaku Yamahata
  0 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-27  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Sun, Sep 26, 2010 at 02:49:40PM +0200, Michael S. Tsirkin wrote:
> On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> > On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> > 
> > > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > > +
> > > 
> > > I am a bit unhappy about all these _init functions.
> > > Can devices be created with qdev? If they were
> > > it would be possible to configure the system completely
> > > from qemu command line.
> > 
> > That's very reasonable question.
> > Once machine configuration file is supported, those initialization
> > functions will go away.
> > I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> > those functions will also go away.
> > 
> > Until that, those initialization glues will stay like pci_create family
> > or other many initialization glues unfortunately.
> > This is the result of qdev missing a feature, not the cause.
> > It would be a long-term issue to add machine configuration file support.
> 
> Yes, but will it be better to do everything from qdev_init?

Yes, ideally.

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability.
  2010-09-27  6:03         ` Isaku Yamahata
@ 2010-09-27 10:36           ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-27 10:36 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Mon, Sep 27, 2010 at 03:03:24PM +0900, Isaku Yamahata wrote:
> On Sun, Sep 26, 2010 at 02:46:51PM +0200, Michael S. Tsirkin wrote:
> > > > > +static inline void pcie_aer_errmsg(PCIDevice *dev, const PCIE_AERErrMsg *msg)
> > > > > +{
> > > > > +    assert(pci_is_express(dev));
> > > > > +    assert(dev->exp.aer_errmsg);
> > > > > +    dev->exp.aer_errmsg(dev, msg);
> > > > 
> > > > Why do we want the indirection? Why not have users just call the function?
> > > 
> > > To handle error signaling uniformly.
> > > Please see 
> > > 6.2.5. Sequence of Device Error Signaling and Logging Operations
> > > and figure 6-2 and 6-3.
> > 
> > My question was: the only difference appears to be between bridge and
> > non-bridge devices: bridge has to do more stuff, but most code is
> > common.  So this seems to be a very roundabout way to do this.
> > Can't we just have a common function with an if (bridge) statement up front?
> > If we ever only expect 2 implementations, I think a function pointer
> > is overkill.
> 
> Not 2, but 3. root port, upstream or downstream and normal device.
> So you want something like the following?
> More than 2 is a good reason for me, I prefer function pointer.

Heh, shall we change all switch statements to pointers then?
That's not a good idea IMO, automated tools like ctags
become useless for navigation, you have to look up where it's
actually set. We do have these things in places where
it brings benefit, which is typically where it can take
different values at runtime.

> switch (pcie_cap_get_type(dev))
>     case PCI_EXP_TYPE_ROOT_PORT:
>         pcie_aer_errmsg_root_port();
>         break;
>     case PCI_EXP_TYPE_DOWNSTREAM:
>     case PCI_EXP_TYPE_UPSTREAM:
>         pcie_aer_errmsg_vbridge();
>         break;
>     default:
>        pcie_aer_errmsg_alldev();
>        break;

Yes, and note that in fact it's not even that:
all of root and ports are also devices, so we end up
with:
	if (pcie_is_root_port(dev)) {
         	pcie_aer_errmsg_root_port();
	} else if (pcie_is_bridge_port(dev)) {
	         pcie_aer_errmsg_vbridge();
	}
        pcie_aer_errmsg_alldev();

which is in fact cleaner than calling pcie_aer_errmsg_alldev
from pcie_aer_errmsg_vbridge, I think.

> 
> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-27  6:22         ` Isaku Yamahata
@ 2010-09-27 10:40           ` Michael S. Tsirkin
  2010-09-27 23:01             ` Isaku Yamahata
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-27 10:40 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Mon, Sep 27, 2010 at 03:22:43PM +0900, Isaku Yamahata wrote:
> On Sun, Sep 26, 2010 at 02:50:42PM +0200, Michael S. Tsirkin wrote:
> > On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> > > On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> > > 
> > > > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > > > +
> > > > 
> > > > I am a bit unhappy about all these _init functions.
> > > > Can devices be created with qdev? If they were
> > > > it would be possible to configure the system completely
> > > > from qemu command line.
> > > 
> > > That's very reasonable question.
> > > Once machine configuration file is supported, those initialization
> > > functions will go away.
> > > I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> > > those functions will also go away.
> > > 
> > > Until that, those initialization glues will stay like pci_create family
> > > or other many initialization glues unfortunately.
> > > This is the result of qdev missing a feature, not the cause.
> > > It would be a long-term issue to add machine configuration file support.
> > 
> > Just to clarify, if I wanted to have a flag to make virtio-net
> > a pci express device, how would I do this?
> 
> the following preparation is needed.
> - register PCIDeviceInfo with name like "virtio-net-pci-xen"
>   with PCIDeviceInfo::is_express = true.
> - in initialization function, initialize express capability.
> - in write config function, call related function.
> 
> And then,
> if (express)
>    create "virtio-net-pci-xen"
> else
>    create "virtio-net-pci"

Sounds pretty bad: we can't double the number of devices
with each capability we add.
Can we make it so setting is_express on command line
will convert the device to PCI express?


> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-27 10:40           ` Michael S. Tsirkin
@ 2010-09-27 23:01             ` Isaku Yamahata
  2010-09-28  9:27               ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Isaku Yamahata @ 2010-09-27 23:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Mon, Sep 27, 2010 at 12:40:12PM +0200, Michael S. Tsirkin wrote:
> On Mon, Sep 27, 2010 at 03:22:43PM +0900, Isaku Yamahata wrote:
> > On Sun, Sep 26, 2010 at 02:50:42PM +0200, Michael S. Tsirkin wrote:
> > > On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> > > > On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> > > > 
> > > > > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > > > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > > > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > > > > +
> > > > > 
> > > > > I am a bit unhappy about all these _init functions.
> > > > > Can devices be created with qdev? If they were
> > > > > it would be possible to configure the system completely
> > > > > from qemu command line.
> > > > 
> > > > That's very reasonable question.
> > > > Once machine configuration file is supported, those initialization
> > > > functions will go away.
> > > > I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> > > > those functions will also go away.
> > > > 
> > > > Until that, those initialization glues will stay like pci_create family
> > > > or other many initialization glues unfortunately.
> > > > This is the result of qdev missing a feature, not the cause.
> > > > It would be a long-term issue to add machine configuration file support.
> > > 
> > > Just to clarify, if I wanted to have a flag to make virtio-net
> > > a pci express device, how would I do this?
> > 
> > the following preparation is needed.
> > - register PCIDeviceInfo with name like "virtio-net-pci-xen"
> >   with PCIDeviceInfo::is_express = true.
> > - in initialization function, initialize express capability.
> > - in write config function, call related function.
> > 
> > And then,
> > if (express)
> >    create "virtio-net-pci-xen"
> > else
> >    create "virtio-net-pci"
> 
> Sounds pretty bad: we can't double the number of devices
> with each capability we add.
> Can we make it so setting is_express on command line
> will convert the device to PCI express?

I don't see your point. Capability isn't something that should be
genericly customizable.
If any, it would not be a generic option, but one specific to device.
There is no generic way to automatically convert pci device into
express device. It means designing a new express device.

Maybe what you want is
- set is_express = true
  This result in always allocating 4k-sized configuration space.
  (Possibly we need more fine-grained parameter in PCIDeviceInfo
   than is_express.)
- introduce device specific property that a user can turn on/off
- In initialization function/config write function and where necessary,
  check the property.
-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-27 23:01             ` Isaku Yamahata
@ 2010-09-28  9:27               ` Michael S. Tsirkin
  2010-09-28 10:38                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-28  9:27 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Tue, Sep 28, 2010 at 08:01:15AM +0900, Isaku Yamahata wrote:
> On Mon, Sep 27, 2010 at 12:40:12PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Sep 27, 2010 at 03:22:43PM +0900, Isaku Yamahata wrote:
> > > On Sun, Sep 26, 2010 at 02:50:42PM +0200, Michael S. Tsirkin wrote:
> > > > On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> > > > > On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> > > > > 
> > > > > > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > > > > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > > > > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > > > > > +
> > > > > > 
> > > > > > I am a bit unhappy about all these _init functions.
> > > > > > Can devices be created with qdev? If they were
> > > > > > it would be possible to configure the system completely
> > > > > > from qemu command line.
> > > > > 
> > > > > That's very reasonable question.
> > > > > Once machine configuration file is supported, those initialization
> > > > > functions will go away.
> > > > > I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> > > > > those functions will also go away.
> > > > > 
> > > > > Until that, those initialization glues will stay like pci_create family
> > > > > or other many initialization glues unfortunately.
> > > > > This is the result of qdev missing a feature, not the cause.
> > > > > It would be a long-term issue to add machine configuration file support.
> > > > 
> > > > Just to clarify, if I wanted to have a flag to make virtio-net
> > > > a pci express device, how would I do this?
> > > 
> > > the following preparation is needed.
> > > - register PCIDeviceInfo with name like "virtio-net-pci-xen"
> > >   with PCIDeviceInfo::is_express = true.
> > > - in initialization function, initialize express capability.
> > > - in write config function, call related function.
> > > 
> > > And then,
> > > if (express)
> > >    create "virtio-net-pci-xen"
> > > else
> > >    create "virtio-net-pci"
> > 
> > Sounds pretty bad: we can't double the number of devices
> > with each capability we add.
> > Can we make it so setting is_express on command line
> > will convert the device to PCI express?
> 
> I don't see your point. Capability isn't something that should be
> genericly customizable.
> If any, it would not be a generic option, but one specific to device.
> There is no generic way to automatically convert pci device into
> express device. It means designing a new express device.
> Maybe what you want is
> - set is_express = true
>   This result in always allocating 4k-sized configuration space.
>   (Possibly we need more fine-grained parameter in PCIDeviceInfo
>    than is_express.)
> - introduce device specific property that a user can turn on/off
> - In initialization function/config write function and where necessary,
>   check the property.

I think what I want is (At some point) automatically convert all virtio
users to express devices, but have a fallback option for old machine
types.

> -- 
> yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH v3 08/13] pcie root port: implement pcie root port.
  2010-09-28  9:27               ` Michael S. Tsirkin
@ 2010-09-28 10:38                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-28 10:38 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: skandasa, etmartin, qemu-devel, wexu2

On Tue, Sep 28, 2010 at 11:27:03AM +0200, Michael S. Tsirkin wrote:
> On Tue, Sep 28, 2010 at 08:01:15AM +0900, Isaku Yamahata wrote:
> > On Mon, Sep 27, 2010 at 12:40:12PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Sep 27, 2010 at 03:22:43PM +0900, Isaku Yamahata wrote:
> > > > On Sun, Sep 26, 2010 at 02:50:42PM +0200, Michael S. Tsirkin wrote:
> > > > > On Fri, Sep 24, 2010 at 02:38:09PM +0900, Isaku Yamahata wrote:
> > > > > > On Wed, Sep 22, 2010 at 01:25:59PM +0200, Michael S. Tsirkin wrote:
> > > > > > 
> > > > > > > > +PCIESlot *pcie_root_init(PCIBus *bus, int devfn, bool multifunction,
> > > > > > > > +                         const char *bus_name, pci_map_irq_fn map_irq,
> > > > > > > > +                         uint8_t port, uint8_t chassis, uint16_t slot);
> > > > > > > > +
> > > > > > > 
> > > > > > > I am a bit unhappy about all these _init functions.
> > > > > > > Can devices be created with qdev? If they were
> > > > > > > it would be possible to configure the system completely
> > > > > > > from qemu command line.
> > > > > > 
> > > > > > That's very reasonable question.
> > > > > > Once machine configuration file is supported, those initialization
> > > > > > functions will go away.
> > > > > > I.e. when the initialization code like pc_init1() in pc_piix.c disappears,
> > > > > > those functions will also go away.
> > > > > > 
> > > > > > Until that, those initialization glues will stay like pci_create family
> > > > > > or other many initialization glues unfortunately.
> > > > > > This is the result of qdev missing a feature, not the cause.
> > > > > > It would be a long-term issue to add machine configuration file support.
> > > > > 
> > > > > Just to clarify, if I wanted to have a flag to make virtio-net
> > > > > a pci express device, how would I do this?
> > > > 
> > > > the following preparation is needed.
> > > > - register PCIDeviceInfo with name like "virtio-net-pci-xen"
> > > >   with PCIDeviceInfo::is_express = true.
> > > > - in initialization function, initialize express capability.
> > > > - in write config function, call related function.
> > > > 
> > > > And then,
> > > > if (express)
> > > >    create "virtio-net-pci-xen"
> > > > else
> > > >    create "virtio-net-pci"
> > > 
> > > Sounds pretty bad: we can't double the number of devices
> > > with each capability we add.
> > > Can we make it so setting is_express on command line
> > > will convert the device to PCI express?
> > 
> > I don't see your point. Capability isn't something that should be
> > genericly customizable.
> > If any, it would not be a generic option, but one specific to device.
> > There is no generic way to automatically convert pci device into
> > express device. It means designing a new express device.
> > Maybe what you want is
> > - set is_express = true
> >   This result in always allocating 4k-sized configuration space.
> >   (Possibly we need more fine-grained parameter in PCIDeviceInfo
> >    than is_express.)
> > - introduce device specific property that a user can turn on/off
> > - In initialization function/config write function and where necessary,
> >   check the property.
> 
> I think what I want is (At some point) automatically convert all virtio
> users to express devices, but have a fallback option for old machine
> types.

In any case, this is not a blocker for the merge.

> > -- 
> > yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2010-09-28 10:44 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-15  5:38 [Qemu-devel] [PATCH v3 00/13] pcie port switch emulators Isaku Yamahata
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 01/13] msi: implemented msi Isaku Yamahata
2010-09-15 13:03   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 02/13] pci: implement RW1C register framework Isaku Yamahata
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 03/13] pci: introduce helper function pci_shift_word/long which returns shifted value Isaku Yamahata
2010-09-15 12:49   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-19  4:13     ` Isaku Yamahata
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 04/13] pcie: add pcie constants to pcie_regs.h Isaku Yamahata
2010-09-20 18:14   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 05/13] pcie: helper functions for pcie capability and extended capability Isaku Yamahata
2010-09-15 12:43   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-19  4:56     ` Isaku Yamahata
2010-09-19 11:45       ` Michael S. Tsirkin
2010-09-24  2:24         ` Isaku Yamahata
2010-09-26 12:32           ` Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 06/13] pcie/aer: helper functions for pcie aer capability Isaku Yamahata
2010-09-22 11:50   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-24  2:50     ` Isaku Yamahata
2010-09-26 12:46       ` Michael S. Tsirkin
2010-09-27  6:03         ` Isaku Yamahata
2010-09-27 10:36           ` Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 07/13] pcie port: define struct PCIEPort/PCIESlot and helper functions Isaku Yamahata
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 08/13] pcie root port: implement pcie root port Isaku Yamahata
2010-09-22 11:25   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-24  5:38     ` Isaku Yamahata
2010-09-26 12:49       ` Michael S. Tsirkin
2010-09-27  6:36         ` Isaku Yamahata
2010-09-26 12:50       ` Michael S. Tsirkin
2010-09-27  6:22         ` Isaku Yamahata
2010-09-27 10:40           ` Michael S. Tsirkin
2010-09-27 23:01             ` Isaku Yamahata
2010-09-28  9:27               ` Michael S. Tsirkin
2010-09-28 10:38                 ` Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 09/13] pcie upstream port: pci express switch upstream port Isaku Yamahata
2010-09-22 11:22   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 10/13] pcie downstream port: pci express switch downstream port Isaku Yamahata
2010-09-22 11:22   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 11/13] pcie/hotplug: glue pushing attention button command. pcie_abp Isaku Yamahata
2010-09-22 11:30   ` [Qemu-devel] " Michael S. Tsirkin
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 12/13] pcie/aer: glue aer error injection into qemu monitor Isaku Yamahata
2010-09-15  5:38 ` [Qemu-devel] [PATCH v3 13/13] msix: clear not only INTA, but all INTx when MSI-X is enabled Isaku Yamahata
2010-09-20 18:18 ` [Qemu-devel] Re: [PATCH v3 00/13] pcie port switch emulators Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.