All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] AMD IOMMU emulation patchset v4
@ 2010-08-28 14:54 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

Hi,

I rebased my work on mst's PCI tree and, hopefully, fixed issues raised by
others. Here's a summary of the changes:
- made it apply to mst/pci
- moved some AMD IOMMU stuff in a reset handler
- dropped range_covers_range() (wasn't the same as ranges_overlap(), but the
  latter was better anyway)
- used 'expand' to remove tabs in pci_regs.h before applying the useful changes
- fixed the endianness mistake spotted by Blue (though ldq_phys wasn't needed)

As for Anthony's suggestion to simply sed-convert all devices, I'd rather go
through them one at a time and do it manually. 'sed' would not only mess
indentation, but also it isn't straightforward to get the 'PCIDevice *' you
need to pass to the pci_* helpers. (I'll try to focus on conversion next so we
can poison the old stuff.)

I also added (read "spelled it out myself") malc's ACK to the ac97 patch.
Nothing changed since his last review.

Please have a look and merge if you like it.


    Thanks,
    Eduard


Eduard - Gabriel Munteanu (7):
  pci: expand tabs to spaces in pci_regs.h
  pci: memory access API and IOMMU support
  AMD IOMMU emulation
  ide: use the PCI memory access interface
  rtl8139: use the PCI memory access interface
  eepro100: use the PCI memory access interface
  ac97: use the PCI memory access interface

 Makefile.target    |    2 +-
 dma-helpers.c      |   46 ++-
 dma.h              |   21 +-
 hw/ac97.c          |    6 +-
 hw/amd_iommu.c     |  663 ++++++++++++++++++++++++++
 hw/eepro100.c      |   86 ++--
 hw/ide/core.c      |   15 +-
 hw/ide/internal.h  |   39 ++
 hw/ide/macio.c     |    4 +-
 hw/ide/pci.c       |    7 +
 hw/pc.c            |    2 +
 hw/pci.c           |  185 ++++++++-
 hw/pci.h           |   74 +++
 hw/pci_ids.h       |    2 +
 hw/pci_internals.h |   12 +
 hw/pci_regs.h      | 1331 ++++++++++++++++++++++++++--------------------------
 hw/rtl8139.c       |   99 +++--
 qemu-common.h      |    1 +
 18 files changed, 1827 insertions(+), 768 deletions(-)
 create mode 100644 hw/amd_iommu.c
 rewrite hw/pci_regs.h (90%)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 0/7] AMD IOMMU emulation patchset v4
@ 2010-08-28 14:54 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

Hi,

I rebased my work on mst's PCI tree and, hopefully, fixed issues raised by
others. Here's a summary of the changes:
- made it apply to mst/pci
- moved some AMD IOMMU stuff in a reset handler
- dropped range_covers_range() (wasn't the same as ranges_overlap(), but the
  latter was better anyway)
- used 'expand' to remove tabs in pci_regs.h before applying the useful changes
- fixed the endianness mistake spotted by Blue (though ldq_phys wasn't needed)

As for Anthony's suggestion to simply sed-convert all devices, I'd rather go
through them one at a time and do it manually. 'sed' would not only mess
indentation, but also it isn't straightforward to get the 'PCIDevice *' you
need to pass to the pci_* helpers. (I'll try to focus on conversion next so we
can poison the old stuff.)

I also added (read "spelled it out myself") malc's ACK to the ac97 patch.
Nothing changed since his last review.

Please have a look and merge if you like it.


    Thanks,
    Eduard


Eduard - Gabriel Munteanu (7):
  pci: expand tabs to spaces in pci_regs.h
  pci: memory access API and IOMMU support
  AMD IOMMU emulation
  ide: use the PCI memory access interface
  rtl8139: use the PCI memory access interface
  eepro100: use the PCI memory access interface
  ac97: use the PCI memory access interface

 Makefile.target    |    2 +-
 dma-helpers.c      |   46 ++-
 dma.h              |   21 +-
 hw/ac97.c          |    6 +-
 hw/amd_iommu.c     |  663 ++++++++++++++++++++++++++
 hw/eepro100.c      |   86 ++--
 hw/ide/core.c      |   15 +-
 hw/ide/internal.h  |   39 ++
 hw/ide/macio.c     |    4 +-
 hw/ide/pci.c       |    7 +
 hw/pc.c            |    2 +
 hw/pci.c           |  185 ++++++++-
 hw/pci.h           |   74 +++
 hw/pci_ids.h       |    2 +
 hw/pci_internals.h |   12 +
 hw/pci_regs.h      | 1331 ++++++++++++++++++++++++++--------------------------
 hw/rtl8139.c       |   99 +++--
 qemu-common.h      |    1 +
 18 files changed, 1827 insertions(+), 768 deletions(-)
 create mode 100644 hw/amd_iommu.c
 rewrite hw/pci_regs.h (90%)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

The conversion was done using the GNU 'expand' tool (default settings)
to make it obey the QEMU coding style.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
 1 files changed, 665 insertions(+), 665 deletions(-)
 rewrite hw/pci_regs.h (90%)

diff --git a/hw/pci_regs.h b/hw/pci_regs.h
dissimilarity index 90%
index dd0bed4..0f9f84c 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -1,665 +1,665 @@
-/*
- *	pci_regs.h
- *
- *	PCI standard defines
- *	Copyright 1994, Drew Eckhardt
- *	Copyright 1997--1999 Martin Mares <mj@ucw.cz>
- *
- *	For more information, please consult the following manuals (look at
- *	http://www.pcisig.com/ for how to get them):
- *
- *	PCI BIOS Specification
- *	PCI Local Bus Specification
- *	PCI to PCI Bridge Specification
- *	PCI System Design Guide
- *
- * 	For hypertransport information, please consult the following manuals
- * 	from http://www.hypertransport.org
- *
- *	The Hypertransport I/O Link Specification
- */
-
-#ifndef LINUX_PCI_REGS_H
-#define LINUX_PCI_REGS_H
-
-/*
- * Under PCI, each device has 256 bytes of configuration address space,
- * of which the first 64 bytes are standardized as follows:
- */
-#define PCI_VENDOR_ID		0x00	/* 16 bits */
-#define PCI_DEVICE_ID		0x02	/* 16 bits */
-#define PCI_COMMAND		0x04	/* 16 bits */
-#define  PCI_COMMAND_IO		0x1	/* Enable response in I/O space */
-#define  PCI_COMMAND_MEMORY	0x2	/* Enable response in Memory space */
-#define  PCI_COMMAND_MASTER	0x4	/* Enable bus mastering */
-#define  PCI_COMMAND_SPECIAL	0x8	/* Enable response to special cycles */
-#define  PCI_COMMAND_INVALIDATE	0x10	/* Use memory write and invalidate */
-#define  PCI_COMMAND_VGA_PALETTE 0x20	/* Enable palette snooping */
-#define  PCI_COMMAND_PARITY	0x40	/* Enable parity checking */
-#define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
-#define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
-#define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
-#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
-
-#define PCI_STATUS		0x06	/* 16 bits */
-#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
-#define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
-#define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
-#define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
-#define  PCI_STATUS_FAST_BACK	0x80	/* Accept fast-back to back */
-#define  PCI_STATUS_PARITY	0x100	/* Detected parity error */
-#define  PCI_STATUS_DEVSEL_MASK	0x600	/* DEVSEL timing */
-#define  PCI_STATUS_DEVSEL_FAST		0x000
-#define  PCI_STATUS_DEVSEL_MEDIUM	0x200
-#define  PCI_STATUS_DEVSEL_SLOW		0x400
-#define  PCI_STATUS_SIG_TARGET_ABORT	0x800 /* Set on target abort */
-#define  PCI_STATUS_REC_TARGET_ABORT	0x1000 /* Master ack of " */
-#define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
-#define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
-#define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
-
-#define PCI_CLASS_REVISION	0x08	/* High 24 bits are class, low 8 revision */
-#define PCI_REVISION_ID		0x08	/* Revision ID */
-#define PCI_CLASS_PROG		0x09	/* Reg. Level Programming Interface */
-#define PCI_CLASS_DEVICE	0x0a	/* Device class */
-
-#define PCI_CACHE_LINE_SIZE	0x0c	/* 8 bits */
-#define PCI_LATENCY_TIMER	0x0d	/* 8 bits */
-#define PCI_HEADER_TYPE		0x0e	/* 8 bits */
-#define  PCI_HEADER_TYPE_NORMAL		0
-#define  PCI_HEADER_TYPE_BRIDGE		1
-#define  PCI_HEADER_TYPE_CARDBUS	2
-
-#define PCI_BIST		0x0f	/* 8 bits */
-#define  PCI_BIST_CODE_MASK	0x0f	/* Return result */
-#define  PCI_BIST_START		0x40	/* 1 to start BIST, 2 secs or less */
-#define  PCI_BIST_CAPABLE	0x80	/* 1 if BIST capable */
-
-/*
- * Base addresses specify locations in memory or I/O space.
- * Decoded size can be determined by writing a value of
- * 0xffffffff to the register, and reading it back.  Only
- * 1 bits are decoded.
- */
-#define PCI_BASE_ADDRESS_0	0x10	/* 32 bits */
-#define PCI_BASE_ADDRESS_1	0x14	/* 32 bits [htype 0,1 only] */
-#define PCI_BASE_ADDRESS_2	0x18	/* 32 bits [htype 0 only] */
-#define PCI_BASE_ADDRESS_3	0x1c	/* 32 bits */
-#define PCI_BASE_ADDRESS_4	0x20	/* 32 bits */
-#define PCI_BASE_ADDRESS_5	0x24	/* 32 bits */
-#define  PCI_BASE_ADDRESS_SPACE		0x01	/* 0 = memory, 1 = I/O */
-#define  PCI_BASE_ADDRESS_SPACE_IO	0x01
-#define  PCI_BASE_ADDRESS_SPACE_MEMORY	0x00
-#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK	0x06
-#define  PCI_BASE_ADDRESS_MEM_TYPE_32	0x00	/* 32 bit address */
-#define  PCI_BASE_ADDRESS_MEM_TYPE_1M	0x02	/* Below 1M [obsolete] */
-#define  PCI_BASE_ADDRESS_MEM_TYPE_64	0x04	/* 64 bit address */
-#define  PCI_BASE_ADDRESS_MEM_PREFETCH	0x08	/* prefetchable? */
-#define  PCI_BASE_ADDRESS_MEM_MASK	(~0x0fUL)
-#define  PCI_BASE_ADDRESS_IO_MASK	(~0x03UL)
-/* bit 1 is reserved if address_space = 1 */
-
-/* Header type 0 (normal devices) */
-#define PCI_CARDBUS_CIS		0x28
-#define PCI_SUBSYSTEM_VENDOR_ID	0x2c
-#define PCI_SUBSYSTEM_ID	0x2e
-#define PCI_ROM_ADDRESS		0x30	/* Bits 31..11 are address, 10..1 reserved */
-#define  PCI_ROM_ADDRESS_ENABLE	0x01
-#define PCI_ROM_ADDRESS_MASK	(~0x7ffUL)
-
-#define PCI_CAPABILITY_LIST	0x34	/* Offset of first capability list entry */
-
-/* 0x35-0x3b are reserved */
-#define PCI_INTERRUPT_LINE	0x3c	/* 8 bits */
-#define PCI_INTERRUPT_PIN	0x3d	/* 8 bits */
-#define PCI_MIN_GNT		0x3e	/* 8 bits */
-#define PCI_MAX_LAT		0x3f	/* 8 bits */
-
-/* Header type 1 (PCI-to-PCI bridges) */
-#define PCI_PRIMARY_BUS		0x18	/* Primary bus number */
-#define PCI_SECONDARY_BUS	0x19	/* Secondary bus number */
-#define PCI_SUBORDINATE_BUS	0x1a	/* Highest bus number behind the bridge */
-#define PCI_SEC_LATENCY_TIMER	0x1b	/* Latency timer for secondary interface */
-#define PCI_IO_BASE		0x1c	/* I/O range behind the bridge */
-#define PCI_IO_LIMIT		0x1d
-#define  PCI_IO_RANGE_TYPE_MASK	0x0fUL	/* I/O bridging type */
-#define  PCI_IO_RANGE_TYPE_16	0x00
-#define  PCI_IO_RANGE_TYPE_32	0x01
-#define  PCI_IO_RANGE_MASK	(~0x0fUL)
-#define PCI_SEC_STATUS		0x1e	/* Secondary status register, only bit 14 used */
-#define PCI_MEMORY_BASE		0x20	/* Memory range behind */
-#define PCI_MEMORY_LIMIT	0x22
-#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
-#define  PCI_MEMORY_RANGE_MASK	(~0x0fUL)
-#define PCI_PREF_MEMORY_BASE	0x24	/* Prefetchable memory range behind */
-#define PCI_PREF_MEMORY_LIMIT	0x26
-#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
-#define  PCI_PREF_RANGE_TYPE_32	0x00
-#define  PCI_PREF_RANGE_TYPE_64	0x01
-#define  PCI_PREF_RANGE_MASK	(~0x0fUL)
-#define PCI_PREF_BASE_UPPER32	0x28	/* Upper half of prefetchable memory range */
-#define PCI_PREF_LIMIT_UPPER32	0x2c
-#define PCI_IO_BASE_UPPER16	0x30	/* Upper half of I/O addresses */
-#define PCI_IO_LIMIT_UPPER16	0x32
-/* 0x34 same as for htype 0 */
-/* 0x35-0x3b is reserved */
-#define PCI_ROM_ADDRESS1	0x38	/* Same as PCI_ROM_ADDRESS, but for htype 1 */
-/* 0x3c-0x3d are same as for htype 0 */
-#define PCI_BRIDGE_CONTROL	0x3e
-#define  PCI_BRIDGE_CTL_PARITY	0x01	/* Enable parity detection on secondary interface */
-#define  PCI_BRIDGE_CTL_SERR	0x02	/* The same for SERR forwarding */
-#define  PCI_BRIDGE_CTL_ISA	0x04	/* Enable ISA mode */
-#define  PCI_BRIDGE_CTL_VGA	0x08	/* Forward VGA addresses */
-#define  PCI_BRIDGE_CTL_MASTER_ABORT	0x20  /* Report master aborts */
-#define  PCI_BRIDGE_CTL_BUS_RESET	0x40	/* Secondary bus reset */
-#define  PCI_BRIDGE_CTL_FAST_BACK	0x80	/* Fast Back2Back enabled on secondary interface */
-
-/* Header type 2 (CardBus bridges) */
-#define PCI_CB_CAPABILITY_LIST	0x14
-/* 0x15 reserved */
-#define PCI_CB_SEC_STATUS	0x16	/* Secondary status */
-#define PCI_CB_PRIMARY_BUS	0x18	/* PCI bus number */
-#define PCI_CB_CARD_BUS		0x19	/* CardBus bus number */
-#define PCI_CB_SUBORDINATE_BUS	0x1a	/* Subordinate bus number */
-#define PCI_CB_LATENCY_TIMER	0x1b	/* CardBus latency timer */
-#define PCI_CB_MEMORY_BASE_0	0x1c
-#define PCI_CB_MEMORY_LIMIT_0	0x20
-#define PCI_CB_MEMORY_BASE_1	0x24
-#define PCI_CB_MEMORY_LIMIT_1	0x28
-#define PCI_CB_IO_BASE_0	0x2c
-#define PCI_CB_IO_BASE_0_HI	0x2e
-#define PCI_CB_IO_LIMIT_0	0x30
-#define PCI_CB_IO_LIMIT_0_HI	0x32
-#define PCI_CB_IO_BASE_1	0x34
-#define PCI_CB_IO_BASE_1_HI	0x36
-#define PCI_CB_IO_LIMIT_1	0x38
-#define PCI_CB_IO_LIMIT_1_HI	0x3a
-#define  PCI_CB_IO_RANGE_MASK	(~0x03UL)
-/* 0x3c-0x3d are same as for htype 0 */
-#define PCI_CB_BRIDGE_CONTROL	0x3e
-#define  PCI_CB_BRIDGE_CTL_PARITY	0x01	/* Similar to standard bridge control register */
-#define  PCI_CB_BRIDGE_CTL_SERR		0x02
-#define  PCI_CB_BRIDGE_CTL_ISA		0x04
-#define  PCI_CB_BRIDGE_CTL_VGA		0x08
-#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT	0x20
-#define  PCI_CB_BRIDGE_CTL_CB_RESET	0x40	/* CardBus reset */
-#define  PCI_CB_BRIDGE_CTL_16BIT_INT	0x80	/* Enable interrupt for 16-bit cards */
-#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100	/* Prefetch enable for both memory regions */
-#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
-#define  PCI_CB_BRIDGE_CTL_POST_WRITES	0x400
-#define PCI_CB_SUBSYSTEM_VENDOR_ID	0x40
-#define PCI_CB_SUBSYSTEM_ID		0x42
-#define PCI_CB_LEGACY_MODE_BASE		0x44	/* 16-bit PC Card legacy mode base address (ExCa) */
-/* 0x48-0x7f reserved */
-
-/* Capability lists */
-
-#define PCI_CAP_LIST_ID		0	/* Capability ID */
-#define  PCI_CAP_ID_PM		0x01	/* Power Management */
-#define  PCI_CAP_ID_AGP		0x02	/* Accelerated Graphics Port */
-#define  PCI_CAP_ID_VPD		0x03	/* Vital Product Data */
-#define  PCI_CAP_ID_SLOTID	0x04	/* Slot Identification */
-#define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
-#define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
-#define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
-#define  PCI_CAP_ID_HT		0x08	/* HyperTransport */
-#define  PCI_CAP_ID_VNDR	0x09	/* Vendor specific */
-#define  PCI_CAP_ID_DBG		0x0A	/* Debug port */
-#define  PCI_CAP_ID_CCRC	0x0B	/* CompactPCI Central Resource Control */
-#define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
-#define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
-#define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
-#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
-#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
-#define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-#define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
-#define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
-#define PCI_CAP_SIZEOF		4
-
-/* Power Management Registers */
-
-#define PCI_PM_PMC		2	/* PM Capabilities Register */
-#define  PCI_PM_CAP_VER_MASK	0x0007	/* Version */
-#define  PCI_PM_CAP_PME_CLOCK	0x0008	/* PME clock required */
-#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
-#define  PCI_PM_CAP_DSI		0x0020	/* Device specific initialization */
-#define  PCI_PM_CAP_AUX_POWER	0x01C0	/* Auxilliary power support mask */
-#define  PCI_PM_CAP_D1		0x0200	/* D1 power state support */
-#define  PCI_PM_CAP_D2		0x0400	/* D2 power state support */
-#define  PCI_PM_CAP_PME		0x0800	/* PME pin supported */
-#define  PCI_PM_CAP_PME_MASK	0xF800	/* PME Mask of all supported states */
-#define  PCI_PM_CAP_PME_D0	0x0800	/* PME# from D0 */
-#define  PCI_PM_CAP_PME_D1	0x1000	/* PME# from D1 */
-#define  PCI_PM_CAP_PME_D2	0x2000	/* PME# from D2 */
-#define  PCI_PM_CAP_PME_D3	0x4000	/* PME# from D3 (hot) */
-#define  PCI_PM_CAP_PME_D3cold	0x8000	/* PME# from D3 (cold) */
-#define  PCI_PM_CAP_PME_SHIFT	11	/* Start of the PME Mask in PMC */
-#define PCI_PM_CTRL		4	/* PM control and status register */
-#define  PCI_PM_CTRL_STATE_MASK	0x0003	/* Current power state (D0 to D3) */
-#define  PCI_PM_CTRL_NO_SOFT_RESET	0x0008	/* No reset for D3hot->D0 */
-#define  PCI_PM_CTRL_PME_ENABLE	0x0100	/* PME pin enable */
-#define  PCI_PM_CTRL_DATA_SEL_MASK	0x1e00	/* Data select (??) */
-#define  PCI_PM_CTRL_DATA_SCALE_MASK	0x6000	/* Data scale (??) */
-#define  PCI_PM_CTRL_PME_STATUS	0x8000	/* PME pin status */
-#define PCI_PM_PPB_EXTENSIONS	6	/* PPB support extensions (??) */
-#define  PCI_PM_PPB_B2_B3	0x40	/* Stop clock when in D3hot (??) */
-#define  PCI_PM_BPCC_ENABLE	0x80	/* Bus power/clock control enable (??) */
-#define PCI_PM_DATA_REGISTER	7	/* (??) */
-#define PCI_PM_SIZEOF		8
-
-/* AGP registers */
-
-#define PCI_AGP_VERSION		2	/* BCD version number */
-#define PCI_AGP_RFU		3	/* Rest of capability flags */
-#define PCI_AGP_STATUS		4	/* Status register */
-#define  PCI_AGP_STATUS_RQ_MASK	0xff000000	/* Maximum number of requests - 1 */
-#define  PCI_AGP_STATUS_SBA	0x0200	/* Sideband addressing supported */
-#define  PCI_AGP_STATUS_64BIT	0x0020	/* 64-bit addressing supported */
-#define  PCI_AGP_STATUS_FW	0x0010	/* FW transfers supported */
-#define  PCI_AGP_STATUS_RATE4	0x0004	/* 4x transfer rate supported */
-#define  PCI_AGP_STATUS_RATE2	0x0002	/* 2x transfer rate supported */
-#define  PCI_AGP_STATUS_RATE1	0x0001	/* 1x transfer rate supported */
-#define PCI_AGP_COMMAND		8	/* Control register */
-#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
-#define  PCI_AGP_COMMAND_SBA	0x0200	/* Sideband addressing enabled */
-#define  PCI_AGP_COMMAND_AGP	0x0100	/* Allow processing of AGP transactions */
-#define  PCI_AGP_COMMAND_64BIT	0x0020 	/* Allow processing of 64-bit addresses */
-#define  PCI_AGP_COMMAND_FW	0x0010 	/* Force FW transfers */
-#define  PCI_AGP_COMMAND_RATE4	0x0004	/* Use 4x rate */
-#define  PCI_AGP_COMMAND_RATE2	0x0002	/* Use 2x rate */
-#define  PCI_AGP_COMMAND_RATE1	0x0001	/* Use 1x rate */
-#define PCI_AGP_SIZEOF		12
-
-/* Vital Product Data */
-
-#define PCI_VPD_ADDR		2	/* Address to access (15 bits!) */
-#define  PCI_VPD_ADDR_MASK	0x7fff	/* Address mask */
-#define  PCI_VPD_ADDR_F		0x8000	/* Write 0, 1 indicates completion */
-#define PCI_VPD_DATA		4	/* 32-bits of data returned here */
-
-/* Slot Identification */
-
-#define PCI_SID_ESR		2	/* Expansion Slot Register */
-#define  PCI_SID_ESR_NSLOTS	0x1f	/* Number of expansion slots available */
-#define  PCI_SID_ESR_FIC	0x20	/* First In Chassis Flag */
-#define PCI_SID_CHASSIS_NR	3	/* Chassis Number */
-
-/* Message Signalled Interrupts registers */
-
-#define PCI_MSI_FLAGS		2	/* Various flags */
-#define  PCI_MSI_FLAGS_64BIT	0x80	/* 64-bit addresses allowed */
-#define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
-#define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
-#define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
-#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
-#define PCI_MSI_RFU		3	/* Rest of capability flags */
-#define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
-#define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
-#define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
-#define PCI_MSI_MASK_32		12	/* Mask bits register for 32-bit devices */
-#define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
-#define PCI_MSI_MASK_64		16	/* Mask bits register for 64-bit devices */
-
-/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
-#define PCI_MSIX_FLAGS		2
-#define  PCI_MSIX_FLAGS_QSIZE	0x7FF
-#define  PCI_MSIX_FLAGS_ENABLE	(1 << 15)
-#define  PCI_MSIX_FLAGS_MASKALL	(1 << 14)
-#define PCI_MSIX_FLAGS_BIRMASK	(7 << 0)
-
-/* CompactPCI Hotswap Register */
-
-#define PCI_CHSWP_CSR		2	/* Control and Status Register */
-#define  PCI_CHSWP_DHA		0x01	/* Device Hiding Arm */
-#define  PCI_CHSWP_EIM		0x02	/* ENUM# Signal Mask */
-#define  PCI_CHSWP_PIE		0x04	/* Pending Insert or Extract */
-#define  PCI_CHSWP_LOO		0x08	/* LED On / Off */
-#define  PCI_CHSWP_PI		0x30	/* Programming Interface */
-#define  PCI_CHSWP_EXT		0x40	/* ENUM# status - extraction */
-#define  PCI_CHSWP_INS		0x80	/* ENUM# status - insertion */
-
-/* PCI Advanced Feature registers */
-
-#define PCI_AF_LENGTH		2
-#define PCI_AF_CAP		3
-#define  PCI_AF_CAP_TP		0x01
-#define  PCI_AF_CAP_FLR		0x02
-#define PCI_AF_CTRL		4
-#define  PCI_AF_CTRL_FLR	0x01
-#define PCI_AF_STATUS		5
-#define  PCI_AF_STATUS_TP	0x01
-
-/* PCI-X registers */
-
-#define PCI_X_CMD		2	/* Modes & Features */
-#define  PCI_X_CMD_DPERR_E	0x0001	/* Data Parity Error Recovery Enable */
-#define  PCI_X_CMD_ERO		0x0002	/* Enable Relaxed Ordering */
-#define  PCI_X_CMD_READ_512	0x0000	/* 512 byte maximum read byte count */
-#define  PCI_X_CMD_READ_1K	0x0004	/* 1Kbyte maximum read byte count */
-#define  PCI_X_CMD_READ_2K	0x0008	/* 2Kbyte maximum read byte count */
-#define  PCI_X_CMD_READ_4K	0x000c	/* 4Kbyte maximum read byte count */
-#define  PCI_X_CMD_MAX_READ	0x000c	/* Max Memory Read Byte Count */
-				/* Max # of outstanding split transactions */
-#define  PCI_X_CMD_SPLIT_1	0x0000	/* Max 1 */
-#define  PCI_X_CMD_SPLIT_2	0x0010	/* Max 2 */
-#define  PCI_X_CMD_SPLIT_3	0x0020	/* Max 3 */
-#define  PCI_X_CMD_SPLIT_4	0x0030	/* Max 4 */
-#define  PCI_X_CMD_SPLIT_8	0x0040	/* Max 8 */
-#define  PCI_X_CMD_SPLIT_12	0x0050	/* Max 12 */
-#define  PCI_X_CMD_SPLIT_16	0x0060	/* Max 16 */
-#define  PCI_X_CMD_SPLIT_32	0x0070	/* Max 32 */
-#define  PCI_X_CMD_MAX_SPLIT	0x0070	/* Max Outstanding Split Transactions */
-#define  PCI_X_CMD_VERSION(x) 	(((x) >> 12) & 3) /* Version */
-#define PCI_X_STATUS		4	/* PCI-X capabilities */
-#define  PCI_X_STATUS_DEVFN	0x000000ff	/* A copy of devfn */
-#define  PCI_X_STATUS_BUS	0x0000ff00	/* A copy of bus nr */
-#define  PCI_X_STATUS_64BIT	0x00010000	/* 64-bit device */
-#define  PCI_X_STATUS_133MHZ	0x00020000	/* 133 MHz capable */
-#define  PCI_X_STATUS_SPL_DISC	0x00040000	/* Split Completion Discarded */
-#define  PCI_X_STATUS_UNX_SPL	0x00080000	/* Unexpected Split Completion */
-#define  PCI_X_STATUS_COMPLEX	0x00100000	/* Device Complexity */
-#define  PCI_X_STATUS_MAX_READ	0x00600000	/* Designed Max Memory Read Count */
-#define  PCI_X_STATUS_MAX_SPLIT	0x03800000	/* Designed Max Outstanding Split Transactions */
-#define  PCI_X_STATUS_MAX_CUM	0x1c000000	/* Designed Max Cumulative Read Size */
-#define  PCI_X_STATUS_SPL_ERR	0x20000000	/* Rcvd Split Completion Error Msg */
-#define  PCI_X_STATUS_266MHZ	0x40000000	/* 266 MHz capable */
-#define  PCI_X_STATUS_533MHZ	0x80000000	/* 533 MHz capable */
-
-/* PCI Express capability registers */
-
-#define PCI_EXP_FLAGS		2	/* Capabilities register */
-#define PCI_EXP_FLAGS_VERS	0x000f	/* Capability version */
-#define PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
-#define  PCI_EXP_TYPE_ENDPOINT	0x0	/* Express Endpoint */
-#define  PCI_EXP_TYPE_LEG_END	0x1	/* Legacy Endpoint */
-#define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
-#define  PCI_EXP_TYPE_UPSTREAM	0x5	/* Upstream Port */
-#define  PCI_EXP_TYPE_DOWNSTREAM 0x6	/* Downstream Port */
-#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7	/* PCI/PCI-X Bridge */
-#define  PCI_EXP_TYPE_RC_END	0x9	/* Root Complex Integrated Endpoint */
-#define  PCI_EXP_TYPE_RC_EC	0x10	/* Root Complex Event Collector */
-#define PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
-#define PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
-#define PCI_EXP_DEVCAP		4	/* Device capabilities */
-#define  PCI_EXP_DEVCAP_PAYLOAD	0x07	/* Max_Payload_Size */
-#define  PCI_EXP_DEVCAP_PHANTOM	0x18	/* Phantom functions */
-#define  PCI_EXP_DEVCAP_EXT_TAG	0x20	/* Extended tags */
-#define  PCI_EXP_DEVCAP_L0S	0x1c0	/* L0s Acceptable Latency */
-#define  PCI_EXP_DEVCAP_L1	0xe00	/* L1 Acceptable Latency */
-#define  PCI_EXP_DEVCAP_ATN_BUT	0x1000	/* Attention Button Present */
-#define  PCI_EXP_DEVCAP_ATN_IND	0x2000	/* Attention Indicator Present */
-#define  PCI_EXP_DEVCAP_PWR_IND	0x4000	/* Power Indicator Present */
-#define  PCI_EXP_DEVCAP_RBER	0x8000	/* Role-Based Error Reporting */
-#define  PCI_EXP_DEVCAP_PWR_VAL	0x3fc0000 /* Slot Power Limit Value */
-#define  PCI_EXP_DEVCAP_PWR_SCL	0xc000000 /* Slot Power Limit Scale */
-#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
-#define PCI_EXP_DEVCTL		8	/* Device Control */
-#define  PCI_EXP_DEVCTL_CERE	0x0001	/* Correctable Error Reporting En. */
-#define  PCI_EXP_DEVCTL_NFERE	0x0002	/* Non-Fatal Error Reporting Enable */
-#define  PCI_EXP_DEVCTL_FERE	0x0004	/* Fatal Error Reporting Enable */
-#define  PCI_EXP_DEVCTL_URRE	0x0008	/* Unsupported Request Reporting En. */
-#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
-#define  PCI_EXP_DEVCTL_PAYLOAD	0x00e0	/* Max_Payload_Size */
-#define  PCI_EXP_DEVCTL_EXT_TAG	0x0100	/* Extended Tag Field Enable */
-#define  PCI_EXP_DEVCTL_PHANTOM	0x0200	/* Phantom Functions Enable */
-#define  PCI_EXP_DEVCTL_AUX_PME	0x0400	/* Auxiliary Power PM Enable */
-#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
-#define  PCI_EXP_DEVCTL_READRQ	0x7000	/* Max_Read_Request_Size */
-#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
-#define PCI_EXP_DEVSTA		10	/* Device Status */
-#define  PCI_EXP_DEVSTA_CED	0x01	/* Correctable Error Detected */
-#define  PCI_EXP_DEVSTA_NFED	0x02	/* Non-Fatal Error Detected */
-#define  PCI_EXP_DEVSTA_FED	0x04	/* Fatal Error Detected */
-#define  PCI_EXP_DEVSTA_URD	0x08	/* Unsupported Request Detected */
-#define  PCI_EXP_DEVSTA_AUXPD	0x10	/* AUX Power Detected */
-#define  PCI_EXP_DEVSTA_TRPND	0x20	/* Transactions Pending */
-#define PCI_EXP_LNKCAP		12	/* Link Capabilities */
-#define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
-#define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
-#define  PCI_EXP_LNKCAP_ASPMS	0x00000c00 /* ASPM Support */
-#define  PCI_EXP_LNKCAP_L0SEL	0x00007000 /* L0s Exit Latency */
-#define  PCI_EXP_LNKCAP_L1EL	0x00038000 /* L1 Exit Latency */
-#define  PCI_EXP_LNKCAP_CLKPM	0x00040000 /* L1 Clock Power Management */
-#define  PCI_EXP_LNKCAP_SDERC	0x00080000 /* Suprise Down Error Reporting Capable */
-#define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
-#define  PCI_EXP_LNKCAP_LBNC	0x00200000 /* Link Bandwidth Notification Capability */
-#define  PCI_EXP_LNKCAP_PN	0xff000000 /* Port Number */
-#define PCI_EXP_LNKCTL		16	/* Link Control */
-#define  PCI_EXP_LNKCTL_ASPMC	0x0003	/* ASPM Control */
-#define  PCI_EXP_LNKCTL_RCB	0x0008	/* Read Completion Boundary */
-#define  PCI_EXP_LNKCTL_LD	0x0010	/* Link Disable */
-#define  PCI_EXP_LNKCTL_RL	0x0020	/* Retrain Link */
-#define  PCI_EXP_LNKCTL_CCC	0x0040	/* Common Clock Configuration */
-#define  PCI_EXP_LNKCTL_ES	0x0080	/* Extended Synch */
-#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100	/* Enable clkreq */
-#define  PCI_EXP_LNKCTL_HAWD	0x0200	/* Hardware Autonomous Width Disable */
-#define  PCI_EXP_LNKCTL_LBMIE	0x0400	/* Link Bandwidth Management Interrupt Enable */
-#define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Lnk Autonomous Bandwidth Interrupt Enable */
-#define PCI_EXP_LNKSTA		18	/* Link Status */
-#define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
-#define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Nogotiated Link Width */
-#define  PCI_EXP_LNKSTA_LT	0x0800	/* Link Training */
-#define  PCI_EXP_LNKSTA_SLC	0x1000	/* Slot Clock Configuration */
-#define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
-#define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
-#define  PCI_EXP_LNKSTA_LABS	0x8000	/* Link Autonomous Bandwidth Status */
-#define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
-#define  PCI_EXP_SLTCAP_ABP	0x00000001 /* Attention Button Present */
-#define  PCI_EXP_SLTCAP_PCP	0x00000002 /* Power Controller Present */
-#define  PCI_EXP_SLTCAP_MRLSP	0x00000004 /* MRL Sensor Present */
-#define  PCI_EXP_SLTCAP_AIP	0x00000008 /* Attention Indicator Present */
-#define  PCI_EXP_SLTCAP_PIP	0x00000010 /* Power Indicator Present */
-#define  PCI_EXP_SLTCAP_HPS	0x00000020 /* Hot-Plug Surprise */
-#define  PCI_EXP_SLTCAP_HPC	0x00000040 /* Hot-Plug Capable */
-#define  PCI_EXP_SLTCAP_SPLV	0x00007f80 /* Slot Power Limit Value */
-#define  PCI_EXP_SLTCAP_SPLS	0x00018000 /* Slot Power Limit Scale */
-#define  PCI_EXP_SLTCAP_EIP	0x00020000 /* Electromechanical Interlock Present */
-#define  PCI_EXP_SLTCAP_NCCS	0x00040000 /* No Command Completed Support */
-#define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
-#define PCI_EXP_SLTCTL		24	/* Slot Control */
-#define  PCI_EXP_SLTCTL_ABPE	0x0001	/* Attention Button Pressed Enable */
-#define  PCI_EXP_SLTCTL_PFDE	0x0002	/* Power Fault Detected Enable */
-#define  PCI_EXP_SLTCTL_MRLSCE	0x0004	/* MRL Sensor Changed Enable */
-#define  PCI_EXP_SLTCTL_PDCE	0x0008	/* Presence Detect Changed Enable */
-#define  PCI_EXP_SLTCTL_CCIE	0x0010	/* Command Completed Interrupt Enable */
-#define  PCI_EXP_SLTCTL_HPIE	0x0020	/* Hot-Plug Interrupt Enable */
-#define  PCI_EXP_SLTCTL_AIC	0x00c0	/* Attention Indicator Control */
-#define  PCI_EXP_SLTCTL_PIC	0x0300	/* Power Indicator Control */
-#define  PCI_EXP_SLTCTL_PCC	0x0400	/* Power Controller Control */
-#define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
-#define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
-#define PCI_EXP_SLTSTA		26	/* Slot Status */
-#define  PCI_EXP_SLTSTA_ABP	0x0001	/* Attention Button Pressed */
-#define  PCI_EXP_SLTSTA_PFD	0x0002	/* Power Fault Detected */
-#define  PCI_EXP_SLTSTA_MRLSC	0x0004	/* MRL Sensor Changed */
-#define  PCI_EXP_SLTSTA_PDC	0x0008	/* Presence Detect Changed */
-#define  PCI_EXP_SLTSTA_CC	0x0010	/* Command Completed */
-#define  PCI_EXP_SLTSTA_MRLSS	0x0020	/* MRL Sensor State */
-#define  PCI_EXP_SLTSTA_PDS	0x0040	/* Presence Detect State */
-#define  PCI_EXP_SLTSTA_EIS	0x0080	/* Electromechanical Interlock Status */
-#define  PCI_EXP_SLTSTA_DLLSC	0x0100	/* Data Link Layer State Changed */
-#define PCI_EXP_RTCTL		28	/* Root Control */
-#define  PCI_EXP_RTCTL_SECEE	0x01	/* System Error on Correctable Error */
-#define  PCI_EXP_RTCTL_SENFEE	0x02	/* System Error on Non-Fatal Error */
-#define  PCI_EXP_RTCTL_SEFEE	0x04	/* System Error on Fatal Error */
-#define  PCI_EXP_RTCTL_PMEIE	0x08	/* PME Interrupt Enable */
-#define  PCI_EXP_RTCTL_CRSSVE	0x10	/* CRS Software Visibility Enable */
-#define PCI_EXP_RTCAP		30	/* Root Capabilities */
-#define PCI_EXP_RTSTA		32	/* Root Status */
-#define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
-#define  PCI_EXP_DEVCAP2_ARI	0x20	/* Alternative Routing-ID */
-#define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
-#define  PCI_EXP_DEVCTL2_ARI	0x20	/* Alternative Routing-ID */
-#define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
-#define PCI_EXP_SLTCTL2		56	/* Slot Control 2 */
-
-/* Extended Capabilities (PCI-X 2.0 and Express) */
-#define PCI_EXT_CAP_ID(header)		(header & 0x0000ffff)
-#define PCI_EXT_CAP_VER(header)		((header >> 16) & 0xf)
-#define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
-
-#define PCI_EXT_CAP_ID_ERR	1
-#define PCI_EXT_CAP_ID_VC	2
-#define PCI_EXT_CAP_ID_DSN	3
-#define PCI_EXT_CAP_ID_PWR	4
-#define PCI_EXT_CAP_ID_ARI	14
-#define PCI_EXT_CAP_ID_ATS	15
-#define PCI_EXT_CAP_ID_SRIOV	16
-
-/* Advanced Error Reporting */
-#define PCI_ERR_UNCOR_STATUS	4	/* Uncorrectable Error Status */
-#define  PCI_ERR_UNC_TRAIN	0x00000001	/* Training */
-#define  PCI_ERR_UNC_DLP	0x00000010	/* Data Link Protocol */
-#define  PCI_ERR_UNC_POISON_TLP	0x00001000	/* Poisoned TLP */
-#define  PCI_ERR_UNC_FCP	0x00002000	/* Flow Control Protocol */
-#define  PCI_ERR_UNC_COMP_TIME	0x00004000	/* Completion Timeout */
-#define  PCI_ERR_UNC_COMP_ABORT	0x00008000	/* Completer Abort */
-#define  PCI_ERR_UNC_UNX_COMP	0x00010000	/* Unexpected Completion */
-#define  PCI_ERR_UNC_RX_OVER	0x00020000	/* Receiver Overflow */
-#define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
-#define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
-#define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
-#define PCI_ERR_UNCOR_MASK	8	/* Uncorrectable Error Mask */
-	/* Same bits as above */
-#define PCI_ERR_UNCOR_SEVER	12	/* Uncorrectable Error Severity */
-	/* Same bits as above */
-#define PCI_ERR_COR_STATUS	16	/* Correctable Error Status */
-#define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
-#define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
-#define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
-#define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
-#define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
-#define PCI_ERR_COR_MASK	20	/* Correctable Error Mask */
-	/* Same bits as above */
-#define PCI_ERR_CAP		24	/* Advanced Error Capabilities */
-#define  PCI_ERR_CAP_FEP(x)	((x) & 31)	/* First Error Pointer */
-#define  PCI_ERR_CAP_ECRC_GENC	0x00000020	/* ECRC Generation Capable */
-#define  PCI_ERR_CAP_ECRC_GENE	0x00000040	/* ECRC Generation Enable */
-#define  PCI_ERR_CAP_ECRC_CHKC	0x00000080	/* ECRC Check Capable */
-#define  PCI_ERR_CAP_ECRC_CHKE	0x00000100	/* ECRC Check Enable */
-#define PCI_ERR_HEADER_LOG	28	/* Header Log Register (16 bytes) */
-#define PCI_ERR_ROOT_COMMAND	44	/* Root Error Command */
-/* Correctable Err Reporting Enable */
-#define PCI_ERR_ROOT_CMD_COR_EN		0x00000001
-/* Non-fatal Err Reporting Enable */
-#define PCI_ERR_ROOT_CMD_NONFATAL_EN	0x00000002
-/* Fatal Err Reporting Enable */
-#define PCI_ERR_ROOT_CMD_FATAL_EN	0x00000004
-#define PCI_ERR_ROOT_STATUS	48
-#define PCI_ERR_ROOT_COR_RCV		0x00000001	/* ERR_COR Received */
-/* Multi ERR_COR Received */
-#define PCI_ERR_ROOT_MULTI_COR_RCV	0x00000002
-/* ERR_FATAL/NONFATAL Recevied */
-#define PCI_ERR_ROOT_UNCOR_RCV		0x00000004
-/* Multi ERR_FATAL/NONFATAL Recevied */
-#define PCI_ERR_ROOT_MULTI_UNCOR_RCV	0x00000008
-#define PCI_ERR_ROOT_FIRST_FATAL	0x00000010	/* First Fatal */
-#define PCI_ERR_ROOT_NONFATAL_RCV	0x00000020	/* Non-Fatal Received */
-#define PCI_ERR_ROOT_FATAL_RCV		0x00000040	/* Fatal Received */
-#define PCI_ERR_ROOT_COR_SRC	52
-#define PCI_ERR_ROOT_SRC	54
-
-/* Virtual Channel */
-#define PCI_VC_PORT_REG1	4
-#define PCI_VC_PORT_REG2	8
-#define PCI_VC_PORT_CTRL	12
-#define PCI_VC_PORT_STATUS	14
-#define PCI_VC_RES_CAP		16
-#define PCI_VC_RES_CTRL		20
-#define PCI_VC_RES_STATUS	26
-
-/* Power Budgeting */
-#define PCI_PWR_DSR		4	/* Data Select Register */
-#define PCI_PWR_DATA		8	/* Data Register */
-#define  PCI_PWR_DATA_BASE(x)	((x) & 0xff)	    /* Base Power */
-#define  PCI_PWR_DATA_SCALE(x)	(((x) >> 8) & 3)    /* Data Scale */
-#define  PCI_PWR_DATA_PM_SUB(x)	(((x) >> 10) & 7)   /* PM Sub State */
-#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
-#define  PCI_PWR_DATA_TYPE(x)	(((x) >> 15) & 7)   /* Type */
-#define  PCI_PWR_DATA_RAIL(x)	(((x) >> 18) & 7)   /* Power Rail */
-#define PCI_PWR_CAP		12	/* Capability */
-#define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
-
-/*
- * Hypertransport sub capability types
- *
- * Unfortunately there are both 3 bit and 5 bit capability types defined
- * in the HT spec, catering for that is a little messy. You probably don't
- * want to use these directly, just use pci_find_ht_capability() and it
- * will do the right thing for you.
- */
-#define HT_3BIT_CAP_MASK	0xE0
-#define HT_CAPTYPE_SLAVE	0x00	/* Slave/Primary link configuration */
-#define HT_CAPTYPE_HOST		0x20	/* Host/Secondary link configuration */
-
-#define HT_5BIT_CAP_MASK	0xF8
-#define HT_CAPTYPE_IRQ		0x80	/* IRQ Configuration */
-#define HT_CAPTYPE_REMAPPING_40	0xA0	/* 40 bit address remapping */
-#define HT_CAPTYPE_REMAPPING_64 0xA2	/* 64 bit address remapping */
-#define HT_CAPTYPE_UNITID_CLUMP	0x90	/* Unit ID clumping */
-#define HT_CAPTYPE_EXTCONF	0x98	/* Extended Configuration Space Access */
-#define HT_CAPTYPE_MSI_MAPPING	0xA8	/* MSI Mapping Capability */
-#define  HT_MSI_FLAGS		0x02		/* Offset to flags */
-#define  HT_MSI_FLAGS_ENABLE	0x1		/* Mapping enable */
-#define  HT_MSI_FLAGS_FIXED	0x2		/* Fixed mapping only */
-#define  HT_MSI_FIXED_ADDR	0x00000000FEE00000ULL	/* Fixed addr */
-#define  HT_MSI_ADDR_LO		0x04		/* Offset to low addr bits */
-#define  HT_MSI_ADDR_LO_MASK	0xFFF00000	/* Low address bit mask */
-#define  HT_MSI_ADDR_HI		0x08		/* Offset to high addr bits */
-#define HT_CAPTYPE_DIRECT_ROUTE	0xB0	/* Direct routing configuration */
-#define HT_CAPTYPE_VCSET	0xB8	/* Virtual Channel configuration */
-#define HT_CAPTYPE_ERROR_RETRY	0xC0	/* Retry on error configuration */
-#define HT_CAPTYPE_GEN3		0xD0	/* Generation 3 hypertransport configuration */
-#define HT_CAPTYPE_PM		0xE0	/* Hypertransport powermanagement configuration */
-
-/* Alternative Routing-ID Interpretation */
-#define PCI_ARI_CAP		0x04	/* ARI Capability Register */
-#define  PCI_ARI_CAP_MFVC	0x0001	/* MFVC Function Groups Capability */
-#define  PCI_ARI_CAP_ACS	0x0002	/* ACS Function Groups Capability */
-#define  PCI_ARI_CAP_NFN(x)	(((x) >> 8) & 0xff) /* Next Function Number */
-#define PCI_ARI_CTRL		0x06	/* ARI Control Register */
-#define  PCI_ARI_CTRL_MFVC	0x0001	/* MFVC Function Groups Enable */
-#define  PCI_ARI_CTRL_ACS	0x0002	/* ACS Function Groups Enable */
-#define  PCI_ARI_CTRL_FG(x)	(((x) >> 4) & 7) /* Function Group */
-
-/* Address Translation Service */
-#define PCI_ATS_CAP		0x04	/* ATS Capability Register */
-#define  PCI_ATS_CAP_QDEP(x)	((x) & 0x1f)	/* Invalidate Queue Depth */
-#define  PCI_ATS_MAX_QDEP	32	/* Max Invalidate Queue Depth */
-#define PCI_ATS_CTRL		0x06	/* ATS Control Register */
-#define  PCI_ATS_CTRL_ENABLE	0x8000	/* ATS Enable */
-#define  PCI_ATS_CTRL_STU(x)	((x) & 0x1f)	/* Smallest Translation Unit */
-#define  PCI_ATS_MIN_STU	12	/* shift of minimum STU block */
-
-/* Single Root I/O Virtualization */
-#define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
-#define  PCI_SRIOV_CAP_VFM	0x01	/* VF Migration Capable */
-#define  PCI_SRIOV_CAP_INTR(x)	((x) >> 21) /* Interrupt Message Number */
-#define PCI_SRIOV_CTRL		0x08	/* SR-IOV Control */
-#define  PCI_SRIOV_CTRL_VFE	0x01	/* VF Enable */
-#define  PCI_SRIOV_CTRL_VFM	0x02	/* VF Migration Enable */
-#define  PCI_SRIOV_CTRL_INTR	0x04	/* VF Migration Interrupt Enable */
-#define  PCI_SRIOV_CTRL_MSE	0x08	/* VF Memory Space Enable */
-#define  PCI_SRIOV_CTRL_ARI	0x10	/* ARI Capable Hierarchy */
-#define PCI_SRIOV_STATUS	0x0a	/* SR-IOV Status */
-#define  PCI_SRIOV_STATUS_VFM	0x01	/* VF Migration Status */
-#define PCI_SRIOV_INITIAL_VF	0x0c	/* Initial VFs */
-#define PCI_SRIOV_TOTAL_VF	0x0e	/* Total VFs */
-#define PCI_SRIOV_NUM_VF	0x10	/* Number of VFs */
-#define PCI_SRIOV_FUNC_LINK	0x12	/* Function Dependency Link */
-#define PCI_SRIOV_VF_OFFSET	0x14	/* First VF Offset */
-#define PCI_SRIOV_VF_STRIDE	0x16	/* Following VF Stride */
-#define PCI_SRIOV_VF_DID	0x1a	/* VF Device ID */
-#define PCI_SRIOV_SUP_PGSIZE	0x1c	/* Supported Page Sizes */
-#define PCI_SRIOV_SYS_PGSIZE	0x20	/* System Page Size */
-#define PCI_SRIOV_BAR		0x24	/* VF BAR0 */
-#define  PCI_SRIOV_NUM_BARS	6	/* Number of VF BARs */
-#define PCI_SRIOV_VFM		0x3c	/* VF Migration State Array Offset*/
-#define  PCI_SRIOV_VFM_BIR(x)	((x) & 7)	/* State BIR */
-#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)	/* State Offset */
-#define  PCI_SRIOV_VFM_UA	0x0	/* Inactive.Unavailable */
-#define  PCI_SRIOV_VFM_MI	0x1	/* Dormant.MigrateIn */
-#define  PCI_SRIOV_VFM_MO	0x2	/* Active.MigrateOut */
-#define  PCI_SRIOV_VFM_AV	0x3	/* Active.Available */
-
-#endif /* LINUX_PCI_REGS_H */
+/*
+ *      pci_regs.h
+ *
+ *      PCI standard defines
+ *      Copyright 1994, Drew Eckhardt
+ *      Copyright 1997--1999 Martin Mares <mj@ucw.cz>
+ *
+ *      For more information, please consult the following manuals (look at
+ *      http://www.pcisig.com/ for how to get them):
+ *
+ *      PCI BIOS Specification
+ *      PCI Local Bus Specification
+ *      PCI to PCI Bridge Specification
+ *      PCI System Design Guide
+ *
+ *      For hypertransport information, please consult the following manuals
+ *      from http://www.hypertransport.org
+ *
+ *      The Hypertransport I/O Link Specification
+ */
+
+#ifndef LINUX_PCI_REGS_H
+#define LINUX_PCI_REGS_H
+
+/*
+ * Under PCI, each device has 256 bytes of configuration address space,
+ * of which the first 64 bytes are standardized as follows:
+ */
+#define PCI_VENDOR_ID           0x00    /* 16 bits */
+#define PCI_DEVICE_ID           0x02    /* 16 bits */
+#define PCI_COMMAND             0x04    /* 16 bits */
+#define  PCI_COMMAND_IO         0x1     /* Enable response in I/O space */
+#define  PCI_COMMAND_MEMORY     0x2     /* Enable response in Memory space */
+#define  PCI_COMMAND_MASTER     0x4     /* Enable bus mastering */
+#define  PCI_COMMAND_SPECIAL    0x8     /* Enable response to special cycles */
+#define  PCI_COMMAND_INVALIDATE 0x10    /* Use memory write and invalidate */
+#define  PCI_COMMAND_VGA_PALETTE 0x20   /* Enable palette snooping */
+#define  PCI_COMMAND_PARITY     0x40    /* Enable parity checking */
+#define  PCI_COMMAND_WAIT       0x80    /* Enable address/data stepping */
+#define  PCI_COMMAND_SERR       0x100   /* Enable SERR */
+#define  PCI_COMMAND_FAST_BACK  0x200   /* Enable back-to-back writes */
+#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
+
+#define PCI_STATUS              0x06    /* 16 bits */
+#define  PCI_STATUS_INTERRUPT   0x08    /* Interrupt status */
+#define  PCI_STATUS_CAP_LIST    0x10    /* Support Capability List */
+#define  PCI_STATUS_66MHZ       0x20    /* Support 66 Mhz PCI 2.1 bus */
+#define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
+#define  PCI_STATUS_FAST_BACK   0x80    /* Accept fast-back to back */
+#define  PCI_STATUS_PARITY      0x100   /* Detected parity error */
+#define  PCI_STATUS_DEVSEL_MASK 0x600   /* DEVSEL timing */
+#define  PCI_STATUS_DEVSEL_FAST         0x000
+#define  PCI_STATUS_DEVSEL_MEDIUM       0x200
+#define  PCI_STATUS_DEVSEL_SLOW         0x400
+#define  PCI_STATUS_SIG_TARGET_ABORT    0x800 /* Set on target abort */
+#define  PCI_STATUS_REC_TARGET_ABORT    0x1000 /* Master ack of " */
+#define  PCI_STATUS_REC_MASTER_ABORT    0x2000 /* Set on master abort */
+#define  PCI_STATUS_SIG_SYSTEM_ERROR    0x4000 /* Set when we drive SERR */
+#define  PCI_STATUS_DETECTED_PARITY     0x8000 /* Set on parity error */
+
+#define PCI_CLASS_REVISION      0x08    /* High 24 bits are class, low 8 revision */
+#define PCI_REVISION_ID         0x08    /* Revision ID */
+#define PCI_CLASS_PROG          0x09    /* Reg. Level Programming Interface */
+#define PCI_CLASS_DEVICE        0x0a    /* Device class */
+
+#define PCI_CACHE_LINE_SIZE     0x0c    /* 8 bits */
+#define PCI_LATENCY_TIMER       0x0d    /* 8 bits */
+#define PCI_HEADER_TYPE         0x0e    /* 8 bits */
+#define  PCI_HEADER_TYPE_NORMAL         0
+#define  PCI_HEADER_TYPE_BRIDGE         1
+#define  PCI_HEADER_TYPE_CARDBUS        2
+
+#define PCI_BIST                0x0f    /* 8 bits */
+#define  PCI_BIST_CODE_MASK     0x0f    /* Return result */
+#define  PCI_BIST_START         0x40    /* 1 to start BIST, 2 secs or less */
+#define  PCI_BIST_CAPABLE       0x80    /* 1 if BIST capable */
+
+/*
+ * Base addresses specify locations in memory or I/O space.
+ * Decoded size can be determined by writing a value of
+ * 0xffffffff to the register, and reading it back.  Only
+ * 1 bits are decoded.
+ */
+#define PCI_BASE_ADDRESS_0      0x10    /* 32 bits */
+#define PCI_BASE_ADDRESS_1      0x14    /* 32 bits [htype 0,1 only] */
+#define PCI_BASE_ADDRESS_2      0x18    /* 32 bits [htype 0 only] */
+#define PCI_BASE_ADDRESS_3      0x1c    /* 32 bits */
+#define PCI_BASE_ADDRESS_4      0x20    /* 32 bits */
+#define PCI_BASE_ADDRESS_5      0x24    /* 32 bits */
+#define  PCI_BASE_ADDRESS_SPACE         0x01    /* 0 = memory, 1 = I/O */
+#define  PCI_BASE_ADDRESS_SPACE_IO      0x01
+#define  PCI_BASE_ADDRESS_SPACE_MEMORY  0x00
+#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK 0x06
+#define  PCI_BASE_ADDRESS_MEM_TYPE_32   0x00    /* 32 bit address */
+#define  PCI_BASE_ADDRESS_MEM_TYPE_1M   0x02    /* Below 1M [obsolete] */
+#define  PCI_BASE_ADDRESS_MEM_TYPE_64   0x04    /* 64 bit address */
+#define  PCI_BASE_ADDRESS_MEM_PREFETCH  0x08    /* prefetchable? */
+#define  PCI_BASE_ADDRESS_MEM_MASK      (~0x0fUL)
+#define  PCI_BASE_ADDRESS_IO_MASK       (~0x03UL)
+/* bit 1 is reserved if address_space = 1 */
+
+/* Header type 0 (normal devices) */
+#define PCI_CARDBUS_CIS         0x28
+#define PCI_SUBSYSTEM_VENDOR_ID 0x2c
+#define PCI_SUBSYSTEM_ID        0x2e
+#define PCI_ROM_ADDRESS         0x30    /* Bits 31..11 are address, 10..1 reserved */
+#define  PCI_ROM_ADDRESS_ENABLE 0x01
+#define PCI_ROM_ADDRESS_MASK    (~0x7ffUL)
+
+#define PCI_CAPABILITY_LIST     0x34    /* Offset of first capability list entry */
+
+/* 0x35-0x3b are reserved */
+#define PCI_INTERRUPT_LINE      0x3c    /* 8 bits */
+#define PCI_INTERRUPT_PIN       0x3d    /* 8 bits */
+#define PCI_MIN_GNT             0x3e    /* 8 bits */
+#define PCI_MAX_LAT             0x3f    /* 8 bits */
+
+/* Header type 1 (PCI-to-PCI bridges) */
+#define PCI_PRIMARY_BUS         0x18    /* Primary bus number */
+#define PCI_SECONDARY_BUS       0x19    /* Secondary bus number */
+#define PCI_SUBORDINATE_BUS     0x1a    /* Highest bus number behind the bridge */
+#define PCI_SEC_LATENCY_TIMER   0x1b    /* Latency timer for secondary interface */
+#define PCI_IO_BASE             0x1c    /* I/O range behind the bridge */
+#define PCI_IO_LIMIT            0x1d
+#define  PCI_IO_RANGE_TYPE_MASK 0x0fUL  /* I/O bridging type */
+#define  PCI_IO_RANGE_TYPE_16   0x00
+#define  PCI_IO_RANGE_TYPE_32   0x01
+#define  PCI_IO_RANGE_MASK      (~0x0fUL)
+#define PCI_SEC_STATUS          0x1e    /* Secondary status register, only bit 14 used */
+#define PCI_MEMORY_BASE         0x20    /* Memory range behind */
+#define PCI_MEMORY_LIMIT        0x22
+#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
+#define  PCI_MEMORY_RANGE_MASK  (~0x0fUL)
+#define PCI_PREF_MEMORY_BASE    0x24    /* Prefetchable memory range behind */
+#define PCI_PREF_MEMORY_LIMIT   0x26
+#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
+#define  PCI_PREF_RANGE_TYPE_32 0x00
+#define  PCI_PREF_RANGE_TYPE_64 0x01
+#define  PCI_PREF_RANGE_MASK    (~0x0fUL)
+#define PCI_PREF_BASE_UPPER32   0x28    /* Upper half of prefetchable memory range */
+#define PCI_PREF_LIMIT_UPPER32  0x2c
+#define PCI_IO_BASE_UPPER16     0x30    /* Upper half of I/O addresses */
+#define PCI_IO_LIMIT_UPPER16    0x32
+/* 0x34 same as for htype 0 */
+/* 0x35-0x3b is reserved */
+#define PCI_ROM_ADDRESS1        0x38    /* Same as PCI_ROM_ADDRESS, but for htype 1 */
+/* 0x3c-0x3d are same as for htype 0 */
+#define PCI_BRIDGE_CONTROL      0x3e
+#define  PCI_BRIDGE_CTL_PARITY  0x01    /* Enable parity detection on secondary interface */
+#define  PCI_BRIDGE_CTL_SERR    0x02    /* The same for SERR forwarding */
+#define  PCI_BRIDGE_CTL_ISA     0x04    /* Enable ISA mode */
+#define  PCI_BRIDGE_CTL_VGA     0x08    /* Forward VGA addresses */
+#define  PCI_BRIDGE_CTL_MASTER_ABORT    0x20  /* Report master aborts */
+#define  PCI_BRIDGE_CTL_BUS_RESET       0x40    /* Secondary bus reset */
+#define  PCI_BRIDGE_CTL_FAST_BACK       0x80    /* Fast Back2Back enabled on secondary interface */
+
+/* Header type 2 (CardBus bridges) */
+#define PCI_CB_CAPABILITY_LIST  0x14
+/* 0x15 reserved */
+#define PCI_CB_SEC_STATUS       0x16    /* Secondary status */
+#define PCI_CB_PRIMARY_BUS      0x18    /* PCI bus number */
+#define PCI_CB_CARD_BUS         0x19    /* CardBus bus number */
+#define PCI_CB_SUBORDINATE_BUS  0x1a    /* Subordinate bus number */
+#define PCI_CB_LATENCY_TIMER    0x1b    /* CardBus latency timer */
+#define PCI_CB_MEMORY_BASE_0    0x1c
+#define PCI_CB_MEMORY_LIMIT_0   0x20
+#define PCI_CB_MEMORY_BASE_1    0x24
+#define PCI_CB_MEMORY_LIMIT_1   0x28
+#define PCI_CB_IO_BASE_0        0x2c
+#define PCI_CB_IO_BASE_0_HI     0x2e
+#define PCI_CB_IO_LIMIT_0       0x30
+#define PCI_CB_IO_LIMIT_0_HI    0x32
+#define PCI_CB_IO_BASE_1        0x34
+#define PCI_CB_IO_BASE_1_HI     0x36
+#define PCI_CB_IO_LIMIT_1       0x38
+#define PCI_CB_IO_LIMIT_1_HI    0x3a
+#define  PCI_CB_IO_RANGE_MASK   (~0x03UL)
+/* 0x3c-0x3d are same as for htype 0 */
+#define PCI_CB_BRIDGE_CONTROL   0x3e
+#define  PCI_CB_BRIDGE_CTL_PARITY       0x01    /* Similar to standard bridge control register */
+#define  PCI_CB_BRIDGE_CTL_SERR         0x02
+#define  PCI_CB_BRIDGE_CTL_ISA          0x04
+#define  PCI_CB_BRIDGE_CTL_VGA          0x08
+#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT 0x20
+#define  PCI_CB_BRIDGE_CTL_CB_RESET     0x40    /* CardBus reset */
+#define  PCI_CB_BRIDGE_CTL_16BIT_INT    0x80    /* Enable interrupt for 16-bit cards */
+#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100  /* Prefetch enable for both memory regions */
+#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
+#define  PCI_CB_BRIDGE_CTL_POST_WRITES  0x400
+#define PCI_CB_SUBSYSTEM_VENDOR_ID      0x40
+#define PCI_CB_SUBSYSTEM_ID             0x42
+#define PCI_CB_LEGACY_MODE_BASE         0x44    /* 16-bit PC Card legacy mode base address (ExCa) */
+/* 0x48-0x7f reserved */
+
+/* Capability lists */
+
+#define PCI_CAP_LIST_ID         0       /* Capability ID */
+#define  PCI_CAP_ID_PM          0x01    /* Power Management */
+#define  PCI_CAP_ID_AGP         0x02    /* Accelerated Graphics Port */
+#define  PCI_CAP_ID_VPD         0x03    /* Vital Product Data */
+#define  PCI_CAP_ID_SLOTID      0x04    /* Slot Identification */
+#define  PCI_CAP_ID_MSI         0x05    /* Message Signalled Interrupts */
+#define  PCI_CAP_ID_CHSWP       0x06    /* CompactPCI HotSwap */
+#define  PCI_CAP_ID_PCIX        0x07    /* PCI-X */
+#define  PCI_CAP_ID_HT          0x08    /* HyperTransport */
+#define  PCI_CAP_ID_VNDR        0x09    /* Vendor specific */
+#define  PCI_CAP_ID_DBG         0x0A    /* Debug port */
+#define  PCI_CAP_ID_CCRC        0x0B    /* CompactPCI Central Resource Control */
+#define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
+#define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
+#define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
+#define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
+#define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
+#define PCI_CAP_LIST_NEXT       1       /* Next capability in the list */
+#define PCI_CAP_FLAGS           2       /* Capability defined flags (16 bits) */
+#define PCI_CAP_SIZEOF          4
+
+/* Power Management Registers */
+
+#define PCI_PM_PMC              2       /* PM Capabilities Register */
+#define  PCI_PM_CAP_VER_MASK    0x0007  /* Version */
+#define  PCI_PM_CAP_PME_CLOCK   0x0008  /* PME clock required */
+#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
+#define  PCI_PM_CAP_DSI         0x0020  /* Device specific initialization */
+#define  PCI_PM_CAP_AUX_POWER   0x01C0  /* Auxilliary power support mask */
+#define  PCI_PM_CAP_D1          0x0200  /* D1 power state support */
+#define  PCI_PM_CAP_D2          0x0400  /* D2 power state support */
+#define  PCI_PM_CAP_PME         0x0800  /* PME pin supported */
+#define  PCI_PM_CAP_PME_MASK    0xF800  /* PME Mask of all supported states */
+#define  PCI_PM_CAP_PME_D0      0x0800  /* PME# from D0 */
+#define  PCI_PM_CAP_PME_D1      0x1000  /* PME# from D1 */
+#define  PCI_PM_CAP_PME_D2      0x2000  /* PME# from D2 */
+#define  PCI_PM_CAP_PME_D3      0x4000  /* PME# from D3 (hot) */
+#define  PCI_PM_CAP_PME_D3cold  0x8000  /* PME# from D3 (cold) */
+#define  PCI_PM_CAP_PME_SHIFT   11      /* Start of the PME Mask in PMC */
+#define PCI_PM_CTRL             4       /* PM control and status register */
+#define  PCI_PM_CTRL_STATE_MASK 0x0003  /* Current power state (D0 to D3) */
+#define  PCI_PM_CTRL_NO_SOFT_RESET      0x0008  /* No reset for D3hot->D0 */
+#define  PCI_PM_CTRL_PME_ENABLE 0x0100  /* PME pin enable */
+#define  PCI_PM_CTRL_DATA_SEL_MASK      0x1e00  /* Data select (??) */
+#define  PCI_PM_CTRL_DATA_SCALE_MASK    0x6000  /* Data scale (??) */
+#define  PCI_PM_CTRL_PME_STATUS 0x8000  /* PME pin status */
+#define PCI_PM_PPB_EXTENSIONS   6       /* PPB support extensions (??) */
+#define  PCI_PM_PPB_B2_B3       0x40    /* Stop clock when in D3hot (??) */
+#define  PCI_PM_BPCC_ENABLE     0x80    /* Bus power/clock control enable (??) */
+#define PCI_PM_DATA_REGISTER    7       /* (??) */
+#define PCI_PM_SIZEOF           8
+
+/* AGP registers */
+
+#define PCI_AGP_VERSION         2       /* BCD version number */
+#define PCI_AGP_RFU             3       /* Rest of capability flags */
+#define PCI_AGP_STATUS          4       /* Status register */
+#define  PCI_AGP_STATUS_RQ_MASK 0xff000000      /* Maximum number of requests - 1 */
+#define  PCI_AGP_STATUS_SBA     0x0200  /* Sideband addressing supported */
+#define  PCI_AGP_STATUS_64BIT   0x0020  /* 64-bit addressing supported */
+#define  PCI_AGP_STATUS_FW      0x0010  /* FW transfers supported */
+#define  PCI_AGP_STATUS_RATE4   0x0004  /* 4x transfer rate supported */
+#define  PCI_AGP_STATUS_RATE2   0x0002  /* 2x transfer rate supported */
+#define  PCI_AGP_STATUS_RATE1   0x0001  /* 1x transfer rate supported */
+#define PCI_AGP_COMMAND         8       /* Control register */
+#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
+#define  PCI_AGP_COMMAND_SBA    0x0200  /* Sideband addressing enabled */
+#define  PCI_AGP_COMMAND_AGP    0x0100  /* Allow processing of AGP transactions */
+#define  PCI_AGP_COMMAND_64BIT  0x0020  /* Allow processing of 64-bit addresses */
+#define  PCI_AGP_COMMAND_FW     0x0010  /* Force FW transfers */
+#define  PCI_AGP_COMMAND_RATE4  0x0004  /* Use 4x rate */
+#define  PCI_AGP_COMMAND_RATE2  0x0002  /* Use 2x rate */
+#define  PCI_AGP_COMMAND_RATE1  0x0001  /* Use 1x rate */
+#define PCI_AGP_SIZEOF          12
+
+/* Vital Product Data */
+
+#define PCI_VPD_ADDR            2       /* Address to access (15 bits!) */
+#define  PCI_VPD_ADDR_MASK      0x7fff  /* Address mask */
+#define  PCI_VPD_ADDR_F         0x8000  /* Write 0, 1 indicates completion */
+#define PCI_VPD_DATA            4       /* 32-bits of data returned here */
+
+/* Slot Identification */
+
+#define PCI_SID_ESR             2       /* Expansion Slot Register */
+#define  PCI_SID_ESR_NSLOTS     0x1f    /* Number of expansion slots available */
+#define  PCI_SID_ESR_FIC        0x20    /* First In Chassis Flag */
+#define PCI_SID_CHASSIS_NR      3       /* Chassis Number */
+
+/* Message Signalled Interrupts registers */
+
+#define PCI_MSI_FLAGS           2       /* Various flags */
+#define  PCI_MSI_FLAGS_64BIT    0x80    /* 64-bit addresses allowed */
+#define  PCI_MSI_FLAGS_QSIZE    0x70    /* Message queue size configured */
+#define  PCI_MSI_FLAGS_QMASK    0x0e    /* Maximum queue size available */
+#define  PCI_MSI_FLAGS_ENABLE   0x01    /* MSI feature enabled */
+#define  PCI_MSI_FLAGS_MASKBIT  0x100   /* 64-bit mask bits allowed */
+#define PCI_MSI_RFU             3       /* Rest of capability flags */
+#define PCI_MSI_ADDRESS_LO      4       /* Lower 32 bits */
+#define PCI_MSI_ADDRESS_HI      8       /* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
+#define PCI_MSI_DATA_32         8       /* 16 bits of data for 32-bit devices */
+#define PCI_MSI_MASK_32         12      /* Mask bits register for 32-bit devices */
+#define PCI_MSI_DATA_64         12      /* 16 bits of data for 64-bit devices */
+#define PCI_MSI_MASK_64         16      /* Mask bits register for 64-bit devices */
+
+/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
+#define PCI_MSIX_FLAGS          2
+#define  PCI_MSIX_FLAGS_QSIZE   0x7FF
+#define  PCI_MSIX_FLAGS_ENABLE  (1 << 15)
+#define  PCI_MSIX_FLAGS_MASKALL (1 << 14)
+#define PCI_MSIX_FLAGS_BIRMASK  (7 << 0)
+
+/* CompactPCI Hotswap Register */
+
+#define PCI_CHSWP_CSR           2       /* Control and Status Register */
+#define  PCI_CHSWP_DHA          0x01    /* Device Hiding Arm */
+#define  PCI_CHSWP_EIM          0x02    /* ENUM# Signal Mask */
+#define  PCI_CHSWP_PIE          0x04    /* Pending Insert or Extract */
+#define  PCI_CHSWP_LOO          0x08    /* LED On / Off */
+#define  PCI_CHSWP_PI           0x30    /* Programming Interface */
+#define  PCI_CHSWP_EXT          0x40    /* ENUM# status - extraction */
+#define  PCI_CHSWP_INS          0x80    /* ENUM# status - insertion */
+
+/* PCI Advanced Feature registers */
+
+#define PCI_AF_LENGTH           2
+#define PCI_AF_CAP              3
+#define  PCI_AF_CAP_TP          0x01
+#define  PCI_AF_CAP_FLR         0x02
+#define PCI_AF_CTRL             4
+#define  PCI_AF_CTRL_FLR        0x01
+#define PCI_AF_STATUS           5
+#define  PCI_AF_STATUS_TP       0x01
+
+/* PCI-X registers */
+
+#define PCI_X_CMD               2       /* Modes & Features */
+#define  PCI_X_CMD_DPERR_E      0x0001  /* Data Parity Error Recovery Enable */
+#define  PCI_X_CMD_ERO          0x0002  /* Enable Relaxed Ordering */
+#define  PCI_X_CMD_READ_512     0x0000  /* 512 byte maximum read byte count */
+#define  PCI_X_CMD_READ_1K      0x0004  /* 1Kbyte maximum read byte count */
+#define  PCI_X_CMD_READ_2K      0x0008  /* 2Kbyte maximum read byte count */
+#define  PCI_X_CMD_READ_4K      0x000c  /* 4Kbyte maximum read byte count */
+#define  PCI_X_CMD_MAX_READ     0x000c  /* Max Memory Read Byte Count */
+                                /* Max # of outstanding split transactions */
+#define  PCI_X_CMD_SPLIT_1      0x0000  /* Max 1 */
+#define  PCI_X_CMD_SPLIT_2      0x0010  /* Max 2 */
+#define  PCI_X_CMD_SPLIT_3      0x0020  /* Max 3 */
+#define  PCI_X_CMD_SPLIT_4      0x0030  /* Max 4 */
+#define  PCI_X_CMD_SPLIT_8      0x0040  /* Max 8 */
+#define  PCI_X_CMD_SPLIT_12     0x0050  /* Max 12 */
+#define  PCI_X_CMD_SPLIT_16     0x0060  /* Max 16 */
+#define  PCI_X_CMD_SPLIT_32     0x0070  /* Max 32 */
+#define  PCI_X_CMD_MAX_SPLIT    0x0070  /* Max Outstanding Split Transactions */
+#define  PCI_X_CMD_VERSION(x)   (((x) >> 12) & 3) /* Version */
+#define PCI_X_STATUS            4       /* PCI-X capabilities */
+#define  PCI_X_STATUS_DEVFN     0x000000ff      /* A copy of devfn */
+#define  PCI_X_STATUS_BUS       0x0000ff00      /* A copy of bus nr */
+#define  PCI_X_STATUS_64BIT     0x00010000      /* 64-bit device */
+#define  PCI_X_STATUS_133MHZ    0x00020000      /* 133 MHz capable */
+#define  PCI_X_STATUS_SPL_DISC  0x00040000      /* Split Completion Discarded */
+#define  PCI_X_STATUS_UNX_SPL   0x00080000      /* Unexpected Split Completion */
+#define  PCI_X_STATUS_COMPLEX   0x00100000      /* Device Complexity */
+#define  PCI_X_STATUS_MAX_READ  0x00600000      /* Designed Max Memory Read Count */
+#define  PCI_X_STATUS_MAX_SPLIT 0x03800000      /* Designed Max Outstanding Split Transactions */
+#define  PCI_X_STATUS_MAX_CUM   0x1c000000      /* Designed Max Cumulative Read Size */
+#define  PCI_X_STATUS_SPL_ERR   0x20000000      /* Rcvd Split Completion Error Msg */
+#define  PCI_X_STATUS_266MHZ    0x40000000      /* 266 MHz capable */
+#define  PCI_X_STATUS_533MHZ    0x80000000      /* 533 MHz capable */
+
+/* PCI Express capability registers */
+
+#define PCI_EXP_FLAGS           2       /* Capabilities register */
+#define PCI_EXP_FLAGS_VERS      0x000f  /* Capability version */
+#define PCI_EXP_FLAGS_TYPE      0x00f0  /* Device/Port type */
+#define  PCI_EXP_TYPE_ENDPOINT  0x0     /* Express Endpoint */
+#define  PCI_EXP_TYPE_LEG_END   0x1     /* Legacy Endpoint */
+#define  PCI_EXP_TYPE_ROOT_PORT 0x4     /* Root Port */
+#define  PCI_EXP_TYPE_UPSTREAM  0x5     /* Upstream Port */
+#define  PCI_EXP_TYPE_DOWNSTREAM 0x6    /* Downstream Port */
+#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7    /* PCI/PCI-X Bridge */
+#define  PCI_EXP_TYPE_RC_END    0x9     /* Root Complex Integrated Endpoint */
+#define  PCI_EXP_TYPE_RC_EC     0x10    /* Root Complex Event Collector */
+#define PCI_EXP_FLAGS_SLOT      0x0100  /* Slot implemented */
+#define PCI_EXP_FLAGS_IRQ       0x3e00  /* Interrupt message number */
+#define PCI_EXP_DEVCAP          4       /* Device capabilities */
+#define  PCI_EXP_DEVCAP_PAYLOAD 0x07    /* Max_Payload_Size */
+#define  PCI_EXP_DEVCAP_PHANTOM 0x18    /* Phantom functions */
+#define  PCI_EXP_DEVCAP_EXT_TAG 0x20    /* Extended tags */
+#define  PCI_EXP_DEVCAP_L0S     0x1c0   /* L0s Acceptable Latency */
+#define  PCI_EXP_DEVCAP_L1      0xe00   /* L1 Acceptable Latency */
+#define  PCI_EXP_DEVCAP_ATN_BUT 0x1000  /* Attention Button Present */
+#define  PCI_EXP_DEVCAP_ATN_IND 0x2000  /* Attention Indicator Present */
+#define  PCI_EXP_DEVCAP_PWR_IND 0x4000  /* Power Indicator Present */
+#define  PCI_EXP_DEVCAP_RBER    0x8000  /* Role-Based Error Reporting */
+#define  PCI_EXP_DEVCAP_PWR_VAL 0x3fc0000 /* Slot Power Limit Value */
+#define  PCI_EXP_DEVCAP_PWR_SCL 0xc000000 /* Slot Power Limit Scale */
+#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
+#define PCI_EXP_DEVCTL          8       /* Device Control */
+#define  PCI_EXP_DEVCTL_CERE    0x0001  /* Correctable Error Reporting En. */
+#define  PCI_EXP_DEVCTL_NFERE   0x0002  /* Non-Fatal Error Reporting Enable */
+#define  PCI_EXP_DEVCTL_FERE    0x0004  /* Fatal Error Reporting Enable */
+#define  PCI_EXP_DEVCTL_URRE    0x0008  /* Unsupported Request Reporting En. */
+#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
+#define  PCI_EXP_DEVCTL_PAYLOAD 0x00e0  /* Max_Payload_Size */
+#define  PCI_EXP_DEVCTL_EXT_TAG 0x0100  /* Extended Tag Field Enable */
+#define  PCI_EXP_DEVCTL_PHANTOM 0x0200  /* Phantom Functions Enable */
+#define  PCI_EXP_DEVCTL_AUX_PME 0x0400  /* Auxiliary Power PM Enable */
+#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
+#define  PCI_EXP_DEVCTL_READRQ  0x7000  /* Max_Read_Request_Size */
+#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
+#define PCI_EXP_DEVSTA          10      /* Device Status */
+#define  PCI_EXP_DEVSTA_CED     0x01    /* Correctable Error Detected */
+#define  PCI_EXP_DEVSTA_NFED    0x02    /* Non-Fatal Error Detected */
+#define  PCI_EXP_DEVSTA_FED     0x04    /* Fatal Error Detected */
+#define  PCI_EXP_DEVSTA_URD     0x08    /* Unsupported Request Detected */
+#define  PCI_EXP_DEVSTA_AUXPD   0x10    /* AUX Power Detected */
+#define  PCI_EXP_DEVSTA_TRPND   0x20    /* Transactions Pending */
+#define PCI_EXP_LNKCAP          12      /* Link Capabilities */
+#define  PCI_EXP_LNKCAP_SLS     0x0000000f /* Supported Link Speeds */
+#define  PCI_EXP_LNKCAP_MLW     0x000003f0 /* Maximum Link Width */
+#define  PCI_EXP_LNKCAP_ASPMS   0x00000c00 /* ASPM Support */
+#define  PCI_EXP_LNKCAP_L0SEL   0x00007000 /* L0s Exit Latency */
+#define  PCI_EXP_LNKCAP_L1EL    0x00038000 /* L1 Exit Latency */
+#define  PCI_EXP_LNKCAP_CLKPM   0x00040000 /* L1 Clock Power Management */
+#define  PCI_EXP_LNKCAP_SDERC   0x00080000 /* Suprise Down Error Reporting Capable */
+#define  PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */
+#define  PCI_EXP_LNKCAP_LBNC    0x00200000 /* Link Bandwidth Notification Capability */
+#define  PCI_EXP_LNKCAP_PN      0xff000000 /* Port Number */
+#define PCI_EXP_LNKCTL          16      /* Link Control */
+#define  PCI_EXP_LNKCTL_ASPMC   0x0003  /* ASPM Control */
+#define  PCI_EXP_LNKCTL_RCB     0x0008  /* Read Completion Boundary */
+#define  PCI_EXP_LNKCTL_LD      0x0010  /* Link Disable */
+#define  PCI_EXP_LNKCTL_RL      0x0020  /* Retrain Link */
+#define  PCI_EXP_LNKCTL_CCC     0x0040  /* Common Clock Configuration */
+#define  PCI_EXP_LNKCTL_ES      0x0080  /* Extended Synch */
+#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100 /* Enable clkreq */
+#define  PCI_EXP_LNKCTL_HAWD    0x0200  /* Hardware Autonomous Width Disable */
+#define  PCI_EXP_LNKCTL_LBMIE   0x0400  /* Link Bandwidth Management Interrupt Enable */
+#define  PCI_EXP_LNKCTL_LABIE   0x0800  /* Lnk Autonomous Bandwidth Interrupt Enable */
+#define PCI_EXP_LNKSTA          18      /* Link Status */
+#define  PCI_EXP_LNKSTA_CLS     0x000f  /* Current Link Speed */
+#define  PCI_EXP_LNKSTA_NLW     0x03f0  /* Nogotiated Link Width */
+#define  PCI_EXP_LNKSTA_LT      0x0800  /* Link Training */
+#define  PCI_EXP_LNKSTA_SLC     0x1000  /* Slot Clock Configuration */
+#define  PCI_EXP_LNKSTA_DLLLA   0x2000  /* Data Link Layer Link Active */
+#define  PCI_EXP_LNKSTA_LBMS    0x4000  /* Link Bandwidth Management Status */
+#define  PCI_EXP_LNKSTA_LABS    0x8000  /* Link Autonomous Bandwidth Status */
+#define PCI_EXP_SLTCAP          20      /* Slot Capabilities */
+#define  PCI_EXP_SLTCAP_ABP     0x00000001 /* Attention Button Present */
+#define  PCI_EXP_SLTCAP_PCP     0x00000002 /* Power Controller Present */
+#define  PCI_EXP_SLTCAP_MRLSP   0x00000004 /* MRL Sensor Present */
+#define  PCI_EXP_SLTCAP_AIP     0x00000008 /* Attention Indicator Present */
+#define  PCI_EXP_SLTCAP_PIP     0x00000010 /* Power Indicator Present */
+#define  PCI_EXP_SLTCAP_HPS     0x00000020 /* Hot-Plug Surprise */
+#define  PCI_EXP_SLTCAP_HPC     0x00000040 /* Hot-Plug Capable */
+#define  PCI_EXP_SLTCAP_SPLV    0x00007f80 /* Slot Power Limit Value */
+#define  PCI_EXP_SLTCAP_SPLS    0x00018000 /* Slot Power Limit Scale */
+#define  PCI_EXP_SLTCAP_EIP     0x00020000 /* Electromechanical Interlock Present */
+#define  PCI_EXP_SLTCAP_NCCS    0x00040000 /* No Command Completed Support */
+#define  PCI_EXP_SLTCAP_PSN     0xfff80000 /* Physical Slot Number */
+#define PCI_EXP_SLTCTL          24      /* Slot Control */
+#define  PCI_EXP_SLTCTL_ABPE    0x0001  /* Attention Button Pressed Enable */
+#define  PCI_EXP_SLTCTL_PFDE    0x0002  /* Power Fault Detected Enable */
+#define  PCI_EXP_SLTCTL_MRLSCE  0x0004  /* MRL Sensor Changed Enable */
+#define  PCI_EXP_SLTCTL_PDCE    0x0008  /* Presence Detect Changed Enable */
+#define  PCI_EXP_SLTCTL_CCIE    0x0010  /* Command Completed Interrupt Enable */
+#define  PCI_EXP_SLTCTL_HPIE    0x0020  /* Hot-Plug Interrupt Enable */
+#define  PCI_EXP_SLTCTL_AIC     0x00c0  /* Attention Indicator Control */
+#define  PCI_EXP_SLTCTL_PIC     0x0300  /* Power Indicator Control */
+#define  PCI_EXP_SLTCTL_PCC     0x0400  /* Power Controller Control */
+#define  PCI_EXP_SLTCTL_EIC     0x0800  /* Electromechanical Interlock Control */
+#define  PCI_EXP_SLTCTL_DLLSCE  0x1000  /* Data Link Layer State Changed Enable */
+#define PCI_EXP_SLTSTA          26      /* Slot Status */
+#define  PCI_EXP_SLTSTA_ABP     0x0001  /* Attention Button Pressed */
+#define  PCI_EXP_SLTSTA_PFD     0x0002  /* Power Fault Detected */
+#define  PCI_EXP_SLTSTA_MRLSC   0x0004  /* MRL Sensor Changed */
+#define  PCI_EXP_SLTSTA_PDC     0x0008  /* Presence Detect Changed */
+#define  PCI_EXP_SLTSTA_CC      0x0010  /* Command Completed */
+#define  PCI_EXP_SLTSTA_MRLSS   0x0020  /* MRL Sensor State */
+#define  PCI_EXP_SLTSTA_PDS     0x0040  /* Presence Detect State */
+#define  PCI_EXP_SLTSTA_EIS     0x0080  /* Electromechanical Interlock Status */
+#define  PCI_EXP_SLTSTA_DLLSC   0x0100  /* Data Link Layer State Changed */
+#define PCI_EXP_RTCTL           28      /* Root Control */
+#define  PCI_EXP_RTCTL_SECEE    0x01    /* System Error on Correctable Error */
+#define  PCI_EXP_RTCTL_SENFEE   0x02    /* System Error on Non-Fatal Error */
+#define  PCI_EXP_RTCTL_SEFEE    0x04    /* System Error on Fatal Error */
+#define  PCI_EXP_RTCTL_PMEIE    0x08    /* PME Interrupt Enable */
+#define  PCI_EXP_RTCTL_CRSSVE   0x10    /* CRS Software Visibility Enable */
+#define PCI_EXP_RTCAP           30      /* Root Capabilities */
+#define PCI_EXP_RTSTA           32      /* Root Status */
+#define PCI_EXP_DEVCAP2         36      /* Device Capabilities 2 */
+#define  PCI_EXP_DEVCAP2_ARI    0x20    /* Alternative Routing-ID */
+#define PCI_EXP_DEVCTL2         40      /* Device Control 2 */
+#define  PCI_EXP_DEVCTL2_ARI    0x20    /* Alternative Routing-ID */
+#define PCI_EXP_LNKCTL2         48      /* Link Control 2 */
+#define PCI_EXP_SLTCTL2         56      /* Slot Control 2 */
+
+/* Extended Capabilities (PCI-X 2.0 and Express) */
+#define PCI_EXT_CAP_ID(header)          (header & 0x0000ffff)
+#define PCI_EXT_CAP_VER(header)         ((header >> 16) & 0xf)
+#define PCI_EXT_CAP_NEXT(header)        ((header >> 20) & 0xffc)
+
+#define PCI_EXT_CAP_ID_ERR      1
+#define PCI_EXT_CAP_ID_VC       2
+#define PCI_EXT_CAP_ID_DSN      3
+#define PCI_EXT_CAP_ID_PWR      4
+#define PCI_EXT_CAP_ID_ARI      14
+#define PCI_EXT_CAP_ID_ATS      15
+#define PCI_EXT_CAP_ID_SRIOV    16
+
+/* Advanced Error Reporting */
+#define PCI_ERR_UNCOR_STATUS    4       /* Uncorrectable Error Status */
+#define  PCI_ERR_UNC_TRAIN      0x00000001      /* Training */
+#define  PCI_ERR_UNC_DLP        0x00000010      /* Data Link Protocol */
+#define  PCI_ERR_UNC_POISON_TLP 0x00001000      /* Poisoned TLP */
+#define  PCI_ERR_UNC_FCP        0x00002000      /* Flow Control Protocol */
+#define  PCI_ERR_UNC_COMP_TIME  0x00004000      /* Completion Timeout */
+#define  PCI_ERR_UNC_COMP_ABORT 0x00008000      /* Completer Abort */
+#define  PCI_ERR_UNC_UNX_COMP   0x00010000      /* Unexpected Completion */
+#define  PCI_ERR_UNC_RX_OVER    0x00020000      /* Receiver Overflow */
+#define  PCI_ERR_UNC_MALF_TLP   0x00040000      /* Malformed TLP */
+#define  PCI_ERR_UNC_ECRC       0x00080000      /* ECRC Error Status */
+#define  PCI_ERR_UNC_UNSUP      0x00100000      /* Unsupported Request */
+#define PCI_ERR_UNCOR_MASK      8       /* Uncorrectable Error Mask */
+        /* Same bits as above */
+#define PCI_ERR_UNCOR_SEVER     12      /* Uncorrectable Error Severity */
+        /* Same bits as above */
+#define PCI_ERR_COR_STATUS      16      /* Correctable Error Status */
+#define  PCI_ERR_COR_RCVR       0x00000001      /* Receiver Error Status */
+#define  PCI_ERR_COR_BAD_TLP    0x00000040      /* Bad TLP Status */
+#define  PCI_ERR_COR_BAD_DLLP   0x00000080      /* Bad DLLP Status */
+#define  PCI_ERR_COR_REP_ROLL   0x00000100      /* REPLAY_NUM Rollover */
+#define  PCI_ERR_COR_REP_TIMER  0x00001000      /* Replay Timer Timeout */
+#define PCI_ERR_COR_MASK        20      /* Correctable Error Mask */
+        /* Same bits as above */
+#define PCI_ERR_CAP             24      /* Advanced Error Capabilities */
+#define  PCI_ERR_CAP_FEP(x)     ((x) & 31)      /* First Error Pointer */
+#define  PCI_ERR_CAP_ECRC_GENC  0x00000020      /* ECRC Generation Capable */
+#define  PCI_ERR_CAP_ECRC_GENE  0x00000040      /* ECRC Generation Enable */
+#define  PCI_ERR_CAP_ECRC_CHKC  0x00000080      /* ECRC Check Capable */
+#define  PCI_ERR_CAP_ECRC_CHKE  0x00000100      /* ECRC Check Enable */
+#define PCI_ERR_HEADER_LOG      28      /* Header Log Register (16 bytes) */
+#define PCI_ERR_ROOT_COMMAND    44      /* Root Error Command */
+/* Correctable Err Reporting Enable */
+#define PCI_ERR_ROOT_CMD_COR_EN         0x00000001
+/* Non-fatal Err Reporting Enable */
+#define PCI_ERR_ROOT_CMD_NONFATAL_EN    0x00000002
+/* Fatal Err Reporting Enable */
+#define PCI_ERR_ROOT_CMD_FATAL_EN       0x00000004
+#define PCI_ERR_ROOT_STATUS     48
+#define PCI_ERR_ROOT_COR_RCV            0x00000001      /* ERR_COR Received */
+/* Multi ERR_COR Received */
+#define PCI_ERR_ROOT_MULTI_COR_RCV      0x00000002
+/* ERR_FATAL/NONFATAL Recevied */
+#define PCI_ERR_ROOT_UNCOR_RCV          0x00000004
+/* Multi ERR_FATAL/NONFATAL Recevied */
+#define PCI_ERR_ROOT_MULTI_UNCOR_RCV    0x00000008
+#define PCI_ERR_ROOT_FIRST_FATAL        0x00000010      /* First Fatal */
+#define PCI_ERR_ROOT_NONFATAL_RCV       0x00000020      /* Non-Fatal Received */
+#define PCI_ERR_ROOT_FATAL_RCV          0x00000040      /* Fatal Received */
+#define PCI_ERR_ROOT_COR_SRC    52
+#define PCI_ERR_ROOT_SRC        54
+
+/* Virtual Channel */
+#define PCI_VC_PORT_REG1        4
+#define PCI_VC_PORT_REG2        8
+#define PCI_VC_PORT_CTRL        12
+#define PCI_VC_PORT_STATUS      14
+#define PCI_VC_RES_CAP          16
+#define PCI_VC_RES_CTRL         20
+#define PCI_VC_RES_STATUS       26
+
+/* Power Budgeting */
+#define PCI_PWR_DSR             4       /* Data Select Register */
+#define PCI_PWR_DATA            8       /* Data Register */
+#define  PCI_PWR_DATA_BASE(x)   ((x) & 0xff)        /* Base Power */
+#define  PCI_PWR_DATA_SCALE(x)  (((x) >> 8) & 3)    /* Data Scale */
+#define  PCI_PWR_DATA_PM_SUB(x) (((x) >> 10) & 7)   /* PM Sub State */
+#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
+#define  PCI_PWR_DATA_TYPE(x)   (((x) >> 15) & 7)   /* Type */
+#define  PCI_PWR_DATA_RAIL(x)   (((x) >> 18) & 7)   /* Power Rail */
+#define PCI_PWR_CAP             12      /* Capability */
+#define  PCI_PWR_CAP_BUDGET(x)  ((x) & 1)       /* Included in system budget */
+
+/*
+ * Hypertransport sub capability types
+ *
+ * Unfortunately there are both 3 bit and 5 bit capability types defined
+ * in the HT spec, catering for that is a little messy. You probably don't
+ * want to use these directly, just use pci_find_ht_capability() and it
+ * will do the right thing for you.
+ */
+#define HT_3BIT_CAP_MASK        0xE0
+#define HT_CAPTYPE_SLAVE        0x00    /* Slave/Primary link configuration */
+#define HT_CAPTYPE_HOST         0x20    /* Host/Secondary link configuration */
+
+#define HT_5BIT_CAP_MASK        0xF8
+#define HT_CAPTYPE_IRQ          0x80    /* IRQ Configuration */
+#define HT_CAPTYPE_REMAPPING_40 0xA0    /* 40 bit address remapping */
+#define HT_CAPTYPE_REMAPPING_64 0xA2    /* 64 bit address remapping */
+#define HT_CAPTYPE_UNITID_CLUMP 0x90    /* Unit ID clumping */
+#define HT_CAPTYPE_EXTCONF      0x98    /* Extended Configuration Space Access */
+#define HT_CAPTYPE_MSI_MAPPING  0xA8    /* MSI Mapping Capability */
+#define  HT_MSI_FLAGS           0x02            /* Offset to flags */
+#define  HT_MSI_FLAGS_ENABLE    0x1             /* Mapping enable */
+#define  HT_MSI_FLAGS_FIXED     0x2             /* Fixed mapping only */
+#define  HT_MSI_FIXED_ADDR      0x00000000FEE00000ULL   /* Fixed addr */
+#define  HT_MSI_ADDR_LO         0x04            /* Offset to low addr bits */
+#define  HT_MSI_ADDR_LO_MASK    0xFFF00000      /* Low address bit mask */
+#define  HT_MSI_ADDR_HI         0x08            /* Offset to high addr bits */
+#define HT_CAPTYPE_DIRECT_ROUTE 0xB0    /* Direct routing configuration */
+#define HT_CAPTYPE_VCSET        0xB8    /* Virtual Channel configuration */
+#define HT_CAPTYPE_ERROR_RETRY  0xC0    /* Retry on error configuration */
+#define HT_CAPTYPE_GEN3         0xD0    /* Generation 3 hypertransport configuration */
+#define HT_CAPTYPE_PM           0xE0    /* Hypertransport powermanagement configuration */
+
+/* Alternative Routing-ID Interpretation */
+#define PCI_ARI_CAP             0x04    /* ARI Capability Register */
+#define  PCI_ARI_CAP_MFVC       0x0001  /* MFVC Function Groups Capability */
+#define  PCI_ARI_CAP_ACS        0x0002  /* ACS Function Groups Capability */
+#define  PCI_ARI_CAP_NFN(x)     (((x) >> 8) & 0xff) /* Next Function Number */
+#define PCI_ARI_CTRL            0x06    /* ARI Control Register */
+#define  PCI_ARI_CTRL_MFVC      0x0001  /* MFVC Function Groups Enable */
+#define  PCI_ARI_CTRL_ACS       0x0002  /* ACS Function Groups Enable */
+#define  PCI_ARI_CTRL_FG(x)     (((x) >> 4) & 7) /* Function Group */
+
+/* Address Translation Service */
+#define PCI_ATS_CAP             0x04    /* ATS Capability Register */
+#define  PCI_ATS_CAP_QDEP(x)    ((x) & 0x1f)    /* Invalidate Queue Depth */
+#define  PCI_ATS_MAX_QDEP       32      /* Max Invalidate Queue Depth */
+#define PCI_ATS_CTRL            0x06    /* ATS Control Register */
+#define  PCI_ATS_CTRL_ENABLE    0x8000  /* ATS Enable */
+#define  PCI_ATS_CTRL_STU(x)    ((x) & 0x1f)    /* Smallest Translation Unit */
+#define  PCI_ATS_MIN_STU        12      /* shift of minimum STU block */
+
+/* Single Root I/O Virtualization */
+#define PCI_SRIOV_CAP           0x04    /* SR-IOV Capabilities */
+#define  PCI_SRIOV_CAP_VFM      0x01    /* VF Migration Capable */
+#define  PCI_SRIOV_CAP_INTR(x)  ((x) >> 21) /* Interrupt Message Number */
+#define PCI_SRIOV_CTRL          0x08    /* SR-IOV Control */
+#define  PCI_SRIOV_CTRL_VFE     0x01    /* VF Enable */
+#define  PCI_SRIOV_CTRL_VFM     0x02    /* VF Migration Enable */
+#define  PCI_SRIOV_CTRL_INTR    0x04    /* VF Migration Interrupt Enable */
+#define  PCI_SRIOV_CTRL_MSE     0x08    /* VF Memory Space Enable */
+#define  PCI_SRIOV_CTRL_ARI     0x10    /* ARI Capable Hierarchy */
+#define PCI_SRIOV_STATUS        0x0a    /* SR-IOV Status */
+#define  PCI_SRIOV_STATUS_VFM   0x01    /* VF Migration Status */
+#define PCI_SRIOV_INITIAL_VF    0x0c    /* Initial VFs */
+#define PCI_SRIOV_TOTAL_VF      0x0e    /* Total VFs */
+#define PCI_SRIOV_NUM_VF        0x10    /* Number of VFs */
+#define PCI_SRIOV_FUNC_LINK     0x12    /* Function Dependency Link */
+#define PCI_SRIOV_VF_OFFSET     0x14    /* First VF Offset */
+#define PCI_SRIOV_VF_STRIDE     0x16    /* Following VF Stride */
+#define PCI_SRIOV_VF_DID        0x1a    /* VF Device ID */
+#define PCI_SRIOV_SUP_PGSIZE    0x1c    /* Supported Page Sizes */
+#define PCI_SRIOV_SYS_PGSIZE    0x20    /* System Page Size */
+#define PCI_SRIOV_BAR           0x24    /* VF BAR0 */
+#define  PCI_SRIOV_NUM_BARS     6       /* Number of VF BARs */
+#define PCI_SRIOV_VFM           0x3c    /* VF Migration State Array Offset*/
+#define  PCI_SRIOV_VFM_BIR(x)   ((x) & 7)       /* State BIR */
+#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)     /* State Offset */
+#define  PCI_SRIOV_VFM_UA       0x0     /* Inactive.Unavailable */
+#define  PCI_SRIOV_VFM_MI       0x1     /* Dormant.MigrateIn */
+#define  PCI_SRIOV_VFM_MO       0x2     /* Active.MigrateOut */
+#define  PCI_SRIOV_VFM_AV       0x3     /* Active.Available */
+
+#endif /* LINUX_PCI_REGS_H */
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

The conversion was done using the GNU 'expand' tool (default settings)
to make it obey the QEMU coding style.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
 1 files changed, 665 insertions(+), 665 deletions(-)
 rewrite hw/pci_regs.h (90%)

diff --git a/hw/pci_regs.h b/hw/pci_regs.h
dissimilarity index 90%
index dd0bed4..0f9f84c 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -1,665 +1,665 @@
-/*
- *	pci_regs.h
- *
- *	PCI standard defines
- *	Copyright 1994, Drew Eckhardt
- *	Copyright 1997--1999 Martin Mares <mj@ucw.cz>
- *
- *	For more information, please consult the following manuals (look at
- *	http://www.pcisig.com/ for how to get them):
- *
- *	PCI BIOS Specification
- *	PCI Local Bus Specification
- *	PCI to PCI Bridge Specification
- *	PCI System Design Guide
- *
- * 	For hypertransport information, please consult the following manuals
- * 	from http://www.hypertransport.org
- *
- *	The Hypertransport I/O Link Specification
- */
-
-#ifndef LINUX_PCI_REGS_H
-#define LINUX_PCI_REGS_H
-
-/*
- * Under PCI, each device has 256 bytes of configuration address space,
- * of which the first 64 bytes are standardized as follows:
- */
-#define PCI_VENDOR_ID		0x00	/* 16 bits */
-#define PCI_DEVICE_ID		0x02	/* 16 bits */
-#define PCI_COMMAND		0x04	/* 16 bits */
-#define  PCI_COMMAND_IO		0x1	/* Enable response in I/O space */
-#define  PCI_COMMAND_MEMORY	0x2	/* Enable response in Memory space */
-#define  PCI_COMMAND_MASTER	0x4	/* Enable bus mastering */
-#define  PCI_COMMAND_SPECIAL	0x8	/* Enable response to special cycles */
-#define  PCI_COMMAND_INVALIDATE	0x10	/* Use memory write and invalidate */
-#define  PCI_COMMAND_VGA_PALETTE 0x20	/* Enable palette snooping */
-#define  PCI_COMMAND_PARITY	0x40	/* Enable parity checking */
-#define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
-#define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
-#define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
-#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
-
-#define PCI_STATUS		0x06	/* 16 bits */
-#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
-#define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
-#define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
-#define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
-#define  PCI_STATUS_FAST_BACK	0x80	/* Accept fast-back to back */
-#define  PCI_STATUS_PARITY	0x100	/* Detected parity error */
-#define  PCI_STATUS_DEVSEL_MASK	0x600	/* DEVSEL timing */
-#define  PCI_STATUS_DEVSEL_FAST		0x000
-#define  PCI_STATUS_DEVSEL_MEDIUM	0x200
-#define  PCI_STATUS_DEVSEL_SLOW		0x400
-#define  PCI_STATUS_SIG_TARGET_ABORT	0x800 /* Set on target abort */
-#define  PCI_STATUS_REC_TARGET_ABORT	0x1000 /* Master ack of " */
-#define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
-#define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
-#define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
-
-#define PCI_CLASS_REVISION	0x08	/* High 24 bits are class, low 8 revision */
-#define PCI_REVISION_ID		0x08	/* Revision ID */
-#define PCI_CLASS_PROG		0x09	/* Reg. Level Programming Interface */
-#define PCI_CLASS_DEVICE	0x0a	/* Device class */
-
-#define PCI_CACHE_LINE_SIZE	0x0c	/* 8 bits */
-#define PCI_LATENCY_TIMER	0x0d	/* 8 bits */
-#define PCI_HEADER_TYPE		0x0e	/* 8 bits */
-#define  PCI_HEADER_TYPE_NORMAL		0
-#define  PCI_HEADER_TYPE_BRIDGE		1
-#define  PCI_HEADER_TYPE_CARDBUS	2
-
-#define PCI_BIST		0x0f	/* 8 bits */
-#define  PCI_BIST_CODE_MASK	0x0f	/* Return result */
-#define  PCI_BIST_START		0x40	/* 1 to start BIST, 2 secs or less */
-#define  PCI_BIST_CAPABLE	0x80	/* 1 if BIST capable */
-
-/*
- * Base addresses specify locations in memory or I/O space.
- * Decoded size can be determined by writing a value of
- * 0xffffffff to the register, and reading it back.  Only
- * 1 bits are decoded.
- */
-#define PCI_BASE_ADDRESS_0	0x10	/* 32 bits */
-#define PCI_BASE_ADDRESS_1	0x14	/* 32 bits [htype 0,1 only] */
-#define PCI_BASE_ADDRESS_2	0x18	/* 32 bits [htype 0 only] */
-#define PCI_BASE_ADDRESS_3	0x1c	/* 32 bits */
-#define PCI_BASE_ADDRESS_4	0x20	/* 32 bits */
-#define PCI_BASE_ADDRESS_5	0x24	/* 32 bits */
-#define  PCI_BASE_ADDRESS_SPACE		0x01	/* 0 = memory, 1 = I/O */
-#define  PCI_BASE_ADDRESS_SPACE_IO	0x01
-#define  PCI_BASE_ADDRESS_SPACE_MEMORY	0x00
-#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK	0x06
-#define  PCI_BASE_ADDRESS_MEM_TYPE_32	0x00	/* 32 bit address */
-#define  PCI_BASE_ADDRESS_MEM_TYPE_1M	0x02	/* Below 1M [obsolete] */
-#define  PCI_BASE_ADDRESS_MEM_TYPE_64	0x04	/* 64 bit address */
-#define  PCI_BASE_ADDRESS_MEM_PREFETCH	0x08	/* prefetchable? */
-#define  PCI_BASE_ADDRESS_MEM_MASK	(~0x0fUL)
-#define  PCI_BASE_ADDRESS_IO_MASK	(~0x03UL)
-/* bit 1 is reserved if address_space = 1 */
-
-/* Header type 0 (normal devices) */
-#define PCI_CARDBUS_CIS		0x28
-#define PCI_SUBSYSTEM_VENDOR_ID	0x2c
-#define PCI_SUBSYSTEM_ID	0x2e
-#define PCI_ROM_ADDRESS		0x30	/* Bits 31..11 are address, 10..1 reserved */
-#define  PCI_ROM_ADDRESS_ENABLE	0x01
-#define PCI_ROM_ADDRESS_MASK	(~0x7ffUL)
-
-#define PCI_CAPABILITY_LIST	0x34	/* Offset of first capability list entry */
-
-/* 0x35-0x3b are reserved */
-#define PCI_INTERRUPT_LINE	0x3c	/* 8 bits */
-#define PCI_INTERRUPT_PIN	0x3d	/* 8 bits */
-#define PCI_MIN_GNT		0x3e	/* 8 bits */
-#define PCI_MAX_LAT		0x3f	/* 8 bits */
-
-/* Header type 1 (PCI-to-PCI bridges) */
-#define PCI_PRIMARY_BUS		0x18	/* Primary bus number */
-#define PCI_SECONDARY_BUS	0x19	/* Secondary bus number */
-#define PCI_SUBORDINATE_BUS	0x1a	/* Highest bus number behind the bridge */
-#define PCI_SEC_LATENCY_TIMER	0x1b	/* Latency timer for secondary interface */
-#define PCI_IO_BASE		0x1c	/* I/O range behind the bridge */
-#define PCI_IO_LIMIT		0x1d
-#define  PCI_IO_RANGE_TYPE_MASK	0x0fUL	/* I/O bridging type */
-#define  PCI_IO_RANGE_TYPE_16	0x00
-#define  PCI_IO_RANGE_TYPE_32	0x01
-#define  PCI_IO_RANGE_MASK	(~0x0fUL)
-#define PCI_SEC_STATUS		0x1e	/* Secondary status register, only bit 14 used */
-#define PCI_MEMORY_BASE		0x20	/* Memory range behind */
-#define PCI_MEMORY_LIMIT	0x22
-#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
-#define  PCI_MEMORY_RANGE_MASK	(~0x0fUL)
-#define PCI_PREF_MEMORY_BASE	0x24	/* Prefetchable memory range behind */
-#define PCI_PREF_MEMORY_LIMIT	0x26
-#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
-#define  PCI_PREF_RANGE_TYPE_32	0x00
-#define  PCI_PREF_RANGE_TYPE_64	0x01
-#define  PCI_PREF_RANGE_MASK	(~0x0fUL)
-#define PCI_PREF_BASE_UPPER32	0x28	/* Upper half of prefetchable memory range */
-#define PCI_PREF_LIMIT_UPPER32	0x2c
-#define PCI_IO_BASE_UPPER16	0x30	/* Upper half of I/O addresses */
-#define PCI_IO_LIMIT_UPPER16	0x32
-/* 0x34 same as for htype 0 */
-/* 0x35-0x3b is reserved */
-#define PCI_ROM_ADDRESS1	0x38	/* Same as PCI_ROM_ADDRESS, but for htype 1 */
-/* 0x3c-0x3d are same as for htype 0 */
-#define PCI_BRIDGE_CONTROL	0x3e
-#define  PCI_BRIDGE_CTL_PARITY	0x01	/* Enable parity detection on secondary interface */
-#define  PCI_BRIDGE_CTL_SERR	0x02	/* The same for SERR forwarding */
-#define  PCI_BRIDGE_CTL_ISA	0x04	/* Enable ISA mode */
-#define  PCI_BRIDGE_CTL_VGA	0x08	/* Forward VGA addresses */
-#define  PCI_BRIDGE_CTL_MASTER_ABORT	0x20  /* Report master aborts */
-#define  PCI_BRIDGE_CTL_BUS_RESET	0x40	/* Secondary bus reset */
-#define  PCI_BRIDGE_CTL_FAST_BACK	0x80	/* Fast Back2Back enabled on secondary interface */
-
-/* Header type 2 (CardBus bridges) */
-#define PCI_CB_CAPABILITY_LIST	0x14
-/* 0x15 reserved */
-#define PCI_CB_SEC_STATUS	0x16	/* Secondary status */
-#define PCI_CB_PRIMARY_BUS	0x18	/* PCI bus number */
-#define PCI_CB_CARD_BUS		0x19	/* CardBus bus number */
-#define PCI_CB_SUBORDINATE_BUS	0x1a	/* Subordinate bus number */
-#define PCI_CB_LATENCY_TIMER	0x1b	/* CardBus latency timer */
-#define PCI_CB_MEMORY_BASE_0	0x1c
-#define PCI_CB_MEMORY_LIMIT_0	0x20
-#define PCI_CB_MEMORY_BASE_1	0x24
-#define PCI_CB_MEMORY_LIMIT_1	0x28
-#define PCI_CB_IO_BASE_0	0x2c
-#define PCI_CB_IO_BASE_0_HI	0x2e
-#define PCI_CB_IO_LIMIT_0	0x30
-#define PCI_CB_IO_LIMIT_0_HI	0x32
-#define PCI_CB_IO_BASE_1	0x34
-#define PCI_CB_IO_BASE_1_HI	0x36
-#define PCI_CB_IO_LIMIT_1	0x38
-#define PCI_CB_IO_LIMIT_1_HI	0x3a
-#define  PCI_CB_IO_RANGE_MASK	(~0x03UL)
-/* 0x3c-0x3d are same as for htype 0 */
-#define PCI_CB_BRIDGE_CONTROL	0x3e
-#define  PCI_CB_BRIDGE_CTL_PARITY	0x01	/* Similar to standard bridge control register */
-#define  PCI_CB_BRIDGE_CTL_SERR		0x02
-#define  PCI_CB_BRIDGE_CTL_ISA		0x04
-#define  PCI_CB_BRIDGE_CTL_VGA		0x08
-#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT	0x20
-#define  PCI_CB_BRIDGE_CTL_CB_RESET	0x40	/* CardBus reset */
-#define  PCI_CB_BRIDGE_CTL_16BIT_INT	0x80	/* Enable interrupt for 16-bit cards */
-#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100	/* Prefetch enable for both memory regions */
-#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
-#define  PCI_CB_BRIDGE_CTL_POST_WRITES	0x400
-#define PCI_CB_SUBSYSTEM_VENDOR_ID	0x40
-#define PCI_CB_SUBSYSTEM_ID		0x42
-#define PCI_CB_LEGACY_MODE_BASE		0x44	/* 16-bit PC Card legacy mode base address (ExCa) */
-/* 0x48-0x7f reserved */
-
-/* Capability lists */
-
-#define PCI_CAP_LIST_ID		0	/* Capability ID */
-#define  PCI_CAP_ID_PM		0x01	/* Power Management */
-#define  PCI_CAP_ID_AGP		0x02	/* Accelerated Graphics Port */
-#define  PCI_CAP_ID_VPD		0x03	/* Vital Product Data */
-#define  PCI_CAP_ID_SLOTID	0x04	/* Slot Identification */
-#define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
-#define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
-#define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
-#define  PCI_CAP_ID_HT		0x08	/* HyperTransport */
-#define  PCI_CAP_ID_VNDR	0x09	/* Vendor specific */
-#define  PCI_CAP_ID_DBG		0x0A	/* Debug port */
-#define  PCI_CAP_ID_CCRC	0x0B	/* CompactPCI Central Resource Control */
-#define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
-#define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
-#define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
-#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
-#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
-#define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-#define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
-#define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
-#define PCI_CAP_SIZEOF		4
-
-/* Power Management Registers */
-
-#define PCI_PM_PMC		2	/* PM Capabilities Register */
-#define  PCI_PM_CAP_VER_MASK	0x0007	/* Version */
-#define  PCI_PM_CAP_PME_CLOCK	0x0008	/* PME clock required */
-#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
-#define  PCI_PM_CAP_DSI		0x0020	/* Device specific initialization */
-#define  PCI_PM_CAP_AUX_POWER	0x01C0	/* Auxilliary power support mask */
-#define  PCI_PM_CAP_D1		0x0200	/* D1 power state support */
-#define  PCI_PM_CAP_D2		0x0400	/* D2 power state support */
-#define  PCI_PM_CAP_PME		0x0800	/* PME pin supported */
-#define  PCI_PM_CAP_PME_MASK	0xF800	/* PME Mask of all supported states */
-#define  PCI_PM_CAP_PME_D0	0x0800	/* PME# from D0 */
-#define  PCI_PM_CAP_PME_D1	0x1000	/* PME# from D1 */
-#define  PCI_PM_CAP_PME_D2	0x2000	/* PME# from D2 */
-#define  PCI_PM_CAP_PME_D3	0x4000	/* PME# from D3 (hot) */
-#define  PCI_PM_CAP_PME_D3cold	0x8000	/* PME# from D3 (cold) */
-#define  PCI_PM_CAP_PME_SHIFT	11	/* Start of the PME Mask in PMC */
-#define PCI_PM_CTRL		4	/* PM control and status register */
-#define  PCI_PM_CTRL_STATE_MASK	0x0003	/* Current power state (D0 to D3) */
-#define  PCI_PM_CTRL_NO_SOFT_RESET	0x0008	/* No reset for D3hot->D0 */
-#define  PCI_PM_CTRL_PME_ENABLE	0x0100	/* PME pin enable */
-#define  PCI_PM_CTRL_DATA_SEL_MASK	0x1e00	/* Data select (??) */
-#define  PCI_PM_CTRL_DATA_SCALE_MASK	0x6000	/* Data scale (??) */
-#define  PCI_PM_CTRL_PME_STATUS	0x8000	/* PME pin status */
-#define PCI_PM_PPB_EXTENSIONS	6	/* PPB support extensions (??) */
-#define  PCI_PM_PPB_B2_B3	0x40	/* Stop clock when in D3hot (??) */
-#define  PCI_PM_BPCC_ENABLE	0x80	/* Bus power/clock control enable (??) */
-#define PCI_PM_DATA_REGISTER	7	/* (??) */
-#define PCI_PM_SIZEOF		8
-
-/* AGP registers */
-
-#define PCI_AGP_VERSION		2	/* BCD version number */
-#define PCI_AGP_RFU		3	/* Rest of capability flags */
-#define PCI_AGP_STATUS		4	/* Status register */
-#define  PCI_AGP_STATUS_RQ_MASK	0xff000000	/* Maximum number of requests - 1 */
-#define  PCI_AGP_STATUS_SBA	0x0200	/* Sideband addressing supported */
-#define  PCI_AGP_STATUS_64BIT	0x0020	/* 64-bit addressing supported */
-#define  PCI_AGP_STATUS_FW	0x0010	/* FW transfers supported */
-#define  PCI_AGP_STATUS_RATE4	0x0004	/* 4x transfer rate supported */
-#define  PCI_AGP_STATUS_RATE2	0x0002	/* 2x transfer rate supported */
-#define  PCI_AGP_STATUS_RATE1	0x0001	/* 1x transfer rate supported */
-#define PCI_AGP_COMMAND		8	/* Control register */
-#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
-#define  PCI_AGP_COMMAND_SBA	0x0200	/* Sideband addressing enabled */
-#define  PCI_AGP_COMMAND_AGP	0x0100	/* Allow processing of AGP transactions */
-#define  PCI_AGP_COMMAND_64BIT	0x0020 	/* Allow processing of 64-bit addresses */
-#define  PCI_AGP_COMMAND_FW	0x0010 	/* Force FW transfers */
-#define  PCI_AGP_COMMAND_RATE4	0x0004	/* Use 4x rate */
-#define  PCI_AGP_COMMAND_RATE2	0x0002	/* Use 2x rate */
-#define  PCI_AGP_COMMAND_RATE1	0x0001	/* Use 1x rate */
-#define PCI_AGP_SIZEOF		12
-
-/* Vital Product Data */
-
-#define PCI_VPD_ADDR		2	/* Address to access (15 bits!) */
-#define  PCI_VPD_ADDR_MASK	0x7fff	/* Address mask */
-#define  PCI_VPD_ADDR_F		0x8000	/* Write 0, 1 indicates completion */
-#define PCI_VPD_DATA		4	/* 32-bits of data returned here */
-
-/* Slot Identification */
-
-#define PCI_SID_ESR		2	/* Expansion Slot Register */
-#define  PCI_SID_ESR_NSLOTS	0x1f	/* Number of expansion slots available */
-#define  PCI_SID_ESR_FIC	0x20	/* First In Chassis Flag */
-#define PCI_SID_CHASSIS_NR	3	/* Chassis Number */
-
-/* Message Signalled Interrupts registers */
-
-#define PCI_MSI_FLAGS		2	/* Various flags */
-#define  PCI_MSI_FLAGS_64BIT	0x80	/* 64-bit addresses allowed */
-#define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
-#define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
-#define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
-#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
-#define PCI_MSI_RFU		3	/* Rest of capability flags */
-#define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
-#define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
-#define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
-#define PCI_MSI_MASK_32		12	/* Mask bits register for 32-bit devices */
-#define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
-#define PCI_MSI_MASK_64		16	/* Mask bits register for 64-bit devices */
-
-/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
-#define PCI_MSIX_FLAGS		2
-#define  PCI_MSIX_FLAGS_QSIZE	0x7FF
-#define  PCI_MSIX_FLAGS_ENABLE	(1 << 15)
-#define  PCI_MSIX_FLAGS_MASKALL	(1 << 14)
-#define PCI_MSIX_FLAGS_BIRMASK	(7 << 0)
-
-/* CompactPCI Hotswap Register */
-
-#define PCI_CHSWP_CSR		2	/* Control and Status Register */
-#define  PCI_CHSWP_DHA		0x01	/* Device Hiding Arm */
-#define  PCI_CHSWP_EIM		0x02	/* ENUM# Signal Mask */
-#define  PCI_CHSWP_PIE		0x04	/* Pending Insert or Extract */
-#define  PCI_CHSWP_LOO		0x08	/* LED On / Off */
-#define  PCI_CHSWP_PI		0x30	/* Programming Interface */
-#define  PCI_CHSWP_EXT		0x40	/* ENUM# status - extraction */
-#define  PCI_CHSWP_INS		0x80	/* ENUM# status - insertion */
-
-/* PCI Advanced Feature registers */
-
-#define PCI_AF_LENGTH		2
-#define PCI_AF_CAP		3
-#define  PCI_AF_CAP_TP		0x01
-#define  PCI_AF_CAP_FLR		0x02
-#define PCI_AF_CTRL		4
-#define  PCI_AF_CTRL_FLR	0x01
-#define PCI_AF_STATUS		5
-#define  PCI_AF_STATUS_TP	0x01
-
-/* PCI-X registers */
-
-#define PCI_X_CMD		2	/* Modes & Features */
-#define  PCI_X_CMD_DPERR_E	0x0001	/* Data Parity Error Recovery Enable */
-#define  PCI_X_CMD_ERO		0x0002	/* Enable Relaxed Ordering */
-#define  PCI_X_CMD_READ_512	0x0000	/* 512 byte maximum read byte count */
-#define  PCI_X_CMD_READ_1K	0x0004	/* 1Kbyte maximum read byte count */
-#define  PCI_X_CMD_READ_2K	0x0008	/* 2Kbyte maximum read byte count */
-#define  PCI_X_CMD_READ_4K	0x000c	/* 4Kbyte maximum read byte count */
-#define  PCI_X_CMD_MAX_READ	0x000c	/* Max Memory Read Byte Count */
-				/* Max # of outstanding split transactions */
-#define  PCI_X_CMD_SPLIT_1	0x0000	/* Max 1 */
-#define  PCI_X_CMD_SPLIT_2	0x0010	/* Max 2 */
-#define  PCI_X_CMD_SPLIT_3	0x0020	/* Max 3 */
-#define  PCI_X_CMD_SPLIT_4	0x0030	/* Max 4 */
-#define  PCI_X_CMD_SPLIT_8	0x0040	/* Max 8 */
-#define  PCI_X_CMD_SPLIT_12	0x0050	/* Max 12 */
-#define  PCI_X_CMD_SPLIT_16	0x0060	/* Max 16 */
-#define  PCI_X_CMD_SPLIT_32	0x0070	/* Max 32 */
-#define  PCI_X_CMD_MAX_SPLIT	0x0070	/* Max Outstanding Split Transactions */
-#define  PCI_X_CMD_VERSION(x) 	(((x) >> 12) & 3) /* Version */
-#define PCI_X_STATUS		4	/* PCI-X capabilities */
-#define  PCI_X_STATUS_DEVFN	0x000000ff	/* A copy of devfn */
-#define  PCI_X_STATUS_BUS	0x0000ff00	/* A copy of bus nr */
-#define  PCI_X_STATUS_64BIT	0x00010000	/* 64-bit device */
-#define  PCI_X_STATUS_133MHZ	0x00020000	/* 133 MHz capable */
-#define  PCI_X_STATUS_SPL_DISC	0x00040000	/* Split Completion Discarded */
-#define  PCI_X_STATUS_UNX_SPL	0x00080000	/* Unexpected Split Completion */
-#define  PCI_X_STATUS_COMPLEX	0x00100000	/* Device Complexity */
-#define  PCI_X_STATUS_MAX_READ	0x00600000	/* Designed Max Memory Read Count */
-#define  PCI_X_STATUS_MAX_SPLIT	0x03800000	/* Designed Max Outstanding Split Transactions */
-#define  PCI_X_STATUS_MAX_CUM	0x1c000000	/* Designed Max Cumulative Read Size */
-#define  PCI_X_STATUS_SPL_ERR	0x20000000	/* Rcvd Split Completion Error Msg */
-#define  PCI_X_STATUS_266MHZ	0x40000000	/* 266 MHz capable */
-#define  PCI_X_STATUS_533MHZ	0x80000000	/* 533 MHz capable */
-
-/* PCI Express capability registers */
-
-#define PCI_EXP_FLAGS		2	/* Capabilities register */
-#define PCI_EXP_FLAGS_VERS	0x000f	/* Capability version */
-#define PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
-#define  PCI_EXP_TYPE_ENDPOINT	0x0	/* Express Endpoint */
-#define  PCI_EXP_TYPE_LEG_END	0x1	/* Legacy Endpoint */
-#define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
-#define  PCI_EXP_TYPE_UPSTREAM	0x5	/* Upstream Port */
-#define  PCI_EXP_TYPE_DOWNSTREAM 0x6	/* Downstream Port */
-#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7	/* PCI/PCI-X Bridge */
-#define  PCI_EXP_TYPE_RC_END	0x9	/* Root Complex Integrated Endpoint */
-#define  PCI_EXP_TYPE_RC_EC	0x10	/* Root Complex Event Collector */
-#define PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
-#define PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
-#define PCI_EXP_DEVCAP		4	/* Device capabilities */
-#define  PCI_EXP_DEVCAP_PAYLOAD	0x07	/* Max_Payload_Size */
-#define  PCI_EXP_DEVCAP_PHANTOM	0x18	/* Phantom functions */
-#define  PCI_EXP_DEVCAP_EXT_TAG	0x20	/* Extended tags */
-#define  PCI_EXP_DEVCAP_L0S	0x1c0	/* L0s Acceptable Latency */
-#define  PCI_EXP_DEVCAP_L1	0xe00	/* L1 Acceptable Latency */
-#define  PCI_EXP_DEVCAP_ATN_BUT	0x1000	/* Attention Button Present */
-#define  PCI_EXP_DEVCAP_ATN_IND	0x2000	/* Attention Indicator Present */
-#define  PCI_EXP_DEVCAP_PWR_IND	0x4000	/* Power Indicator Present */
-#define  PCI_EXP_DEVCAP_RBER	0x8000	/* Role-Based Error Reporting */
-#define  PCI_EXP_DEVCAP_PWR_VAL	0x3fc0000 /* Slot Power Limit Value */
-#define  PCI_EXP_DEVCAP_PWR_SCL	0xc000000 /* Slot Power Limit Scale */
-#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
-#define PCI_EXP_DEVCTL		8	/* Device Control */
-#define  PCI_EXP_DEVCTL_CERE	0x0001	/* Correctable Error Reporting En. */
-#define  PCI_EXP_DEVCTL_NFERE	0x0002	/* Non-Fatal Error Reporting Enable */
-#define  PCI_EXP_DEVCTL_FERE	0x0004	/* Fatal Error Reporting Enable */
-#define  PCI_EXP_DEVCTL_URRE	0x0008	/* Unsupported Request Reporting En. */
-#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
-#define  PCI_EXP_DEVCTL_PAYLOAD	0x00e0	/* Max_Payload_Size */
-#define  PCI_EXP_DEVCTL_EXT_TAG	0x0100	/* Extended Tag Field Enable */
-#define  PCI_EXP_DEVCTL_PHANTOM	0x0200	/* Phantom Functions Enable */
-#define  PCI_EXP_DEVCTL_AUX_PME	0x0400	/* Auxiliary Power PM Enable */
-#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
-#define  PCI_EXP_DEVCTL_READRQ	0x7000	/* Max_Read_Request_Size */
-#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
-#define PCI_EXP_DEVSTA		10	/* Device Status */
-#define  PCI_EXP_DEVSTA_CED	0x01	/* Correctable Error Detected */
-#define  PCI_EXP_DEVSTA_NFED	0x02	/* Non-Fatal Error Detected */
-#define  PCI_EXP_DEVSTA_FED	0x04	/* Fatal Error Detected */
-#define  PCI_EXP_DEVSTA_URD	0x08	/* Unsupported Request Detected */
-#define  PCI_EXP_DEVSTA_AUXPD	0x10	/* AUX Power Detected */
-#define  PCI_EXP_DEVSTA_TRPND	0x20	/* Transactions Pending */
-#define PCI_EXP_LNKCAP		12	/* Link Capabilities */
-#define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
-#define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
-#define  PCI_EXP_LNKCAP_ASPMS	0x00000c00 /* ASPM Support */
-#define  PCI_EXP_LNKCAP_L0SEL	0x00007000 /* L0s Exit Latency */
-#define  PCI_EXP_LNKCAP_L1EL	0x00038000 /* L1 Exit Latency */
-#define  PCI_EXP_LNKCAP_CLKPM	0x00040000 /* L1 Clock Power Management */
-#define  PCI_EXP_LNKCAP_SDERC	0x00080000 /* Suprise Down Error Reporting Capable */
-#define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
-#define  PCI_EXP_LNKCAP_LBNC	0x00200000 /* Link Bandwidth Notification Capability */
-#define  PCI_EXP_LNKCAP_PN	0xff000000 /* Port Number */
-#define PCI_EXP_LNKCTL		16	/* Link Control */
-#define  PCI_EXP_LNKCTL_ASPMC	0x0003	/* ASPM Control */
-#define  PCI_EXP_LNKCTL_RCB	0x0008	/* Read Completion Boundary */
-#define  PCI_EXP_LNKCTL_LD	0x0010	/* Link Disable */
-#define  PCI_EXP_LNKCTL_RL	0x0020	/* Retrain Link */
-#define  PCI_EXP_LNKCTL_CCC	0x0040	/* Common Clock Configuration */
-#define  PCI_EXP_LNKCTL_ES	0x0080	/* Extended Synch */
-#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100	/* Enable clkreq */
-#define  PCI_EXP_LNKCTL_HAWD	0x0200	/* Hardware Autonomous Width Disable */
-#define  PCI_EXP_LNKCTL_LBMIE	0x0400	/* Link Bandwidth Management Interrupt Enable */
-#define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Lnk Autonomous Bandwidth Interrupt Enable */
-#define PCI_EXP_LNKSTA		18	/* Link Status */
-#define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
-#define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Nogotiated Link Width */
-#define  PCI_EXP_LNKSTA_LT	0x0800	/* Link Training */
-#define  PCI_EXP_LNKSTA_SLC	0x1000	/* Slot Clock Configuration */
-#define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
-#define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
-#define  PCI_EXP_LNKSTA_LABS	0x8000	/* Link Autonomous Bandwidth Status */
-#define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
-#define  PCI_EXP_SLTCAP_ABP	0x00000001 /* Attention Button Present */
-#define  PCI_EXP_SLTCAP_PCP	0x00000002 /* Power Controller Present */
-#define  PCI_EXP_SLTCAP_MRLSP	0x00000004 /* MRL Sensor Present */
-#define  PCI_EXP_SLTCAP_AIP	0x00000008 /* Attention Indicator Present */
-#define  PCI_EXP_SLTCAP_PIP	0x00000010 /* Power Indicator Present */
-#define  PCI_EXP_SLTCAP_HPS	0x00000020 /* Hot-Plug Surprise */
-#define  PCI_EXP_SLTCAP_HPC	0x00000040 /* Hot-Plug Capable */
-#define  PCI_EXP_SLTCAP_SPLV	0x00007f80 /* Slot Power Limit Value */
-#define  PCI_EXP_SLTCAP_SPLS	0x00018000 /* Slot Power Limit Scale */
-#define  PCI_EXP_SLTCAP_EIP	0x00020000 /* Electromechanical Interlock Present */
-#define  PCI_EXP_SLTCAP_NCCS	0x00040000 /* No Command Completed Support */
-#define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
-#define PCI_EXP_SLTCTL		24	/* Slot Control */
-#define  PCI_EXP_SLTCTL_ABPE	0x0001	/* Attention Button Pressed Enable */
-#define  PCI_EXP_SLTCTL_PFDE	0x0002	/* Power Fault Detected Enable */
-#define  PCI_EXP_SLTCTL_MRLSCE	0x0004	/* MRL Sensor Changed Enable */
-#define  PCI_EXP_SLTCTL_PDCE	0x0008	/* Presence Detect Changed Enable */
-#define  PCI_EXP_SLTCTL_CCIE	0x0010	/* Command Completed Interrupt Enable */
-#define  PCI_EXP_SLTCTL_HPIE	0x0020	/* Hot-Plug Interrupt Enable */
-#define  PCI_EXP_SLTCTL_AIC	0x00c0	/* Attention Indicator Control */
-#define  PCI_EXP_SLTCTL_PIC	0x0300	/* Power Indicator Control */
-#define  PCI_EXP_SLTCTL_PCC	0x0400	/* Power Controller Control */
-#define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
-#define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
-#define PCI_EXP_SLTSTA		26	/* Slot Status */
-#define  PCI_EXP_SLTSTA_ABP	0x0001	/* Attention Button Pressed */
-#define  PCI_EXP_SLTSTA_PFD	0x0002	/* Power Fault Detected */
-#define  PCI_EXP_SLTSTA_MRLSC	0x0004	/* MRL Sensor Changed */
-#define  PCI_EXP_SLTSTA_PDC	0x0008	/* Presence Detect Changed */
-#define  PCI_EXP_SLTSTA_CC	0x0010	/* Command Completed */
-#define  PCI_EXP_SLTSTA_MRLSS	0x0020	/* MRL Sensor State */
-#define  PCI_EXP_SLTSTA_PDS	0x0040	/* Presence Detect State */
-#define  PCI_EXP_SLTSTA_EIS	0x0080	/* Electromechanical Interlock Status */
-#define  PCI_EXP_SLTSTA_DLLSC	0x0100	/* Data Link Layer State Changed */
-#define PCI_EXP_RTCTL		28	/* Root Control */
-#define  PCI_EXP_RTCTL_SECEE	0x01	/* System Error on Correctable Error */
-#define  PCI_EXP_RTCTL_SENFEE	0x02	/* System Error on Non-Fatal Error */
-#define  PCI_EXP_RTCTL_SEFEE	0x04	/* System Error on Fatal Error */
-#define  PCI_EXP_RTCTL_PMEIE	0x08	/* PME Interrupt Enable */
-#define  PCI_EXP_RTCTL_CRSSVE	0x10	/* CRS Software Visibility Enable */
-#define PCI_EXP_RTCAP		30	/* Root Capabilities */
-#define PCI_EXP_RTSTA		32	/* Root Status */
-#define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
-#define  PCI_EXP_DEVCAP2_ARI	0x20	/* Alternative Routing-ID */
-#define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
-#define  PCI_EXP_DEVCTL2_ARI	0x20	/* Alternative Routing-ID */
-#define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
-#define PCI_EXP_SLTCTL2		56	/* Slot Control 2 */
-
-/* Extended Capabilities (PCI-X 2.0 and Express) */
-#define PCI_EXT_CAP_ID(header)		(header & 0x0000ffff)
-#define PCI_EXT_CAP_VER(header)		((header >> 16) & 0xf)
-#define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
-
-#define PCI_EXT_CAP_ID_ERR	1
-#define PCI_EXT_CAP_ID_VC	2
-#define PCI_EXT_CAP_ID_DSN	3
-#define PCI_EXT_CAP_ID_PWR	4
-#define PCI_EXT_CAP_ID_ARI	14
-#define PCI_EXT_CAP_ID_ATS	15
-#define PCI_EXT_CAP_ID_SRIOV	16
-
-/* Advanced Error Reporting */
-#define PCI_ERR_UNCOR_STATUS	4	/* Uncorrectable Error Status */
-#define  PCI_ERR_UNC_TRAIN	0x00000001	/* Training */
-#define  PCI_ERR_UNC_DLP	0x00000010	/* Data Link Protocol */
-#define  PCI_ERR_UNC_POISON_TLP	0x00001000	/* Poisoned TLP */
-#define  PCI_ERR_UNC_FCP	0x00002000	/* Flow Control Protocol */
-#define  PCI_ERR_UNC_COMP_TIME	0x00004000	/* Completion Timeout */
-#define  PCI_ERR_UNC_COMP_ABORT	0x00008000	/* Completer Abort */
-#define  PCI_ERR_UNC_UNX_COMP	0x00010000	/* Unexpected Completion */
-#define  PCI_ERR_UNC_RX_OVER	0x00020000	/* Receiver Overflow */
-#define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
-#define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
-#define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
-#define PCI_ERR_UNCOR_MASK	8	/* Uncorrectable Error Mask */
-	/* Same bits as above */
-#define PCI_ERR_UNCOR_SEVER	12	/* Uncorrectable Error Severity */
-	/* Same bits as above */
-#define PCI_ERR_COR_STATUS	16	/* Correctable Error Status */
-#define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
-#define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
-#define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
-#define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
-#define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
-#define PCI_ERR_COR_MASK	20	/* Correctable Error Mask */
-	/* Same bits as above */
-#define PCI_ERR_CAP		24	/* Advanced Error Capabilities */
-#define  PCI_ERR_CAP_FEP(x)	((x) & 31)	/* First Error Pointer */
-#define  PCI_ERR_CAP_ECRC_GENC	0x00000020	/* ECRC Generation Capable */
-#define  PCI_ERR_CAP_ECRC_GENE	0x00000040	/* ECRC Generation Enable */
-#define  PCI_ERR_CAP_ECRC_CHKC	0x00000080	/* ECRC Check Capable */
-#define  PCI_ERR_CAP_ECRC_CHKE	0x00000100	/* ECRC Check Enable */
-#define PCI_ERR_HEADER_LOG	28	/* Header Log Register (16 bytes) */
-#define PCI_ERR_ROOT_COMMAND	44	/* Root Error Command */
-/* Correctable Err Reporting Enable */
-#define PCI_ERR_ROOT_CMD_COR_EN		0x00000001
-/* Non-fatal Err Reporting Enable */
-#define PCI_ERR_ROOT_CMD_NONFATAL_EN	0x00000002
-/* Fatal Err Reporting Enable */
-#define PCI_ERR_ROOT_CMD_FATAL_EN	0x00000004
-#define PCI_ERR_ROOT_STATUS	48
-#define PCI_ERR_ROOT_COR_RCV		0x00000001	/* ERR_COR Received */
-/* Multi ERR_COR Received */
-#define PCI_ERR_ROOT_MULTI_COR_RCV	0x00000002
-/* ERR_FATAL/NONFATAL Recevied */
-#define PCI_ERR_ROOT_UNCOR_RCV		0x00000004
-/* Multi ERR_FATAL/NONFATAL Recevied */
-#define PCI_ERR_ROOT_MULTI_UNCOR_RCV	0x00000008
-#define PCI_ERR_ROOT_FIRST_FATAL	0x00000010	/* First Fatal */
-#define PCI_ERR_ROOT_NONFATAL_RCV	0x00000020	/* Non-Fatal Received */
-#define PCI_ERR_ROOT_FATAL_RCV		0x00000040	/* Fatal Received */
-#define PCI_ERR_ROOT_COR_SRC	52
-#define PCI_ERR_ROOT_SRC	54
-
-/* Virtual Channel */
-#define PCI_VC_PORT_REG1	4
-#define PCI_VC_PORT_REG2	8
-#define PCI_VC_PORT_CTRL	12
-#define PCI_VC_PORT_STATUS	14
-#define PCI_VC_RES_CAP		16
-#define PCI_VC_RES_CTRL		20
-#define PCI_VC_RES_STATUS	26
-
-/* Power Budgeting */
-#define PCI_PWR_DSR		4	/* Data Select Register */
-#define PCI_PWR_DATA		8	/* Data Register */
-#define  PCI_PWR_DATA_BASE(x)	((x) & 0xff)	    /* Base Power */
-#define  PCI_PWR_DATA_SCALE(x)	(((x) >> 8) & 3)    /* Data Scale */
-#define  PCI_PWR_DATA_PM_SUB(x)	(((x) >> 10) & 7)   /* PM Sub State */
-#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
-#define  PCI_PWR_DATA_TYPE(x)	(((x) >> 15) & 7)   /* Type */
-#define  PCI_PWR_DATA_RAIL(x)	(((x) >> 18) & 7)   /* Power Rail */
-#define PCI_PWR_CAP		12	/* Capability */
-#define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
-
-/*
- * Hypertransport sub capability types
- *
- * Unfortunately there are both 3 bit and 5 bit capability types defined
- * in the HT spec, catering for that is a little messy. You probably don't
- * want to use these directly, just use pci_find_ht_capability() and it
- * will do the right thing for you.
- */
-#define HT_3BIT_CAP_MASK	0xE0
-#define HT_CAPTYPE_SLAVE	0x00	/* Slave/Primary link configuration */
-#define HT_CAPTYPE_HOST		0x20	/* Host/Secondary link configuration */
-
-#define HT_5BIT_CAP_MASK	0xF8
-#define HT_CAPTYPE_IRQ		0x80	/* IRQ Configuration */
-#define HT_CAPTYPE_REMAPPING_40	0xA0	/* 40 bit address remapping */
-#define HT_CAPTYPE_REMAPPING_64 0xA2	/* 64 bit address remapping */
-#define HT_CAPTYPE_UNITID_CLUMP	0x90	/* Unit ID clumping */
-#define HT_CAPTYPE_EXTCONF	0x98	/* Extended Configuration Space Access */
-#define HT_CAPTYPE_MSI_MAPPING	0xA8	/* MSI Mapping Capability */
-#define  HT_MSI_FLAGS		0x02		/* Offset to flags */
-#define  HT_MSI_FLAGS_ENABLE	0x1		/* Mapping enable */
-#define  HT_MSI_FLAGS_FIXED	0x2		/* Fixed mapping only */
-#define  HT_MSI_FIXED_ADDR	0x00000000FEE00000ULL	/* Fixed addr */
-#define  HT_MSI_ADDR_LO		0x04		/* Offset to low addr bits */
-#define  HT_MSI_ADDR_LO_MASK	0xFFF00000	/* Low address bit mask */
-#define  HT_MSI_ADDR_HI		0x08		/* Offset to high addr bits */
-#define HT_CAPTYPE_DIRECT_ROUTE	0xB0	/* Direct routing configuration */
-#define HT_CAPTYPE_VCSET	0xB8	/* Virtual Channel configuration */
-#define HT_CAPTYPE_ERROR_RETRY	0xC0	/* Retry on error configuration */
-#define HT_CAPTYPE_GEN3		0xD0	/* Generation 3 hypertransport configuration */
-#define HT_CAPTYPE_PM		0xE0	/* Hypertransport powermanagement configuration */
-
-/* Alternative Routing-ID Interpretation */
-#define PCI_ARI_CAP		0x04	/* ARI Capability Register */
-#define  PCI_ARI_CAP_MFVC	0x0001	/* MFVC Function Groups Capability */
-#define  PCI_ARI_CAP_ACS	0x0002	/* ACS Function Groups Capability */
-#define  PCI_ARI_CAP_NFN(x)	(((x) >> 8) & 0xff) /* Next Function Number */
-#define PCI_ARI_CTRL		0x06	/* ARI Control Register */
-#define  PCI_ARI_CTRL_MFVC	0x0001	/* MFVC Function Groups Enable */
-#define  PCI_ARI_CTRL_ACS	0x0002	/* ACS Function Groups Enable */
-#define  PCI_ARI_CTRL_FG(x)	(((x) >> 4) & 7) /* Function Group */
-
-/* Address Translation Service */
-#define PCI_ATS_CAP		0x04	/* ATS Capability Register */
-#define  PCI_ATS_CAP_QDEP(x)	((x) & 0x1f)	/* Invalidate Queue Depth */
-#define  PCI_ATS_MAX_QDEP	32	/* Max Invalidate Queue Depth */
-#define PCI_ATS_CTRL		0x06	/* ATS Control Register */
-#define  PCI_ATS_CTRL_ENABLE	0x8000	/* ATS Enable */
-#define  PCI_ATS_CTRL_STU(x)	((x) & 0x1f)	/* Smallest Translation Unit */
-#define  PCI_ATS_MIN_STU	12	/* shift of minimum STU block */
-
-/* Single Root I/O Virtualization */
-#define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
-#define  PCI_SRIOV_CAP_VFM	0x01	/* VF Migration Capable */
-#define  PCI_SRIOV_CAP_INTR(x)	((x) >> 21) /* Interrupt Message Number */
-#define PCI_SRIOV_CTRL		0x08	/* SR-IOV Control */
-#define  PCI_SRIOV_CTRL_VFE	0x01	/* VF Enable */
-#define  PCI_SRIOV_CTRL_VFM	0x02	/* VF Migration Enable */
-#define  PCI_SRIOV_CTRL_INTR	0x04	/* VF Migration Interrupt Enable */
-#define  PCI_SRIOV_CTRL_MSE	0x08	/* VF Memory Space Enable */
-#define  PCI_SRIOV_CTRL_ARI	0x10	/* ARI Capable Hierarchy */
-#define PCI_SRIOV_STATUS	0x0a	/* SR-IOV Status */
-#define  PCI_SRIOV_STATUS_VFM	0x01	/* VF Migration Status */
-#define PCI_SRIOV_INITIAL_VF	0x0c	/* Initial VFs */
-#define PCI_SRIOV_TOTAL_VF	0x0e	/* Total VFs */
-#define PCI_SRIOV_NUM_VF	0x10	/* Number of VFs */
-#define PCI_SRIOV_FUNC_LINK	0x12	/* Function Dependency Link */
-#define PCI_SRIOV_VF_OFFSET	0x14	/* First VF Offset */
-#define PCI_SRIOV_VF_STRIDE	0x16	/* Following VF Stride */
-#define PCI_SRIOV_VF_DID	0x1a	/* VF Device ID */
-#define PCI_SRIOV_SUP_PGSIZE	0x1c	/* Supported Page Sizes */
-#define PCI_SRIOV_SYS_PGSIZE	0x20	/* System Page Size */
-#define PCI_SRIOV_BAR		0x24	/* VF BAR0 */
-#define  PCI_SRIOV_NUM_BARS	6	/* Number of VF BARs */
-#define PCI_SRIOV_VFM		0x3c	/* VF Migration State Array Offset*/
-#define  PCI_SRIOV_VFM_BIR(x)	((x) & 7)	/* State BIR */
-#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)	/* State Offset */
-#define  PCI_SRIOV_VFM_UA	0x0	/* Inactive.Unavailable */
-#define  PCI_SRIOV_VFM_MI	0x1	/* Dormant.MigrateIn */
-#define  PCI_SRIOV_VFM_MO	0x2	/* Active.MigrateOut */
-#define  PCI_SRIOV_VFM_AV	0x3	/* Active.Available */
-
-#endif /* LINUX_PCI_REGS_H */
+/*
+ *      pci_regs.h
+ *
+ *      PCI standard defines
+ *      Copyright 1994, Drew Eckhardt
+ *      Copyright 1997--1999 Martin Mares <mj@ucw.cz>
+ *
+ *      For more information, please consult the following manuals (look at
+ *      http://www.pcisig.com/ for how to get them):
+ *
+ *      PCI BIOS Specification
+ *      PCI Local Bus Specification
+ *      PCI to PCI Bridge Specification
+ *      PCI System Design Guide
+ *
+ *      For hypertransport information, please consult the following manuals
+ *      from http://www.hypertransport.org
+ *
+ *      The Hypertransport I/O Link Specification
+ */
+
+#ifndef LINUX_PCI_REGS_H
+#define LINUX_PCI_REGS_H
+
+/*
+ * Under PCI, each device has 256 bytes of configuration address space,
+ * of which the first 64 bytes are standardized as follows:
+ */
+#define PCI_VENDOR_ID           0x00    /* 16 bits */
+#define PCI_DEVICE_ID           0x02    /* 16 bits */
+#define PCI_COMMAND             0x04    /* 16 bits */
+#define  PCI_COMMAND_IO         0x1     /* Enable response in I/O space */
+#define  PCI_COMMAND_MEMORY     0x2     /* Enable response in Memory space */
+#define  PCI_COMMAND_MASTER     0x4     /* Enable bus mastering */
+#define  PCI_COMMAND_SPECIAL    0x8     /* Enable response to special cycles */
+#define  PCI_COMMAND_INVALIDATE 0x10    /* Use memory write and invalidate */
+#define  PCI_COMMAND_VGA_PALETTE 0x20   /* Enable palette snooping */
+#define  PCI_COMMAND_PARITY     0x40    /* Enable parity checking */
+#define  PCI_COMMAND_WAIT       0x80    /* Enable address/data stepping */
+#define  PCI_COMMAND_SERR       0x100   /* Enable SERR */
+#define  PCI_COMMAND_FAST_BACK  0x200   /* Enable back-to-back writes */
+#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
+
+#define PCI_STATUS              0x06    /* 16 bits */
+#define  PCI_STATUS_INTERRUPT   0x08    /* Interrupt status */
+#define  PCI_STATUS_CAP_LIST    0x10    /* Support Capability List */
+#define  PCI_STATUS_66MHZ       0x20    /* Support 66 Mhz PCI 2.1 bus */
+#define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
+#define  PCI_STATUS_FAST_BACK   0x80    /* Accept fast-back to back */
+#define  PCI_STATUS_PARITY      0x100   /* Detected parity error */
+#define  PCI_STATUS_DEVSEL_MASK 0x600   /* DEVSEL timing */
+#define  PCI_STATUS_DEVSEL_FAST         0x000
+#define  PCI_STATUS_DEVSEL_MEDIUM       0x200
+#define  PCI_STATUS_DEVSEL_SLOW         0x400
+#define  PCI_STATUS_SIG_TARGET_ABORT    0x800 /* Set on target abort */
+#define  PCI_STATUS_REC_TARGET_ABORT    0x1000 /* Master ack of " */
+#define  PCI_STATUS_REC_MASTER_ABORT    0x2000 /* Set on master abort */
+#define  PCI_STATUS_SIG_SYSTEM_ERROR    0x4000 /* Set when we drive SERR */
+#define  PCI_STATUS_DETECTED_PARITY     0x8000 /* Set on parity error */
+
+#define PCI_CLASS_REVISION      0x08    /* High 24 bits are class, low 8 revision */
+#define PCI_REVISION_ID         0x08    /* Revision ID */
+#define PCI_CLASS_PROG          0x09    /* Reg. Level Programming Interface */
+#define PCI_CLASS_DEVICE        0x0a    /* Device class */
+
+#define PCI_CACHE_LINE_SIZE     0x0c    /* 8 bits */
+#define PCI_LATENCY_TIMER       0x0d    /* 8 bits */
+#define PCI_HEADER_TYPE         0x0e    /* 8 bits */
+#define  PCI_HEADER_TYPE_NORMAL         0
+#define  PCI_HEADER_TYPE_BRIDGE         1
+#define  PCI_HEADER_TYPE_CARDBUS        2
+
+#define PCI_BIST                0x0f    /* 8 bits */
+#define  PCI_BIST_CODE_MASK     0x0f    /* Return result */
+#define  PCI_BIST_START         0x40    /* 1 to start BIST, 2 secs or less */
+#define  PCI_BIST_CAPABLE       0x80    /* 1 if BIST capable */
+
+/*
+ * Base addresses specify locations in memory or I/O space.
+ * Decoded size can be determined by writing a value of
+ * 0xffffffff to the register, and reading it back.  Only
+ * 1 bits are decoded.
+ */
+#define PCI_BASE_ADDRESS_0      0x10    /* 32 bits */
+#define PCI_BASE_ADDRESS_1      0x14    /* 32 bits [htype 0,1 only] */
+#define PCI_BASE_ADDRESS_2      0x18    /* 32 bits [htype 0 only] */
+#define PCI_BASE_ADDRESS_3      0x1c    /* 32 bits */
+#define PCI_BASE_ADDRESS_4      0x20    /* 32 bits */
+#define PCI_BASE_ADDRESS_5      0x24    /* 32 bits */
+#define  PCI_BASE_ADDRESS_SPACE         0x01    /* 0 = memory, 1 = I/O */
+#define  PCI_BASE_ADDRESS_SPACE_IO      0x01
+#define  PCI_BASE_ADDRESS_SPACE_MEMORY  0x00
+#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK 0x06
+#define  PCI_BASE_ADDRESS_MEM_TYPE_32   0x00    /* 32 bit address */
+#define  PCI_BASE_ADDRESS_MEM_TYPE_1M   0x02    /* Below 1M [obsolete] */
+#define  PCI_BASE_ADDRESS_MEM_TYPE_64   0x04    /* 64 bit address */
+#define  PCI_BASE_ADDRESS_MEM_PREFETCH  0x08    /* prefetchable? */
+#define  PCI_BASE_ADDRESS_MEM_MASK      (~0x0fUL)
+#define  PCI_BASE_ADDRESS_IO_MASK       (~0x03UL)
+/* bit 1 is reserved if address_space = 1 */
+
+/* Header type 0 (normal devices) */
+#define PCI_CARDBUS_CIS         0x28
+#define PCI_SUBSYSTEM_VENDOR_ID 0x2c
+#define PCI_SUBSYSTEM_ID        0x2e
+#define PCI_ROM_ADDRESS         0x30    /* Bits 31..11 are address, 10..1 reserved */
+#define  PCI_ROM_ADDRESS_ENABLE 0x01
+#define PCI_ROM_ADDRESS_MASK    (~0x7ffUL)
+
+#define PCI_CAPABILITY_LIST     0x34    /* Offset of first capability list entry */
+
+/* 0x35-0x3b are reserved */
+#define PCI_INTERRUPT_LINE      0x3c    /* 8 bits */
+#define PCI_INTERRUPT_PIN       0x3d    /* 8 bits */
+#define PCI_MIN_GNT             0x3e    /* 8 bits */
+#define PCI_MAX_LAT             0x3f    /* 8 bits */
+
+/* Header type 1 (PCI-to-PCI bridges) */
+#define PCI_PRIMARY_BUS         0x18    /* Primary bus number */
+#define PCI_SECONDARY_BUS       0x19    /* Secondary bus number */
+#define PCI_SUBORDINATE_BUS     0x1a    /* Highest bus number behind the bridge */
+#define PCI_SEC_LATENCY_TIMER   0x1b    /* Latency timer for secondary interface */
+#define PCI_IO_BASE             0x1c    /* I/O range behind the bridge */
+#define PCI_IO_LIMIT            0x1d
+#define  PCI_IO_RANGE_TYPE_MASK 0x0fUL  /* I/O bridging type */
+#define  PCI_IO_RANGE_TYPE_16   0x00
+#define  PCI_IO_RANGE_TYPE_32   0x01
+#define  PCI_IO_RANGE_MASK      (~0x0fUL)
+#define PCI_SEC_STATUS          0x1e    /* Secondary status register, only bit 14 used */
+#define PCI_MEMORY_BASE         0x20    /* Memory range behind */
+#define PCI_MEMORY_LIMIT        0x22
+#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
+#define  PCI_MEMORY_RANGE_MASK  (~0x0fUL)
+#define PCI_PREF_MEMORY_BASE    0x24    /* Prefetchable memory range behind */
+#define PCI_PREF_MEMORY_LIMIT   0x26
+#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
+#define  PCI_PREF_RANGE_TYPE_32 0x00
+#define  PCI_PREF_RANGE_TYPE_64 0x01
+#define  PCI_PREF_RANGE_MASK    (~0x0fUL)
+#define PCI_PREF_BASE_UPPER32   0x28    /* Upper half of prefetchable memory range */
+#define PCI_PREF_LIMIT_UPPER32  0x2c
+#define PCI_IO_BASE_UPPER16     0x30    /* Upper half of I/O addresses */
+#define PCI_IO_LIMIT_UPPER16    0x32
+/* 0x34 same as for htype 0 */
+/* 0x35-0x3b is reserved */
+#define PCI_ROM_ADDRESS1        0x38    /* Same as PCI_ROM_ADDRESS, but for htype 1 */
+/* 0x3c-0x3d are same as for htype 0 */
+#define PCI_BRIDGE_CONTROL      0x3e
+#define  PCI_BRIDGE_CTL_PARITY  0x01    /* Enable parity detection on secondary interface */
+#define  PCI_BRIDGE_CTL_SERR    0x02    /* The same for SERR forwarding */
+#define  PCI_BRIDGE_CTL_ISA     0x04    /* Enable ISA mode */
+#define  PCI_BRIDGE_CTL_VGA     0x08    /* Forward VGA addresses */
+#define  PCI_BRIDGE_CTL_MASTER_ABORT    0x20  /* Report master aborts */
+#define  PCI_BRIDGE_CTL_BUS_RESET       0x40    /* Secondary bus reset */
+#define  PCI_BRIDGE_CTL_FAST_BACK       0x80    /* Fast Back2Back enabled on secondary interface */
+
+/* Header type 2 (CardBus bridges) */
+#define PCI_CB_CAPABILITY_LIST  0x14
+/* 0x15 reserved */
+#define PCI_CB_SEC_STATUS       0x16    /* Secondary status */
+#define PCI_CB_PRIMARY_BUS      0x18    /* PCI bus number */
+#define PCI_CB_CARD_BUS         0x19    /* CardBus bus number */
+#define PCI_CB_SUBORDINATE_BUS  0x1a    /* Subordinate bus number */
+#define PCI_CB_LATENCY_TIMER    0x1b    /* CardBus latency timer */
+#define PCI_CB_MEMORY_BASE_0    0x1c
+#define PCI_CB_MEMORY_LIMIT_0   0x20
+#define PCI_CB_MEMORY_BASE_1    0x24
+#define PCI_CB_MEMORY_LIMIT_1   0x28
+#define PCI_CB_IO_BASE_0        0x2c
+#define PCI_CB_IO_BASE_0_HI     0x2e
+#define PCI_CB_IO_LIMIT_0       0x30
+#define PCI_CB_IO_LIMIT_0_HI    0x32
+#define PCI_CB_IO_BASE_1        0x34
+#define PCI_CB_IO_BASE_1_HI     0x36
+#define PCI_CB_IO_LIMIT_1       0x38
+#define PCI_CB_IO_LIMIT_1_HI    0x3a
+#define  PCI_CB_IO_RANGE_MASK   (~0x03UL)
+/* 0x3c-0x3d are same as for htype 0 */
+#define PCI_CB_BRIDGE_CONTROL   0x3e
+#define  PCI_CB_BRIDGE_CTL_PARITY       0x01    /* Similar to standard bridge control register */
+#define  PCI_CB_BRIDGE_CTL_SERR         0x02
+#define  PCI_CB_BRIDGE_CTL_ISA          0x04
+#define  PCI_CB_BRIDGE_CTL_VGA          0x08
+#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT 0x20
+#define  PCI_CB_BRIDGE_CTL_CB_RESET     0x40    /* CardBus reset */
+#define  PCI_CB_BRIDGE_CTL_16BIT_INT    0x80    /* Enable interrupt for 16-bit cards */
+#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100  /* Prefetch enable for both memory regions */
+#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
+#define  PCI_CB_BRIDGE_CTL_POST_WRITES  0x400
+#define PCI_CB_SUBSYSTEM_VENDOR_ID      0x40
+#define PCI_CB_SUBSYSTEM_ID             0x42
+#define PCI_CB_LEGACY_MODE_BASE         0x44    /* 16-bit PC Card legacy mode base address (ExCa) */
+/* 0x48-0x7f reserved */
+
+/* Capability lists */
+
+#define PCI_CAP_LIST_ID         0       /* Capability ID */
+#define  PCI_CAP_ID_PM          0x01    /* Power Management */
+#define  PCI_CAP_ID_AGP         0x02    /* Accelerated Graphics Port */
+#define  PCI_CAP_ID_VPD         0x03    /* Vital Product Data */
+#define  PCI_CAP_ID_SLOTID      0x04    /* Slot Identification */
+#define  PCI_CAP_ID_MSI         0x05    /* Message Signalled Interrupts */
+#define  PCI_CAP_ID_CHSWP       0x06    /* CompactPCI HotSwap */
+#define  PCI_CAP_ID_PCIX        0x07    /* PCI-X */
+#define  PCI_CAP_ID_HT          0x08    /* HyperTransport */
+#define  PCI_CAP_ID_VNDR        0x09    /* Vendor specific */
+#define  PCI_CAP_ID_DBG         0x0A    /* Debug port */
+#define  PCI_CAP_ID_CCRC        0x0B    /* CompactPCI Central Resource Control */
+#define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
+#define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
+#define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
+#define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
+#define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
+#define PCI_CAP_LIST_NEXT       1       /* Next capability in the list */
+#define PCI_CAP_FLAGS           2       /* Capability defined flags (16 bits) */
+#define PCI_CAP_SIZEOF          4
+
+/* Power Management Registers */
+
+#define PCI_PM_PMC              2       /* PM Capabilities Register */
+#define  PCI_PM_CAP_VER_MASK    0x0007  /* Version */
+#define  PCI_PM_CAP_PME_CLOCK   0x0008  /* PME clock required */
+#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
+#define  PCI_PM_CAP_DSI         0x0020  /* Device specific initialization */
+#define  PCI_PM_CAP_AUX_POWER   0x01C0  /* Auxilliary power support mask */
+#define  PCI_PM_CAP_D1          0x0200  /* D1 power state support */
+#define  PCI_PM_CAP_D2          0x0400  /* D2 power state support */
+#define  PCI_PM_CAP_PME         0x0800  /* PME pin supported */
+#define  PCI_PM_CAP_PME_MASK    0xF800  /* PME Mask of all supported states */
+#define  PCI_PM_CAP_PME_D0      0x0800  /* PME# from D0 */
+#define  PCI_PM_CAP_PME_D1      0x1000  /* PME# from D1 */
+#define  PCI_PM_CAP_PME_D2      0x2000  /* PME# from D2 */
+#define  PCI_PM_CAP_PME_D3      0x4000  /* PME# from D3 (hot) */
+#define  PCI_PM_CAP_PME_D3cold  0x8000  /* PME# from D3 (cold) */
+#define  PCI_PM_CAP_PME_SHIFT   11      /* Start of the PME Mask in PMC */
+#define PCI_PM_CTRL             4       /* PM control and status register */
+#define  PCI_PM_CTRL_STATE_MASK 0x0003  /* Current power state (D0 to D3) */
+#define  PCI_PM_CTRL_NO_SOFT_RESET      0x0008  /* No reset for D3hot->D0 */
+#define  PCI_PM_CTRL_PME_ENABLE 0x0100  /* PME pin enable */
+#define  PCI_PM_CTRL_DATA_SEL_MASK      0x1e00  /* Data select (??) */
+#define  PCI_PM_CTRL_DATA_SCALE_MASK    0x6000  /* Data scale (??) */
+#define  PCI_PM_CTRL_PME_STATUS 0x8000  /* PME pin status */
+#define PCI_PM_PPB_EXTENSIONS   6       /* PPB support extensions (??) */
+#define  PCI_PM_PPB_B2_B3       0x40    /* Stop clock when in D3hot (??) */
+#define  PCI_PM_BPCC_ENABLE     0x80    /* Bus power/clock control enable (??) */
+#define PCI_PM_DATA_REGISTER    7       /* (??) */
+#define PCI_PM_SIZEOF           8
+
+/* AGP registers */
+
+#define PCI_AGP_VERSION         2       /* BCD version number */
+#define PCI_AGP_RFU             3       /* Rest of capability flags */
+#define PCI_AGP_STATUS          4       /* Status register */
+#define  PCI_AGP_STATUS_RQ_MASK 0xff000000      /* Maximum number of requests - 1 */
+#define  PCI_AGP_STATUS_SBA     0x0200  /* Sideband addressing supported */
+#define  PCI_AGP_STATUS_64BIT   0x0020  /* 64-bit addressing supported */
+#define  PCI_AGP_STATUS_FW      0x0010  /* FW transfers supported */
+#define  PCI_AGP_STATUS_RATE4   0x0004  /* 4x transfer rate supported */
+#define  PCI_AGP_STATUS_RATE2   0x0002  /* 2x transfer rate supported */
+#define  PCI_AGP_STATUS_RATE1   0x0001  /* 1x transfer rate supported */
+#define PCI_AGP_COMMAND         8       /* Control register */
+#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
+#define  PCI_AGP_COMMAND_SBA    0x0200  /* Sideband addressing enabled */
+#define  PCI_AGP_COMMAND_AGP    0x0100  /* Allow processing of AGP transactions */
+#define  PCI_AGP_COMMAND_64BIT  0x0020  /* Allow processing of 64-bit addresses */
+#define  PCI_AGP_COMMAND_FW     0x0010  /* Force FW transfers */
+#define  PCI_AGP_COMMAND_RATE4  0x0004  /* Use 4x rate */
+#define  PCI_AGP_COMMAND_RATE2  0x0002  /* Use 2x rate */
+#define  PCI_AGP_COMMAND_RATE1  0x0001  /* Use 1x rate */
+#define PCI_AGP_SIZEOF          12
+
+/* Vital Product Data */
+
+#define PCI_VPD_ADDR            2       /* Address to access (15 bits!) */
+#define  PCI_VPD_ADDR_MASK      0x7fff  /* Address mask */
+#define  PCI_VPD_ADDR_F         0x8000  /* Write 0, 1 indicates completion */
+#define PCI_VPD_DATA            4       /* 32-bits of data returned here */
+
+/* Slot Identification */
+
+#define PCI_SID_ESR             2       /* Expansion Slot Register */
+#define  PCI_SID_ESR_NSLOTS     0x1f    /* Number of expansion slots available */
+#define  PCI_SID_ESR_FIC        0x20    /* First In Chassis Flag */
+#define PCI_SID_CHASSIS_NR      3       /* Chassis Number */
+
+/* Message Signalled Interrupts registers */
+
+#define PCI_MSI_FLAGS           2       /* Various flags */
+#define  PCI_MSI_FLAGS_64BIT    0x80    /* 64-bit addresses allowed */
+#define  PCI_MSI_FLAGS_QSIZE    0x70    /* Message queue size configured */
+#define  PCI_MSI_FLAGS_QMASK    0x0e    /* Maximum queue size available */
+#define  PCI_MSI_FLAGS_ENABLE   0x01    /* MSI feature enabled */
+#define  PCI_MSI_FLAGS_MASKBIT  0x100   /* 64-bit mask bits allowed */
+#define PCI_MSI_RFU             3       /* Rest of capability flags */
+#define PCI_MSI_ADDRESS_LO      4       /* Lower 32 bits */
+#define PCI_MSI_ADDRESS_HI      8       /* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
+#define PCI_MSI_DATA_32         8       /* 16 bits of data for 32-bit devices */
+#define PCI_MSI_MASK_32         12      /* Mask bits register for 32-bit devices */
+#define PCI_MSI_DATA_64         12      /* 16 bits of data for 64-bit devices */
+#define PCI_MSI_MASK_64         16      /* Mask bits register for 64-bit devices */
+
+/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
+#define PCI_MSIX_FLAGS          2
+#define  PCI_MSIX_FLAGS_QSIZE   0x7FF
+#define  PCI_MSIX_FLAGS_ENABLE  (1 << 15)
+#define  PCI_MSIX_FLAGS_MASKALL (1 << 14)
+#define PCI_MSIX_FLAGS_BIRMASK  (7 << 0)
+
+/* CompactPCI Hotswap Register */
+
+#define PCI_CHSWP_CSR           2       /* Control and Status Register */
+#define  PCI_CHSWP_DHA          0x01    /* Device Hiding Arm */
+#define  PCI_CHSWP_EIM          0x02    /* ENUM# Signal Mask */
+#define  PCI_CHSWP_PIE          0x04    /* Pending Insert or Extract */
+#define  PCI_CHSWP_LOO          0x08    /* LED On / Off */
+#define  PCI_CHSWP_PI           0x30    /* Programming Interface */
+#define  PCI_CHSWP_EXT          0x40    /* ENUM# status - extraction */
+#define  PCI_CHSWP_INS          0x80    /* ENUM# status - insertion */
+
+/* PCI Advanced Feature registers */
+
+#define PCI_AF_LENGTH           2
+#define PCI_AF_CAP              3
+#define  PCI_AF_CAP_TP          0x01
+#define  PCI_AF_CAP_FLR         0x02
+#define PCI_AF_CTRL             4
+#define  PCI_AF_CTRL_FLR        0x01
+#define PCI_AF_STATUS           5
+#define  PCI_AF_STATUS_TP       0x01
+
+/* PCI-X registers */
+
+#define PCI_X_CMD               2       /* Modes & Features */
+#define  PCI_X_CMD_DPERR_E      0x0001  /* Data Parity Error Recovery Enable */
+#define  PCI_X_CMD_ERO          0x0002  /* Enable Relaxed Ordering */
+#define  PCI_X_CMD_READ_512     0x0000  /* 512 byte maximum read byte count */
+#define  PCI_X_CMD_READ_1K      0x0004  /* 1Kbyte maximum read byte count */
+#define  PCI_X_CMD_READ_2K      0x0008  /* 2Kbyte maximum read byte count */
+#define  PCI_X_CMD_READ_4K      0x000c  /* 4Kbyte maximum read byte count */
+#define  PCI_X_CMD_MAX_READ     0x000c  /* Max Memory Read Byte Count */
+                                /* Max # of outstanding split transactions */
+#define  PCI_X_CMD_SPLIT_1      0x0000  /* Max 1 */
+#define  PCI_X_CMD_SPLIT_2      0x0010  /* Max 2 */
+#define  PCI_X_CMD_SPLIT_3      0x0020  /* Max 3 */
+#define  PCI_X_CMD_SPLIT_4      0x0030  /* Max 4 */
+#define  PCI_X_CMD_SPLIT_8      0x0040  /* Max 8 */
+#define  PCI_X_CMD_SPLIT_12     0x0050  /* Max 12 */
+#define  PCI_X_CMD_SPLIT_16     0x0060  /* Max 16 */
+#define  PCI_X_CMD_SPLIT_32     0x0070  /* Max 32 */
+#define  PCI_X_CMD_MAX_SPLIT    0x0070  /* Max Outstanding Split Transactions */
+#define  PCI_X_CMD_VERSION(x)   (((x) >> 12) & 3) /* Version */
+#define PCI_X_STATUS            4       /* PCI-X capabilities */
+#define  PCI_X_STATUS_DEVFN     0x000000ff      /* A copy of devfn */
+#define  PCI_X_STATUS_BUS       0x0000ff00      /* A copy of bus nr */
+#define  PCI_X_STATUS_64BIT     0x00010000      /* 64-bit device */
+#define  PCI_X_STATUS_133MHZ    0x00020000      /* 133 MHz capable */
+#define  PCI_X_STATUS_SPL_DISC  0x00040000      /* Split Completion Discarded */
+#define  PCI_X_STATUS_UNX_SPL   0x00080000      /* Unexpected Split Completion */
+#define  PCI_X_STATUS_COMPLEX   0x00100000      /* Device Complexity */
+#define  PCI_X_STATUS_MAX_READ  0x00600000      /* Designed Max Memory Read Count */
+#define  PCI_X_STATUS_MAX_SPLIT 0x03800000      /* Designed Max Outstanding Split Transactions */
+#define  PCI_X_STATUS_MAX_CUM   0x1c000000      /* Designed Max Cumulative Read Size */
+#define  PCI_X_STATUS_SPL_ERR   0x20000000      /* Rcvd Split Completion Error Msg */
+#define  PCI_X_STATUS_266MHZ    0x40000000      /* 266 MHz capable */
+#define  PCI_X_STATUS_533MHZ    0x80000000      /* 533 MHz capable */
+
+/* PCI Express capability registers */
+
+#define PCI_EXP_FLAGS           2       /* Capabilities register */
+#define PCI_EXP_FLAGS_VERS      0x000f  /* Capability version */
+#define PCI_EXP_FLAGS_TYPE      0x00f0  /* Device/Port type */
+#define  PCI_EXP_TYPE_ENDPOINT  0x0     /* Express Endpoint */
+#define  PCI_EXP_TYPE_LEG_END   0x1     /* Legacy Endpoint */
+#define  PCI_EXP_TYPE_ROOT_PORT 0x4     /* Root Port */
+#define  PCI_EXP_TYPE_UPSTREAM  0x5     /* Upstream Port */
+#define  PCI_EXP_TYPE_DOWNSTREAM 0x6    /* Downstream Port */
+#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7    /* PCI/PCI-X Bridge */
+#define  PCI_EXP_TYPE_RC_END    0x9     /* Root Complex Integrated Endpoint */
+#define  PCI_EXP_TYPE_RC_EC     0x10    /* Root Complex Event Collector */
+#define PCI_EXP_FLAGS_SLOT      0x0100  /* Slot implemented */
+#define PCI_EXP_FLAGS_IRQ       0x3e00  /* Interrupt message number */
+#define PCI_EXP_DEVCAP          4       /* Device capabilities */
+#define  PCI_EXP_DEVCAP_PAYLOAD 0x07    /* Max_Payload_Size */
+#define  PCI_EXP_DEVCAP_PHANTOM 0x18    /* Phantom functions */
+#define  PCI_EXP_DEVCAP_EXT_TAG 0x20    /* Extended tags */
+#define  PCI_EXP_DEVCAP_L0S     0x1c0   /* L0s Acceptable Latency */
+#define  PCI_EXP_DEVCAP_L1      0xe00   /* L1 Acceptable Latency */
+#define  PCI_EXP_DEVCAP_ATN_BUT 0x1000  /* Attention Button Present */
+#define  PCI_EXP_DEVCAP_ATN_IND 0x2000  /* Attention Indicator Present */
+#define  PCI_EXP_DEVCAP_PWR_IND 0x4000  /* Power Indicator Present */
+#define  PCI_EXP_DEVCAP_RBER    0x8000  /* Role-Based Error Reporting */
+#define  PCI_EXP_DEVCAP_PWR_VAL 0x3fc0000 /* Slot Power Limit Value */
+#define  PCI_EXP_DEVCAP_PWR_SCL 0xc000000 /* Slot Power Limit Scale */
+#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
+#define PCI_EXP_DEVCTL          8       /* Device Control */
+#define  PCI_EXP_DEVCTL_CERE    0x0001  /* Correctable Error Reporting En. */
+#define  PCI_EXP_DEVCTL_NFERE   0x0002  /* Non-Fatal Error Reporting Enable */
+#define  PCI_EXP_DEVCTL_FERE    0x0004  /* Fatal Error Reporting Enable */
+#define  PCI_EXP_DEVCTL_URRE    0x0008  /* Unsupported Request Reporting En. */
+#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
+#define  PCI_EXP_DEVCTL_PAYLOAD 0x00e0  /* Max_Payload_Size */
+#define  PCI_EXP_DEVCTL_EXT_TAG 0x0100  /* Extended Tag Field Enable */
+#define  PCI_EXP_DEVCTL_PHANTOM 0x0200  /* Phantom Functions Enable */
+#define  PCI_EXP_DEVCTL_AUX_PME 0x0400  /* Auxiliary Power PM Enable */
+#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
+#define  PCI_EXP_DEVCTL_READRQ  0x7000  /* Max_Read_Request_Size */
+#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
+#define PCI_EXP_DEVSTA          10      /* Device Status */
+#define  PCI_EXP_DEVSTA_CED     0x01    /* Correctable Error Detected */
+#define  PCI_EXP_DEVSTA_NFED    0x02    /* Non-Fatal Error Detected */
+#define  PCI_EXP_DEVSTA_FED     0x04    /* Fatal Error Detected */
+#define  PCI_EXP_DEVSTA_URD     0x08    /* Unsupported Request Detected */
+#define  PCI_EXP_DEVSTA_AUXPD   0x10    /* AUX Power Detected */
+#define  PCI_EXP_DEVSTA_TRPND   0x20    /* Transactions Pending */
+#define PCI_EXP_LNKCAP          12      /* Link Capabilities */
+#define  PCI_EXP_LNKCAP_SLS     0x0000000f /* Supported Link Speeds */
+#define  PCI_EXP_LNKCAP_MLW     0x000003f0 /* Maximum Link Width */
+#define  PCI_EXP_LNKCAP_ASPMS   0x00000c00 /* ASPM Support */
+#define  PCI_EXP_LNKCAP_L0SEL   0x00007000 /* L0s Exit Latency */
+#define  PCI_EXP_LNKCAP_L1EL    0x00038000 /* L1 Exit Latency */
+#define  PCI_EXP_LNKCAP_CLKPM   0x00040000 /* L1 Clock Power Management */
+#define  PCI_EXP_LNKCAP_SDERC   0x00080000 /* Suprise Down Error Reporting Capable */
+#define  PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */
+#define  PCI_EXP_LNKCAP_LBNC    0x00200000 /* Link Bandwidth Notification Capability */
+#define  PCI_EXP_LNKCAP_PN      0xff000000 /* Port Number */
+#define PCI_EXP_LNKCTL          16      /* Link Control */
+#define  PCI_EXP_LNKCTL_ASPMC   0x0003  /* ASPM Control */
+#define  PCI_EXP_LNKCTL_RCB     0x0008  /* Read Completion Boundary */
+#define  PCI_EXP_LNKCTL_LD      0x0010  /* Link Disable */
+#define  PCI_EXP_LNKCTL_RL      0x0020  /* Retrain Link */
+#define  PCI_EXP_LNKCTL_CCC     0x0040  /* Common Clock Configuration */
+#define  PCI_EXP_LNKCTL_ES      0x0080  /* Extended Synch */
+#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100 /* Enable clkreq */
+#define  PCI_EXP_LNKCTL_HAWD    0x0200  /* Hardware Autonomous Width Disable */
+#define  PCI_EXP_LNKCTL_LBMIE   0x0400  /* Link Bandwidth Management Interrupt Enable */
+#define  PCI_EXP_LNKCTL_LABIE   0x0800  /* Lnk Autonomous Bandwidth Interrupt Enable */
+#define PCI_EXP_LNKSTA          18      /* Link Status */
+#define  PCI_EXP_LNKSTA_CLS     0x000f  /* Current Link Speed */
+#define  PCI_EXP_LNKSTA_NLW     0x03f0  /* Nogotiated Link Width */
+#define  PCI_EXP_LNKSTA_LT      0x0800  /* Link Training */
+#define  PCI_EXP_LNKSTA_SLC     0x1000  /* Slot Clock Configuration */
+#define  PCI_EXP_LNKSTA_DLLLA   0x2000  /* Data Link Layer Link Active */
+#define  PCI_EXP_LNKSTA_LBMS    0x4000  /* Link Bandwidth Management Status */
+#define  PCI_EXP_LNKSTA_LABS    0x8000  /* Link Autonomous Bandwidth Status */
+#define PCI_EXP_SLTCAP          20      /* Slot Capabilities */
+#define  PCI_EXP_SLTCAP_ABP     0x00000001 /* Attention Button Present */
+#define  PCI_EXP_SLTCAP_PCP     0x00000002 /* Power Controller Present */
+#define  PCI_EXP_SLTCAP_MRLSP   0x00000004 /* MRL Sensor Present */
+#define  PCI_EXP_SLTCAP_AIP     0x00000008 /* Attention Indicator Present */
+#define  PCI_EXP_SLTCAP_PIP     0x00000010 /* Power Indicator Present */
+#define  PCI_EXP_SLTCAP_HPS     0x00000020 /* Hot-Plug Surprise */
+#define  PCI_EXP_SLTCAP_HPC     0x00000040 /* Hot-Plug Capable */
+#define  PCI_EXP_SLTCAP_SPLV    0x00007f80 /* Slot Power Limit Value */
+#define  PCI_EXP_SLTCAP_SPLS    0x00018000 /* Slot Power Limit Scale */
+#define  PCI_EXP_SLTCAP_EIP     0x00020000 /* Electromechanical Interlock Present */
+#define  PCI_EXP_SLTCAP_NCCS    0x00040000 /* No Command Completed Support */
+#define  PCI_EXP_SLTCAP_PSN     0xfff80000 /* Physical Slot Number */
+#define PCI_EXP_SLTCTL          24      /* Slot Control */
+#define  PCI_EXP_SLTCTL_ABPE    0x0001  /* Attention Button Pressed Enable */
+#define  PCI_EXP_SLTCTL_PFDE    0x0002  /* Power Fault Detected Enable */
+#define  PCI_EXP_SLTCTL_MRLSCE  0x0004  /* MRL Sensor Changed Enable */
+#define  PCI_EXP_SLTCTL_PDCE    0x0008  /* Presence Detect Changed Enable */
+#define  PCI_EXP_SLTCTL_CCIE    0x0010  /* Command Completed Interrupt Enable */
+#define  PCI_EXP_SLTCTL_HPIE    0x0020  /* Hot-Plug Interrupt Enable */
+#define  PCI_EXP_SLTCTL_AIC     0x00c0  /* Attention Indicator Control */
+#define  PCI_EXP_SLTCTL_PIC     0x0300  /* Power Indicator Control */
+#define  PCI_EXP_SLTCTL_PCC     0x0400  /* Power Controller Control */
+#define  PCI_EXP_SLTCTL_EIC     0x0800  /* Electromechanical Interlock Control */
+#define  PCI_EXP_SLTCTL_DLLSCE  0x1000  /* Data Link Layer State Changed Enable */
+#define PCI_EXP_SLTSTA          26      /* Slot Status */
+#define  PCI_EXP_SLTSTA_ABP     0x0001  /* Attention Button Pressed */
+#define  PCI_EXP_SLTSTA_PFD     0x0002  /* Power Fault Detected */
+#define  PCI_EXP_SLTSTA_MRLSC   0x0004  /* MRL Sensor Changed */
+#define  PCI_EXP_SLTSTA_PDC     0x0008  /* Presence Detect Changed */
+#define  PCI_EXP_SLTSTA_CC      0x0010  /* Command Completed */
+#define  PCI_EXP_SLTSTA_MRLSS   0x0020  /* MRL Sensor State */
+#define  PCI_EXP_SLTSTA_PDS     0x0040  /* Presence Detect State */
+#define  PCI_EXP_SLTSTA_EIS     0x0080  /* Electromechanical Interlock Status */
+#define  PCI_EXP_SLTSTA_DLLSC   0x0100  /* Data Link Layer State Changed */
+#define PCI_EXP_RTCTL           28      /* Root Control */
+#define  PCI_EXP_RTCTL_SECEE    0x01    /* System Error on Correctable Error */
+#define  PCI_EXP_RTCTL_SENFEE   0x02    /* System Error on Non-Fatal Error */
+#define  PCI_EXP_RTCTL_SEFEE    0x04    /* System Error on Fatal Error */
+#define  PCI_EXP_RTCTL_PMEIE    0x08    /* PME Interrupt Enable */
+#define  PCI_EXP_RTCTL_CRSSVE   0x10    /* CRS Software Visibility Enable */
+#define PCI_EXP_RTCAP           30      /* Root Capabilities */
+#define PCI_EXP_RTSTA           32      /* Root Status */
+#define PCI_EXP_DEVCAP2         36      /* Device Capabilities 2 */
+#define  PCI_EXP_DEVCAP2_ARI    0x20    /* Alternative Routing-ID */
+#define PCI_EXP_DEVCTL2         40      /* Device Control 2 */
+#define  PCI_EXP_DEVCTL2_ARI    0x20    /* Alternative Routing-ID */
+#define PCI_EXP_LNKCTL2         48      /* Link Control 2 */
+#define PCI_EXP_SLTCTL2         56      /* Slot Control 2 */
+
+/* Extended Capabilities (PCI-X 2.0 and Express) */
+#define PCI_EXT_CAP_ID(header)          (header & 0x0000ffff)
+#define PCI_EXT_CAP_VER(header)         ((header >> 16) & 0xf)
+#define PCI_EXT_CAP_NEXT(header)        ((header >> 20) & 0xffc)
+
+#define PCI_EXT_CAP_ID_ERR      1
+#define PCI_EXT_CAP_ID_VC       2
+#define PCI_EXT_CAP_ID_DSN      3
+#define PCI_EXT_CAP_ID_PWR      4
+#define PCI_EXT_CAP_ID_ARI      14
+#define PCI_EXT_CAP_ID_ATS      15
+#define PCI_EXT_CAP_ID_SRIOV    16
+
+/* Advanced Error Reporting */
+#define PCI_ERR_UNCOR_STATUS    4       /* Uncorrectable Error Status */
+#define  PCI_ERR_UNC_TRAIN      0x00000001      /* Training */
+#define  PCI_ERR_UNC_DLP        0x00000010      /* Data Link Protocol */
+#define  PCI_ERR_UNC_POISON_TLP 0x00001000      /* Poisoned TLP */
+#define  PCI_ERR_UNC_FCP        0x00002000      /* Flow Control Protocol */
+#define  PCI_ERR_UNC_COMP_TIME  0x00004000      /* Completion Timeout */
+#define  PCI_ERR_UNC_COMP_ABORT 0x00008000      /* Completer Abort */
+#define  PCI_ERR_UNC_UNX_COMP   0x00010000      /* Unexpected Completion */
+#define  PCI_ERR_UNC_RX_OVER    0x00020000      /* Receiver Overflow */
+#define  PCI_ERR_UNC_MALF_TLP   0x00040000      /* Malformed TLP */
+#define  PCI_ERR_UNC_ECRC       0x00080000      /* ECRC Error Status */
+#define  PCI_ERR_UNC_UNSUP      0x00100000      /* Unsupported Request */
+#define PCI_ERR_UNCOR_MASK      8       /* Uncorrectable Error Mask */
+        /* Same bits as above */
+#define PCI_ERR_UNCOR_SEVER     12      /* Uncorrectable Error Severity */
+        /* Same bits as above */
+#define PCI_ERR_COR_STATUS      16      /* Correctable Error Status */
+#define  PCI_ERR_COR_RCVR       0x00000001      /* Receiver Error Status */
+#define  PCI_ERR_COR_BAD_TLP    0x00000040      /* Bad TLP Status */
+#define  PCI_ERR_COR_BAD_DLLP   0x00000080      /* Bad DLLP Status */
+#define  PCI_ERR_COR_REP_ROLL   0x00000100      /* REPLAY_NUM Rollover */
+#define  PCI_ERR_COR_REP_TIMER  0x00001000      /* Replay Timer Timeout */
+#define PCI_ERR_COR_MASK        20      /* Correctable Error Mask */
+        /* Same bits as above */
+#define PCI_ERR_CAP             24      /* Advanced Error Capabilities */
+#define  PCI_ERR_CAP_FEP(x)     ((x) & 31)      /* First Error Pointer */
+#define  PCI_ERR_CAP_ECRC_GENC  0x00000020      /* ECRC Generation Capable */
+#define  PCI_ERR_CAP_ECRC_GENE  0x00000040      /* ECRC Generation Enable */
+#define  PCI_ERR_CAP_ECRC_CHKC  0x00000080      /* ECRC Check Capable */
+#define  PCI_ERR_CAP_ECRC_CHKE  0x00000100      /* ECRC Check Enable */
+#define PCI_ERR_HEADER_LOG      28      /* Header Log Register (16 bytes) */
+#define PCI_ERR_ROOT_COMMAND    44      /* Root Error Command */
+/* Correctable Err Reporting Enable */
+#define PCI_ERR_ROOT_CMD_COR_EN         0x00000001
+/* Non-fatal Err Reporting Enable */
+#define PCI_ERR_ROOT_CMD_NONFATAL_EN    0x00000002
+/* Fatal Err Reporting Enable */
+#define PCI_ERR_ROOT_CMD_FATAL_EN       0x00000004
+#define PCI_ERR_ROOT_STATUS     48
+#define PCI_ERR_ROOT_COR_RCV            0x00000001      /* ERR_COR Received */
+/* Multi ERR_COR Received */
+#define PCI_ERR_ROOT_MULTI_COR_RCV      0x00000002
+/* ERR_FATAL/NONFATAL Recevied */
+#define PCI_ERR_ROOT_UNCOR_RCV          0x00000004
+/* Multi ERR_FATAL/NONFATAL Recevied */
+#define PCI_ERR_ROOT_MULTI_UNCOR_RCV    0x00000008
+#define PCI_ERR_ROOT_FIRST_FATAL        0x00000010      /* First Fatal */
+#define PCI_ERR_ROOT_NONFATAL_RCV       0x00000020      /* Non-Fatal Received */
+#define PCI_ERR_ROOT_FATAL_RCV          0x00000040      /* Fatal Received */
+#define PCI_ERR_ROOT_COR_SRC    52
+#define PCI_ERR_ROOT_SRC        54
+
+/* Virtual Channel */
+#define PCI_VC_PORT_REG1        4
+#define PCI_VC_PORT_REG2        8
+#define PCI_VC_PORT_CTRL        12
+#define PCI_VC_PORT_STATUS      14
+#define PCI_VC_RES_CAP          16
+#define PCI_VC_RES_CTRL         20
+#define PCI_VC_RES_STATUS       26
+
+/* Power Budgeting */
+#define PCI_PWR_DSR             4       /* Data Select Register */
+#define PCI_PWR_DATA            8       /* Data Register */
+#define  PCI_PWR_DATA_BASE(x)   ((x) & 0xff)        /* Base Power */
+#define  PCI_PWR_DATA_SCALE(x)  (((x) >> 8) & 3)    /* Data Scale */
+#define  PCI_PWR_DATA_PM_SUB(x) (((x) >> 10) & 7)   /* PM Sub State */
+#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
+#define  PCI_PWR_DATA_TYPE(x)   (((x) >> 15) & 7)   /* Type */
+#define  PCI_PWR_DATA_RAIL(x)   (((x) >> 18) & 7)   /* Power Rail */
+#define PCI_PWR_CAP             12      /* Capability */
+#define  PCI_PWR_CAP_BUDGET(x)  ((x) & 1)       /* Included in system budget */
+
+/*
+ * Hypertransport sub capability types
+ *
+ * Unfortunately there are both 3 bit and 5 bit capability types defined
+ * in the HT spec, catering for that is a little messy. You probably don't
+ * want to use these directly, just use pci_find_ht_capability() and it
+ * will do the right thing for you.
+ */
+#define HT_3BIT_CAP_MASK        0xE0
+#define HT_CAPTYPE_SLAVE        0x00    /* Slave/Primary link configuration */
+#define HT_CAPTYPE_HOST         0x20    /* Host/Secondary link configuration */
+
+#define HT_5BIT_CAP_MASK        0xF8
+#define HT_CAPTYPE_IRQ          0x80    /* IRQ Configuration */
+#define HT_CAPTYPE_REMAPPING_40 0xA0    /* 40 bit address remapping */
+#define HT_CAPTYPE_REMAPPING_64 0xA2    /* 64 bit address remapping */
+#define HT_CAPTYPE_UNITID_CLUMP 0x90    /* Unit ID clumping */
+#define HT_CAPTYPE_EXTCONF      0x98    /* Extended Configuration Space Access */
+#define HT_CAPTYPE_MSI_MAPPING  0xA8    /* MSI Mapping Capability */
+#define  HT_MSI_FLAGS           0x02            /* Offset to flags */
+#define  HT_MSI_FLAGS_ENABLE    0x1             /* Mapping enable */
+#define  HT_MSI_FLAGS_FIXED     0x2             /* Fixed mapping only */
+#define  HT_MSI_FIXED_ADDR      0x00000000FEE00000ULL   /* Fixed addr */
+#define  HT_MSI_ADDR_LO         0x04            /* Offset to low addr bits */
+#define  HT_MSI_ADDR_LO_MASK    0xFFF00000      /* Low address bit mask */
+#define  HT_MSI_ADDR_HI         0x08            /* Offset to high addr bits */
+#define HT_CAPTYPE_DIRECT_ROUTE 0xB0    /* Direct routing configuration */
+#define HT_CAPTYPE_VCSET        0xB8    /* Virtual Channel configuration */
+#define HT_CAPTYPE_ERROR_RETRY  0xC0    /* Retry on error configuration */
+#define HT_CAPTYPE_GEN3         0xD0    /* Generation 3 hypertransport configuration */
+#define HT_CAPTYPE_PM           0xE0    /* Hypertransport powermanagement configuration */
+
+/* Alternative Routing-ID Interpretation */
+#define PCI_ARI_CAP             0x04    /* ARI Capability Register */
+#define  PCI_ARI_CAP_MFVC       0x0001  /* MFVC Function Groups Capability */
+#define  PCI_ARI_CAP_ACS        0x0002  /* ACS Function Groups Capability */
+#define  PCI_ARI_CAP_NFN(x)     (((x) >> 8) & 0xff) /* Next Function Number */
+#define PCI_ARI_CTRL            0x06    /* ARI Control Register */
+#define  PCI_ARI_CTRL_MFVC      0x0001  /* MFVC Function Groups Enable */
+#define  PCI_ARI_CTRL_ACS       0x0002  /* ACS Function Groups Enable */
+#define  PCI_ARI_CTRL_FG(x)     (((x) >> 4) & 7) /* Function Group */
+
+/* Address Translation Service */
+#define PCI_ATS_CAP             0x04    /* ATS Capability Register */
+#define  PCI_ATS_CAP_QDEP(x)    ((x) & 0x1f)    /* Invalidate Queue Depth */
+#define  PCI_ATS_MAX_QDEP       32      /* Max Invalidate Queue Depth */
+#define PCI_ATS_CTRL            0x06    /* ATS Control Register */
+#define  PCI_ATS_CTRL_ENABLE    0x8000  /* ATS Enable */
+#define  PCI_ATS_CTRL_STU(x)    ((x) & 0x1f)    /* Smallest Translation Unit */
+#define  PCI_ATS_MIN_STU        12      /* shift of minimum STU block */
+
+/* Single Root I/O Virtualization */
+#define PCI_SRIOV_CAP           0x04    /* SR-IOV Capabilities */
+#define  PCI_SRIOV_CAP_VFM      0x01    /* VF Migration Capable */
+#define  PCI_SRIOV_CAP_INTR(x)  ((x) >> 21) /* Interrupt Message Number */
+#define PCI_SRIOV_CTRL          0x08    /* SR-IOV Control */
+#define  PCI_SRIOV_CTRL_VFE     0x01    /* VF Enable */
+#define  PCI_SRIOV_CTRL_VFM     0x02    /* VF Migration Enable */
+#define  PCI_SRIOV_CTRL_INTR    0x04    /* VF Migration Interrupt Enable */
+#define  PCI_SRIOV_CTRL_MSE     0x08    /* VF Memory Space Enable */
+#define  PCI_SRIOV_CTRL_ARI     0x10    /* ARI Capable Hierarchy */
+#define PCI_SRIOV_STATUS        0x0a    /* SR-IOV Status */
+#define  PCI_SRIOV_STATUS_VFM   0x01    /* VF Migration Status */
+#define PCI_SRIOV_INITIAL_VF    0x0c    /* Initial VFs */
+#define PCI_SRIOV_TOTAL_VF      0x0e    /* Total VFs */
+#define PCI_SRIOV_NUM_VF        0x10    /* Number of VFs */
+#define PCI_SRIOV_FUNC_LINK     0x12    /* Function Dependency Link */
+#define PCI_SRIOV_VF_OFFSET     0x14    /* First VF Offset */
+#define PCI_SRIOV_VF_STRIDE     0x16    /* Following VF Stride */
+#define PCI_SRIOV_VF_DID        0x1a    /* VF Device ID */
+#define PCI_SRIOV_SUP_PGSIZE    0x1c    /* Supported Page Sizes */
+#define PCI_SRIOV_SYS_PGSIZE    0x20    /* System Page Size */
+#define PCI_SRIOV_BAR           0x24    /* VF BAR0 */
+#define  PCI_SRIOV_NUM_BARS     6       /* Number of VF BARs */
+#define PCI_SRIOV_VFM           0x3c    /* VF Migration State Array Offset*/
+#define  PCI_SRIOV_VFM_BIR(x)   ((x) & 7)       /* State BIR */
+#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)     /* State Offset */
+#define  PCI_SRIOV_VFM_UA       0x0     /* Inactive.Unavailable */
+#define  PCI_SRIOV_VFM_MI       0x1     /* Dormant.MigrateIn */
+#define  PCI_SRIOV_VFM_MO       0x2     /* Active.MigrateOut */
+#define  PCI_SRIOV_VFM_AV       0x3     /* Active.Available */
+
+#endif /* LINUX_PCI_REGS_H */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h           |   74 +++++++++++++++++++++
 hw/pci_internals.h |   12 ++++
 qemu-common.h      |    1 +
 4 files changed, 271 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..b460905 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
     assert(PCI_FUNC(devfn_min) == 0);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min  = devfn_min;
+    bus->iommu      = NULL;
+    bus->translate  = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
     return strdup(path);
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return;
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err)
+        return NULL;
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len)
+        *len = plen;
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb)
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
+
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..3131016 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
     char *romfile;
     ram_addr_t rom_offset;
     uint32_t rom_bar;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+extern void pci_memory_rw(PCIDevice *dev,
+                          pcibus_t addr,
+                          uint8_t *buf,
+                          pcibus_t len,
+                          int is_write);
+extern void *pci_memory_map(PCIDevice *dev,
+                            PCIInvalidateMapFunc *cb,
+                            void *opaque,
+                            pcibus_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+extern void pci_memory_unmap(PCIDevice *dev,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+extern void pci_register_iommu(PCIDevice *dev,
+                               PCITranslateFunc *translate);
+extern void pci_memory_invalidate_range(PCIDevice *dev,
+                                        pcibus_t addr,
+                                        pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+extern void pci_st##suffix(PCIDevice *dev,                              \
+                           pcibus_t addr,                               \
+                           uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index e3c93a3..fb134b9 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -33,6 +33,9 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
 };
 
 struct PCIBridge {
@@ -44,4 +47,13 @@ struct PCIBridge {
     const char *bus_name;
 };
 
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
+};
+
 #endif /* QEMU_PCI_INTERNALS_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..8b060e8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h           |   74 +++++++++++++++++++++
 hw/pci_internals.h |   12 ++++
 qemu-common.h      |    1 +
 4 files changed, 271 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..b460905 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
     assert(PCI_FUNC(devfn_min) == 0);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min  = devfn_min;
+    bus->iommu      = NULL;
+    bus->translate  = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
     return strdup(path);
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return;
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err)
+        return NULL;
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len)
+        *len = plen;
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb)
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
+
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..3131016 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
     char *romfile;
     ram_addr_t rom_offset;
     uint32_t rom_bar;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+extern void pci_memory_rw(PCIDevice *dev,
+                          pcibus_t addr,
+                          uint8_t *buf,
+                          pcibus_t len,
+                          int is_write);
+extern void *pci_memory_map(PCIDevice *dev,
+                            PCIInvalidateMapFunc *cb,
+                            void *opaque,
+                            pcibus_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+extern void pci_memory_unmap(PCIDevice *dev,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+extern void pci_register_iommu(PCIDevice *dev,
+                               PCITranslateFunc *translate);
+extern void pci_memory_invalidate_range(PCIDevice *dev,
+                                        pcibus_t addr,
+                                        pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+extern void pci_st##suffix(PCIDevice *dev,                              \
+                           pcibus_t addr,                               \
+                           uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index e3c93a3..fb134b9 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -33,6 +33,9 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
 };
 
 struct PCIBridge {
@@ -44,4 +47,13 @@ struct PCIBridge {
     const char *bus_name;
 };
 
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
+};
+
 #endif /* QEMU_PCI_INTERNALS_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..8b060e8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 3/7] AMD IOMMU emulation
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +-
 hw/amd_iommu.c  |  663 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 5 files changed, 669 insertions(+), 1 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 3ef4666..d4eeccd 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -195,7 +195,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
-obj-i386-y += pc_piix.o
+obj-i386-y += pc_piix.o amd_iommu.o
 
 # shared objects
 obj-ppc-y = ppc.o
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..43e0426
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,663 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+#define CAPAB_REG_SIZE          0x04
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+/* Event codes and flags, as stored in the info field */
+#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
+#define EVENT_IOPF                  (0x2U << 24)
+#define   EVENT_IOPF_I              (1U << 3)
+#define   EVENT_IOPF_PR             (1U << 4)
+#define   EVENT_IOPF_RW             (1U << 5)
+#define   EVENT_IOPF_PE             (1U << 6)
+#define   EVENT_IOPF_RZ             (1U << 7)
+#define   EVENT_IOPF_TR             (1U << 8)
+#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
+#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
+#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
+#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
+#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
+#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
+
+#define EVENT_LEN                   16
+
+typedef struct AMDIOMMUState {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    target_phys_addr_t          evtlog_len;
+    target_phys_addr_t          evtlog_head;
+    target_phys_addr_t          evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+} AMDIOMMUState;
+
+typedef struct AMDIOMMUEvent {
+    uint16_t    devfn;
+    uint16_t    reserved;
+    uint16_t    domid;
+    uint16_t    info;
+    uint64_t    addr;
+} __attribute__((packed)) AMDIOMMUEvent;
+
+static void amd_iommu_completion_wait(AMDIOMMUState *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
+                                       uint8_t *cmd)
+{
+    PCIDevice *dev;
+    PCIBus *bus = st->dev.bus;
+    int bus_num = pci_bus_num(bus);
+    int devfn = *(uint16_t *) cmd;
+
+    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
+    if (dev) {
+        pci_memory_invalidate_range(dev, 0, -1);
+    }
+}
+
+static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    if (!st->cmdbuf_enabled) {
+        return;
+    }
+
+    /* Check if there's work to do. */
+    if (st->cmdbuf_head == st->cmdbuf_tail) {
+        return;
+    }
+
+    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            amd_iommu_invalidate_iotlb(st, cmd);
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+
+    /* Increment and wrap head pointer. */
+    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+    if (st->cmdbuf_head >= st->cmdbuf_len) {
+        st->cmdbuf_head = 0;
+    }
+}
+
+static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(AMDIOMMUState *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = le64_to_cpu(*base);
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_CMDBUFEN);
+            
+            /* Update status flags depending on the control register. */
+            if (st->cmdbuf_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
+            }
+            if (st->evtlog_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
+            }
+
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_enable_mmio(AMDIOMMUState *st)
+{
+    target_phys_addr_t addr;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0) {
+        return;
+    }
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_enabled = 1;
+
+    /* Further changes to the capability are prohibited. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_default_write_config(dev, addr, val, len);
+
+    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
+        amd_iommu_enable_mmio(st);
+    }
+}
+
+static void amd_iommu_reset(DeviceState *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
+    unsigned char *capab = st->capab;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->enabled      = 0;
+    st->ats_enabled  = 0;
+    st->mmio_enabled = 0;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    /* Changes to the capability are allowed after system reset. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
+
+    memset(st->mmio_buf, 0, MMIO_SIZE);
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
+{
+    if (!st->evtlog_enabled ||
+        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
+        return;
+    }
+
+    if (st->evtlog_tail >= st->evtlog_len) {
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
+    }
+
+    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
+                              (uint8_t *) evt, EVENT_LEN);
+
+    st->evtlog_tail += EVENT_LEN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
+}
+
+static void amd_iommu_page_fault(AMDIOMMUState *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    AMDIOMMUEvent evt;
+    unsigned info;
+
+    evt.devfn = cpu_to_le16(devfn);
+    evt.reserved = 0;
+    evt.domid = cpu_to_le16(domid);
+    evt.addr = cpu_to_le64(addr);
+
+    info = EVENT_IOPF;
+    if (present) {
+        info |= EVENT_IOPF_PR;
+    }
+    if (is_write) {
+        info |= EVENT_IOPF_RW;
+    }
+    evt.info = cpu_to_le16(info);
+
+    amd_iommu_log_event(st, &evt);
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static int amd_iommu_translate(PCIDevice *iommu,
+                               PCIDevice *dev,
+                               pcibus_t addr,
+                               target_phys_addr_t *paddr,
+                               target_phys_addr_t *len,
+                               unsigned perms)
+{
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
+
+    if (!st->enabled) {
+        goto no_translation;
+    }
+
+    /* Get device table entry. */
+    devfn = dev->devfn;
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    *paddr = addr;
+    *len = -1;
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    /* Secure Device capability */
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    st->capab = st->dev.config + st->capab_offset;
+    dev->config_write = amd_iommu_write_capab;
+
+    /* Allocate backing space for the MMIO registers. */
+    st->mmio_buf = qemu_malloc(MMIO_SIZE);
+
+    pci_register_iommu(dev, amd_iommu_translate);
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(AMDIOMMUState),
+    .qdev.reset   = amd_iommu_reset,
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
diff --git a/hw/pc.c b/hw/pc.c
index a96187f..e2456b0 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    pci_create_simple(pci_bus, -1, "amd-iommu");
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..d790312 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 0f9f84c..6695e41 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -209,6 +209,7 @@
 #define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC         0x0F    /* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
 #define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
 #define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +-
 hw/amd_iommu.c  |  663 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 5 files changed, 669 insertions(+), 1 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 3ef4666..d4eeccd 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -195,7 +195,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
-obj-i386-y += pc_piix.o
+obj-i386-y += pc_piix.o amd_iommu.o
 
 # shared objects
 obj-ppc-y = ppc.o
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..43e0426
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,663 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+#define CAPAB_REG_SIZE          0x04
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+/* Event codes and flags, as stored in the info field */
+#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
+#define EVENT_IOPF                  (0x2U << 24)
+#define   EVENT_IOPF_I              (1U << 3)
+#define   EVENT_IOPF_PR             (1U << 4)
+#define   EVENT_IOPF_RW             (1U << 5)
+#define   EVENT_IOPF_PE             (1U << 6)
+#define   EVENT_IOPF_RZ             (1U << 7)
+#define   EVENT_IOPF_TR             (1U << 8)
+#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
+#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
+#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
+#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
+#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
+#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
+
+#define EVENT_LEN                   16
+
+typedef struct AMDIOMMUState {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    target_phys_addr_t          evtlog_len;
+    target_phys_addr_t          evtlog_head;
+    target_phys_addr_t          evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+} AMDIOMMUState;
+
+typedef struct AMDIOMMUEvent {
+    uint16_t    devfn;
+    uint16_t    reserved;
+    uint16_t    domid;
+    uint16_t    info;
+    uint64_t    addr;
+} __attribute__((packed)) AMDIOMMUEvent;
+
+static void amd_iommu_completion_wait(AMDIOMMUState *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
+                                       uint8_t *cmd)
+{
+    PCIDevice *dev;
+    PCIBus *bus = st->dev.bus;
+    int bus_num = pci_bus_num(bus);
+    int devfn = *(uint16_t *) cmd;
+
+    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
+    if (dev) {
+        pci_memory_invalidate_range(dev, 0, -1);
+    }
+}
+
+static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    if (!st->cmdbuf_enabled) {
+        return;
+    }
+
+    /* Check if there's work to do. */
+    if (st->cmdbuf_head == st->cmdbuf_tail) {
+        return;
+    }
+
+    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            amd_iommu_invalidate_iotlb(st, cmd);
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+
+    /* Increment and wrap head pointer. */
+    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+    if (st->cmdbuf_head >= st->cmdbuf_len) {
+        st->cmdbuf_head = 0;
+    }
+}
+
+static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(AMDIOMMUState *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = le64_to_cpu(*base);
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_CMDBUFEN);
+            
+            /* Update status flags depending on the control register. */
+            if (st->cmdbuf_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
+            }
+            if (st->evtlog_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
+            }
+
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_enable_mmio(AMDIOMMUState *st)
+{
+    target_phys_addr_t addr;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0) {
+        return;
+    }
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_enabled = 1;
+
+    /* Further changes to the capability are prohibited. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_default_write_config(dev, addr, val, len);
+
+    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
+        amd_iommu_enable_mmio(st);
+    }
+}
+
+static void amd_iommu_reset(DeviceState *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
+    unsigned char *capab = st->capab;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->enabled      = 0;
+    st->ats_enabled  = 0;
+    st->mmio_enabled = 0;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    /* Changes to the capability are allowed after system reset. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
+
+    memset(st->mmio_buf, 0, MMIO_SIZE);
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
+{
+    if (!st->evtlog_enabled ||
+        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
+        return;
+    }
+
+    if (st->evtlog_tail >= st->evtlog_len) {
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
+    }
+
+    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
+                              (uint8_t *) evt, EVENT_LEN);
+
+    st->evtlog_tail += EVENT_LEN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
+}
+
+static void amd_iommu_page_fault(AMDIOMMUState *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    AMDIOMMUEvent evt;
+    unsigned info;
+
+    evt.devfn = cpu_to_le16(devfn);
+    evt.reserved = 0;
+    evt.domid = cpu_to_le16(domid);
+    evt.addr = cpu_to_le64(addr);
+
+    info = EVENT_IOPF;
+    if (present) {
+        info |= EVENT_IOPF_PR;
+    }
+    if (is_write) {
+        info |= EVENT_IOPF_RW;
+    }
+    evt.info = cpu_to_le16(info);
+
+    amd_iommu_log_event(st, &evt);
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static int amd_iommu_translate(PCIDevice *iommu,
+                               PCIDevice *dev,
+                               pcibus_t addr,
+                               target_phys_addr_t *paddr,
+                               target_phys_addr_t *len,
+                               unsigned perms)
+{
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
+
+    if (!st->enabled) {
+        goto no_translation;
+    }
+
+    /* Get device table entry. */
+    devfn = dev->devfn;
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    *paddr = addr;
+    *len = -1;
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    /* Secure Device capability */
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    st->capab = st->dev.config + st->capab_offset;
+    dev->config_write = amd_iommu_write_capab;
+
+    /* Allocate backing space for the MMIO registers. */
+    st->mmio_buf = qemu_malloc(MMIO_SIZE);
+
+    pci_register_iommu(dev, amd_iommu_translate);
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(AMDIOMMUState),
+    .qdev.reset   = amd_iommu_reset,
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
diff --git a/hw/pc.c b/hw/pc.c
index a96187f..e2456b0 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    pci_create_simple(pci_bus, -1, "amd-iommu");
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..d790312 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 0f9f84c..6695e41 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -209,6 +209,7 @@
 #define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC         0x0F    /* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
 #define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
 #define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 4/7] ide: use the PCI memory access interface
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 dma-helpers.c     |   46 +++++++++++++++++++++++++++++++++++++++++-----
 dma.h             |   21 ++++++++++++++++++++-
 hw/ide/core.c     |   15 ++++++++-------
 hw/ide/internal.h |   39 +++++++++++++++++++++++++++++++++++++++
 hw/ide/macio.c    |    4 ++--
 hw/ide/pci.c      |    7 +++++++
 6 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 712ed89..a0dcdb8 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,36 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+static void *qemu_sglist_default_map(void *opaque,
+                                     QEMUSGInvalMapFunc *inval_cb,
+                                     void *inval_opaque,
+                                     target_phys_addr_t addr,
+                                     target_phys_addr_t *len,
+                                     int is_write)
+{
+    return cpu_physical_memory_map(addr, len, is_write);
+}
+
+static void qemu_sglist_default_unmap(void *opaque,
+                                      void *buffer,
+                                      target_phys_addr_t len,
+                                      int is_write,
+                                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
 {
     qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+
+    qsg->map = map ? map : qemu_sglist_default_map;
+    qsg->unmap = unmap ? unmap : qemu_sglist_default_unmap;
+    qsg->opaque = opaque;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
-                                  dbs->iov.iov[i].iov_len);
+        dbs->sg->unmap(dbs->sg->opaque,
+                       dbs->iov.iov[i].iov_base,
+                       dbs->iov.iov[i].iov_len, !dbs->is_write,
+                       dbs->iov.iov[i].iov_len);
     }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+    DMAAIOCB *dbs = opaque;
+
+    bdrv_aio_cancel(dbs->acb);
+    dma_bdrv_unmap(dbs);
+    qemu_iovec_destroy(&dbs->iov);
+    qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+        mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
+                           cur_addr, &cur_len, !dbs->is_write);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..d48f35c 100644
--- a/dma.h
+++ b/dma.h
@@ -15,6 +15,19 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef void QEMUSGInvalMapFunc(void *opaque);
+typedef void *QEMUSGMapFunc(void *opaque,
+                            QEMUSGInvalMapFunc *inval_cb,
+                            void *inval_opaque,
+                            target_phys_addr_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+typedef void QEMUSGUnmapFunc(void *opaque,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+
 typedef struct {
     target_phys_addr_t base;
     target_phys_addr_t len;
@@ -25,9 +38,15 @@ typedef struct {
     int nsg;
     int nalloc;
     target_phys_addr_t size;
+
+    QEMUSGMapFunc *map;
+    QEMUSGUnmapFunc *unmap;
+    void *opaque;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
+                      void *opaque);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index af52c2c..024a125 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -436,7 +436,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
     } prd;
     int l, len;
 
-    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
+    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
+                     bm->map, bm->unmap, bm->opaque);
     s->io_buffer_size = 0;
     for(;;) {
         if (bm->cur_prd_len == 0) {
@@ -444,7 +445,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -527,7 +528,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -542,11 +543,11 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             l = bm->cur_prd_len;
         if (l > 0) {
             if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_write(bm, bm->cur_prd_addr,
+                                   s->io_buffer + s->io_buffer_index, l);
             } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_read(bm, bm->cur_prd_addr,
+                                  s->io_buffer + s->io_buffer_index, l);
             }
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 4165543..f686d38 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -477,6 +477,24 @@ struct IDEDeviceInfo {
 #define BM_CMD_START     0x01
 #define BM_CMD_READ      0x08
 
+typedef void BMDMAInvalMapFunc(void *opaque);
+typedef void BMDMARWFunc(void *opaque,
+                         target_phys_addr_t addr,
+                         uint8_t *buf,
+                         target_phys_addr_t len,
+                         int is_write);
+typedef void *BMDMAMapFunc(void *opaque,
+                           BMDMAInvalMapFunc *inval_cb,
+                           void *inval_opaque,
+                           target_phys_addr_t addr,
+                           target_phys_addr_t *len,
+                           int is_write);
+typedef void BMDMAUnmapFunc(void *opaque,
+                            void *buffer,
+                            target_phys_addr_t len,
+                            int is_write,
+                            target_phys_addr_t access_len);
+
 struct BMDMAState {
     uint8_t cmd;
     uint8_t status;
@@ -496,8 +514,29 @@ struct BMDMAState {
     int64_t sector_num;
     uint32_t nsector;
     QEMUBH *bh;
+
+    BMDMARWFunc *rw;
+    BMDMAMapFunc *map;
+    BMDMAUnmapFunc *unmap;
+    void *opaque;
 };
 
+static inline void bmdma_memory_read(BMDMAState *bm,
+                                     target_phys_addr_t addr,
+                                     uint8_t *buf,
+                                     target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 0);
+}
+
+static inline void bmdma_memory_write(BMDMAState *bm,
+                                      target_phys_addr_t addr,
+                                      uint8_t *buf,
+                                      target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 1);
+}
+
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
     return bus->ifs + bus->unit;
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index bd1c73e..962ae13 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
 
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
@@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
     s->io_buffer_index = 0;
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4d95cc5..5879044 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
             continue;
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
+
+    for (i = 0; i < 2; i++) {
+        d->bmdma[i].rw = (void *) pci_memory_rw;
+        d->bmdma[i].map = (void *) pci_memory_map;
+        d->bmdma[i].unmap = (void *) pci_memory_unmap;
+        d->bmdma[i].opaque = dev;
+    }
 }
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 dma-helpers.c     |   46 +++++++++++++++++++++++++++++++++++++++++-----
 dma.h             |   21 ++++++++++++++++++++-
 hw/ide/core.c     |   15 ++++++++-------
 hw/ide/internal.h |   39 +++++++++++++++++++++++++++++++++++++++
 hw/ide/macio.c    |    4 ++--
 hw/ide/pci.c      |    7 +++++++
 6 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 712ed89..a0dcdb8 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,36 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+static void *qemu_sglist_default_map(void *opaque,
+                                     QEMUSGInvalMapFunc *inval_cb,
+                                     void *inval_opaque,
+                                     target_phys_addr_t addr,
+                                     target_phys_addr_t *len,
+                                     int is_write)
+{
+    return cpu_physical_memory_map(addr, len, is_write);
+}
+
+static void qemu_sglist_default_unmap(void *opaque,
+                                      void *buffer,
+                                      target_phys_addr_t len,
+                                      int is_write,
+                                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
 {
     qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+
+    qsg->map = map ? map : qemu_sglist_default_map;
+    qsg->unmap = unmap ? unmap : qemu_sglist_default_unmap;
+    qsg->opaque = opaque;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
-                                  dbs->iov.iov[i].iov_len);
+        dbs->sg->unmap(dbs->sg->opaque,
+                       dbs->iov.iov[i].iov_base,
+                       dbs->iov.iov[i].iov_len, !dbs->is_write,
+                       dbs->iov.iov[i].iov_len);
     }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+    DMAAIOCB *dbs = opaque;
+
+    bdrv_aio_cancel(dbs->acb);
+    dma_bdrv_unmap(dbs);
+    qemu_iovec_destroy(&dbs->iov);
+    qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+        mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
+                           cur_addr, &cur_len, !dbs->is_write);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..d48f35c 100644
--- a/dma.h
+++ b/dma.h
@@ -15,6 +15,19 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef void QEMUSGInvalMapFunc(void *opaque);
+typedef void *QEMUSGMapFunc(void *opaque,
+                            QEMUSGInvalMapFunc *inval_cb,
+                            void *inval_opaque,
+                            target_phys_addr_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+typedef void QEMUSGUnmapFunc(void *opaque,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+
 typedef struct {
     target_phys_addr_t base;
     target_phys_addr_t len;
@@ -25,9 +38,15 @@ typedef struct {
     int nsg;
     int nalloc;
     target_phys_addr_t size;
+
+    QEMUSGMapFunc *map;
+    QEMUSGUnmapFunc *unmap;
+    void *opaque;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
+                      void *opaque);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index af52c2c..024a125 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -436,7 +436,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
     } prd;
     int l, len;
 
-    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
+    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
+                     bm->map, bm->unmap, bm->opaque);
     s->io_buffer_size = 0;
     for(;;) {
         if (bm->cur_prd_len == 0) {
@@ -444,7 +445,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -527,7 +528,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -542,11 +543,11 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             l = bm->cur_prd_len;
         if (l > 0) {
             if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_write(bm, bm->cur_prd_addr,
+                                   s->io_buffer + s->io_buffer_index, l);
             } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_read(bm, bm->cur_prd_addr,
+                                  s->io_buffer + s->io_buffer_index, l);
             }
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 4165543..f686d38 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -477,6 +477,24 @@ struct IDEDeviceInfo {
 #define BM_CMD_START     0x01
 #define BM_CMD_READ      0x08
 
+typedef void BMDMAInvalMapFunc(void *opaque);
+typedef void BMDMARWFunc(void *opaque,
+                         target_phys_addr_t addr,
+                         uint8_t *buf,
+                         target_phys_addr_t len,
+                         int is_write);
+typedef void *BMDMAMapFunc(void *opaque,
+                           BMDMAInvalMapFunc *inval_cb,
+                           void *inval_opaque,
+                           target_phys_addr_t addr,
+                           target_phys_addr_t *len,
+                           int is_write);
+typedef void BMDMAUnmapFunc(void *opaque,
+                            void *buffer,
+                            target_phys_addr_t len,
+                            int is_write,
+                            target_phys_addr_t access_len);
+
 struct BMDMAState {
     uint8_t cmd;
     uint8_t status;
@@ -496,8 +514,29 @@ struct BMDMAState {
     int64_t sector_num;
     uint32_t nsector;
     QEMUBH *bh;
+
+    BMDMARWFunc *rw;
+    BMDMAMapFunc *map;
+    BMDMAUnmapFunc *unmap;
+    void *opaque;
 };
 
+static inline void bmdma_memory_read(BMDMAState *bm,
+                                     target_phys_addr_t addr,
+                                     uint8_t *buf,
+                                     target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 0);
+}
+
+static inline void bmdma_memory_write(BMDMAState *bm,
+                                      target_phys_addr_t addr,
+                                      uint8_t *buf,
+                                      target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 1);
+}
+
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
     return bus->ifs + bus->unit;
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index bd1c73e..962ae13 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
 
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
@@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
     s->io_buffer_index = 0;
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4d95cc5..5879044 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
             continue;
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
+
+    for (i = 0; i < 2; i++) {
+        d->bmdma[i].rw = (void *) pci_memory_rw;
+        d->bmdma[i].map = (void *) pci_memory_map;
+        d->bmdma[i].unmap = (void *) pci_memory_unmap;
+        d->bmdma[i].opaque = dev;
+    }
 }
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 5/7] rtl8139: use the PCI memory access interface
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |   99 ++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index d92981d..32dbff3 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -412,12 +412,6 @@ typedef struct RTL8139TallyCounters
     uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -496,6 +490,14 @@ typedef struct RTL8139State {
 
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -746,6 +748,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    PCIDevice *dev = &s->dev;
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +761,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                                 buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                             buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -774,7 +778,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    pci_memory_write(dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -814,6 +818,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
     RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCIDevice *dev = &s->dev;
     int size = size_;
 
     uint32_t packet_header = 0;
@@ -968,13 +973,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1019,7 +1024,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        pci_memory_write(dev, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1032,7 +1037,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        pci_memory_write(dev, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1086,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1279,50 +1284,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr)
 {
+    PCIDevice *dev = &s->dev;
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 0,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 8,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1758,6 +1767,8 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1780,7 +1791,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    pci_memory_read(dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1886,6 +1897,8 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1911,14 +1924,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2025,7 +2038,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    pci_memory_read(dev, tx_addr,
+                    s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2052,10 +2066,10 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_write(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
-//    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
+//    pci_memory_write(dev, cplus_tx_ring_desc+4,  &val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2364,7 +2378,6 @@ static void rtl8139_transmit(RTL8139State *s)
 
 static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32_t val)
 {
-
     int descriptor = txRegOffset/4;
 
     /* handle C+ transmit mode register configuration */
@@ -2381,7 +2394,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 5/7] rtl8139: use the PCI memory access interface
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |   99 ++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index d92981d..32dbff3 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -412,12 +412,6 @@ typedef struct RTL8139TallyCounters
     uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -496,6 +490,14 @@ typedef struct RTL8139State {
 
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -746,6 +748,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    PCIDevice *dev = &s->dev;
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +761,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                                 buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                             buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -774,7 +778,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    pci_memory_write(dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -814,6 +818,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
     RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCIDevice *dev = &s->dev;
     int size = size_;
 
     uint32_t packet_header = 0;
@@ -968,13 +973,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1019,7 +1024,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        pci_memory_write(dev, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1032,7 +1037,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        pci_memory_write(dev, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1086,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1279,50 +1284,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr)
 {
+    PCIDevice *dev = &s->dev;
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 0,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 8,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1758,6 +1767,8 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1780,7 +1791,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    pci_memory_read(dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1886,6 +1897,8 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1911,14 +1924,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2025,7 +2038,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    pci_memory_read(dev, tx_addr,
+                    s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2052,10 +2066,10 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_write(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
-//    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
+//    pci_memory_write(dev, cplus_tx_ring_desc+4,  &val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2364,7 +2378,6 @@ static void rtl8139_transmit(RTL8139State *s)
 
 static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32_t val)
 {
-
     int descriptor = txRegOffset/4;
 
     /* handle C+ transmit mode register configuration */
@@ -2381,7 +2394,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 6/7] eepro100: use the PCI memory access interface
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |   86 ++++++++++++++++++++++++++++++--------------------------
 1 files changed, 46 insertions(+), 40 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 2b75c8f..5b7d82a 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -306,10 +306,10 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(EEPRO100State * s, pcibus_t addr, uint32_t val)
 {
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    pci_memory_write(&s->dev, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -692,12 +692,12 @@ static void dump_statistics(EEPRO100State * s)
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    pci_memory_write(&s->dev, s->statsaddr,
+                     (uint8_t *) & s->statistics, s->stats_size);
+    stl_le_phys(s, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(s, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(s, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(s, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -707,7 +707,8 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    pci_memory_read(&s->dev,
+                    s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -737,18 +738,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+        uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        pci_memory_read(&s->dev,
+                        tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -759,16 +760,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+                uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+                uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                pci_memory_read(&s->dev,
+                                tx_buffer_address, &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -777,16 +778,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+            uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+            uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            pci_memory_read(&s->dev,
+                            tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -811,7 +812,7 @@ static void set_multicast_list(EEPRO100State *s)
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        pci_memory_read(&s->dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -845,12 +846,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8,
+                            &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n", nic_dump(&s->configuration[0], 16)));
             break;
         case CmdMulticastList:
@@ -880,7 +883,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        pci_stw(&s->dev, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -947,7 +950,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -958,7 +961,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1259,10 +1262,10 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_read(&s->dev, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_write(&s->dev, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1721,8 +1724,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    pci_memory_read(&s->dev,
+                    s->ru_base + s->ru_offset,
+                    (uint8_t *) & rx, offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1736,9 +1740,11 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
-             rfd_status);
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
+    pci_stw(&s->dev,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
+            rfd_status);
+    pci_stw(&s->dev,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1752,8 +1758,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    pci_memory_write(&s->dev, s->ru_base + s->ru_offset +
+                     offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 6/7] eepro100: use the PCI memory access interface
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |   86 ++++++++++++++++++++++++++++++--------------------------
 1 files changed, 46 insertions(+), 40 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 2b75c8f..5b7d82a 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -306,10 +306,10 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(EEPRO100State * s, pcibus_t addr, uint32_t val)
 {
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    pci_memory_write(&s->dev, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -692,12 +692,12 @@ static void dump_statistics(EEPRO100State * s)
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    pci_memory_write(&s->dev, s->statsaddr,
+                     (uint8_t *) & s->statistics, s->stats_size);
+    stl_le_phys(s, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(s, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(s, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(s, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -707,7 +707,8 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    pci_memory_read(&s->dev,
+                    s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -737,18 +738,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+        uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        pci_memory_read(&s->dev,
+                        tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -759,16 +760,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+                uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+                uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                pci_memory_read(&s->dev,
+                                tx_buffer_address, &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -777,16 +778,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+            uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+            uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            pci_memory_read(&s->dev,
+                            tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -811,7 +812,7 @@ static void set_multicast_list(EEPRO100State *s)
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        pci_memory_read(&s->dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -845,12 +846,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8,
+                            &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n", nic_dump(&s->configuration[0], 16)));
             break;
         case CmdMulticastList:
@@ -880,7 +883,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        pci_stw(&s->dev, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -947,7 +950,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -958,7 +961,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1259,10 +1262,10 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_read(&s->dev, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_write(&s->dev, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1721,8 +1724,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    pci_memory_read(&s->dev,
+                    s->ru_base + s->ru_offset,
+                    (uint8_t *) & rx, offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1736,9 +1740,11 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
-             rfd_status);
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
+    pci_stw(&s->dev,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
+            rfd_status);
+    pci_stw(&s->dev,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1752,8 +1758,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    pci_memory_write(&s->dev, s->ru_base + s->ru_offset +
+                     offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 7/7] ac97: use the PCI memory access interface
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: malc <av1474@comtv.ru>
---
 hw/ac97.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index d71072d..bad38fb 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -223,7 +223,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    pci_memory_read (&s->dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -972,7 +972,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        pci_memory_read (&s->dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1056,7 +1056,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        pci_memory_write (&s->dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 7/7] ac97: use the PCI memory access interface
@ 2010-08-28 14:54   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: malc <av1474@comtv.ru>
---
 hw/ac97.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index d71072d..bad38fb 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -223,7 +223,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    pci_memory_read (&s->dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -972,7 +972,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        pci_memory_read (&s->dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1056,7 +1056,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        pci_memory_write (&s->dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 3/7] AMD IOMMU emulation
  2010-08-28 14:54   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 15:58     ` Blue Swirl
  -1 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-28 15:58 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, joro, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> This introduces emulation for the AMD IOMMU, described in "AMD I/O
> Virtualization Technology (IOMMU) Specification".
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/amd_iommu.c  |  663 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pc.c         |    2 +
>  hw/pci_ids.h    |    2 +
>  hw/pci_regs.h   |    1 +
>  5 files changed, 669 insertions(+), 1 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 3ef4666..d4eeccd 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -195,7 +195,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o amd_iommu.o
>
>  # shared objects
>  obj-ppc-y = ppc.o
> diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> new file mode 100644
> index 0000000..43e0426
> --- /dev/null
> +++ b/hw/amd_iommu.c
> @@ -0,0 +1,663 @@
> +/*
> + * AMD IOMMU emulation
> + *
> + * Copyright (c) 2010 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "pc.h"
> +#include "hw.h"
> +#include "pci.h"
> +#include "qlist.h"
> +
> +/* Capability registers */
> +#define CAPAB_HEADER            0x00
> +#define   CAPAB_REV_TYPE        0x02
> +#define   CAPAB_FLAGS           0x03
> +#define CAPAB_BAR_LOW           0x04
> +#define CAPAB_BAR_HIGH          0x08
> +#define CAPAB_RANGE             0x0C
> +#define CAPAB_MISC              0x10
> +
> +#define CAPAB_SIZE              0x14
> +#define CAPAB_REG_SIZE          0x04
> +
> +/* Capability header data */
> +#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
> +#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
> +#define CAPAB_FLAG_NPCACHE      (1 << 2)
> +#define CAPAB_INIT_REV          (1 << 3)
> +#define CAPAB_INIT_TYPE         3
> +#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
> +#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
> +#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
> +#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
> +
> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x4000
> +
> +#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
> +#define MMIO_DEVTAB_ENTRY_SIZE  32
> +#define MMIO_DEVTAB_SIZE_UNIT   4096
> +
> +#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
> +#define MMIO_CMDBUF_SIZE_MASK       0x0F
> +#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
> +#define MMIO_CMDBUF_DEFAULT_SIZE    8
> +#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
> +#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
> +#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
> +#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
> +#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
> +#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
> +#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_LIMIT_LOW         0xFFF
> +
> +#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
> +#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
> +#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
> +#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
> +#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
> +#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
> +
> +#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
> +#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
> +#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
> +#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
> +#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
> +
> +#define CMDBUF_ID_BYTE              0x07
> +#define CMDBUF_ID_RSHIFT            4
> +#define CMDBUF_ENTRY_SIZE           0x10
> +
> +#define CMD_COMPLETION_WAIT         0x01
> +#define CMD_INVAL_DEVTAB_ENTRY      0x02
> +#define CMD_INVAL_IOMMU_PAGES       0x03
> +#define CMD_INVAL_IOTLB_PAGES       0x04
> +#define CMD_INVAL_INTR_TABLE        0x05
> +
> +#define DEVTAB_ENTRY_SIZE           32
> +
> +/* Device table entry bits 0:63 */
> +#define DEV_VALID                   (1ULL << 0)
> +#define DEV_TRANSLATION_VALID       (1ULL << 1)
> +#define DEV_MODE_MASK               0x7
> +#define DEV_MODE_RSHIFT             9
> +#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
> +#define DEV_PT_ROOT_RSHIFT          12
> +#define DEV_PERM_SHIFT              61
> +#define DEV_PERM_READ               (1ULL << 61)
> +#define DEV_PERM_WRITE              (1ULL << 62)
> +
> +/* Device table entry bits 64:127 */
> +#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
> +#define DEV_IOTLB_SUPPORT           (1ULL << 17)
> +#define DEV_SUPPRESS_PF             (1ULL << 18)
> +#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
> +#define DEV_IOCTL_MASK              ~3
> +#define DEV_IOCTL_RSHIFT            20
> +#define   DEV_IOCTL_DENY            0
> +#define   DEV_IOCTL_PASSTHROUGH     1
> +#define   DEV_IOCTL_TRANSLATE       2
> +#define DEV_CACHE                   (1ULL << 37)
> +#define DEV_SNOOP_DISABLE           (1ULL << 38)
> +#define DEV_EXCL                    (1ULL << 39)
> +
> +/* Event codes and flags, as stored in the info field */
> +#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
> +#define EVENT_IOPF                  (0x2U << 24)
> +#define   EVENT_IOPF_I              (1U << 3)
> +#define   EVENT_IOPF_PR             (1U << 4)
> +#define   EVENT_IOPF_RW             (1U << 5)
> +#define   EVENT_IOPF_PE             (1U << 6)
> +#define   EVENT_IOPF_RZ             (1U << 7)
> +#define   EVENT_IOPF_TR             (1U << 8)
> +#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
> +#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
> +#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
> +#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
> +#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
> +#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
> +
> +#define EVENT_LEN                   16
> +
> +typedef struct AMDIOMMUState {
> +    PCIDevice                   dev;
> +
> +    int                         capab_offset;
> +    unsigned char               *capab;
> +
> +    int                         mmio_index;
> +    target_phys_addr_t          mmio_addr;
> +    unsigned char               *mmio_buf;
> +    int                         mmio_enabled;
> +
> +    int                         enabled;
> +    int                         ats_enabled;
> +
> +    target_phys_addr_t          devtab;
> +    size_t                      devtab_len;
> +
> +    target_phys_addr_t          cmdbuf;
> +    int                         cmdbuf_enabled;
> +    size_t                      cmdbuf_len;
> +    size_t                      cmdbuf_head;
> +    size_t                      cmdbuf_tail;
> +    int                         completion_wait_intr;
> +
> +    target_phys_addr_t          evtlog;
> +    int                         evtlog_enabled;
> +    int                         evtlog_intr;
> +    target_phys_addr_t          evtlog_len;
> +    target_phys_addr_t          evtlog_head;
> +    target_phys_addr_t          evtlog_tail;
> +
> +    target_phys_addr_t          excl_base;
> +    target_phys_addr_t          excl_limit;
> +    int                         excl_enabled;
> +    int                         excl_allow;
> +} AMDIOMMUState;
> +
> +typedef struct AMDIOMMUEvent {
> +    uint16_t    devfn;
> +    uint16_t    reserved;
> +    uint16_t    domid;
> +    uint16_t    info;
> +    uint64_t    addr;
> +} __attribute__((packed)) AMDIOMMUEvent;
> +
> +static void amd_iommu_completion_wait(AMDIOMMUState *st,
> +                                      uint8_t *cmd)
> +{
> +    uint64_t addr;
> +
> +    if (cmd[0] & 1) {
> +        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
> +        cpu_physical_memory_write(addr, cmd + 8, 8);
> +    }
> +
> +    if (cmd[0] & 2)
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
> +}
> +
> +static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
> +                                       uint8_t *cmd)
> +{
> +    PCIDevice *dev;
> +    PCIBus *bus = st->dev.bus;
> +    int bus_num = pci_bus_num(bus);
> +    int devfn = *(uint16_t *) cmd;
> +
> +    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +    if (dev) {
> +        pci_memory_invalidate_range(dev, 0, -1);
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
> +{
> +    uint8_t cmd[16];
> +    int type;
> +
> +    if (!st->cmdbuf_enabled) {
> +        return;
> +    }
> +
> +    /* Check if there's work to do. */
> +    if (st->cmdbuf_head == st->cmdbuf_tail) {
> +        return;
> +    }
> +
> +    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
> +    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
> +    switch (type) {
> +        case CMD_COMPLETION_WAIT:
> +            amd_iommu_completion_wait(st, cmd);
> +            break;
> +        case CMD_INVAL_DEVTAB_ENTRY:
> +            break;
> +        case CMD_INVAL_IOMMU_PAGES:
> +            break;
> +        case CMD_INVAL_IOTLB_PAGES:
> +            amd_iommu_invalidate_iotlb(st, cmd);
> +            break;
> +        case CMD_INVAL_INTR_TABLE:
> +            break;
> +        default:
> +            break;
> +    }
> +
> +    /* Increment and wrap head pointer. */
> +    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
> +    if (st->cmdbuf_head >= st->cmdbuf_len) {
> +        st->cmdbuf_head = 0;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
> +                                        size_t offset,
> +                                        size_t size)
> +{
> +    ssize_t i;
> +    uint32_t ret;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = st->mmio_buf[offset + size - 1];
> +    for (i = size - 2; i >= 0; i--) {
> +        ret <<= 8;
> +        ret |= st->mmio_buf[offset + i];
> +    }
> +
> +    return ret;
> +}
> +
> +static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
> +                                     size_t offset,
> +                                     size_t size,
> +                                     uint32_t val)
> +{
> +    size_t i;
> +
> +    for (i = 0; i < size; i++) {
> +        st->mmio_buf[offset + i] = val & 0xFF;
> +        val >>= 8;
> +    }
> +}
> +
> +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> +                                  target_phys_addr_t addr)
> +{
> +    size_t reg = addr & ~0x07;
> +    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];

This is still buggy.

> +    uint64_t val = le64_to_cpu(*base);
> +
> +    switch (reg) {
> +        case MMIO_CONTROL:
> +            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
> +            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
> +            st->evtlog_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
> +            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
> +            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
> +            st->cmdbuf_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_CMDBUFEN);
> +
> +            /* Update status flags depending on the control register. */
> +            if (st->cmdbuf_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
> +            }
> +            if (st->evtlog_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
> +            }
> +
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_DEVICE_TABLE:
> +            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
> +            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
> +                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
> +            break;
> +        case MMIO_COMMAND_BASE:
> +            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
> +            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
> +                                     MMIO_CMDBUF_SIZE_MASK);
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_HEAD:
> +            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_TAIL:
> +            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_EVENT_BASE:
> +            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
> +            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
> +                                     MMIO_EVTLOG_SIZE_MASK);
> +            break;
> +        case MMIO_EVENT_HEAD:
> +            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
> +            break;
> +        case MMIO_EVENT_TAIL:
> +            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
> +            break;
> +        case MMIO_EXCL_BASE:
> +            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
> +            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
> +            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
> +            break;
> +        case MMIO_EXCL_LIMIT:
> +            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
> +                                                   MMIO_EXCL_LIMIT_LOW);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 1);
> +}
> +
> +static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 2);
> +}
> +
> +static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 4);
> +}
> +
> +static void amd_iommu_mmio_writeb(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 1, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writew(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 2, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writel(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 4, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
> +    amd_iommu_mmio_readb,
> +    amd_iommu_mmio_readw,
> +    amd_iommu_mmio_readl,
> +};
> +
> +static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
> +    amd_iommu_mmio_writeb,
> +    amd_iommu_mmio_writew,
> +    amd_iommu_mmio_writel,
> +};
> +
> +static void amd_iommu_enable_mmio(AMDIOMMUState *st)
> +{
> +    target_phys_addr_t addr;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
> +                                            amd_iommu_mmio_write, st);
> +    if (st->mmio_index < 0) {
> +        return;
> +    }
> +
> +    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
> +    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
> +
> +    st->mmio_addr = addr;
> +    st->mmio_enabled = 1;
> +
> +    /* Further changes to the capability are prohibited. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
> +}
> +
> +static void amd_iommu_write_capab(PCIDevice *dev,
> +                                  uint32_t addr, uint32_t val, int len)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_default_write_config(dev, addr, val, len);
> +
> +    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
> +        amd_iommu_enable_mmio(st);
> +    }
> +}
> +
> +static void amd_iommu_reset(DeviceState *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
> +    unsigned char *capab = st->capab;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->enabled      = 0;
> +    st->ats_enabled  = 0;
> +    st->mmio_enabled = 0;
> +
> +    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
> +    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
> +    capab[CAPAB_BAR_LOW]   = 0;
> +    capab[CAPAB_BAR_HIGH]  = 0;
> +    capab[CAPAB_RANGE]     = 0;
> +    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
> +
> +    /* Changes to the capability are allowed after system reset. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
> +
> +    memset(st->mmio_buf, 0, MMIO_SIZE);
> +    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
> +    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
> +}
> +
> +static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
> +{
> +    if (!st->evtlog_enabled ||
> +        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
> +        return;
> +    }
> +
> +    if (st->evtlog_tail >= st->evtlog_len) {
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
> +    }
> +
> +    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
> +                              (uint8_t *) evt, EVENT_LEN);
> +
> +    st->evtlog_tail += EVENT_LEN;
> +    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
> +}
> +
> +static void amd_iommu_page_fault(AMDIOMMUState *st,
> +                                 int devfn,
> +                                 unsigned domid,
> +                                 target_phys_addr_t addr,
> +                                 int present,
> +                                 int is_write)
> +{
> +    AMDIOMMUEvent evt;
> +    unsigned info;
> +
> +    evt.devfn = cpu_to_le16(devfn);
> +    evt.reserved = 0;
> +    evt.domid = cpu_to_le16(domid);
> +    evt.addr = cpu_to_le64(addr);
> +
> +    info = EVENT_IOPF;
> +    if (present) {
> +        info |= EVENT_IOPF_PR;
> +    }
> +    if (is_write) {
> +        info |= EVENT_IOPF_RW;
> +    }
> +    evt.info = cpu_to_le16(info);
> +
> +    amd_iommu_log_event(st, &evt);
> +}
> +
> +static inline uint64_t amd_iommu_get_perms(uint64_t entry)
> +{
> +    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
> +}
> +
> +static int amd_iommu_translate(PCIDevice *iommu,
> +                               PCIDevice *dev,
> +                               pcibus_t addr,
> +                               target_phys_addr_t *paddr,
> +                               target_phys_addr_t *len,
> +                               unsigned perms)
> +{
> +    int devfn, present;
> +    target_phys_addr_t entry_addr, pte_addr;
> +    uint64_t entry[4], pte, page_offset, pte_perms;
> +    unsigned level, domid;
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
> +
> +    if (!st->enabled) {
> +        goto no_translation;
> +    }
> +
> +    /* Get device table entry. */
> +    devfn = dev->devfn;
> +    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
> +    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
> +
> +    pte = entry[0];
> +    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
> +        goto no_translation;
> +    }
> +    domid = entry[1] & DEV_DOMAIN_ID_MASK;
> +    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    while (level > 0) {
> +        /*
> +         * Check permissions: the bitwise
> +         * implication perms -> entry_perms must be true.
> +         */
> +        pte_perms = amd_iommu_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amd_iommu_page_fault(st, devfn, domid, addr,
> +                                 present, !!(perms & IOMMU_PERM_WRITE));
> +            return -EPERM;
> +        }
> +
> +        /* Go to the next lower level. */
> +        pte_addr = pte & DEV_PT_ROOT_MASK;
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = ldq_phys(pte_addr);
> +        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    }
> +    page_offset = addr & 4095;
> +    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
> +    *len = 4096 - page_offset;
> +
> +    return 0;
> +
> +no_translation:
> +    *paddr = addr;
> +    *len = -1;
> +    return 0;
> +}
> +
> +static int amd_iommu_pci_initfn(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
> +    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
> +    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
> +
> +    /* Secure Device capability */
> +    st->capab_offset = pci_add_capability(&st->dev,
> +                                          PCI_CAP_ID_SEC,
> +                                          CAPAB_SIZE);
> +    st->capab = st->dev.config + st->capab_offset;
> +    dev->config_write = amd_iommu_write_capab;
> +
> +    /* Allocate backing space for the MMIO registers. */
> +    st->mmio_buf = qemu_malloc(MMIO_SIZE);
> +
> +    pci_register_iommu(dev, amd_iommu_translate);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_amd_iommu = {
> +    .name                       = "amd-iommu",
> +    .version_id                 = 1,
> +    .minimum_version_id         = 1,
> +    .minimum_version_id_old     = 1,
> +    .fields                     = (VMStateField []) {
> +        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo amd_iommu_pci_info = {
> +    .qdev.name    = "amd-iommu",
> +    .qdev.desc    = "AMD IOMMU",
> +    .qdev.size    = sizeof(AMDIOMMUState),
> +    .qdev.reset   = amd_iommu_reset,
> +    .qdev.vmsd    = &vmstate_amd_iommu,
> +    .init         = amd_iommu_pci_initfn,
> +};
> +
> +static void amd_iommu_register(void)
> +{
> +    pci_qdev_register(&amd_iommu_pci_info);
> +}
> +
> +device_init(amd_iommu_register);
> diff --git a/hw/pc.c b/hw/pc.c
> index a96187f..e2456b0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>     int max_bus;
>     int bus;
>
> +    pci_create_simple(pci_bus, -1, "amd-iommu");
> +
>     max_bus = drive_get_max_bus(IF_SCSI);
>     for (bus = 0; bus <= max_bus; bus++) {
>         pci_create_simple(pci_bus, -1, "lsi53c895a");
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index 39e9f1d..d790312 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -26,6 +26,7 @@
>
>  #define PCI_CLASS_MEMORY_RAM             0x0500
>
> +#define PCI_CLASS_SYSTEM_IOMMU           0x0806
>  #define PCI_CLASS_SYSTEM_OTHER           0x0880
>
>  #define PCI_CLASS_SERIAL_USB             0x0c03
> @@ -56,6 +57,7 @@
>
>  #define PCI_VENDOR_ID_AMD                0x1022
>  #define PCI_DEVICE_ID_AMD_LANCE          0x2000
> +#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
>
>  #define PCI_VENDOR_ID_MOTOROLA           0x1057
>  #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> index 0f9f84c..6695e41 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -209,6 +209,7 @@
>  #define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC         0x0F    /* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
>  #define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
>  #define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-28 15:58     ` Blue Swirl
  0 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-28 15:58 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, mst, joro, qemu-devel, yamahata, avi, paul

On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> This introduces emulation for the AMD IOMMU, described in "AMD I/O
> Virtualization Technology (IOMMU) Specification".
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/amd_iommu.c  |  663 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pc.c         |    2 +
>  hw/pci_ids.h    |    2 +
>  hw/pci_regs.h   |    1 +
>  5 files changed, 669 insertions(+), 1 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 3ef4666..d4eeccd 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -195,7 +195,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o amd_iommu.o
>
>  # shared objects
>  obj-ppc-y = ppc.o
> diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> new file mode 100644
> index 0000000..43e0426
> --- /dev/null
> +++ b/hw/amd_iommu.c
> @@ -0,0 +1,663 @@
> +/*
> + * AMD IOMMU emulation
> + *
> + * Copyright (c) 2010 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "pc.h"
> +#include "hw.h"
> +#include "pci.h"
> +#include "qlist.h"
> +
> +/* Capability registers */
> +#define CAPAB_HEADER            0x00
> +#define   CAPAB_REV_TYPE        0x02
> +#define   CAPAB_FLAGS           0x03
> +#define CAPAB_BAR_LOW           0x04
> +#define CAPAB_BAR_HIGH          0x08
> +#define CAPAB_RANGE             0x0C
> +#define CAPAB_MISC              0x10
> +
> +#define CAPAB_SIZE              0x14
> +#define CAPAB_REG_SIZE          0x04
> +
> +/* Capability header data */
> +#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
> +#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
> +#define CAPAB_FLAG_NPCACHE      (1 << 2)
> +#define CAPAB_INIT_REV          (1 << 3)
> +#define CAPAB_INIT_TYPE         3
> +#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
> +#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
> +#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
> +#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
> +
> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x4000
> +
> +#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
> +#define MMIO_DEVTAB_ENTRY_SIZE  32
> +#define MMIO_DEVTAB_SIZE_UNIT   4096
> +
> +#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
> +#define MMIO_CMDBUF_SIZE_MASK       0x0F
> +#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
> +#define MMIO_CMDBUF_DEFAULT_SIZE    8
> +#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
> +#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
> +#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
> +#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
> +#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
> +#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
> +#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_LIMIT_LOW         0xFFF
> +
> +#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
> +#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
> +#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
> +#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
> +#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
> +#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
> +
> +#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
> +#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
> +#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
> +#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
> +#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
> +
> +#define CMDBUF_ID_BYTE              0x07
> +#define CMDBUF_ID_RSHIFT            4
> +#define CMDBUF_ENTRY_SIZE           0x10
> +
> +#define CMD_COMPLETION_WAIT         0x01
> +#define CMD_INVAL_DEVTAB_ENTRY      0x02
> +#define CMD_INVAL_IOMMU_PAGES       0x03
> +#define CMD_INVAL_IOTLB_PAGES       0x04
> +#define CMD_INVAL_INTR_TABLE        0x05
> +
> +#define DEVTAB_ENTRY_SIZE           32
> +
> +/* Device table entry bits 0:63 */
> +#define DEV_VALID                   (1ULL << 0)
> +#define DEV_TRANSLATION_VALID       (1ULL << 1)
> +#define DEV_MODE_MASK               0x7
> +#define DEV_MODE_RSHIFT             9
> +#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
> +#define DEV_PT_ROOT_RSHIFT          12
> +#define DEV_PERM_SHIFT              61
> +#define DEV_PERM_READ               (1ULL << 61)
> +#define DEV_PERM_WRITE              (1ULL << 62)
> +
> +/* Device table entry bits 64:127 */
> +#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
> +#define DEV_IOTLB_SUPPORT           (1ULL << 17)
> +#define DEV_SUPPRESS_PF             (1ULL << 18)
> +#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
> +#define DEV_IOCTL_MASK              ~3
> +#define DEV_IOCTL_RSHIFT            20
> +#define   DEV_IOCTL_DENY            0
> +#define   DEV_IOCTL_PASSTHROUGH     1
> +#define   DEV_IOCTL_TRANSLATE       2
> +#define DEV_CACHE                   (1ULL << 37)
> +#define DEV_SNOOP_DISABLE           (1ULL << 38)
> +#define DEV_EXCL                    (1ULL << 39)
> +
> +/* Event codes and flags, as stored in the info field */
> +#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
> +#define EVENT_IOPF                  (0x2U << 24)
> +#define   EVENT_IOPF_I              (1U << 3)
> +#define   EVENT_IOPF_PR             (1U << 4)
> +#define   EVENT_IOPF_RW             (1U << 5)
> +#define   EVENT_IOPF_PE             (1U << 6)
> +#define   EVENT_IOPF_RZ             (1U << 7)
> +#define   EVENT_IOPF_TR             (1U << 8)
> +#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
> +#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
> +#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
> +#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
> +#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
> +#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
> +
> +#define EVENT_LEN                   16
> +
> +typedef struct AMDIOMMUState {
> +    PCIDevice                   dev;
> +
> +    int                         capab_offset;
> +    unsigned char               *capab;
> +
> +    int                         mmio_index;
> +    target_phys_addr_t          mmio_addr;
> +    unsigned char               *mmio_buf;
> +    int                         mmio_enabled;
> +
> +    int                         enabled;
> +    int                         ats_enabled;
> +
> +    target_phys_addr_t          devtab;
> +    size_t                      devtab_len;
> +
> +    target_phys_addr_t          cmdbuf;
> +    int                         cmdbuf_enabled;
> +    size_t                      cmdbuf_len;
> +    size_t                      cmdbuf_head;
> +    size_t                      cmdbuf_tail;
> +    int                         completion_wait_intr;
> +
> +    target_phys_addr_t          evtlog;
> +    int                         evtlog_enabled;
> +    int                         evtlog_intr;
> +    target_phys_addr_t          evtlog_len;
> +    target_phys_addr_t          evtlog_head;
> +    target_phys_addr_t          evtlog_tail;
> +
> +    target_phys_addr_t          excl_base;
> +    target_phys_addr_t          excl_limit;
> +    int                         excl_enabled;
> +    int                         excl_allow;
> +} AMDIOMMUState;
> +
> +typedef struct AMDIOMMUEvent {
> +    uint16_t    devfn;
> +    uint16_t    reserved;
> +    uint16_t    domid;
> +    uint16_t    info;
> +    uint64_t    addr;
> +} __attribute__((packed)) AMDIOMMUEvent;
> +
> +static void amd_iommu_completion_wait(AMDIOMMUState *st,
> +                                      uint8_t *cmd)
> +{
> +    uint64_t addr;
> +
> +    if (cmd[0] & 1) {
> +        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
> +        cpu_physical_memory_write(addr, cmd + 8, 8);
> +    }
> +
> +    if (cmd[0] & 2)
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
> +}
> +
> +static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
> +                                       uint8_t *cmd)
> +{
> +    PCIDevice *dev;
> +    PCIBus *bus = st->dev.bus;
> +    int bus_num = pci_bus_num(bus);
> +    int devfn = *(uint16_t *) cmd;
> +
> +    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +    if (dev) {
> +        pci_memory_invalidate_range(dev, 0, -1);
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
> +{
> +    uint8_t cmd[16];
> +    int type;
> +
> +    if (!st->cmdbuf_enabled) {
> +        return;
> +    }
> +
> +    /* Check if there's work to do. */
> +    if (st->cmdbuf_head == st->cmdbuf_tail) {
> +        return;
> +    }
> +
> +    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
> +    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
> +    switch (type) {
> +        case CMD_COMPLETION_WAIT:
> +            amd_iommu_completion_wait(st, cmd);
> +            break;
> +        case CMD_INVAL_DEVTAB_ENTRY:
> +            break;
> +        case CMD_INVAL_IOMMU_PAGES:
> +            break;
> +        case CMD_INVAL_IOTLB_PAGES:
> +            amd_iommu_invalidate_iotlb(st, cmd);
> +            break;
> +        case CMD_INVAL_INTR_TABLE:
> +            break;
> +        default:
> +            break;
> +    }
> +
> +    /* Increment and wrap head pointer. */
> +    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
> +    if (st->cmdbuf_head >= st->cmdbuf_len) {
> +        st->cmdbuf_head = 0;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
> +                                        size_t offset,
> +                                        size_t size)
> +{
> +    ssize_t i;
> +    uint32_t ret;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = st->mmio_buf[offset + size - 1];
> +    for (i = size - 2; i >= 0; i--) {
> +        ret <<= 8;
> +        ret |= st->mmio_buf[offset + i];
> +    }
> +
> +    return ret;
> +}
> +
> +static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
> +                                     size_t offset,
> +                                     size_t size,
> +                                     uint32_t val)
> +{
> +    size_t i;
> +
> +    for (i = 0; i < size; i++) {
> +        st->mmio_buf[offset + i] = val & 0xFF;
> +        val >>= 8;
> +    }
> +}
> +
> +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> +                                  target_phys_addr_t addr)
> +{
> +    size_t reg = addr & ~0x07;
> +    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];

This is still buggy.

> +    uint64_t val = le64_to_cpu(*base);
> +
> +    switch (reg) {
> +        case MMIO_CONTROL:
> +            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
> +            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
> +            st->evtlog_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
> +            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
> +            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
> +            st->cmdbuf_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_CMDBUFEN);
> +
> +            /* Update status flags depending on the control register. */
> +            if (st->cmdbuf_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
> +            }
> +            if (st->evtlog_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
> +            }
> +
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_DEVICE_TABLE:
> +            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
> +            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
> +                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
> +            break;
> +        case MMIO_COMMAND_BASE:
> +            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
> +            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
> +                                     MMIO_CMDBUF_SIZE_MASK);
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_HEAD:
> +            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_TAIL:
> +            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_EVENT_BASE:
> +            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
> +            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
> +                                     MMIO_EVTLOG_SIZE_MASK);
> +            break;
> +        case MMIO_EVENT_HEAD:
> +            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
> +            break;
> +        case MMIO_EVENT_TAIL:
> +            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
> +            break;
> +        case MMIO_EXCL_BASE:
> +            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
> +            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
> +            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
> +            break;
> +        case MMIO_EXCL_LIMIT:
> +            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
> +                                                   MMIO_EXCL_LIMIT_LOW);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 1);
> +}
> +
> +static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 2);
> +}
> +
> +static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 4);
> +}
> +
> +static void amd_iommu_mmio_writeb(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 1, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writew(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 2, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writel(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 4, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
> +    amd_iommu_mmio_readb,
> +    amd_iommu_mmio_readw,
> +    amd_iommu_mmio_readl,
> +};
> +
> +static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
> +    amd_iommu_mmio_writeb,
> +    amd_iommu_mmio_writew,
> +    amd_iommu_mmio_writel,
> +};
> +
> +static void amd_iommu_enable_mmio(AMDIOMMUState *st)
> +{
> +    target_phys_addr_t addr;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
> +                                            amd_iommu_mmio_write, st);
> +    if (st->mmio_index < 0) {
> +        return;
> +    }
> +
> +    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
> +    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
> +
> +    st->mmio_addr = addr;
> +    st->mmio_enabled = 1;
> +
> +    /* Further changes to the capability are prohibited. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
> +}
> +
> +static void amd_iommu_write_capab(PCIDevice *dev,
> +                                  uint32_t addr, uint32_t val, int len)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_default_write_config(dev, addr, val, len);
> +
> +    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
> +        amd_iommu_enable_mmio(st);
> +    }
> +}
> +
> +static void amd_iommu_reset(DeviceState *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
> +    unsigned char *capab = st->capab;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->enabled      = 0;
> +    st->ats_enabled  = 0;
> +    st->mmio_enabled = 0;
> +
> +    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
> +    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
> +    capab[CAPAB_BAR_LOW]   = 0;
> +    capab[CAPAB_BAR_HIGH]  = 0;
> +    capab[CAPAB_RANGE]     = 0;
> +    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
> +
> +    /* Changes to the capability are allowed after system reset. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
> +
> +    memset(st->mmio_buf, 0, MMIO_SIZE);
> +    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
> +    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
> +}
> +
> +static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
> +{
> +    if (!st->evtlog_enabled ||
> +        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
> +        return;
> +    }
> +
> +    if (st->evtlog_tail >= st->evtlog_len) {
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
> +    }
> +
> +    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
> +                              (uint8_t *) evt, EVENT_LEN);
> +
> +    st->evtlog_tail += EVENT_LEN;
> +    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
> +}
> +
> +static void amd_iommu_page_fault(AMDIOMMUState *st,
> +                                 int devfn,
> +                                 unsigned domid,
> +                                 target_phys_addr_t addr,
> +                                 int present,
> +                                 int is_write)
> +{
> +    AMDIOMMUEvent evt;
> +    unsigned info;
> +
> +    evt.devfn = cpu_to_le16(devfn);
> +    evt.reserved = 0;
> +    evt.domid = cpu_to_le16(domid);
> +    evt.addr = cpu_to_le64(addr);
> +
> +    info = EVENT_IOPF;
> +    if (present) {
> +        info |= EVENT_IOPF_PR;
> +    }
> +    if (is_write) {
> +        info |= EVENT_IOPF_RW;
> +    }
> +    evt.info = cpu_to_le16(info);
> +
> +    amd_iommu_log_event(st, &evt);
> +}
> +
> +static inline uint64_t amd_iommu_get_perms(uint64_t entry)
> +{
> +    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
> +}
> +
> +static int amd_iommu_translate(PCIDevice *iommu,
> +                               PCIDevice *dev,
> +                               pcibus_t addr,
> +                               target_phys_addr_t *paddr,
> +                               target_phys_addr_t *len,
> +                               unsigned perms)
> +{
> +    int devfn, present;
> +    target_phys_addr_t entry_addr, pte_addr;
> +    uint64_t entry[4], pte, page_offset, pte_perms;
> +    unsigned level, domid;
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
> +
> +    if (!st->enabled) {
> +        goto no_translation;
> +    }
> +
> +    /* Get device table entry. */
> +    devfn = dev->devfn;
> +    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
> +    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
> +
> +    pte = entry[0];
> +    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
> +        goto no_translation;
> +    }
> +    domid = entry[1] & DEV_DOMAIN_ID_MASK;
> +    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    while (level > 0) {
> +        /*
> +         * Check permissions: the bitwise
> +         * implication perms -> entry_perms must be true.
> +         */
> +        pte_perms = amd_iommu_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amd_iommu_page_fault(st, devfn, domid, addr,
> +                                 present, !!(perms & IOMMU_PERM_WRITE));
> +            return -EPERM;
> +        }
> +
> +        /* Go to the next lower level. */
> +        pte_addr = pte & DEV_PT_ROOT_MASK;
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = ldq_phys(pte_addr);
> +        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    }
> +    page_offset = addr & 4095;
> +    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
> +    *len = 4096 - page_offset;
> +
> +    return 0;
> +
> +no_translation:
> +    *paddr = addr;
> +    *len = -1;
> +    return 0;
> +}
> +
> +static int amd_iommu_pci_initfn(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
> +    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
> +    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
> +
> +    /* Secure Device capability */
> +    st->capab_offset = pci_add_capability(&st->dev,
> +                                          PCI_CAP_ID_SEC,
> +                                          CAPAB_SIZE);
> +    st->capab = st->dev.config + st->capab_offset;
> +    dev->config_write = amd_iommu_write_capab;
> +
> +    /* Allocate backing space for the MMIO registers. */
> +    st->mmio_buf = qemu_malloc(MMIO_SIZE);
> +
> +    pci_register_iommu(dev, amd_iommu_translate);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_amd_iommu = {
> +    .name                       = "amd-iommu",
> +    .version_id                 = 1,
> +    .minimum_version_id         = 1,
> +    .minimum_version_id_old     = 1,
> +    .fields                     = (VMStateField []) {
> +        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo amd_iommu_pci_info = {
> +    .qdev.name    = "amd-iommu",
> +    .qdev.desc    = "AMD IOMMU",
> +    .qdev.size    = sizeof(AMDIOMMUState),
> +    .qdev.reset   = amd_iommu_reset,
> +    .qdev.vmsd    = &vmstate_amd_iommu,
> +    .init         = amd_iommu_pci_initfn,
> +};
> +
> +static void amd_iommu_register(void)
> +{
> +    pci_qdev_register(&amd_iommu_pci_info);
> +}
> +
> +device_init(amd_iommu_register);
> diff --git a/hw/pc.c b/hw/pc.c
> index a96187f..e2456b0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>     int max_bus;
>     int bus;
>
> +    pci_create_simple(pci_bus, -1, "amd-iommu");
> +
>     max_bus = drive_get_max_bus(IF_SCSI);
>     for (bus = 0; bus <= max_bus; bus++) {
>         pci_create_simple(pci_bus, -1, "lsi53c895a");
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index 39e9f1d..d790312 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -26,6 +26,7 @@
>
>  #define PCI_CLASS_MEMORY_RAM             0x0500
>
> +#define PCI_CLASS_SYSTEM_IOMMU           0x0806
>  #define PCI_CLASS_SYSTEM_OTHER           0x0880
>
>  #define PCI_CLASS_SERIAL_USB             0x0c03
> @@ -56,6 +57,7 @@
>
>  #define PCI_VENDOR_ID_AMD                0x1022
>  #define PCI_DEVICE_ID_AMD_LANCE          0x2000
> +#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
>
>  #define PCI_VENDOR_ID_MOTOROLA           0x1057
>  #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> index 0f9f84c..6695e41 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -209,6 +209,7 @@
>  #define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC         0x0F    /* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
>  #define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
>  #define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 0/7] AMD IOMMU emulation patchset v4
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-28 16:00   ` Blue Swirl
  -1 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-28 16:00 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, joro, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> Hi,
>
> I rebased my work on mst's PCI tree and, hopefully, fixed issues raised by
> others. Here's a summary of the changes:
> - made it apply to mst/pci
> - moved some AMD IOMMU stuff in a reset handler
> - dropped range_covers_range() (wasn't the same as ranges_overlap(), but the
>  latter was better anyway)
> - used 'expand' to remove tabs in pci_regs.h before applying the useful changes
> - fixed the endianness mistake spotted by Blue (though ldq_phys wasn't needed)
>
> As for Anthony's suggestion to simply sed-convert all devices, I'd rather go
> through them one at a time and do it manually. 'sed' would not only mess
> indentation, but also it isn't straightforward to get the 'PCIDevice *' you
> need to pass to the pci_* helpers. (I'll try to focus on conversion next so we
> can poison the old stuff.)
>
> I also added (read "spelled it out myself") malc's ACK to the ac97 patch.
> Nothing changed since his last review.
>
> Please have a look and merge if you like it.

The endianess bug still exists. I had also other comments to 2.

>
>
>    Thanks,
>    Eduard
>
>
> Eduard - Gabriel Munteanu (7):
>  pci: expand tabs to spaces in pci_regs.h
>  pci: memory access API and IOMMU support
>  AMD IOMMU emulation
>  ide: use the PCI memory access interface
>  rtl8139: use the PCI memory access interface
>  eepro100: use the PCI memory access interface
>  ac97: use the PCI memory access interface
>
>  Makefile.target    |    2 +-
>  dma-helpers.c      |   46 ++-
>  dma.h              |   21 +-
>  hw/ac97.c          |    6 +-
>  hw/amd_iommu.c     |  663 ++++++++++++++++++++++++++
>  hw/eepro100.c      |   86 ++--
>  hw/ide/core.c      |   15 +-
>  hw/ide/internal.h  |   39 ++
>  hw/ide/macio.c     |    4 +-
>  hw/ide/pci.c       |    7 +
>  hw/pc.c            |    2 +
>  hw/pci.c           |  185 ++++++++-
>  hw/pci.h           |   74 +++
>  hw/pci_ids.h       |    2 +
>  hw/pci_internals.h |   12 +
>  hw/pci_regs.h      | 1331 ++++++++++++++++++++++++++--------------------------
>  hw/rtl8139.c       |   99 +++--
>  qemu-common.h      |    1 +
>  18 files changed, 1827 insertions(+), 768 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>  rewrite hw/pci_regs.h (90%)
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 0/7] AMD IOMMU emulation patchset v4
@ 2010-08-28 16:00   ` Blue Swirl
  0 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-28 16:00 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, mst, joro, qemu-devel, yamahata, avi, paul

On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> Hi,
>
> I rebased my work on mst's PCI tree and, hopefully, fixed issues raised by
> others. Here's a summary of the changes:
> - made it apply to mst/pci
> - moved some AMD IOMMU stuff in a reset handler
> - dropped range_covers_range() (wasn't the same as ranges_overlap(), but the
>  latter was better anyway)
> - used 'expand' to remove tabs in pci_regs.h before applying the useful changes
> - fixed the endianness mistake spotted by Blue (though ldq_phys wasn't needed)
>
> As for Anthony's suggestion to simply sed-convert all devices, I'd rather go
> through them one at a time and do it manually. 'sed' would not only mess
> indentation, but also it isn't straightforward to get the 'PCIDevice *' you
> need to pass to the pci_* helpers. (I'll try to focus on conversion next so we
> can poison the old stuff.)
>
> I also added (read "spelled it out myself") malc's ACK to the ac97 patch.
> Nothing changed since his last review.
>
> Please have a look and merge if you like it.

The endianess bug still exists. I had also other comments to 2.

>
>
>    Thanks,
>    Eduard
>
>
> Eduard - Gabriel Munteanu (7):
>  pci: expand tabs to spaces in pci_regs.h
>  pci: memory access API and IOMMU support
>  AMD IOMMU emulation
>  ide: use the PCI memory access interface
>  rtl8139: use the PCI memory access interface
>  eepro100: use the PCI memory access interface
>  ac97: use the PCI memory access interface
>
>  Makefile.target    |    2 +-
>  dma-helpers.c      |   46 ++-
>  dma.h              |   21 +-
>  hw/ac97.c          |    6 +-
>  hw/amd_iommu.c     |  663 ++++++++++++++++++++++++++
>  hw/eepro100.c      |   86 ++--
>  hw/ide/core.c      |   15 +-
>  hw/ide/internal.h  |   39 ++
>  hw/ide/macio.c     |    4 +-
>  hw/ide/pci.c       |    7 +
>  hw/pc.c            |    2 +
>  hw/pci.c           |  185 ++++++++-
>  hw/pci.h           |   74 +++
>  hw/pci_ids.h       |    2 +
>  hw/pci_internals.h |   12 +
>  hw/pci_regs.h      | 1331 ++++++++++++++++++++++++++--------------------------
>  hw/rtl8139.c       |   99 +++--
>  qemu-common.h      |    1 +
>  18 files changed, 1827 insertions(+), 768 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>  rewrite hw/pci_regs.h (90%)
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 3/7] AMD IOMMU emulation
  2010-08-28 15:58     ` [Qemu-devel] " Blue Swirl
@ 2010-08-28 21:53       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 21:53 UTC (permalink / raw)
  To: Blue Swirl
  Cc: mst, joro, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 03:58:23PM +0000, Blue Swirl wrote:
> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
> <eduard.munteanu@linux360.ro> wrote:
> > This introduces emulation for the AMD IOMMU, described in "AMD I/O
> > Virtualization Technology (IOMMU) Specification".
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> > new file mode 100644
> > index 0000000..43e0426
> > --- /dev/null
> > +++ b/hw/amd_iommu.c

[snip]

> > +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??target_phys_addr_t addr)
> > +{
> > + ?? ??size_t reg = addr & ~0x07;
> > + ?? ??uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
> 
> This is still buggy.
> 
> > + ?? ??uint64_t val = le64_to_cpu(*base);

mmio_buf is always LE, so a BE host will have *base in reversed
byteorder. But look at the next line, where I did the le64_to_cpu().
That should swap the bytes on a BE host, yielding the correct byteorder.

On a LE host, *base is right the first time and le64_to_cpu() is a nop.

In any case, I only use 'val', not '*base' directly. I suppose it could
be rewritten for clarity (i.e. ditch 'base').

Do you still think it's wrong? Or is it for another reason?


	Thanks,
	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-28 21:53       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 21:53 UTC (permalink / raw)
  To: Blue Swirl; +Cc: kvm, mst, joro, qemu-devel, yamahata, avi, paul

On Sat, Aug 28, 2010 at 03:58:23PM +0000, Blue Swirl wrote:
> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
> <eduard.munteanu@linux360.ro> wrote:
> > This introduces emulation for the AMD IOMMU, described in "AMD I/O
> > Virtualization Technology (IOMMU) Specification".
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> > new file mode 100644
> > index 0000000..43e0426
> > --- /dev/null
> > +++ b/hw/amd_iommu.c

[snip]

> > +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??target_phys_addr_t addr)
> > +{
> > + ?? ??size_t reg = addr & ~0x07;
> > + ?? ??uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
> 
> This is still buggy.
> 
> > + ?? ??uint64_t val = le64_to_cpu(*base);

mmio_buf is always LE, so a BE host will have *base in reversed
byteorder. But look at the next line, where I did the le64_to_cpu().
That should swap the bytes on a BE host, yielding the correct byteorder.

On a LE host, *base is right the first time and le64_to_cpu() is a nop.

In any case, I only use 'val', not '*base' directly. I suppose it could
be rewritten for clarity (i.e. ditch 'base').

Do you still think it's wrong? Or is it for another reason?


	Thanks,
	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 0/7] AMD IOMMU emulation patchset v4
  2010-08-28 16:00   ` [Qemu-devel] " Blue Swirl
@ 2010-08-29  9:55     ` Joerg Roedel
  -1 siblings, 0 replies; 96+ messages in thread
From: Joerg Roedel @ 2010-08-29  9:55 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Eduard - Gabriel Munteanu, mst, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 04:00:31PM +0000, Blue Swirl wrote:
> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu

> > Please have a look and merge if you like it.
> 
> The endianess bug still exists. I had also other comments to 2.

I am very happy with this patch set. Besides your comments, is there
anything else that prevents merging of this patch set? Paul, what is
your opinion in this?

	Joerg


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 0/7] AMD IOMMU emulation patchset v4
@ 2010-08-29  9:55     ` Joerg Roedel
  0 siblings, 0 replies; 96+ messages in thread
From: Joerg Roedel @ 2010-08-29  9:55 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, mst, qemu-devel, yamahata, avi, Eduard - Gabriel Munteanu, paul

On Sat, Aug 28, 2010 at 04:00:31PM +0000, Blue Swirl wrote:
> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu

> > Please have a look and merge if you like it.
> 
> The endianess bug still exists. I had also other comments to 2.

I am very happy with this patch set. Besides your comments, is there
anything else that prevents merging of this patch set? Paul, what is
your opinion in this?

	Joerg

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 3/7] AMD IOMMU emulation
  2010-08-28 21:53       ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-29 20:37         ` Blue Swirl
  -1 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-29 20:37 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, joro, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 9:53 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> On Sat, Aug 28, 2010 at 03:58:23PM +0000, Blue Swirl wrote:
>> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
>> <eduard.munteanu@linux360.ro> wrote:
>> > This introduces emulation for the AMD IOMMU, described in "AMD I/O
>> > Virtualization Technology (IOMMU) Specification".
>> >
>> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
>> > ---
>
> [snip]
>
>> > diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
>> > new file mode 100644
>> > index 0000000..43e0426
>> > --- /dev/null
>> > +++ b/hw/amd_iommu.c
>
> [snip]
>
>> > +static void amd_iommu_update_mmio(AMDIOMMUState *st,
>> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??target_phys_addr_t addr)
>> > +{
>> > + ?? ??size_t reg = addr & ~0x07;
>> > + ?? ??uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
>>
>> This is still buggy.
>>
>> > + ?? ??uint64_t val = le64_to_cpu(*base);
>
> mmio_buf is always LE, so a BE host will have *base in reversed
> byteorder. But look at the next line, where I did the le64_to_cpu().
> That should swap the bytes on a BE host, yielding the correct byteorder.

Sorry, I  missed that one when comparing the patch to previous version.

> On a LE host, *base is right the first time and le64_to_cpu() is a nop.
>
> In any case, I only use 'val', not '*base' directly. I suppose it could
> be rewritten for clarity (i.e. ditch 'base').

Yes, someone could add more code later which accidentally uses 'base' directly.

> Do you still think it's wrong? Or is it for another reason?

I think it's OK for now. The rewrite can happen with a small patch later.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-29 20:37         ` Blue Swirl
  0 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-29 20:37 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, mst, joro, qemu-devel, yamahata, avi, paul

On Sat, Aug 28, 2010 at 9:53 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> On Sat, Aug 28, 2010 at 03:58:23PM +0000, Blue Swirl wrote:
>> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
>> <eduard.munteanu@linux360.ro> wrote:
>> > This introduces emulation for the AMD IOMMU, described in "AMD I/O
>> > Virtualization Technology (IOMMU) Specification".
>> >
>> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
>> > ---
>
> [snip]
>
>> > diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
>> > new file mode 100644
>> > index 0000000..43e0426
>> > --- /dev/null
>> > +++ b/hw/amd_iommu.c
>
> [snip]
>
>> > +static void amd_iommu_update_mmio(AMDIOMMUState *st,
>> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??target_phys_addr_t addr)
>> > +{
>> > + ?? ??size_t reg = addr & ~0x07;
>> > + ?? ??uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
>>
>> This is still buggy.
>>
>> > + ?? ??uint64_t val = le64_to_cpu(*base);
>
> mmio_buf is always LE, so a BE host will have *base in reversed
> byteorder. But look at the next line, where I did the le64_to_cpu().
> That should swap the bytes on a BE host, yielding the correct byteorder.

Sorry, I  missed that one when comparing the patch to previous version.

> On a LE host, *base is right the first time and le64_to_cpu() is a nop.
>
> In any case, I only use 'val', not '*base' directly. I suppose it could
> be rewritten for clarity (i.e. ditch 'base').

Yes, someone could add more code later which accidentally uses 'base' directly.

> Do you still think it's wrong? Or is it for another reason?

I think it's OK for now. The rewrite can happen with a small patch later.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 0/7] AMD IOMMU emulation patchset v4
  2010-08-29  9:55     ` [Qemu-devel] " Joerg Roedel
@ 2010-08-29 20:44       ` Blue Swirl
  -1 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-29 20:44 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Eduard - Gabriel Munteanu, mst, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Sun, Aug 29, 2010 at 9:55 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Sat, Aug 28, 2010 at 04:00:31PM +0000, Blue Swirl wrote:
>> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
>
>> > Please have a look and merge if you like it.
>>
>> The endianess bug still exists. I had also other comments to 2.
>
> I am very happy with this patch set. Besides your comments, is there
> anything else that prevents merging of this patch set? Paul, what is
> your opinion in this?

I also think it's nice piece of work. It would be good to fix the
CODING_STYLE (missing braces) problem in 2 before merging. The
endianess problem is not so much of a problem, my mistake.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 0/7] AMD IOMMU emulation patchset v4
@ 2010-08-29 20:44       ` Blue Swirl
  0 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-08-29 20:44 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kvm, mst, qemu-devel, yamahata, avi, Eduard - Gabriel Munteanu, paul

On Sun, Aug 29, 2010 at 9:55 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Sat, Aug 28, 2010 at 04:00:31PM +0000, Blue Swirl wrote:
>> On Sat, Aug 28, 2010 at 2:54 PM, Eduard - Gabriel Munteanu
>
>> > Please have a look and merge if you like it.
>>
>> The endianess bug still exists. I had also other comments to 2.
>
> I am very happy with this patch set. Besides your comments, is there
> anything else that prevents merging of this patch set? Paul, what is
> your opinion in this?

I also think it's nice piece of work. It would be good to fix the
CODING_STYLE (missing braces) problem in 2 before merging. The
endianess problem is not so much of a problem, my mistake.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-29 20:44       ` [Qemu-devel] " Blue Swirl
@ 2010-08-29 22:08         ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-29 22:08 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |  191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h           |   69 +++++++++++++++++++
 hw/pci_internals.h |   12 +++
 qemu-common.h      |    1 +
 4 files changed, 272 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..afcb33c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
     assert(PCI_FUNC(devfn_min) == 0);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min  = devfn_min;
+    bus->iommu      = NULL;
+    bus->translate  = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -1789,3 +1805,176 @@ static char *pcibus_get_dev_path(DeviceState *dev)
     return strdup(path);
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err) {
+            return;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb) {
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+    }
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8)) {                                       \
+        return 0;                                                         \
+    }                                                                     \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8)) {                                       \
+        return;                                                           \
+    }                                                                     \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..c95863a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
     char *romfile;
     ram_addr_t rom_offset;
     uint32_t rom_bar;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write);
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write);
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len);
+void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
+void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index e3c93a3..fb134b9 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -33,6 +33,9 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
 };
 
 struct PCIBridge {
@@ -44,4 +47,13 @@ struct PCIBridge {
     const char *bus_name;
 };
 
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
+};
+
 #endif /* QEMU_PCI_INTERNALS_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..8b060e8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-08-29 22:08         ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-29 22:08 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |  191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h           |   69 +++++++++++++++++++
 hw/pci_internals.h |   12 +++
 qemu-common.h      |    1 +
 4 files changed, 272 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..afcb33c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
     assert(PCI_FUNC(devfn_min) == 0);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min  = devfn_min;
+    bus->iommu      = NULL;
+    bus->translate  = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -1789,3 +1805,176 @@ static char *pcibus_get_dev_path(DeviceState *dev)
     return strdup(path);
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err) {
+            return;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb) {
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+    }
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8)) {                                       \
+        return 0;                                                         \
+    }                                                                     \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8)) {                                       \
+        return;                                                           \
+    }                                                                     \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..c95863a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
     char *romfile;
     ram_addr_t rom_offset;
     uint32_t rom_bar;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write);
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write);
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len);
+void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
+void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index e3c93a3..fb134b9 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -33,6 +33,9 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
 };
 
 struct PCIBridge {
@@ -44,4 +47,13 @@ struct PCIBridge {
     const char *bus_name;
 };
 
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
+};
+
 #endif /* QEMU_PCI_INTERNALS_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..8b060e8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-29 22:08         ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-29 22:11           ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-29 22:11 UTC (permalink / raw)
  To: mst
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Mon, Aug 30, 2010 at 01:08:23AM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
> 
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---

Please merge this instead of the patch I sent with the series. I wanted
to avoid resubmitting the whole series.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-08-29 22:11           ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-29 22:11 UTC (permalink / raw)
  To: mst; +Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Mon, Aug 30, 2010 at 01:08:23AM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
> 
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---

Please merge this instead of the patch I sent with the series. I wanted
to avoid resubmitting the whole series.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation
  2010-08-28 14:54   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-30  3:07     ` Isaku Yamahata
  -1 siblings, 0 replies; 96+ messages in thread
From: Isaku Yamahata @ 2010-08-30  3:07 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, kvm, joro, qemu-devel, blauwirbel, paul, avi

On Sat, Aug 28, 2010 at 05:54:54PM +0300, Eduard - Gabriel Munteanu wrote:
> diff --git a/hw/pc.c b/hw/pc.c
> index a96187f..e2456b0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>      int max_bus;
>      int bus;
>  
> +    pci_create_simple(pci_bus, -1, "amd-iommu");
> +
>      max_bus = drive_get_max_bus(IF_SCSI);
>      for (bus = 0; bus <= max_bus; bus++) {
>          pci_create_simple(pci_bus, -1, "lsi53c895a");

This always instantiate iommu.
How to coexist with other iommu(Intel VT-d) emulation?
-- 
yamahata

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-30  3:07     ` Isaku Yamahata
  0 siblings, 0 replies; 96+ messages in thread
From: Isaku Yamahata @ 2010-08-30  3:07 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, paul, avi

On Sat, Aug 28, 2010 at 05:54:54PM +0300, Eduard - Gabriel Munteanu wrote:
> diff --git a/hw/pc.c b/hw/pc.c
> index a96187f..e2456b0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>      int max_bus;
>      int bus;
>  
> +    pci_create_simple(pci_bus, -1, "amd-iommu");
> +
>      max_bus = drive_get_max_bus(IF_SCSI);
>      for (bus = 0; bus <= max_bus; bus++) {
>          pci_create_simple(pci_bus, -1, "lsi53c895a");

This always instantiate iommu.
How to coexist with other iommu(Intel VT-d) emulation?
-- 
yamahata

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation
  2010-08-30  3:07     ` Isaku Yamahata
@ 2010-08-30  5:54       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-30  5:54 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: mst, kvm, joro, qemu-devel, blauwirbel, paul, avi

On Mon, Aug 30, 2010 at 12:07:30PM +0900, Isaku Yamahata wrote:
> On Sat, Aug 28, 2010 at 05:54:54PM +0300, Eduard - Gabriel Munteanu wrote:
> > diff --git a/hw/pc.c b/hw/pc.c
> > index a96187f..e2456b0 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
> >      int max_bus;
> >      int bus;
> >  
> > +    pci_create_simple(pci_bus, -1, "amd-iommu");
> > +
> >      max_bus = drive_get_max_bus(IF_SCSI);
> >      for (bus = 0; bus <= max_bus; bus++) {
> >          pci_create_simple(pci_bus, -1, "lsi53c895a");
> 
> This always instantiate iommu.
> How to coexist with other iommu(Intel VT-d) emulation?
> -- 
> yamahata

I suppose it could be turned into a compile-time/runtime configurable
option when VT-d emulation arrives. Unless you mean having both IOMMUs
run at the same time, which is impossible unless some meaningful
topology is specified (presumably hardcoded as well).

Considering this is only a machine model I'm modifying, it's just like
other emulated pieces of PC hardware that are (at the moment) hardcoded.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-30  5:54       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-30  5:54 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: kvm, mst, joro, qemu-devel, blauwirbel, paul, avi

On Mon, Aug 30, 2010 at 12:07:30PM +0900, Isaku Yamahata wrote:
> On Sat, Aug 28, 2010 at 05:54:54PM +0300, Eduard - Gabriel Munteanu wrote:
> > diff --git a/hw/pc.c b/hw/pc.c
> > index a96187f..e2456b0 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -1068,6 +1068,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
> >      int max_bus;
> >      int bus;
> >  
> > +    pci_create_simple(pci_bus, -1, "amd-iommu");
> > +
> >      max_bus = drive_get_max_bus(IF_SCSI);
> >      for (bus = 0; bus <= max_bus; bus++) {
> >          pci_create_simple(pci_bus, -1, "lsi53c895a");
> 
> This always instantiate iommu.
> How to coexist with other iommu(Intel VT-d) emulation?
> -- 
> yamahata

I suppose it could be turned into a compile-time/runtime configurable
option when VT-d emulation arrives. Unless you mean having both IOMMUs
run at the same time, which is impossible unless some meaningful
topology is specified (presumably hardcoded as well).

Considering this is only a machine model I'm modifying, it's just like
other emulated pieces of PC hardware that are (at the moment) hardcoded.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
  2010-08-28 14:54   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-31 20:29     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-08-31 20:29 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
> The conversion was done using the GNU 'expand' tool (default settings)
> to make it obey the QEMU coding style.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

I'm not really interested in this: we copied pci_regs.h from linux
to help non-linux hosts, and keeping the code consistent
with the original makes detecting bugs and adding new stuff
from linux/pci_regs.h easier.

> ---
>  hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
>  1 files changed, 665 insertions(+), 665 deletions(-)
>  rewrite hw/pci_regs.h (90%)
> 
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> dissimilarity index 90%
> index dd0bed4..0f9f84c 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -1,665 +1,665 @@
> -/*
> - *	pci_regs.h
> - *
> - *	PCI standard defines
> - *	Copyright 1994, Drew Eckhardt
> - *	Copyright 1997--1999 Martin Mares <mj@ucw.cz>
> - *
> - *	For more information, please consult the following manuals (look at
> - *	http://www.pcisig.com/ for how to get them):
> - *
> - *	PCI BIOS Specification
> - *	PCI Local Bus Specification
> - *	PCI to PCI Bridge Specification
> - *	PCI System Design Guide
> - *
> - * 	For hypertransport information, please consult the following manuals
> - * 	from http://www.hypertransport.org
> - *
> - *	The Hypertransport I/O Link Specification
> - */
> -
> -#ifndef LINUX_PCI_REGS_H
> -#define LINUX_PCI_REGS_H
> -
> -/*
> - * Under PCI, each device has 256 bytes of configuration address space,
> - * of which the first 64 bytes are standardized as follows:
> - */
> -#define PCI_VENDOR_ID		0x00	/* 16 bits */
> -#define PCI_DEVICE_ID		0x02	/* 16 bits */
> -#define PCI_COMMAND		0x04	/* 16 bits */
> -#define  PCI_COMMAND_IO		0x1	/* Enable response in I/O space */
> -#define  PCI_COMMAND_MEMORY	0x2	/* Enable response in Memory space */
> -#define  PCI_COMMAND_MASTER	0x4	/* Enable bus mastering */
> -#define  PCI_COMMAND_SPECIAL	0x8	/* Enable response to special cycles */
> -#define  PCI_COMMAND_INVALIDATE	0x10	/* Use memory write and invalidate */
> -#define  PCI_COMMAND_VGA_PALETTE 0x20	/* Enable palette snooping */
> -#define  PCI_COMMAND_PARITY	0x40	/* Enable parity checking */
> -#define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
> -#define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
> -#define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
> -#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
> -
> -#define PCI_STATUS		0x06	/* 16 bits */
> -#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
> -#define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
> -#define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
> -#define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
> -#define  PCI_STATUS_FAST_BACK	0x80	/* Accept fast-back to back */
> -#define  PCI_STATUS_PARITY	0x100	/* Detected parity error */
> -#define  PCI_STATUS_DEVSEL_MASK	0x600	/* DEVSEL timing */
> -#define  PCI_STATUS_DEVSEL_FAST		0x000
> -#define  PCI_STATUS_DEVSEL_MEDIUM	0x200
> -#define  PCI_STATUS_DEVSEL_SLOW		0x400
> -#define  PCI_STATUS_SIG_TARGET_ABORT	0x800 /* Set on target abort */
> -#define  PCI_STATUS_REC_TARGET_ABORT	0x1000 /* Master ack of " */
> -#define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
> -#define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
> -#define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> -
> -#define PCI_CLASS_REVISION	0x08	/* High 24 bits are class, low 8 revision */
> -#define PCI_REVISION_ID		0x08	/* Revision ID */
> -#define PCI_CLASS_PROG		0x09	/* Reg. Level Programming Interface */
> -#define PCI_CLASS_DEVICE	0x0a	/* Device class */
> -
> -#define PCI_CACHE_LINE_SIZE	0x0c	/* 8 bits */
> -#define PCI_LATENCY_TIMER	0x0d	/* 8 bits */
> -#define PCI_HEADER_TYPE		0x0e	/* 8 bits */
> -#define  PCI_HEADER_TYPE_NORMAL		0
> -#define  PCI_HEADER_TYPE_BRIDGE		1
> -#define  PCI_HEADER_TYPE_CARDBUS	2
> -
> -#define PCI_BIST		0x0f	/* 8 bits */
> -#define  PCI_BIST_CODE_MASK	0x0f	/* Return result */
> -#define  PCI_BIST_START		0x40	/* 1 to start BIST, 2 secs or less */
> -#define  PCI_BIST_CAPABLE	0x80	/* 1 if BIST capable */
> -
> -/*
> - * Base addresses specify locations in memory or I/O space.
> - * Decoded size can be determined by writing a value of
> - * 0xffffffff to the register, and reading it back.  Only
> - * 1 bits are decoded.
> - */
> -#define PCI_BASE_ADDRESS_0	0x10	/* 32 bits */
> -#define PCI_BASE_ADDRESS_1	0x14	/* 32 bits [htype 0,1 only] */
> -#define PCI_BASE_ADDRESS_2	0x18	/* 32 bits [htype 0 only] */
> -#define PCI_BASE_ADDRESS_3	0x1c	/* 32 bits */
> -#define PCI_BASE_ADDRESS_4	0x20	/* 32 bits */
> -#define PCI_BASE_ADDRESS_5	0x24	/* 32 bits */
> -#define  PCI_BASE_ADDRESS_SPACE		0x01	/* 0 = memory, 1 = I/O */
> -#define  PCI_BASE_ADDRESS_SPACE_IO	0x01
> -#define  PCI_BASE_ADDRESS_SPACE_MEMORY	0x00
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK	0x06
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_32	0x00	/* 32 bit address */
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_1M	0x02	/* Below 1M [obsolete] */
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_64	0x04	/* 64 bit address */
> -#define  PCI_BASE_ADDRESS_MEM_PREFETCH	0x08	/* prefetchable? */
> -#define  PCI_BASE_ADDRESS_MEM_MASK	(~0x0fUL)
> -#define  PCI_BASE_ADDRESS_IO_MASK	(~0x03UL)
> -/* bit 1 is reserved if address_space = 1 */
> -
> -/* Header type 0 (normal devices) */
> -#define PCI_CARDBUS_CIS		0x28
> -#define PCI_SUBSYSTEM_VENDOR_ID	0x2c
> -#define PCI_SUBSYSTEM_ID	0x2e
> -#define PCI_ROM_ADDRESS		0x30	/* Bits 31..11 are address, 10..1 reserved */
> -#define  PCI_ROM_ADDRESS_ENABLE	0x01
> -#define PCI_ROM_ADDRESS_MASK	(~0x7ffUL)
> -
> -#define PCI_CAPABILITY_LIST	0x34	/* Offset of first capability list entry */
> -
> -/* 0x35-0x3b are reserved */
> -#define PCI_INTERRUPT_LINE	0x3c	/* 8 bits */
> -#define PCI_INTERRUPT_PIN	0x3d	/* 8 bits */
> -#define PCI_MIN_GNT		0x3e	/* 8 bits */
> -#define PCI_MAX_LAT		0x3f	/* 8 bits */
> -
> -/* Header type 1 (PCI-to-PCI bridges) */
> -#define PCI_PRIMARY_BUS		0x18	/* Primary bus number */
> -#define PCI_SECONDARY_BUS	0x19	/* Secondary bus number */
> -#define PCI_SUBORDINATE_BUS	0x1a	/* Highest bus number behind the bridge */
> -#define PCI_SEC_LATENCY_TIMER	0x1b	/* Latency timer for secondary interface */
> -#define PCI_IO_BASE		0x1c	/* I/O range behind the bridge */
> -#define PCI_IO_LIMIT		0x1d
> -#define  PCI_IO_RANGE_TYPE_MASK	0x0fUL	/* I/O bridging type */
> -#define  PCI_IO_RANGE_TYPE_16	0x00
> -#define  PCI_IO_RANGE_TYPE_32	0x01
> -#define  PCI_IO_RANGE_MASK	(~0x0fUL)
> -#define PCI_SEC_STATUS		0x1e	/* Secondary status register, only bit 14 used */
> -#define PCI_MEMORY_BASE		0x20	/* Memory range behind */
> -#define PCI_MEMORY_LIMIT	0x22
> -#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
> -#define  PCI_MEMORY_RANGE_MASK	(~0x0fUL)
> -#define PCI_PREF_MEMORY_BASE	0x24	/* Prefetchable memory range behind */
> -#define PCI_PREF_MEMORY_LIMIT	0x26
> -#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
> -#define  PCI_PREF_RANGE_TYPE_32	0x00
> -#define  PCI_PREF_RANGE_TYPE_64	0x01
> -#define  PCI_PREF_RANGE_MASK	(~0x0fUL)
> -#define PCI_PREF_BASE_UPPER32	0x28	/* Upper half of prefetchable memory range */
> -#define PCI_PREF_LIMIT_UPPER32	0x2c
> -#define PCI_IO_BASE_UPPER16	0x30	/* Upper half of I/O addresses */
> -#define PCI_IO_LIMIT_UPPER16	0x32
> -/* 0x34 same as for htype 0 */
> -/* 0x35-0x3b is reserved */
> -#define PCI_ROM_ADDRESS1	0x38	/* Same as PCI_ROM_ADDRESS, but for htype 1 */
> -/* 0x3c-0x3d are same as for htype 0 */
> -#define PCI_BRIDGE_CONTROL	0x3e
> -#define  PCI_BRIDGE_CTL_PARITY	0x01	/* Enable parity detection on secondary interface */
> -#define  PCI_BRIDGE_CTL_SERR	0x02	/* The same for SERR forwarding */
> -#define  PCI_BRIDGE_CTL_ISA	0x04	/* Enable ISA mode */
> -#define  PCI_BRIDGE_CTL_VGA	0x08	/* Forward VGA addresses */
> -#define  PCI_BRIDGE_CTL_MASTER_ABORT	0x20  /* Report master aborts */
> -#define  PCI_BRIDGE_CTL_BUS_RESET	0x40	/* Secondary bus reset */
> -#define  PCI_BRIDGE_CTL_FAST_BACK	0x80	/* Fast Back2Back enabled on secondary interface */
> -
> -/* Header type 2 (CardBus bridges) */
> -#define PCI_CB_CAPABILITY_LIST	0x14
> -/* 0x15 reserved */
> -#define PCI_CB_SEC_STATUS	0x16	/* Secondary status */
> -#define PCI_CB_PRIMARY_BUS	0x18	/* PCI bus number */
> -#define PCI_CB_CARD_BUS		0x19	/* CardBus bus number */
> -#define PCI_CB_SUBORDINATE_BUS	0x1a	/* Subordinate bus number */
> -#define PCI_CB_LATENCY_TIMER	0x1b	/* CardBus latency timer */
> -#define PCI_CB_MEMORY_BASE_0	0x1c
> -#define PCI_CB_MEMORY_LIMIT_0	0x20
> -#define PCI_CB_MEMORY_BASE_1	0x24
> -#define PCI_CB_MEMORY_LIMIT_1	0x28
> -#define PCI_CB_IO_BASE_0	0x2c
> -#define PCI_CB_IO_BASE_0_HI	0x2e
> -#define PCI_CB_IO_LIMIT_0	0x30
> -#define PCI_CB_IO_LIMIT_0_HI	0x32
> -#define PCI_CB_IO_BASE_1	0x34
> -#define PCI_CB_IO_BASE_1_HI	0x36
> -#define PCI_CB_IO_LIMIT_1	0x38
> -#define PCI_CB_IO_LIMIT_1_HI	0x3a
> -#define  PCI_CB_IO_RANGE_MASK	(~0x03UL)
> -/* 0x3c-0x3d are same as for htype 0 */
> -#define PCI_CB_BRIDGE_CONTROL	0x3e
> -#define  PCI_CB_BRIDGE_CTL_PARITY	0x01	/* Similar to standard bridge control register */
> -#define  PCI_CB_BRIDGE_CTL_SERR		0x02
> -#define  PCI_CB_BRIDGE_CTL_ISA		0x04
> -#define  PCI_CB_BRIDGE_CTL_VGA		0x08
> -#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT	0x20
> -#define  PCI_CB_BRIDGE_CTL_CB_RESET	0x40	/* CardBus reset */
> -#define  PCI_CB_BRIDGE_CTL_16BIT_INT	0x80	/* Enable interrupt for 16-bit cards */
> -#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100	/* Prefetch enable for both memory regions */
> -#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
> -#define  PCI_CB_BRIDGE_CTL_POST_WRITES	0x400
> -#define PCI_CB_SUBSYSTEM_VENDOR_ID	0x40
> -#define PCI_CB_SUBSYSTEM_ID		0x42
> -#define PCI_CB_LEGACY_MODE_BASE		0x44	/* 16-bit PC Card legacy mode base address (ExCa) */
> -/* 0x48-0x7f reserved */
> -
> -/* Capability lists */
> -
> -#define PCI_CAP_LIST_ID		0	/* Capability ID */
> -#define  PCI_CAP_ID_PM		0x01	/* Power Management */
> -#define  PCI_CAP_ID_AGP		0x02	/* Accelerated Graphics Port */
> -#define  PCI_CAP_ID_VPD		0x03	/* Vital Product Data */
> -#define  PCI_CAP_ID_SLOTID	0x04	/* Slot Identification */
> -#define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
> -#define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
> -#define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
> -#define  PCI_CAP_ID_HT		0x08	/* HyperTransport */
> -#define  PCI_CAP_ID_VNDR	0x09	/* Vendor specific */
> -#define  PCI_CAP_ID_DBG		0x0A	/* Debug port */
> -#define  PCI_CAP_ID_CCRC	0x0B	/* CompactPCI Central Resource Control */
> -#define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
> -#define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
> -#define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> -#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
> -#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
> -#define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
> -#define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
> -#define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
> -#define PCI_CAP_SIZEOF		4
> -
> -/* Power Management Registers */
> -
> -#define PCI_PM_PMC		2	/* PM Capabilities Register */
> -#define  PCI_PM_CAP_VER_MASK	0x0007	/* Version */
> -#define  PCI_PM_CAP_PME_CLOCK	0x0008	/* PME clock required */
> -#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
> -#define  PCI_PM_CAP_DSI		0x0020	/* Device specific initialization */
> -#define  PCI_PM_CAP_AUX_POWER	0x01C0	/* Auxilliary power support mask */
> -#define  PCI_PM_CAP_D1		0x0200	/* D1 power state support */
> -#define  PCI_PM_CAP_D2		0x0400	/* D2 power state support */
> -#define  PCI_PM_CAP_PME		0x0800	/* PME pin supported */
> -#define  PCI_PM_CAP_PME_MASK	0xF800	/* PME Mask of all supported states */
> -#define  PCI_PM_CAP_PME_D0	0x0800	/* PME# from D0 */
> -#define  PCI_PM_CAP_PME_D1	0x1000	/* PME# from D1 */
> -#define  PCI_PM_CAP_PME_D2	0x2000	/* PME# from D2 */
> -#define  PCI_PM_CAP_PME_D3	0x4000	/* PME# from D3 (hot) */
> -#define  PCI_PM_CAP_PME_D3cold	0x8000	/* PME# from D3 (cold) */
> -#define  PCI_PM_CAP_PME_SHIFT	11	/* Start of the PME Mask in PMC */
> -#define PCI_PM_CTRL		4	/* PM control and status register */
> -#define  PCI_PM_CTRL_STATE_MASK	0x0003	/* Current power state (D0 to D3) */
> -#define  PCI_PM_CTRL_NO_SOFT_RESET	0x0008	/* No reset for D3hot->D0 */
> -#define  PCI_PM_CTRL_PME_ENABLE	0x0100	/* PME pin enable */
> -#define  PCI_PM_CTRL_DATA_SEL_MASK	0x1e00	/* Data select (??) */
> -#define  PCI_PM_CTRL_DATA_SCALE_MASK	0x6000	/* Data scale (??) */
> -#define  PCI_PM_CTRL_PME_STATUS	0x8000	/* PME pin status */
> -#define PCI_PM_PPB_EXTENSIONS	6	/* PPB support extensions (??) */
> -#define  PCI_PM_PPB_B2_B3	0x40	/* Stop clock when in D3hot (??) */
> -#define  PCI_PM_BPCC_ENABLE	0x80	/* Bus power/clock control enable (??) */
> -#define PCI_PM_DATA_REGISTER	7	/* (??) */
> -#define PCI_PM_SIZEOF		8
> -
> -/* AGP registers */
> -
> -#define PCI_AGP_VERSION		2	/* BCD version number */
> -#define PCI_AGP_RFU		3	/* Rest of capability flags */
> -#define PCI_AGP_STATUS		4	/* Status register */
> -#define  PCI_AGP_STATUS_RQ_MASK	0xff000000	/* Maximum number of requests - 1 */
> -#define  PCI_AGP_STATUS_SBA	0x0200	/* Sideband addressing supported */
> -#define  PCI_AGP_STATUS_64BIT	0x0020	/* 64-bit addressing supported */
> -#define  PCI_AGP_STATUS_FW	0x0010	/* FW transfers supported */
> -#define  PCI_AGP_STATUS_RATE4	0x0004	/* 4x transfer rate supported */
> -#define  PCI_AGP_STATUS_RATE2	0x0002	/* 2x transfer rate supported */
> -#define  PCI_AGP_STATUS_RATE1	0x0001	/* 1x transfer rate supported */
> -#define PCI_AGP_COMMAND		8	/* Control register */
> -#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
> -#define  PCI_AGP_COMMAND_SBA	0x0200	/* Sideband addressing enabled */
> -#define  PCI_AGP_COMMAND_AGP	0x0100	/* Allow processing of AGP transactions */
> -#define  PCI_AGP_COMMAND_64BIT	0x0020 	/* Allow processing of 64-bit addresses */
> -#define  PCI_AGP_COMMAND_FW	0x0010 	/* Force FW transfers */
> -#define  PCI_AGP_COMMAND_RATE4	0x0004	/* Use 4x rate */
> -#define  PCI_AGP_COMMAND_RATE2	0x0002	/* Use 2x rate */
> -#define  PCI_AGP_COMMAND_RATE1	0x0001	/* Use 1x rate */
> -#define PCI_AGP_SIZEOF		12
> -
> -/* Vital Product Data */
> -
> -#define PCI_VPD_ADDR		2	/* Address to access (15 bits!) */
> -#define  PCI_VPD_ADDR_MASK	0x7fff	/* Address mask */
> -#define  PCI_VPD_ADDR_F		0x8000	/* Write 0, 1 indicates completion */
> -#define PCI_VPD_DATA		4	/* 32-bits of data returned here */
> -
> -/* Slot Identification */
> -
> -#define PCI_SID_ESR		2	/* Expansion Slot Register */
> -#define  PCI_SID_ESR_NSLOTS	0x1f	/* Number of expansion slots available */
> -#define  PCI_SID_ESR_FIC	0x20	/* First In Chassis Flag */
> -#define PCI_SID_CHASSIS_NR	3	/* Chassis Number */
> -
> -/* Message Signalled Interrupts registers */
> -
> -#define PCI_MSI_FLAGS		2	/* Various flags */
> -#define  PCI_MSI_FLAGS_64BIT	0x80	/* 64-bit addresses allowed */
> -#define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
> -#define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
> -#define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
> -#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
> -#define PCI_MSI_RFU		3	/* Rest of capability flags */
> -#define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
> -#define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
> -#define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
> -#define PCI_MSI_MASK_32		12	/* Mask bits register for 32-bit devices */
> -#define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
> -#define PCI_MSI_MASK_64		16	/* Mask bits register for 64-bit devices */
> -
> -/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
> -#define PCI_MSIX_FLAGS		2
> -#define  PCI_MSIX_FLAGS_QSIZE	0x7FF
> -#define  PCI_MSIX_FLAGS_ENABLE	(1 << 15)
> -#define  PCI_MSIX_FLAGS_MASKALL	(1 << 14)
> -#define PCI_MSIX_FLAGS_BIRMASK	(7 << 0)
> -
> -/* CompactPCI Hotswap Register */
> -
> -#define PCI_CHSWP_CSR		2	/* Control and Status Register */
> -#define  PCI_CHSWP_DHA		0x01	/* Device Hiding Arm */
> -#define  PCI_CHSWP_EIM		0x02	/* ENUM# Signal Mask */
> -#define  PCI_CHSWP_PIE		0x04	/* Pending Insert or Extract */
> -#define  PCI_CHSWP_LOO		0x08	/* LED On / Off */
> -#define  PCI_CHSWP_PI		0x30	/* Programming Interface */
> -#define  PCI_CHSWP_EXT		0x40	/* ENUM# status - extraction */
> -#define  PCI_CHSWP_INS		0x80	/* ENUM# status - insertion */
> -
> -/* PCI Advanced Feature registers */
> -
> -#define PCI_AF_LENGTH		2
> -#define PCI_AF_CAP		3
> -#define  PCI_AF_CAP_TP		0x01
> -#define  PCI_AF_CAP_FLR		0x02
> -#define PCI_AF_CTRL		4
> -#define  PCI_AF_CTRL_FLR	0x01
> -#define PCI_AF_STATUS		5
> -#define  PCI_AF_STATUS_TP	0x01
> -
> -/* PCI-X registers */
> -
> -#define PCI_X_CMD		2	/* Modes & Features */
> -#define  PCI_X_CMD_DPERR_E	0x0001	/* Data Parity Error Recovery Enable */
> -#define  PCI_X_CMD_ERO		0x0002	/* Enable Relaxed Ordering */
> -#define  PCI_X_CMD_READ_512	0x0000	/* 512 byte maximum read byte count */
> -#define  PCI_X_CMD_READ_1K	0x0004	/* 1Kbyte maximum read byte count */
> -#define  PCI_X_CMD_READ_2K	0x0008	/* 2Kbyte maximum read byte count */
> -#define  PCI_X_CMD_READ_4K	0x000c	/* 4Kbyte maximum read byte count */
> -#define  PCI_X_CMD_MAX_READ	0x000c	/* Max Memory Read Byte Count */
> -				/* Max # of outstanding split transactions */
> -#define  PCI_X_CMD_SPLIT_1	0x0000	/* Max 1 */
> -#define  PCI_X_CMD_SPLIT_2	0x0010	/* Max 2 */
> -#define  PCI_X_CMD_SPLIT_3	0x0020	/* Max 3 */
> -#define  PCI_X_CMD_SPLIT_4	0x0030	/* Max 4 */
> -#define  PCI_X_CMD_SPLIT_8	0x0040	/* Max 8 */
> -#define  PCI_X_CMD_SPLIT_12	0x0050	/* Max 12 */
> -#define  PCI_X_CMD_SPLIT_16	0x0060	/* Max 16 */
> -#define  PCI_X_CMD_SPLIT_32	0x0070	/* Max 32 */
> -#define  PCI_X_CMD_MAX_SPLIT	0x0070	/* Max Outstanding Split Transactions */
> -#define  PCI_X_CMD_VERSION(x) 	(((x) >> 12) & 3) /* Version */
> -#define PCI_X_STATUS		4	/* PCI-X capabilities */
> -#define  PCI_X_STATUS_DEVFN	0x000000ff	/* A copy of devfn */
> -#define  PCI_X_STATUS_BUS	0x0000ff00	/* A copy of bus nr */
> -#define  PCI_X_STATUS_64BIT	0x00010000	/* 64-bit device */
> -#define  PCI_X_STATUS_133MHZ	0x00020000	/* 133 MHz capable */
> -#define  PCI_X_STATUS_SPL_DISC	0x00040000	/* Split Completion Discarded */
> -#define  PCI_X_STATUS_UNX_SPL	0x00080000	/* Unexpected Split Completion */
> -#define  PCI_X_STATUS_COMPLEX	0x00100000	/* Device Complexity */
> -#define  PCI_X_STATUS_MAX_READ	0x00600000	/* Designed Max Memory Read Count */
> -#define  PCI_X_STATUS_MAX_SPLIT	0x03800000	/* Designed Max Outstanding Split Transactions */
> -#define  PCI_X_STATUS_MAX_CUM	0x1c000000	/* Designed Max Cumulative Read Size */
> -#define  PCI_X_STATUS_SPL_ERR	0x20000000	/* Rcvd Split Completion Error Msg */
> -#define  PCI_X_STATUS_266MHZ	0x40000000	/* 266 MHz capable */
> -#define  PCI_X_STATUS_533MHZ	0x80000000	/* 533 MHz capable */
> -
> -/* PCI Express capability registers */
> -
> -#define PCI_EXP_FLAGS		2	/* Capabilities register */
> -#define PCI_EXP_FLAGS_VERS	0x000f	/* Capability version */
> -#define PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
> -#define  PCI_EXP_TYPE_ENDPOINT	0x0	/* Express Endpoint */
> -#define  PCI_EXP_TYPE_LEG_END	0x1	/* Legacy Endpoint */
> -#define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
> -#define  PCI_EXP_TYPE_UPSTREAM	0x5	/* Upstream Port */
> -#define  PCI_EXP_TYPE_DOWNSTREAM 0x6	/* Downstream Port */
> -#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7	/* PCI/PCI-X Bridge */
> -#define  PCI_EXP_TYPE_RC_END	0x9	/* Root Complex Integrated Endpoint */
> -#define  PCI_EXP_TYPE_RC_EC	0x10	/* Root Complex Event Collector */
> -#define PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
> -#define PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
> -#define PCI_EXP_DEVCAP		4	/* Device capabilities */
> -#define  PCI_EXP_DEVCAP_PAYLOAD	0x07	/* Max_Payload_Size */
> -#define  PCI_EXP_DEVCAP_PHANTOM	0x18	/* Phantom functions */
> -#define  PCI_EXP_DEVCAP_EXT_TAG	0x20	/* Extended tags */
> -#define  PCI_EXP_DEVCAP_L0S	0x1c0	/* L0s Acceptable Latency */
> -#define  PCI_EXP_DEVCAP_L1	0xe00	/* L1 Acceptable Latency */
> -#define  PCI_EXP_DEVCAP_ATN_BUT	0x1000	/* Attention Button Present */
> -#define  PCI_EXP_DEVCAP_ATN_IND	0x2000	/* Attention Indicator Present */
> -#define  PCI_EXP_DEVCAP_PWR_IND	0x4000	/* Power Indicator Present */
> -#define  PCI_EXP_DEVCAP_RBER	0x8000	/* Role-Based Error Reporting */
> -#define  PCI_EXP_DEVCAP_PWR_VAL	0x3fc0000 /* Slot Power Limit Value */
> -#define  PCI_EXP_DEVCAP_PWR_SCL	0xc000000 /* Slot Power Limit Scale */
> -#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
> -#define PCI_EXP_DEVCTL		8	/* Device Control */
> -#define  PCI_EXP_DEVCTL_CERE	0x0001	/* Correctable Error Reporting En. */
> -#define  PCI_EXP_DEVCTL_NFERE	0x0002	/* Non-Fatal Error Reporting Enable */
> -#define  PCI_EXP_DEVCTL_FERE	0x0004	/* Fatal Error Reporting Enable */
> -#define  PCI_EXP_DEVCTL_URRE	0x0008	/* Unsupported Request Reporting En. */
> -#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
> -#define  PCI_EXP_DEVCTL_PAYLOAD	0x00e0	/* Max_Payload_Size */
> -#define  PCI_EXP_DEVCTL_EXT_TAG	0x0100	/* Extended Tag Field Enable */
> -#define  PCI_EXP_DEVCTL_PHANTOM	0x0200	/* Phantom Functions Enable */
> -#define  PCI_EXP_DEVCTL_AUX_PME	0x0400	/* Auxiliary Power PM Enable */
> -#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
> -#define  PCI_EXP_DEVCTL_READRQ	0x7000	/* Max_Read_Request_Size */
> -#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
> -#define PCI_EXP_DEVSTA		10	/* Device Status */
> -#define  PCI_EXP_DEVSTA_CED	0x01	/* Correctable Error Detected */
> -#define  PCI_EXP_DEVSTA_NFED	0x02	/* Non-Fatal Error Detected */
> -#define  PCI_EXP_DEVSTA_FED	0x04	/* Fatal Error Detected */
> -#define  PCI_EXP_DEVSTA_URD	0x08	/* Unsupported Request Detected */
> -#define  PCI_EXP_DEVSTA_AUXPD	0x10	/* AUX Power Detected */
> -#define  PCI_EXP_DEVSTA_TRPND	0x20	/* Transactions Pending */
> -#define PCI_EXP_LNKCAP		12	/* Link Capabilities */
> -#define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
> -#define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
> -#define  PCI_EXP_LNKCAP_ASPMS	0x00000c00 /* ASPM Support */
> -#define  PCI_EXP_LNKCAP_L0SEL	0x00007000 /* L0s Exit Latency */
> -#define  PCI_EXP_LNKCAP_L1EL	0x00038000 /* L1 Exit Latency */
> -#define  PCI_EXP_LNKCAP_CLKPM	0x00040000 /* L1 Clock Power Management */
> -#define  PCI_EXP_LNKCAP_SDERC	0x00080000 /* Suprise Down Error Reporting Capable */
> -#define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
> -#define  PCI_EXP_LNKCAP_LBNC	0x00200000 /* Link Bandwidth Notification Capability */
> -#define  PCI_EXP_LNKCAP_PN	0xff000000 /* Port Number */
> -#define PCI_EXP_LNKCTL		16	/* Link Control */
> -#define  PCI_EXP_LNKCTL_ASPMC	0x0003	/* ASPM Control */
> -#define  PCI_EXP_LNKCTL_RCB	0x0008	/* Read Completion Boundary */
> -#define  PCI_EXP_LNKCTL_LD	0x0010	/* Link Disable */
> -#define  PCI_EXP_LNKCTL_RL	0x0020	/* Retrain Link */
> -#define  PCI_EXP_LNKCTL_CCC	0x0040	/* Common Clock Configuration */
> -#define  PCI_EXP_LNKCTL_ES	0x0080	/* Extended Synch */
> -#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100	/* Enable clkreq */
> -#define  PCI_EXP_LNKCTL_HAWD	0x0200	/* Hardware Autonomous Width Disable */
> -#define  PCI_EXP_LNKCTL_LBMIE	0x0400	/* Link Bandwidth Management Interrupt Enable */
> -#define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Lnk Autonomous Bandwidth Interrupt Enable */
> -#define PCI_EXP_LNKSTA		18	/* Link Status */
> -#define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
> -#define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Nogotiated Link Width */
> -#define  PCI_EXP_LNKSTA_LT	0x0800	/* Link Training */
> -#define  PCI_EXP_LNKSTA_SLC	0x1000	/* Slot Clock Configuration */
> -#define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
> -#define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
> -#define  PCI_EXP_LNKSTA_LABS	0x8000	/* Link Autonomous Bandwidth Status */
> -#define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
> -#define  PCI_EXP_SLTCAP_ABP	0x00000001 /* Attention Button Present */
> -#define  PCI_EXP_SLTCAP_PCP	0x00000002 /* Power Controller Present */
> -#define  PCI_EXP_SLTCAP_MRLSP	0x00000004 /* MRL Sensor Present */
> -#define  PCI_EXP_SLTCAP_AIP	0x00000008 /* Attention Indicator Present */
> -#define  PCI_EXP_SLTCAP_PIP	0x00000010 /* Power Indicator Present */
> -#define  PCI_EXP_SLTCAP_HPS	0x00000020 /* Hot-Plug Surprise */
> -#define  PCI_EXP_SLTCAP_HPC	0x00000040 /* Hot-Plug Capable */
> -#define  PCI_EXP_SLTCAP_SPLV	0x00007f80 /* Slot Power Limit Value */
> -#define  PCI_EXP_SLTCAP_SPLS	0x00018000 /* Slot Power Limit Scale */
> -#define  PCI_EXP_SLTCAP_EIP	0x00020000 /* Electromechanical Interlock Present */
> -#define  PCI_EXP_SLTCAP_NCCS	0x00040000 /* No Command Completed Support */
> -#define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
> -#define PCI_EXP_SLTCTL		24	/* Slot Control */
> -#define  PCI_EXP_SLTCTL_ABPE	0x0001	/* Attention Button Pressed Enable */
> -#define  PCI_EXP_SLTCTL_PFDE	0x0002	/* Power Fault Detected Enable */
> -#define  PCI_EXP_SLTCTL_MRLSCE	0x0004	/* MRL Sensor Changed Enable */
> -#define  PCI_EXP_SLTCTL_PDCE	0x0008	/* Presence Detect Changed Enable */
> -#define  PCI_EXP_SLTCTL_CCIE	0x0010	/* Command Completed Interrupt Enable */
> -#define  PCI_EXP_SLTCTL_HPIE	0x0020	/* Hot-Plug Interrupt Enable */
> -#define  PCI_EXP_SLTCTL_AIC	0x00c0	/* Attention Indicator Control */
> -#define  PCI_EXP_SLTCTL_PIC	0x0300	/* Power Indicator Control */
> -#define  PCI_EXP_SLTCTL_PCC	0x0400	/* Power Controller Control */
> -#define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
> -#define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
> -#define PCI_EXP_SLTSTA		26	/* Slot Status */
> -#define  PCI_EXP_SLTSTA_ABP	0x0001	/* Attention Button Pressed */
> -#define  PCI_EXP_SLTSTA_PFD	0x0002	/* Power Fault Detected */
> -#define  PCI_EXP_SLTSTA_MRLSC	0x0004	/* MRL Sensor Changed */
> -#define  PCI_EXP_SLTSTA_PDC	0x0008	/* Presence Detect Changed */
> -#define  PCI_EXP_SLTSTA_CC	0x0010	/* Command Completed */
> -#define  PCI_EXP_SLTSTA_MRLSS	0x0020	/* MRL Sensor State */
> -#define  PCI_EXP_SLTSTA_PDS	0x0040	/* Presence Detect State */
> -#define  PCI_EXP_SLTSTA_EIS	0x0080	/* Electromechanical Interlock Status */
> -#define  PCI_EXP_SLTSTA_DLLSC	0x0100	/* Data Link Layer State Changed */
> -#define PCI_EXP_RTCTL		28	/* Root Control */
> -#define  PCI_EXP_RTCTL_SECEE	0x01	/* System Error on Correctable Error */
> -#define  PCI_EXP_RTCTL_SENFEE	0x02	/* System Error on Non-Fatal Error */
> -#define  PCI_EXP_RTCTL_SEFEE	0x04	/* System Error on Fatal Error */
> -#define  PCI_EXP_RTCTL_PMEIE	0x08	/* PME Interrupt Enable */
> -#define  PCI_EXP_RTCTL_CRSSVE	0x10	/* CRS Software Visibility Enable */
> -#define PCI_EXP_RTCAP		30	/* Root Capabilities */
> -#define PCI_EXP_RTSTA		32	/* Root Status */
> -#define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
> -#define  PCI_EXP_DEVCAP2_ARI	0x20	/* Alternative Routing-ID */
> -#define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
> -#define  PCI_EXP_DEVCTL2_ARI	0x20	/* Alternative Routing-ID */
> -#define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
> -#define PCI_EXP_SLTCTL2		56	/* Slot Control 2 */
> -
> -/* Extended Capabilities (PCI-X 2.0 and Express) */
> -#define PCI_EXT_CAP_ID(header)		(header & 0x0000ffff)
> -#define PCI_EXT_CAP_VER(header)		((header >> 16) & 0xf)
> -#define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
> -
> -#define PCI_EXT_CAP_ID_ERR	1
> -#define PCI_EXT_CAP_ID_VC	2
> -#define PCI_EXT_CAP_ID_DSN	3
> -#define PCI_EXT_CAP_ID_PWR	4
> -#define PCI_EXT_CAP_ID_ARI	14
> -#define PCI_EXT_CAP_ID_ATS	15
> -#define PCI_EXT_CAP_ID_SRIOV	16
> -
> -/* Advanced Error Reporting */
> -#define PCI_ERR_UNCOR_STATUS	4	/* Uncorrectable Error Status */
> -#define  PCI_ERR_UNC_TRAIN	0x00000001	/* Training */
> -#define  PCI_ERR_UNC_DLP	0x00000010	/* Data Link Protocol */
> -#define  PCI_ERR_UNC_POISON_TLP	0x00001000	/* Poisoned TLP */
> -#define  PCI_ERR_UNC_FCP	0x00002000	/* Flow Control Protocol */
> -#define  PCI_ERR_UNC_COMP_TIME	0x00004000	/* Completion Timeout */
> -#define  PCI_ERR_UNC_COMP_ABORT	0x00008000	/* Completer Abort */
> -#define  PCI_ERR_UNC_UNX_COMP	0x00010000	/* Unexpected Completion */
> -#define  PCI_ERR_UNC_RX_OVER	0x00020000	/* Receiver Overflow */
> -#define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
> -#define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
> -#define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
> -#define PCI_ERR_UNCOR_MASK	8	/* Uncorrectable Error Mask */
> -	/* Same bits as above */
> -#define PCI_ERR_UNCOR_SEVER	12	/* Uncorrectable Error Severity */
> -	/* Same bits as above */
> -#define PCI_ERR_COR_STATUS	16	/* Correctable Error Status */
> -#define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
> -#define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
> -#define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
> -#define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
> -#define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
> -#define PCI_ERR_COR_MASK	20	/* Correctable Error Mask */
> -	/* Same bits as above */
> -#define PCI_ERR_CAP		24	/* Advanced Error Capabilities */
> -#define  PCI_ERR_CAP_FEP(x)	((x) & 31)	/* First Error Pointer */
> -#define  PCI_ERR_CAP_ECRC_GENC	0x00000020	/* ECRC Generation Capable */
> -#define  PCI_ERR_CAP_ECRC_GENE	0x00000040	/* ECRC Generation Enable */
> -#define  PCI_ERR_CAP_ECRC_CHKC	0x00000080	/* ECRC Check Capable */
> -#define  PCI_ERR_CAP_ECRC_CHKE	0x00000100	/* ECRC Check Enable */
> -#define PCI_ERR_HEADER_LOG	28	/* Header Log Register (16 bytes) */
> -#define PCI_ERR_ROOT_COMMAND	44	/* Root Error Command */
> -/* Correctable Err Reporting Enable */
> -#define PCI_ERR_ROOT_CMD_COR_EN		0x00000001
> -/* Non-fatal Err Reporting Enable */
> -#define PCI_ERR_ROOT_CMD_NONFATAL_EN	0x00000002
> -/* Fatal Err Reporting Enable */
> -#define PCI_ERR_ROOT_CMD_FATAL_EN	0x00000004
> -#define PCI_ERR_ROOT_STATUS	48
> -#define PCI_ERR_ROOT_COR_RCV		0x00000001	/* ERR_COR Received */
> -/* Multi ERR_COR Received */
> -#define PCI_ERR_ROOT_MULTI_COR_RCV	0x00000002
> -/* ERR_FATAL/NONFATAL Recevied */
> -#define PCI_ERR_ROOT_UNCOR_RCV		0x00000004
> -/* Multi ERR_FATAL/NONFATAL Recevied */
> -#define PCI_ERR_ROOT_MULTI_UNCOR_RCV	0x00000008
> -#define PCI_ERR_ROOT_FIRST_FATAL	0x00000010	/* First Fatal */
> -#define PCI_ERR_ROOT_NONFATAL_RCV	0x00000020	/* Non-Fatal Received */
> -#define PCI_ERR_ROOT_FATAL_RCV		0x00000040	/* Fatal Received */
> -#define PCI_ERR_ROOT_COR_SRC	52
> -#define PCI_ERR_ROOT_SRC	54
> -
> -/* Virtual Channel */
> -#define PCI_VC_PORT_REG1	4
> -#define PCI_VC_PORT_REG2	8
> -#define PCI_VC_PORT_CTRL	12
> -#define PCI_VC_PORT_STATUS	14
> -#define PCI_VC_RES_CAP		16
> -#define PCI_VC_RES_CTRL		20
> -#define PCI_VC_RES_STATUS	26
> -
> -/* Power Budgeting */
> -#define PCI_PWR_DSR		4	/* Data Select Register */
> -#define PCI_PWR_DATA		8	/* Data Register */
> -#define  PCI_PWR_DATA_BASE(x)	((x) & 0xff)	    /* Base Power */
> -#define  PCI_PWR_DATA_SCALE(x)	(((x) >> 8) & 3)    /* Data Scale */
> -#define  PCI_PWR_DATA_PM_SUB(x)	(((x) >> 10) & 7)   /* PM Sub State */
> -#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
> -#define  PCI_PWR_DATA_TYPE(x)	(((x) >> 15) & 7)   /* Type */
> -#define  PCI_PWR_DATA_RAIL(x)	(((x) >> 18) & 7)   /* Power Rail */
> -#define PCI_PWR_CAP		12	/* Capability */
> -#define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
> -
> -/*
> - * Hypertransport sub capability types
> - *
> - * Unfortunately there are both 3 bit and 5 bit capability types defined
> - * in the HT spec, catering for that is a little messy. You probably don't
> - * want to use these directly, just use pci_find_ht_capability() and it
> - * will do the right thing for you.
> - */
> -#define HT_3BIT_CAP_MASK	0xE0
> -#define HT_CAPTYPE_SLAVE	0x00	/* Slave/Primary link configuration */
> -#define HT_CAPTYPE_HOST		0x20	/* Host/Secondary link configuration */
> -
> -#define HT_5BIT_CAP_MASK	0xF8
> -#define HT_CAPTYPE_IRQ		0x80	/* IRQ Configuration */
> -#define HT_CAPTYPE_REMAPPING_40	0xA0	/* 40 bit address remapping */
> -#define HT_CAPTYPE_REMAPPING_64 0xA2	/* 64 bit address remapping */
> -#define HT_CAPTYPE_UNITID_CLUMP	0x90	/* Unit ID clumping */
> -#define HT_CAPTYPE_EXTCONF	0x98	/* Extended Configuration Space Access */
> -#define HT_CAPTYPE_MSI_MAPPING	0xA8	/* MSI Mapping Capability */
> -#define  HT_MSI_FLAGS		0x02		/* Offset to flags */
> -#define  HT_MSI_FLAGS_ENABLE	0x1		/* Mapping enable */
> -#define  HT_MSI_FLAGS_FIXED	0x2		/* Fixed mapping only */
> -#define  HT_MSI_FIXED_ADDR	0x00000000FEE00000ULL	/* Fixed addr */
> -#define  HT_MSI_ADDR_LO		0x04		/* Offset to low addr bits */
> -#define  HT_MSI_ADDR_LO_MASK	0xFFF00000	/* Low address bit mask */
> -#define  HT_MSI_ADDR_HI		0x08		/* Offset to high addr bits */
> -#define HT_CAPTYPE_DIRECT_ROUTE	0xB0	/* Direct routing configuration */
> -#define HT_CAPTYPE_VCSET	0xB8	/* Virtual Channel configuration */
> -#define HT_CAPTYPE_ERROR_RETRY	0xC0	/* Retry on error configuration */
> -#define HT_CAPTYPE_GEN3		0xD0	/* Generation 3 hypertransport configuration */
> -#define HT_CAPTYPE_PM		0xE0	/* Hypertransport powermanagement configuration */
> -
> -/* Alternative Routing-ID Interpretation */
> -#define PCI_ARI_CAP		0x04	/* ARI Capability Register */
> -#define  PCI_ARI_CAP_MFVC	0x0001	/* MFVC Function Groups Capability */
> -#define  PCI_ARI_CAP_ACS	0x0002	/* ACS Function Groups Capability */
> -#define  PCI_ARI_CAP_NFN(x)	(((x) >> 8) & 0xff) /* Next Function Number */
> -#define PCI_ARI_CTRL		0x06	/* ARI Control Register */
> -#define  PCI_ARI_CTRL_MFVC	0x0001	/* MFVC Function Groups Enable */
> -#define  PCI_ARI_CTRL_ACS	0x0002	/* ACS Function Groups Enable */
> -#define  PCI_ARI_CTRL_FG(x)	(((x) >> 4) & 7) /* Function Group */
> -
> -/* Address Translation Service */
> -#define PCI_ATS_CAP		0x04	/* ATS Capability Register */
> -#define  PCI_ATS_CAP_QDEP(x)	((x) & 0x1f)	/* Invalidate Queue Depth */
> -#define  PCI_ATS_MAX_QDEP	32	/* Max Invalidate Queue Depth */
> -#define PCI_ATS_CTRL		0x06	/* ATS Control Register */
> -#define  PCI_ATS_CTRL_ENABLE	0x8000	/* ATS Enable */
> -#define  PCI_ATS_CTRL_STU(x)	((x) & 0x1f)	/* Smallest Translation Unit */
> -#define  PCI_ATS_MIN_STU	12	/* shift of minimum STU block */
> -
> -/* Single Root I/O Virtualization */
> -#define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
> -#define  PCI_SRIOV_CAP_VFM	0x01	/* VF Migration Capable */
> -#define  PCI_SRIOV_CAP_INTR(x)	((x) >> 21) /* Interrupt Message Number */
> -#define PCI_SRIOV_CTRL		0x08	/* SR-IOV Control */
> -#define  PCI_SRIOV_CTRL_VFE	0x01	/* VF Enable */
> -#define  PCI_SRIOV_CTRL_VFM	0x02	/* VF Migration Enable */
> -#define  PCI_SRIOV_CTRL_INTR	0x04	/* VF Migration Interrupt Enable */
> -#define  PCI_SRIOV_CTRL_MSE	0x08	/* VF Memory Space Enable */
> -#define  PCI_SRIOV_CTRL_ARI	0x10	/* ARI Capable Hierarchy */
> -#define PCI_SRIOV_STATUS	0x0a	/* SR-IOV Status */
> -#define  PCI_SRIOV_STATUS_VFM	0x01	/* VF Migration Status */
> -#define PCI_SRIOV_INITIAL_VF	0x0c	/* Initial VFs */
> -#define PCI_SRIOV_TOTAL_VF	0x0e	/* Total VFs */
> -#define PCI_SRIOV_NUM_VF	0x10	/* Number of VFs */
> -#define PCI_SRIOV_FUNC_LINK	0x12	/* Function Dependency Link */
> -#define PCI_SRIOV_VF_OFFSET	0x14	/* First VF Offset */
> -#define PCI_SRIOV_VF_STRIDE	0x16	/* Following VF Stride */
> -#define PCI_SRIOV_VF_DID	0x1a	/* VF Device ID */
> -#define PCI_SRIOV_SUP_PGSIZE	0x1c	/* Supported Page Sizes */
> -#define PCI_SRIOV_SYS_PGSIZE	0x20	/* System Page Size */
> -#define PCI_SRIOV_BAR		0x24	/* VF BAR0 */
> -#define  PCI_SRIOV_NUM_BARS	6	/* Number of VF BARs */
> -#define PCI_SRIOV_VFM		0x3c	/* VF Migration State Array Offset*/
> -#define  PCI_SRIOV_VFM_BIR(x)	((x) & 7)	/* State BIR */
> -#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)	/* State Offset */
> -#define  PCI_SRIOV_VFM_UA	0x0	/* Inactive.Unavailable */
> -#define  PCI_SRIOV_VFM_MI	0x1	/* Dormant.MigrateIn */
> -#define  PCI_SRIOV_VFM_MO	0x2	/* Active.MigrateOut */
> -#define  PCI_SRIOV_VFM_AV	0x3	/* Active.Available */
> -
> -#endif /* LINUX_PCI_REGS_H */
> +/*
> + *      pci_regs.h
> + *
> + *      PCI standard defines
> + *      Copyright 1994, Drew Eckhardt
> + *      Copyright 1997--1999 Martin Mares <mj@ucw.cz>
> + *
> + *      For more information, please consult the following manuals (look at
> + *      http://www.pcisig.com/ for how to get them):
> + *
> + *      PCI BIOS Specification
> + *      PCI Local Bus Specification
> + *      PCI to PCI Bridge Specification
> + *      PCI System Design Guide
> + *
> + *      For hypertransport information, please consult the following manuals
> + *      from http://www.hypertransport.org
> + *
> + *      The Hypertransport I/O Link Specification
> + */
> +
> +#ifndef LINUX_PCI_REGS_H
> +#define LINUX_PCI_REGS_H
> +
> +/*
> + * Under PCI, each device has 256 bytes of configuration address space,
> + * of which the first 64 bytes are standardized as follows:
> + */
> +#define PCI_VENDOR_ID           0x00    /* 16 bits */
> +#define PCI_DEVICE_ID           0x02    /* 16 bits */
> +#define PCI_COMMAND             0x04    /* 16 bits */
> +#define  PCI_COMMAND_IO         0x1     /* Enable response in I/O space */
> +#define  PCI_COMMAND_MEMORY     0x2     /* Enable response in Memory space */
> +#define  PCI_COMMAND_MASTER     0x4     /* Enable bus mastering */
> +#define  PCI_COMMAND_SPECIAL    0x8     /* Enable response to special cycles */
> +#define  PCI_COMMAND_INVALIDATE 0x10    /* Use memory write and invalidate */
> +#define  PCI_COMMAND_VGA_PALETTE 0x20   /* Enable palette snooping */
> +#define  PCI_COMMAND_PARITY     0x40    /* Enable parity checking */
> +#define  PCI_COMMAND_WAIT       0x80    /* Enable address/data stepping */
> +#define  PCI_COMMAND_SERR       0x100   /* Enable SERR */
> +#define  PCI_COMMAND_FAST_BACK  0x200   /* Enable back-to-back writes */
> +#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
> +
> +#define PCI_STATUS              0x06    /* 16 bits */
> +#define  PCI_STATUS_INTERRUPT   0x08    /* Interrupt status */
> +#define  PCI_STATUS_CAP_LIST    0x10    /* Support Capability List */
> +#define  PCI_STATUS_66MHZ       0x20    /* Support 66 Mhz PCI 2.1 bus */
> +#define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
> +#define  PCI_STATUS_FAST_BACK   0x80    /* Accept fast-back to back */
> +#define  PCI_STATUS_PARITY      0x100   /* Detected parity error */
> +#define  PCI_STATUS_DEVSEL_MASK 0x600   /* DEVSEL timing */
> +#define  PCI_STATUS_DEVSEL_FAST         0x000
> +#define  PCI_STATUS_DEVSEL_MEDIUM       0x200
> +#define  PCI_STATUS_DEVSEL_SLOW         0x400
> +#define  PCI_STATUS_SIG_TARGET_ABORT    0x800 /* Set on target abort */
> +#define  PCI_STATUS_REC_TARGET_ABORT    0x1000 /* Master ack of " */
> +#define  PCI_STATUS_REC_MASTER_ABORT    0x2000 /* Set on master abort */
> +#define  PCI_STATUS_SIG_SYSTEM_ERROR    0x4000 /* Set when we drive SERR */
> +#define  PCI_STATUS_DETECTED_PARITY     0x8000 /* Set on parity error */
> +
> +#define PCI_CLASS_REVISION      0x08    /* High 24 bits are class, low 8 revision */
> +#define PCI_REVISION_ID         0x08    /* Revision ID */
> +#define PCI_CLASS_PROG          0x09    /* Reg. Level Programming Interface */
> +#define PCI_CLASS_DEVICE        0x0a    /* Device class */
> +
> +#define PCI_CACHE_LINE_SIZE     0x0c    /* 8 bits */
> +#define PCI_LATENCY_TIMER       0x0d    /* 8 bits */
> +#define PCI_HEADER_TYPE         0x0e    /* 8 bits */
> +#define  PCI_HEADER_TYPE_NORMAL         0
> +#define  PCI_HEADER_TYPE_BRIDGE         1
> +#define  PCI_HEADER_TYPE_CARDBUS        2
> +
> +#define PCI_BIST                0x0f    /* 8 bits */
> +#define  PCI_BIST_CODE_MASK     0x0f    /* Return result */
> +#define  PCI_BIST_START         0x40    /* 1 to start BIST, 2 secs or less */
> +#define  PCI_BIST_CAPABLE       0x80    /* 1 if BIST capable */
> +
> +/*
> + * Base addresses specify locations in memory or I/O space.
> + * Decoded size can be determined by writing a value of
> + * 0xffffffff to the register, and reading it back.  Only
> + * 1 bits are decoded.
> + */
> +#define PCI_BASE_ADDRESS_0      0x10    /* 32 bits */
> +#define PCI_BASE_ADDRESS_1      0x14    /* 32 bits [htype 0,1 only] */
> +#define PCI_BASE_ADDRESS_2      0x18    /* 32 bits [htype 0 only] */
> +#define PCI_BASE_ADDRESS_3      0x1c    /* 32 bits */
> +#define PCI_BASE_ADDRESS_4      0x20    /* 32 bits */
> +#define PCI_BASE_ADDRESS_5      0x24    /* 32 bits */
> +#define  PCI_BASE_ADDRESS_SPACE         0x01    /* 0 = memory, 1 = I/O */
> +#define  PCI_BASE_ADDRESS_SPACE_IO      0x01
> +#define  PCI_BASE_ADDRESS_SPACE_MEMORY  0x00
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK 0x06
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_32   0x00    /* 32 bit address */
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_1M   0x02    /* Below 1M [obsolete] */
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_64   0x04    /* 64 bit address */
> +#define  PCI_BASE_ADDRESS_MEM_PREFETCH  0x08    /* prefetchable? */
> +#define  PCI_BASE_ADDRESS_MEM_MASK      (~0x0fUL)
> +#define  PCI_BASE_ADDRESS_IO_MASK       (~0x03UL)
> +/* bit 1 is reserved if address_space = 1 */
> +
> +/* Header type 0 (normal devices) */
> +#define PCI_CARDBUS_CIS         0x28
> +#define PCI_SUBSYSTEM_VENDOR_ID 0x2c
> +#define PCI_SUBSYSTEM_ID        0x2e
> +#define PCI_ROM_ADDRESS         0x30    /* Bits 31..11 are address, 10..1 reserved */
> +#define  PCI_ROM_ADDRESS_ENABLE 0x01
> +#define PCI_ROM_ADDRESS_MASK    (~0x7ffUL)
> +
> +#define PCI_CAPABILITY_LIST     0x34    /* Offset of first capability list entry */
> +
> +/* 0x35-0x3b are reserved */
> +#define PCI_INTERRUPT_LINE      0x3c    /* 8 bits */
> +#define PCI_INTERRUPT_PIN       0x3d    /* 8 bits */
> +#define PCI_MIN_GNT             0x3e    /* 8 bits */
> +#define PCI_MAX_LAT             0x3f    /* 8 bits */
> +
> +/* Header type 1 (PCI-to-PCI bridges) */
> +#define PCI_PRIMARY_BUS         0x18    /* Primary bus number */
> +#define PCI_SECONDARY_BUS       0x19    /* Secondary bus number */
> +#define PCI_SUBORDINATE_BUS     0x1a    /* Highest bus number behind the bridge */
> +#define PCI_SEC_LATENCY_TIMER   0x1b    /* Latency timer for secondary interface */
> +#define PCI_IO_BASE             0x1c    /* I/O range behind the bridge */
> +#define PCI_IO_LIMIT            0x1d
> +#define  PCI_IO_RANGE_TYPE_MASK 0x0fUL  /* I/O bridging type */
> +#define  PCI_IO_RANGE_TYPE_16   0x00
> +#define  PCI_IO_RANGE_TYPE_32   0x01
> +#define  PCI_IO_RANGE_MASK      (~0x0fUL)
> +#define PCI_SEC_STATUS          0x1e    /* Secondary status register, only bit 14 used */
> +#define PCI_MEMORY_BASE         0x20    /* Memory range behind */
> +#define PCI_MEMORY_LIMIT        0x22
> +#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
> +#define  PCI_MEMORY_RANGE_MASK  (~0x0fUL)
> +#define PCI_PREF_MEMORY_BASE    0x24    /* Prefetchable memory range behind */
> +#define PCI_PREF_MEMORY_LIMIT   0x26
> +#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
> +#define  PCI_PREF_RANGE_TYPE_32 0x00
> +#define  PCI_PREF_RANGE_TYPE_64 0x01
> +#define  PCI_PREF_RANGE_MASK    (~0x0fUL)
> +#define PCI_PREF_BASE_UPPER32   0x28    /* Upper half of prefetchable memory range */
> +#define PCI_PREF_LIMIT_UPPER32  0x2c
> +#define PCI_IO_BASE_UPPER16     0x30    /* Upper half of I/O addresses */
> +#define PCI_IO_LIMIT_UPPER16    0x32
> +/* 0x34 same as for htype 0 */
> +/* 0x35-0x3b is reserved */
> +#define PCI_ROM_ADDRESS1        0x38    /* Same as PCI_ROM_ADDRESS, but for htype 1 */
> +/* 0x3c-0x3d are same as for htype 0 */
> +#define PCI_BRIDGE_CONTROL      0x3e
> +#define  PCI_BRIDGE_CTL_PARITY  0x01    /* Enable parity detection on secondary interface */
> +#define  PCI_BRIDGE_CTL_SERR    0x02    /* The same for SERR forwarding */
> +#define  PCI_BRIDGE_CTL_ISA     0x04    /* Enable ISA mode */
> +#define  PCI_BRIDGE_CTL_VGA     0x08    /* Forward VGA addresses */
> +#define  PCI_BRIDGE_CTL_MASTER_ABORT    0x20  /* Report master aborts */
> +#define  PCI_BRIDGE_CTL_BUS_RESET       0x40    /* Secondary bus reset */
> +#define  PCI_BRIDGE_CTL_FAST_BACK       0x80    /* Fast Back2Back enabled on secondary interface */
> +
> +/* Header type 2 (CardBus bridges) */
> +#define PCI_CB_CAPABILITY_LIST  0x14
> +/* 0x15 reserved */
> +#define PCI_CB_SEC_STATUS       0x16    /* Secondary status */
> +#define PCI_CB_PRIMARY_BUS      0x18    /* PCI bus number */
> +#define PCI_CB_CARD_BUS         0x19    /* CardBus bus number */
> +#define PCI_CB_SUBORDINATE_BUS  0x1a    /* Subordinate bus number */
> +#define PCI_CB_LATENCY_TIMER    0x1b    /* CardBus latency timer */
> +#define PCI_CB_MEMORY_BASE_0    0x1c
> +#define PCI_CB_MEMORY_LIMIT_0   0x20
> +#define PCI_CB_MEMORY_BASE_1    0x24
> +#define PCI_CB_MEMORY_LIMIT_1   0x28
> +#define PCI_CB_IO_BASE_0        0x2c
> +#define PCI_CB_IO_BASE_0_HI     0x2e
> +#define PCI_CB_IO_LIMIT_0       0x30
> +#define PCI_CB_IO_LIMIT_0_HI    0x32
> +#define PCI_CB_IO_BASE_1        0x34
> +#define PCI_CB_IO_BASE_1_HI     0x36
> +#define PCI_CB_IO_LIMIT_1       0x38
> +#define PCI_CB_IO_LIMIT_1_HI    0x3a
> +#define  PCI_CB_IO_RANGE_MASK   (~0x03UL)
> +/* 0x3c-0x3d are same as for htype 0 */
> +#define PCI_CB_BRIDGE_CONTROL   0x3e
> +#define  PCI_CB_BRIDGE_CTL_PARITY       0x01    /* Similar to standard bridge control register */
> +#define  PCI_CB_BRIDGE_CTL_SERR         0x02
> +#define  PCI_CB_BRIDGE_CTL_ISA          0x04
> +#define  PCI_CB_BRIDGE_CTL_VGA          0x08
> +#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT 0x20
> +#define  PCI_CB_BRIDGE_CTL_CB_RESET     0x40    /* CardBus reset */
> +#define  PCI_CB_BRIDGE_CTL_16BIT_INT    0x80    /* Enable interrupt for 16-bit cards */
> +#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100  /* Prefetch enable for both memory regions */
> +#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
> +#define  PCI_CB_BRIDGE_CTL_POST_WRITES  0x400
> +#define PCI_CB_SUBSYSTEM_VENDOR_ID      0x40
> +#define PCI_CB_SUBSYSTEM_ID             0x42
> +#define PCI_CB_LEGACY_MODE_BASE         0x44    /* 16-bit PC Card legacy mode base address (ExCa) */
> +/* 0x48-0x7f reserved */
> +
> +/* Capability lists */
> +
> +#define PCI_CAP_LIST_ID         0       /* Capability ID */
> +#define  PCI_CAP_ID_PM          0x01    /* Power Management */
> +#define  PCI_CAP_ID_AGP         0x02    /* Accelerated Graphics Port */
> +#define  PCI_CAP_ID_VPD         0x03    /* Vital Product Data */
> +#define  PCI_CAP_ID_SLOTID      0x04    /* Slot Identification */
> +#define  PCI_CAP_ID_MSI         0x05    /* Message Signalled Interrupts */
> +#define  PCI_CAP_ID_CHSWP       0x06    /* CompactPCI HotSwap */
> +#define  PCI_CAP_ID_PCIX        0x07    /* PCI-X */
> +#define  PCI_CAP_ID_HT          0x08    /* HyperTransport */
> +#define  PCI_CAP_ID_VNDR        0x09    /* Vendor specific */
> +#define  PCI_CAP_ID_DBG         0x0A    /* Debug port */
> +#define  PCI_CAP_ID_CCRC        0x0B    /* CompactPCI Central Resource Control */
> +#define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
> +#define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
> +#define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
> +#define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
> +#define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
> +#define PCI_CAP_LIST_NEXT       1       /* Next capability in the list */
> +#define PCI_CAP_FLAGS           2       /* Capability defined flags (16 bits) */
> +#define PCI_CAP_SIZEOF          4
> +
> +/* Power Management Registers */
> +
> +#define PCI_PM_PMC              2       /* PM Capabilities Register */
> +#define  PCI_PM_CAP_VER_MASK    0x0007  /* Version */
> +#define  PCI_PM_CAP_PME_CLOCK   0x0008  /* PME clock required */
> +#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
> +#define  PCI_PM_CAP_DSI         0x0020  /* Device specific initialization */
> +#define  PCI_PM_CAP_AUX_POWER   0x01C0  /* Auxilliary power support mask */
> +#define  PCI_PM_CAP_D1          0x0200  /* D1 power state support */
> +#define  PCI_PM_CAP_D2          0x0400  /* D2 power state support */
> +#define  PCI_PM_CAP_PME         0x0800  /* PME pin supported */
> +#define  PCI_PM_CAP_PME_MASK    0xF800  /* PME Mask of all supported states */
> +#define  PCI_PM_CAP_PME_D0      0x0800  /* PME# from D0 */
> +#define  PCI_PM_CAP_PME_D1      0x1000  /* PME# from D1 */
> +#define  PCI_PM_CAP_PME_D2      0x2000  /* PME# from D2 */
> +#define  PCI_PM_CAP_PME_D3      0x4000  /* PME# from D3 (hot) */
> +#define  PCI_PM_CAP_PME_D3cold  0x8000  /* PME# from D3 (cold) */
> +#define  PCI_PM_CAP_PME_SHIFT   11      /* Start of the PME Mask in PMC */
> +#define PCI_PM_CTRL             4       /* PM control and status register */
> +#define  PCI_PM_CTRL_STATE_MASK 0x0003  /* Current power state (D0 to D3) */
> +#define  PCI_PM_CTRL_NO_SOFT_RESET      0x0008  /* No reset for D3hot->D0 */
> +#define  PCI_PM_CTRL_PME_ENABLE 0x0100  /* PME pin enable */
> +#define  PCI_PM_CTRL_DATA_SEL_MASK      0x1e00  /* Data select (??) */
> +#define  PCI_PM_CTRL_DATA_SCALE_MASK    0x6000  /* Data scale (??) */
> +#define  PCI_PM_CTRL_PME_STATUS 0x8000  /* PME pin status */
> +#define PCI_PM_PPB_EXTENSIONS   6       /* PPB support extensions (??) */
> +#define  PCI_PM_PPB_B2_B3       0x40    /* Stop clock when in D3hot (??) */
> +#define  PCI_PM_BPCC_ENABLE     0x80    /* Bus power/clock control enable (??) */
> +#define PCI_PM_DATA_REGISTER    7       /* (??) */
> +#define PCI_PM_SIZEOF           8
> +
> +/* AGP registers */
> +
> +#define PCI_AGP_VERSION         2       /* BCD version number */
> +#define PCI_AGP_RFU             3       /* Rest of capability flags */
> +#define PCI_AGP_STATUS          4       /* Status register */
> +#define  PCI_AGP_STATUS_RQ_MASK 0xff000000      /* Maximum number of requests - 1 */
> +#define  PCI_AGP_STATUS_SBA     0x0200  /* Sideband addressing supported */
> +#define  PCI_AGP_STATUS_64BIT   0x0020  /* 64-bit addressing supported */
> +#define  PCI_AGP_STATUS_FW      0x0010  /* FW transfers supported */
> +#define  PCI_AGP_STATUS_RATE4   0x0004  /* 4x transfer rate supported */
> +#define  PCI_AGP_STATUS_RATE2   0x0002  /* 2x transfer rate supported */
> +#define  PCI_AGP_STATUS_RATE1   0x0001  /* 1x transfer rate supported */
> +#define PCI_AGP_COMMAND         8       /* Control register */
> +#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
> +#define  PCI_AGP_COMMAND_SBA    0x0200  /* Sideband addressing enabled */
> +#define  PCI_AGP_COMMAND_AGP    0x0100  /* Allow processing of AGP transactions */
> +#define  PCI_AGP_COMMAND_64BIT  0x0020  /* Allow processing of 64-bit addresses */
> +#define  PCI_AGP_COMMAND_FW     0x0010  /* Force FW transfers */
> +#define  PCI_AGP_COMMAND_RATE4  0x0004  /* Use 4x rate */
> +#define  PCI_AGP_COMMAND_RATE2  0x0002  /* Use 2x rate */
> +#define  PCI_AGP_COMMAND_RATE1  0x0001  /* Use 1x rate */
> +#define PCI_AGP_SIZEOF          12
> +
> +/* Vital Product Data */
> +
> +#define PCI_VPD_ADDR            2       /* Address to access (15 bits!) */
> +#define  PCI_VPD_ADDR_MASK      0x7fff  /* Address mask */
> +#define  PCI_VPD_ADDR_F         0x8000  /* Write 0, 1 indicates completion */
> +#define PCI_VPD_DATA            4       /* 32-bits of data returned here */
> +
> +/* Slot Identification */
> +
> +#define PCI_SID_ESR             2       /* Expansion Slot Register */
> +#define  PCI_SID_ESR_NSLOTS     0x1f    /* Number of expansion slots available */
> +#define  PCI_SID_ESR_FIC        0x20    /* First In Chassis Flag */
> +#define PCI_SID_CHASSIS_NR      3       /* Chassis Number */
> +
> +/* Message Signalled Interrupts registers */
> +
> +#define PCI_MSI_FLAGS           2       /* Various flags */
> +#define  PCI_MSI_FLAGS_64BIT    0x80    /* 64-bit addresses allowed */
> +#define  PCI_MSI_FLAGS_QSIZE    0x70    /* Message queue size configured */
> +#define  PCI_MSI_FLAGS_QMASK    0x0e    /* Maximum queue size available */
> +#define  PCI_MSI_FLAGS_ENABLE   0x01    /* MSI feature enabled */
> +#define  PCI_MSI_FLAGS_MASKBIT  0x100   /* 64-bit mask bits allowed */
> +#define PCI_MSI_RFU             3       /* Rest of capability flags */
> +#define PCI_MSI_ADDRESS_LO      4       /* Lower 32 bits */
> +#define PCI_MSI_ADDRESS_HI      8       /* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
> +#define PCI_MSI_DATA_32         8       /* 16 bits of data for 32-bit devices */
> +#define PCI_MSI_MASK_32         12      /* Mask bits register for 32-bit devices */
> +#define PCI_MSI_DATA_64         12      /* 16 bits of data for 64-bit devices */
> +#define PCI_MSI_MASK_64         16      /* Mask bits register for 64-bit devices */
> +
> +/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
> +#define PCI_MSIX_FLAGS          2
> +#define  PCI_MSIX_FLAGS_QSIZE   0x7FF
> +#define  PCI_MSIX_FLAGS_ENABLE  (1 << 15)
> +#define  PCI_MSIX_FLAGS_MASKALL (1 << 14)
> +#define PCI_MSIX_FLAGS_BIRMASK  (7 << 0)
> +
> +/* CompactPCI Hotswap Register */
> +
> +#define PCI_CHSWP_CSR           2       /* Control and Status Register */
> +#define  PCI_CHSWP_DHA          0x01    /* Device Hiding Arm */
> +#define  PCI_CHSWP_EIM          0x02    /* ENUM# Signal Mask */
> +#define  PCI_CHSWP_PIE          0x04    /* Pending Insert or Extract */
> +#define  PCI_CHSWP_LOO          0x08    /* LED On / Off */
> +#define  PCI_CHSWP_PI           0x30    /* Programming Interface */
> +#define  PCI_CHSWP_EXT          0x40    /* ENUM# status - extraction */
> +#define  PCI_CHSWP_INS          0x80    /* ENUM# status - insertion */
> +
> +/* PCI Advanced Feature registers */
> +
> +#define PCI_AF_LENGTH           2
> +#define PCI_AF_CAP              3
> +#define  PCI_AF_CAP_TP          0x01
> +#define  PCI_AF_CAP_FLR         0x02
> +#define PCI_AF_CTRL             4
> +#define  PCI_AF_CTRL_FLR        0x01
> +#define PCI_AF_STATUS           5
> +#define  PCI_AF_STATUS_TP       0x01
> +
> +/* PCI-X registers */
> +
> +#define PCI_X_CMD               2       /* Modes & Features */
> +#define  PCI_X_CMD_DPERR_E      0x0001  /* Data Parity Error Recovery Enable */
> +#define  PCI_X_CMD_ERO          0x0002  /* Enable Relaxed Ordering */
> +#define  PCI_X_CMD_READ_512     0x0000  /* 512 byte maximum read byte count */
> +#define  PCI_X_CMD_READ_1K      0x0004  /* 1Kbyte maximum read byte count */
> +#define  PCI_X_CMD_READ_2K      0x0008  /* 2Kbyte maximum read byte count */
> +#define  PCI_X_CMD_READ_4K      0x000c  /* 4Kbyte maximum read byte count */
> +#define  PCI_X_CMD_MAX_READ     0x000c  /* Max Memory Read Byte Count */
> +                                /* Max # of outstanding split transactions */
> +#define  PCI_X_CMD_SPLIT_1      0x0000  /* Max 1 */
> +#define  PCI_X_CMD_SPLIT_2      0x0010  /* Max 2 */
> +#define  PCI_X_CMD_SPLIT_3      0x0020  /* Max 3 */
> +#define  PCI_X_CMD_SPLIT_4      0x0030  /* Max 4 */
> +#define  PCI_X_CMD_SPLIT_8      0x0040  /* Max 8 */
> +#define  PCI_X_CMD_SPLIT_12     0x0050  /* Max 12 */
> +#define  PCI_X_CMD_SPLIT_16     0x0060  /* Max 16 */
> +#define  PCI_X_CMD_SPLIT_32     0x0070  /* Max 32 */
> +#define  PCI_X_CMD_MAX_SPLIT    0x0070  /* Max Outstanding Split Transactions */
> +#define  PCI_X_CMD_VERSION(x)   (((x) >> 12) & 3) /* Version */
> +#define PCI_X_STATUS            4       /* PCI-X capabilities */
> +#define  PCI_X_STATUS_DEVFN     0x000000ff      /* A copy of devfn */
> +#define  PCI_X_STATUS_BUS       0x0000ff00      /* A copy of bus nr */
> +#define  PCI_X_STATUS_64BIT     0x00010000      /* 64-bit device */
> +#define  PCI_X_STATUS_133MHZ    0x00020000      /* 133 MHz capable */
> +#define  PCI_X_STATUS_SPL_DISC  0x00040000      /* Split Completion Discarded */
> +#define  PCI_X_STATUS_UNX_SPL   0x00080000      /* Unexpected Split Completion */
> +#define  PCI_X_STATUS_COMPLEX   0x00100000      /* Device Complexity */
> +#define  PCI_X_STATUS_MAX_READ  0x00600000      /* Designed Max Memory Read Count */
> +#define  PCI_X_STATUS_MAX_SPLIT 0x03800000      /* Designed Max Outstanding Split Transactions */
> +#define  PCI_X_STATUS_MAX_CUM   0x1c000000      /* Designed Max Cumulative Read Size */
> +#define  PCI_X_STATUS_SPL_ERR   0x20000000      /* Rcvd Split Completion Error Msg */
> +#define  PCI_X_STATUS_266MHZ    0x40000000      /* 266 MHz capable */
> +#define  PCI_X_STATUS_533MHZ    0x80000000      /* 533 MHz capable */
> +
> +/* PCI Express capability registers */
> +
> +#define PCI_EXP_FLAGS           2       /* Capabilities register */
> +#define PCI_EXP_FLAGS_VERS      0x000f  /* Capability version */
> +#define PCI_EXP_FLAGS_TYPE      0x00f0  /* Device/Port type */
> +#define  PCI_EXP_TYPE_ENDPOINT  0x0     /* Express Endpoint */
> +#define  PCI_EXP_TYPE_LEG_END   0x1     /* Legacy Endpoint */
> +#define  PCI_EXP_TYPE_ROOT_PORT 0x4     /* Root Port */
> +#define  PCI_EXP_TYPE_UPSTREAM  0x5     /* Upstream Port */
> +#define  PCI_EXP_TYPE_DOWNSTREAM 0x6    /* Downstream Port */
> +#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7    /* PCI/PCI-X Bridge */
> +#define  PCI_EXP_TYPE_RC_END    0x9     /* Root Complex Integrated Endpoint */
> +#define  PCI_EXP_TYPE_RC_EC     0x10    /* Root Complex Event Collector */
> +#define PCI_EXP_FLAGS_SLOT      0x0100  /* Slot implemented */
> +#define PCI_EXP_FLAGS_IRQ       0x3e00  /* Interrupt message number */
> +#define PCI_EXP_DEVCAP          4       /* Device capabilities */
> +#define  PCI_EXP_DEVCAP_PAYLOAD 0x07    /* Max_Payload_Size */
> +#define  PCI_EXP_DEVCAP_PHANTOM 0x18    /* Phantom functions */
> +#define  PCI_EXP_DEVCAP_EXT_TAG 0x20    /* Extended tags */
> +#define  PCI_EXP_DEVCAP_L0S     0x1c0   /* L0s Acceptable Latency */
> +#define  PCI_EXP_DEVCAP_L1      0xe00   /* L1 Acceptable Latency */
> +#define  PCI_EXP_DEVCAP_ATN_BUT 0x1000  /* Attention Button Present */
> +#define  PCI_EXP_DEVCAP_ATN_IND 0x2000  /* Attention Indicator Present */
> +#define  PCI_EXP_DEVCAP_PWR_IND 0x4000  /* Power Indicator Present */
> +#define  PCI_EXP_DEVCAP_RBER    0x8000  /* Role-Based Error Reporting */
> +#define  PCI_EXP_DEVCAP_PWR_VAL 0x3fc0000 /* Slot Power Limit Value */
> +#define  PCI_EXP_DEVCAP_PWR_SCL 0xc000000 /* Slot Power Limit Scale */
> +#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
> +#define PCI_EXP_DEVCTL          8       /* Device Control */
> +#define  PCI_EXP_DEVCTL_CERE    0x0001  /* Correctable Error Reporting En. */
> +#define  PCI_EXP_DEVCTL_NFERE   0x0002  /* Non-Fatal Error Reporting Enable */
> +#define  PCI_EXP_DEVCTL_FERE    0x0004  /* Fatal Error Reporting Enable */
> +#define  PCI_EXP_DEVCTL_URRE    0x0008  /* Unsupported Request Reporting En. */
> +#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
> +#define  PCI_EXP_DEVCTL_PAYLOAD 0x00e0  /* Max_Payload_Size */
> +#define  PCI_EXP_DEVCTL_EXT_TAG 0x0100  /* Extended Tag Field Enable */
> +#define  PCI_EXP_DEVCTL_PHANTOM 0x0200  /* Phantom Functions Enable */
> +#define  PCI_EXP_DEVCTL_AUX_PME 0x0400  /* Auxiliary Power PM Enable */
> +#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
> +#define  PCI_EXP_DEVCTL_READRQ  0x7000  /* Max_Read_Request_Size */
> +#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
> +#define PCI_EXP_DEVSTA          10      /* Device Status */
> +#define  PCI_EXP_DEVSTA_CED     0x01    /* Correctable Error Detected */
> +#define  PCI_EXP_DEVSTA_NFED    0x02    /* Non-Fatal Error Detected */
> +#define  PCI_EXP_DEVSTA_FED     0x04    /* Fatal Error Detected */
> +#define  PCI_EXP_DEVSTA_URD     0x08    /* Unsupported Request Detected */
> +#define  PCI_EXP_DEVSTA_AUXPD   0x10    /* AUX Power Detected */
> +#define  PCI_EXP_DEVSTA_TRPND   0x20    /* Transactions Pending */
> +#define PCI_EXP_LNKCAP          12      /* Link Capabilities */
> +#define  PCI_EXP_LNKCAP_SLS     0x0000000f /* Supported Link Speeds */
> +#define  PCI_EXP_LNKCAP_MLW     0x000003f0 /* Maximum Link Width */
> +#define  PCI_EXP_LNKCAP_ASPMS   0x00000c00 /* ASPM Support */
> +#define  PCI_EXP_LNKCAP_L0SEL   0x00007000 /* L0s Exit Latency */
> +#define  PCI_EXP_LNKCAP_L1EL    0x00038000 /* L1 Exit Latency */
> +#define  PCI_EXP_LNKCAP_CLKPM   0x00040000 /* L1 Clock Power Management */
> +#define  PCI_EXP_LNKCAP_SDERC   0x00080000 /* Suprise Down Error Reporting Capable */
> +#define  PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */
> +#define  PCI_EXP_LNKCAP_LBNC    0x00200000 /* Link Bandwidth Notification Capability */
> +#define  PCI_EXP_LNKCAP_PN      0xff000000 /* Port Number */
> +#define PCI_EXP_LNKCTL          16      /* Link Control */
> +#define  PCI_EXP_LNKCTL_ASPMC   0x0003  /* ASPM Control */
> +#define  PCI_EXP_LNKCTL_RCB     0x0008  /* Read Completion Boundary */
> +#define  PCI_EXP_LNKCTL_LD      0x0010  /* Link Disable */
> +#define  PCI_EXP_LNKCTL_RL      0x0020  /* Retrain Link */
> +#define  PCI_EXP_LNKCTL_CCC     0x0040  /* Common Clock Configuration */
> +#define  PCI_EXP_LNKCTL_ES      0x0080  /* Extended Synch */
> +#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100 /* Enable clkreq */
> +#define  PCI_EXP_LNKCTL_HAWD    0x0200  /* Hardware Autonomous Width Disable */
> +#define  PCI_EXP_LNKCTL_LBMIE   0x0400  /* Link Bandwidth Management Interrupt Enable */
> +#define  PCI_EXP_LNKCTL_LABIE   0x0800  /* Lnk Autonomous Bandwidth Interrupt Enable */
> +#define PCI_EXP_LNKSTA          18      /* Link Status */
> +#define  PCI_EXP_LNKSTA_CLS     0x000f  /* Current Link Speed */
> +#define  PCI_EXP_LNKSTA_NLW     0x03f0  /* Nogotiated Link Width */
> +#define  PCI_EXP_LNKSTA_LT      0x0800  /* Link Training */
> +#define  PCI_EXP_LNKSTA_SLC     0x1000  /* Slot Clock Configuration */
> +#define  PCI_EXP_LNKSTA_DLLLA   0x2000  /* Data Link Layer Link Active */
> +#define  PCI_EXP_LNKSTA_LBMS    0x4000  /* Link Bandwidth Management Status */
> +#define  PCI_EXP_LNKSTA_LABS    0x8000  /* Link Autonomous Bandwidth Status */
> +#define PCI_EXP_SLTCAP          20      /* Slot Capabilities */
> +#define  PCI_EXP_SLTCAP_ABP     0x00000001 /* Attention Button Present */
> +#define  PCI_EXP_SLTCAP_PCP     0x00000002 /* Power Controller Present */
> +#define  PCI_EXP_SLTCAP_MRLSP   0x00000004 /* MRL Sensor Present */
> +#define  PCI_EXP_SLTCAP_AIP     0x00000008 /* Attention Indicator Present */
> +#define  PCI_EXP_SLTCAP_PIP     0x00000010 /* Power Indicator Present */
> +#define  PCI_EXP_SLTCAP_HPS     0x00000020 /* Hot-Plug Surprise */
> +#define  PCI_EXP_SLTCAP_HPC     0x00000040 /* Hot-Plug Capable */
> +#define  PCI_EXP_SLTCAP_SPLV    0x00007f80 /* Slot Power Limit Value */
> +#define  PCI_EXP_SLTCAP_SPLS    0x00018000 /* Slot Power Limit Scale */
> +#define  PCI_EXP_SLTCAP_EIP     0x00020000 /* Electromechanical Interlock Present */
> +#define  PCI_EXP_SLTCAP_NCCS    0x00040000 /* No Command Completed Support */
> +#define  PCI_EXP_SLTCAP_PSN     0xfff80000 /* Physical Slot Number */
> +#define PCI_EXP_SLTCTL          24      /* Slot Control */
> +#define  PCI_EXP_SLTCTL_ABPE    0x0001  /* Attention Button Pressed Enable */
> +#define  PCI_EXP_SLTCTL_PFDE    0x0002  /* Power Fault Detected Enable */
> +#define  PCI_EXP_SLTCTL_MRLSCE  0x0004  /* MRL Sensor Changed Enable */
> +#define  PCI_EXP_SLTCTL_PDCE    0x0008  /* Presence Detect Changed Enable */
> +#define  PCI_EXP_SLTCTL_CCIE    0x0010  /* Command Completed Interrupt Enable */
> +#define  PCI_EXP_SLTCTL_HPIE    0x0020  /* Hot-Plug Interrupt Enable */
> +#define  PCI_EXP_SLTCTL_AIC     0x00c0  /* Attention Indicator Control */
> +#define  PCI_EXP_SLTCTL_PIC     0x0300  /* Power Indicator Control */
> +#define  PCI_EXP_SLTCTL_PCC     0x0400  /* Power Controller Control */
> +#define  PCI_EXP_SLTCTL_EIC     0x0800  /* Electromechanical Interlock Control */
> +#define  PCI_EXP_SLTCTL_DLLSCE  0x1000  /* Data Link Layer State Changed Enable */
> +#define PCI_EXP_SLTSTA          26      /* Slot Status */
> +#define  PCI_EXP_SLTSTA_ABP     0x0001  /* Attention Button Pressed */
> +#define  PCI_EXP_SLTSTA_PFD     0x0002  /* Power Fault Detected */
> +#define  PCI_EXP_SLTSTA_MRLSC   0x0004  /* MRL Sensor Changed */
> +#define  PCI_EXP_SLTSTA_PDC     0x0008  /* Presence Detect Changed */
> +#define  PCI_EXP_SLTSTA_CC      0x0010  /* Command Completed */
> +#define  PCI_EXP_SLTSTA_MRLSS   0x0020  /* MRL Sensor State */
> +#define  PCI_EXP_SLTSTA_PDS     0x0040  /* Presence Detect State */
> +#define  PCI_EXP_SLTSTA_EIS     0x0080  /* Electromechanical Interlock Status */
> +#define  PCI_EXP_SLTSTA_DLLSC   0x0100  /* Data Link Layer State Changed */
> +#define PCI_EXP_RTCTL           28      /* Root Control */
> +#define  PCI_EXP_RTCTL_SECEE    0x01    /* System Error on Correctable Error */
> +#define  PCI_EXP_RTCTL_SENFEE   0x02    /* System Error on Non-Fatal Error */
> +#define  PCI_EXP_RTCTL_SEFEE    0x04    /* System Error on Fatal Error */
> +#define  PCI_EXP_RTCTL_PMEIE    0x08    /* PME Interrupt Enable */
> +#define  PCI_EXP_RTCTL_CRSSVE   0x10    /* CRS Software Visibility Enable */
> +#define PCI_EXP_RTCAP           30      /* Root Capabilities */
> +#define PCI_EXP_RTSTA           32      /* Root Status */
> +#define PCI_EXP_DEVCAP2         36      /* Device Capabilities 2 */
> +#define  PCI_EXP_DEVCAP2_ARI    0x20    /* Alternative Routing-ID */
> +#define PCI_EXP_DEVCTL2         40      /* Device Control 2 */
> +#define  PCI_EXP_DEVCTL2_ARI    0x20    /* Alternative Routing-ID */
> +#define PCI_EXP_LNKCTL2         48      /* Link Control 2 */
> +#define PCI_EXP_SLTCTL2         56      /* Slot Control 2 */
> +
> +/* Extended Capabilities (PCI-X 2.0 and Express) */
> +#define PCI_EXT_CAP_ID(header)          (header & 0x0000ffff)
> +#define PCI_EXT_CAP_VER(header)         ((header >> 16) & 0xf)
> +#define PCI_EXT_CAP_NEXT(header)        ((header >> 20) & 0xffc)
> +
> +#define PCI_EXT_CAP_ID_ERR      1
> +#define PCI_EXT_CAP_ID_VC       2
> +#define PCI_EXT_CAP_ID_DSN      3
> +#define PCI_EXT_CAP_ID_PWR      4
> +#define PCI_EXT_CAP_ID_ARI      14
> +#define PCI_EXT_CAP_ID_ATS      15
> +#define PCI_EXT_CAP_ID_SRIOV    16
> +
> +/* Advanced Error Reporting */
> +#define PCI_ERR_UNCOR_STATUS    4       /* Uncorrectable Error Status */
> +#define  PCI_ERR_UNC_TRAIN      0x00000001      /* Training */
> +#define  PCI_ERR_UNC_DLP        0x00000010      /* Data Link Protocol */
> +#define  PCI_ERR_UNC_POISON_TLP 0x00001000      /* Poisoned TLP */
> +#define  PCI_ERR_UNC_FCP        0x00002000      /* Flow Control Protocol */
> +#define  PCI_ERR_UNC_COMP_TIME  0x00004000      /* Completion Timeout */
> +#define  PCI_ERR_UNC_COMP_ABORT 0x00008000      /* Completer Abort */
> +#define  PCI_ERR_UNC_UNX_COMP   0x00010000      /* Unexpected Completion */
> +#define  PCI_ERR_UNC_RX_OVER    0x00020000      /* Receiver Overflow */
> +#define  PCI_ERR_UNC_MALF_TLP   0x00040000      /* Malformed TLP */
> +#define  PCI_ERR_UNC_ECRC       0x00080000      /* ECRC Error Status */
> +#define  PCI_ERR_UNC_UNSUP      0x00100000      /* Unsupported Request */
> +#define PCI_ERR_UNCOR_MASK      8       /* Uncorrectable Error Mask */
> +        /* Same bits as above */
> +#define PCI_ERR_UNCOR_SEVER     12      /* Uncorrectable Error Severity */
> +        /* Same bits as above */
> +#define PCI_ERR_COR_STATUS      16      /* Correctable Error Status */
> +#define  PCI_ERR_COR_RCVR       0x00000001      /* Receiver Error Status */
> +#define  PCI_ERR_COR_BAD_TLP    0x00000040      /* Bad TLP Status */
> +#define  PCI_ERR_COR_BAD_DLLP   0x00000080      /* Bad DLLP Status */
> +#define  PCI_ERR_COR_REP_ROLL   0x00000100      /* REPLAY_NUM Rollover */
> +#define  PCI_ERR_COR_REP_TIMER  0x00001000      /* Replay Timer Timeout */
> +#define PCI_ERR_COR_MASK        20      /* Correctable Error Mask */
> +        /* Same bits as above */
> +#define PCI_ERR_CAP             24      /* Advanced Error Capabilities */
> +#define  PCI_ERR_CAP_FEP(x)     ((x) & 31)      /* First Error Pointer */
> +#define  PCI_ERR_CAP_ECRC_GENC  0x00000020      /* ECRC Generation Capable */
> +#define  PCI_ERR_CAP_ECRC_GENE  0x00000040      /* ECRC Generation Enable */
> +#define  PCI_ERR_CAP_ECRC_CHKC  0x00000080      /* ECRC Check Capable */
> +#define  PCI_ERR_CAP_ECRC_CHKE  0x00000100      /* ECRC Check Enable */
> +#define PCI_ERR_HEADER_LOG      28      /* Header Log Register (16 bytes) */
> +#define PCI_ERR_ROOT_COMMAND    44      /* Root Error Command */
> +/* Correctable Err Reporting Enable */
> +#define PCI_ERR_ROOT_CMD_COR_EN         0x00000001
> +/* Non-fatal Err Reporting Enable */
> +#define PCI_ERR_ROOT_CMD_NONFATAL_EN    0x00000002
> +/* Fatal Err Reporting Enable */
> +#define PCI_ERR_ROOT_CMD_FATAL_EN       0x00000004
> +#define PCI_ERR_ROOT_STATUS     48
> +#define PCI_ERR_ROOT_COR_RCV            0x00000001      /* ERR_COR Received */
> +/* Multi ERR_COR Received */
> +#define PCI_ERR_ROOT_MULTI_COR_RCV      0x00000002
> +/* ERR_FATAL/NONFATAL Recevied */
> +#define PCI_ERR_ROOT_UNCOR_RCV          0x00000004
> +/* Multi ERR_FATAL/NONFATAL Recevied */
> +#define PCI_ERR_ROOT_MULTI_UNCOR_RCV    0x00000008
> +#define PCI_ERR_ROOT_FIRST_FATAL        0x00000010      /* First Fatal */
> +#define PCI_ERR_ROOT_NONFATAL_RCV       0x00000020      /* Non-Fatal Received */
> +#define PCI_ERR_ROOT_FATAL_RCV          0x00000040      /* Fatal Received */
> +#define PCI_ERR_ROOT_COR_SRC    52
> +#define PCI_ERR_ROOT_SRC        54
> +
> +/* Virtual Channel */
> +#define PCI_VC_PORT_REG1        4
> +#define PCI_VC_PORT_REG2        8
> +#define PCI_VC_PORT_CTRL        12
> +#define PCI_VC_PORT_STATUS      14
> +#define PCI_VC_RES_CAP          16
> +#define PCI_VC_RES_CTRL         20
> +#define PCI_VC_RES_STATUS       26
> +
> +/* Power Budgeting */
> +#define PCI_PWR_DSR             4       /* Data Select Register */
> +#define PCI_PWR_DATA            8       /* Data Register */
> +#define  PCI_PWR_DATA_BASE(x)   ((x) & 0xff)        /* Base Power */
> +#define  PCI_PWR_DATA_SCALE(x)  (((x) >> 8) & 3)    /* Data Scale */
> +#define  PCI_PWR_DATA_PM_SUB(x) (((x) >> 10) & 7)   /* PM Sub State */
> +#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
> +#define  PCI_PWR_DATA_TYPE(x)   (((x) >> 15) & 7)   /* Type */
> +#define  PCI_PWR_DATA_RAIL(x)   (((x) >> 18) & 7)   /* Power Rail */
> +#define PCI_PWR_CAP             12      /* Capability */
> +#define  PCI_PWR_CAP_BUDGET(x)  ((x) & 1)       /* Included in system budget */
> +
> +/*
> + * Hypertransport sub capability types
> + *
> + * Unfortunately there are both 3 bit and 5 bit capability types defined
> + * in the HT spec, catering for that is a little messy. You probably don't
> + * want to use these directly, just use pci_find_ht_capability() and it
> + * will do the right thing for you.
> + */
> +#define HT_3BIT_CAP_MASK        0xE0
> +#define HT_CAPTYPE_SLAVE        0x00    /* Slave/Primary link configuration */
> +#define HT_CAPTYPE_HOST         0x20    /* Host/Secondary link configuration */
> +
> +#define HT_5BIT_CAP_MASK        0xF8
> +#define HT_CAPTYPE_IRQ          0x80    /* IRQ Configuration */
> +#define HT_CAPTYPE_REMAPPING_40 0xA0    /* 40 bit address remapping */
> +#define HT_CAPTYPE_REMAPPING_64 0xA2    /* 64 bit address remapping */
> +#define HT_CAPTYPE_UNITID_CLUMP 0x90    /* Unit ID clumping */
> +#define HT_CAPTYPE_EXTCONF      0x98    /* Extended Configuration Space Access */
> +#define HT_CAPTYPE_MSI_MAPPING  0xA8    /* MSI Mapping Capability */
> +#define  HT_MSI_FLAGS           0x02            /* Offset to flags */
> +#define  HT_MSI_FLAGS_ENABLE    0x1             /* Mapping enable */
> +#define  HT_MSI_FLAGS_FIXED     0x2             /* Fixed mapping only */
> +#define  HT_MSI_FIXED_ADDR      0x00000000FEE00000ULL   /* Fixed addr */
> +#define  HT_MSI_ADDR_LO         0x04            /* Offset to low addr bits */
> +#define  HT_MSI_ADDR_LO_MASK    0xFFF00000      /* Low address bit mask */
> +#define  HT_MSI_ADDR_HI         0x08            /* Offset to high addr bits */
> +#define HT_CAPTYPE_DIRECT_ROUTE 0xB0    /* Direct routing configuration */
> +#define HT_CAPTYPE_VCSET        0xB8    /* Virtual Channel configuration */
> +#define HT_CAPTYPE_ERROR_RETRY  0xC0    /* Retry on error configuration */
> +#define HT_CAPTYPE_GEN3         0xD0    /* Generation 3 hypertransport configuration */
> +#define HT_CAPTYPE_PM           0xE0    /* Hypertransport powermanagement configuration */
> +
> +/* Alternative Routing-ID Interpretation */
> +#define PCI_ARI_CAP             0x04    /* ARI Capability Register */
> +#define  PCI_ARI_CAP_MFVC       0x0001  /* MFVC Function Groups Capability */
> +#define  PCI_ARI_CAP_ACS        0x0002  /* ACS Function Groups Capability */
> +#define  PCI_ARI_CAP_NFN(x)     (((x) >> 8) & 0xff) /* Next Function Number */
> +#define PCI_ARI_CTRL            0x06    /* ARI Control Register */
> +#define  PCI_ARI_CTRL_MFVC      0x0001  /* MFVC Function Groups Enable */
> +#define  PCI_ARI_CTRL_ACS       0x0002  /* ACS Function Groups Enable */
> +#define  PCI_ARI_CTRL_FG(x)     (((x) >> 4) & 7) /* Function Group */
> +
> +/* Address Translation Service */
> +#define PCI_ATS_CAP             0x04    /* ATS Capability Register */
> +#define  PCI_ATS_CAP_QDEP(x)    ((x) & 0x1f)    /* Invalidate Queue Depth */
> +#define  PCI_ATS_MAX_QDEP       32      /* Max Invalidate Queue Depth */
> +#define PCI_ATS_CTRL            0x06    /* ATS Control Register */
> +#define  PCI_ATS_CTRL_ENABLE    0x8000  /* ATS Enable */
> +#define  PCI_ATS_CTRL_STU(x)    ((x) & 0x1f)    /* Smallest Translation Unit */
> +#define  PCI_ATS_MIN_STU        12      /* shift of minimum STU block */
> +
> +/* Single Root I/O Virtualization */
> +#define PCI_SRIOV_CAP           0x04    /* SR-IOV Capabilities */
> +#define  PCI_SRIOV_CAP_VFM      0x01    /* VF Migration Capable */
> +#define  PCI_SRIOV_CAP_INTR(x)  ((x) >> 21) /* Interrupt Message Number */
> +#define PCI_SRIOV_CTRL          0x08    /* SR-IOV Control */
> +#define  PCI_SRIOV_CTRL_VFE     0x01    /* VF Enable */
> +#define  PCI_SRIOV_CTRL_VFM     0x02    /* VF Migration Enable */
> +#define  PCI_SRIOV_CTRL_INTR    0x04    /* VF Migration Interrupt Enable */
> +#define  PCI_SRIOV_CTRL_MSE     0x08    /* VF Memory Space Enable */
> +#define  PCI_SRIOV_CTRL_ARI     0x10    /* ARI Capable Hierarchy */
> +#define PCI_SRIOV_STATUS        0x0a    /* SR-IOV Status */
> +#define  PCI_SRIOV_STATUS_VFM   0x01    /* VF Migration Status */
> +#define PCI_SRIOV_INITIAL_VF    0x0c    /* Initial VFs */
> +#define PCI_SRIOV_TOTAL_VF      0x0e    /* Total VFs */
> +#define PCI_SRIOV_NUM_VF        0x10    /* Number of VFs */
> +#define PCI_SRIOV_FUNC_LINK     0x12    /* Function Dependency Link */
> +#define PCI_SRIOV_VF_OFFSET     0x14    /* First VF Offset */
> +#define PCI_SRIOV_VF_STRIDE     0x16    /* Following VF Stride */
> +#define PCI_SRIOV_VF_DID        0x1a    /* VF Device ID */
> +#define PCI_SRIOV_SUP_PGSIZE    0x1c    /* Supported Page Sizes */
> +#define PCI_SRIOV_SYS_PGSIZE    0x20    /* System Page Size */
> +#define PCI_SRIOV_BAR           0x24    /* VF BAR0 */
> +#define  PCI_SRIOV_NUM_BARS     6       /* Number of VF BARs */
> +#define PCI_SRIOV_VFM           0x3c    /* VF Migration State Array Offset*/
> +#define  PCI_SRIOV_VFM_BIR(x)   ((x) & 7)       /* State BIR */
> +#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)     /* State Offset */
> +#define  PCI_SRIOV_VFM_UA       0x0     /* Inactive.Unavailable */
> +#define  PCI_SRIOV_VFM_MI       0x1     /* Dormant.MigrateIn */
> +#define  PCI_SRIOV_VFM_MO       0x2     /* Active.MigrateOut */
> +#define  PCI_SRIOV_VFM_AV       0x3     /* Active.Available */
> +
> +#endif /* LINUX_PCI_REGS_H */
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
@ 2010-08-31 20:29     ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-08-31 20:29 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
> The conversion was done using the GNU 'expand' tool (default settings)
> to make it obey the QEMU coding style.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

I'm not really interested in this: we copied pci_regs.h from linux
to help non-linux hosts, and keeping the code consistent
with the original makes detecting bugs and adding new stuff
from linux/pci_regs.h easier.

> ---
>  hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
>  1 files changed, 665 insertions(+), 665 deletions(-)
>  rewrite hw/pci_regs.h (90%)
> 
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> dissimilarity index 90%
> index dd0bed4..0f9f84c 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -1,665 +1,665 @@
> -/*
> - *	pci_regs.h
> - *
> - *	PCI standard defines
> - *	Copyright 1994, Drew Eckhardt
> - *	Copyright 1997--1999 Martin Mares <mj@ucw.cz>
> - *
> - *	For more information, please consult the following manuals (look at
> - *	http://www.pcisig.com/ for how to get them):
> - *
> - *	PCI BIOS Specification
> - *	PCI Local Bus Specification
> - *	PCI to PCI Bridge Specification
> - *	PCI System Design Guide
> - *
> - * 	For hypertransport information, please consult the following manuals
> - * 	from http://www.hypertransport.org
> - *
> - *	The Hypertransport I/O Link Specification
> - */
> -
> -#ifndef LINUX_PCI_REGS_H
> -#define LINUX_PCI_REGS_H
> -
> -/*
> - * Under PCI, each device has 256 bytes of configuration address space,
> - * of which the first 64 bytes are standardized as follows:
> - */
> -#define PCI_VENDOR_ID		0x00	/* 16 bits */
> -#define PCI_DEVICE_ID		0x02	/* 16 bits */
> -#define PCI_COMMAND		0x04	/* 16 bits */
> -#define  PCI_COMMAND_IO		0x1	/* Enable response in I/O space */
> -#define  PCI_COMMAND_MEMORY	0x2	/* Enable response in Memory space */
> -#define  PCI_COMMAND_MASTER	0x4	/* Enable bus mastering */
> -#define  PCI_COMMAND_SPECIAL	0x8	/* Enable response to special cycles */
> -#define  PCI_COMMAND_INVALIDATE	0x10	/* Use memory write and invalidate */
> -#define  PCI_COMMAND_VGA_PALETTE 0x20	/* Enable palette snooping */
> -#define  PCI_COMMAND_PARITY	0x40	/* Enable parity checking */
> -#define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
> -#define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
> -#define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
> -#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
> -
> -#define PCI_STATUS		0x06	/* 16 bits */
> -#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
> -#define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
> -#define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
> -#define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
> -#define  PCI_STATUS_FAST_BACK	0x80	/* Accept fast-back to back */
> -#define  PCI_STATUS_PARITY	0x100	/* Detected parity error */
> -#define  PCI_STATUS_DEVSEL_MASK	0x600	/* DEVSEL timing */
> -#define  PCI_STATUS_DEVSEL_FAST		0x000
> -#define  PCI_STATUS_DEVSEL_MEDIUM	0x200
> -#define  PCI_STATUS_DEVSEL_SLOW		0x400
> -#define  PCI_STATUS_SIG_TARGET_ABORT	0x800 /* Set on target abort */
> -#define  PCI_STATUS_REC_TARGET_ABORT	0x1000 /* Master ack of " */
> -#define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
> -#define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
> -#define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> -
> -#define PCI_CLASS_REVISION	0x08	/* High 24 bits are class, low 8 revision */
> -#define PCI_REVISION_ID		0x08	/* Revision ID */
> -#define PCI_CLASS_PROG		0x09	/* Reg. Level Programming Interface */
> -#define PCI_CLASS_DEVICE	0x0a	/* Device class */
> -
> -#define PCI_CACHE_LINE_SIZE	0x0c	/* 8 bits */
> -#define PCI_LATENCY_TIMER	0x0d	/* 8 bits */
> -#define PCI_HEADER_TYPE		0x0e	/* 8 bits */
> -#define  PCI_HEADER_TYPE_NORMAL		0
> -#define  PCI_HEADER_TYPE_BRIDGE		1
> -#define  PCI_HEADER_TYPE_CARDBUS	2
> -
> -#define PCI_BIST		0x0f	/* 8 bits */
> -#define  PCI_BIST_CODE_MASK	0x0f	/* Return result */
> -#define  PCI_BIST_START		0x40	/* 1 to start BIST, 2 secs or less */
> -#define  PCI_BIST_CAPABLE	0x80	/* 1 if BIST capable */
> -
> -/*
> - * Base addresses specify locations in memory or I/O space.
> - * Decoded size can be determined by writing a value of
> - * 0xffffffff to the register, and reading it back.  Only
> - * 1 bits are decoded.
> - */
> -#define PCI_BASE_ADDRESS_0	0x10	/* 32 bits */
> -#define PCI_BASE_ADDRESS_1	0x14	/* 32 bits [htype 0,1 only] */
> -#define PCI_BASE_ADDRESS_2	0x18	/* 32 bits [htype 0 only] */
> -#define PCI_BASE_ADDRESS_3	0x1c	/* 32 bits */
> -#define PCI_BASE_ADDRESS_4	0x20	/* 32 bits */
> -#define PCI_BASE_ADDRESS_5	0x24	/* 32 bits */
> -#define  PCI_BASE_ADDRESS_SPACE		0x01	/* 0 = memory, 1 = I/O */
> -#define  PCI_BASE_ADDRESS_SPACE_IO	0x01
> -#define  PCI_BASE_ADDRESS_SPACE_MEMORY	0x00
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK	0x06
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_32	0x00	/* 32 bit address */
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_1M	0x02	/* Below 1M [obsolete] */
> -#define  PCI_BASE_ADDRESS_MEM_TYPE_64	0x04	/* 64 bit address */
> -#define  PCI_BASE_ADDRESS_MEM_PREFETCH	0x08	/* prefetchable? */
> -#define  PCI_BASE_ADDRESS_MEM_MASK	(~0x0fUL)
> -#define  PCI_BASE_ADDRESS_IO_MASK	(~0x03UL)
> -/* bit 1 is reserved if address_space = 1 */
> -
> -/* Header type 0 (normal devices) */
> -#define PCI_CARDBUS_CIS		0x28
> -#define PCI_SUBSYSTEM_VENDOR_ID	0x2c
> -#define PCI_SUBSYSTEM_ID	0x2e
> -#define PCI_ROM_ADDRESS		0x30	/* Bits 31..11 are address, 10..1 reserved */
> -#define  PCI_ROM_ADDRESS_ENABLE	0x01
> -#define PCI_ROM_ADDRESS_MASK	(~0x7ffUL)
> -
> -#define PCI_CAPABILITY_LIST	0x34	/* Offset of first capability list entry */
> -
> -/* 0x35-0x3b are reserved */
> -#define PCI_INTERRUPT_LINE	0x3c	/* 8 bits */
> -#define PCI_INTERRUPT_PIN	0x3d	/* 8 bits */
> -#define PCI_MIN_GNT		0x3e	/* 8 bits */
> -#define PCI_MAX_LAT		0x3f	/* 8 bits */
> -
> -/* Header type 1 (PCI-to-PCI bridges) */
> -#define PCI_PRIMARY_BUS		0x18	/* Primary bus number */
> -#define PCI_SECONDARY_BUS	0x19	/* Secondary bus number */
> -#define PCI_SUBORDINATE_BUS	0x1a	/* Highest bus number behind the bridge */
> -#define PCI_SEC_LATENCY_TIMER	0x1b	/* Latency timer for secondary interface */
> -#define PCI_IO_BASE		0x1c	/* I/O range behind the bridge */
> -#define PCI_IO_LIMIT		0x1d
> -#define  PCI_IO_RANGE_TYPE_MASK	0x0fUL	/* I/O bridging type */
> -#define  PCI_IO_RANGE_TYPE_16	0x00
> -#define  PCI_IO_RANGE_TYPE_32	0x01
> -#define  PCI_IO_RANGE_MASK	(~0x0fUL)
> -#define PCI_SEC_STATUS		0x1e	/* Secondary status register, only bit 14 used */
> -#define PCI_MEMORY_BASE		0x20	/* Memory range behind */
> -#define PCI_MEMORY_LIMIT	0x22
> -#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
> -#define  PCI_MEMORY_RANGE_MASK	(~0x0fUL)
> -#define PCI_PREF_MEMORY_BASE	0x24	/* Prefetchable memory range behind */
> -#define PCI_PREF_MEMORY_LIMIT	0x26
> -#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
> -#define  PCI_PREF_RANGE_TYPE_32	0x00
> -#define  PCI_PREF_RANGE_TYPE_64	0x01
> -#define  PCI_PREF_RANGE_MASK	(~0x0fUL)
> -#define PCI_PREF_BASE_UPPER32	0x28	/* Upper half of prefetchable memory range */
> -#define PCI_PREF_LIMIT_UPPER32	0x2c
> -#define PCI_IO_BASE_UPPER16	0x30	/* Upper half of I/O addresses */
> -#define PCI_IO_LIMIT_UPPER16	0x32
> -/* 0x34 same as for htype 0 */
> -/* 0x35-0x3b is reserved */
> -#define PCI_ROM_ADDRESS1	0x38	/* Same as PCI_ROM_ADDRESS, but for htype 1 */
> -/* 0x3c-0x3d are same as for htype 0 */
> -#define PCI_BRIDGE_CONTROL	0x3e
> -#define  PCI_BRIDGE_CTL_PARITY	0x01	/* Enable parity detection on secondary interface */
> -#define  PCI_BRIDGE_CTL_SERR	0x02	/* The same for SERR forwarding */
> -#define  PCI_BRIDGE_CTL_ISA	0x04	/* Enable ISA mode */
> -#define  PCI_BRIDGE_CTL_VGA	0x08	/* Forward VGA addresses */
> -#define  PCI_BRIDGE_CTL_MASTER_ABORT	0x20  /* Report master aborts */
> -#define  PCI_BRIDGE_CTL_BUS_RESET	0x40	/* Secondary bus reset */
> -#define  PCI_BRIDGE_CTL_FAST_BACK	0x80	/* Fast Back2Back enabled on secondary interface */
> -
> -/* Header type 2 (CardBus bridges) */
> -#define PCI_CB_CAPABILITY_LIST	0x14
> -/* 0x15 reserved */
> -#define PCI_CB_SEC_STATUS	0x16	/* Secondary status */
> -#define PCI_CB_PRIMARY_BUS	0x18	/* PCI bus number */
> -#define PCI_CB_CARD_BUS		0x19	/* CardBus bus number */
> -#define PCI_CB_SUBORDINATE_BUS	0x1a	/* Subordinate bus number */
> -#define PCI_CB_LATENCY_TIMER	0x1b	/* CardBus latency timer */
> -#define PCI_CB_MEMORY_BASE_0	0x1c
> -#define PCI_CB_MEMORY_LIMIT_0	0x20
> -#define PCI_CB_MEMORY_BASE_1	0x24
> -#define PCI_CB_MEMORY_LIMIT_1	0x28
> -#define PCI_CB_IO_BASE_0	0x2c
> -#define PCI_CB_IO_BASE_0_HI	0x2e
> -#define PCI_CB_IO_LIMIT_0	0x30
> -#define PCI_CB_IO_LIMIT_0_HI	0x32
> -#define PCI_CB_IO_BASE_1	0x34
> -#define PCI_CB_IO_BASE_1_HI	0x36
> -#define PCI_CB_IO_LIMIT_1	0x38
> -#define PCI_CB_IO_LIMIT_1_HI	0x3a
> -#define  PCI_CB_IO_RANGE_MASK	(~0x03UL)
> -/* 0x3c-0x3d are same as for htype 0 */
> -#define PCI_CB_BRIDGE_CONTROL	0x3e
> -#define  PCI_CB_BRIDGE_CTL_PARITY	0x01	/* Similar to standard bridge control register */
> -#define  PCI_CB_BRIDGE_CTL_SERR		0x02
> -#define  PCI_CB_BRIDGE_CTL_ISA		0x04
> -#define  PCI_CB_BRIDGE_CTL_VGA		0x08
> -#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT	0x20
> -#define  PCI_CB_BRIDGE_CTL_CB_RESET	0x40	/* CardBus reset */
> -#define  PCI_CB_BRIDGE_CTL_16BIT_INT	0x80	/* Enable interrupt for 16-bit cards */
> -#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100	/* Prefetch enable for both memory regions */
> -#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
> -#define  PCI_CB_BRIDGE_CTL_POST_WRITES	0x400
> -#define PCI_CB_SUBSYSTEM_VENDOR_ID	0x40
> -#define PCI_CB_SUBSYSTEM_ID		0x42
> -#define PCI_CB_LEGACY_MODE_BASE		0x44	/* 16-bit PC Card legacy mode base address (ExCa) */
> -/* 0x48-0x7f reserved */
> -
> -/* Capability lists */
> -
> -#define PCI_CAP_LIST_ID		0	/* Capability ID */
> -#define  PCI_CAP_ID_PM		0x01	/* Power Management */
> -#define  PCI_CAP_ID_AGP		0x02	/* Accelerated Graphics Port */
> -#define  PCI_CAP_ID_VPD		0x03	/* Vital Product Data */
> -#define  PCI_CAP_ID_SLOTID	0x04	/* Slot Identification */
> -#define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
> -#define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
> -#define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
> -#define  PCI_CAP_ID_HT		0x08	/* HyperTransport */
> -#define  PCI_CAP_ID_VNDR	0x09	/* Vendor specific */
> -#define  PCI_CAP_ID_DBG		0x0A	/* Debug port */
> -#define  PCI_CAP_ID_CCRC	0x0B	/* CompactPCI Central Resource Control */
> -#define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
> -#define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
> -#define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> -#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
> -#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
> -#define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
> -#define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
> -#define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
> -#define PCI_CAP_SIZEOF		4
> -
> -/* Power Management Registers */
> -
> -#define PCI_PM_PMC		2	/* PM Capabilities Register */
> -#define  PCI_PM_CAP_VER_MASK	0x0007	/* Version */
> -#define  PCI_PM_CAP_PME_CLOCK	0x0008	/* PME clock required */
> -#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
> -#define  PCI_PM_CAP_DSI		0x0020	/* Device specific initialization */
> -#define  PCI_PM_CAP_AUX_POWER	0x01C0	/* Auxilliary power support mask */
> -#define  PCI_PM_CAP_D1		0x0200	/* D1 power state support */
> -#define  PCI_PM_CAP_D2		0x0400	/* D2 power state support */
> -#define  PCI_PM_CAP_PME		0x0800	/* PME pin supported */
> -#define  PCI_PM_CAP_PME_MASK	0xF800	/* PME Mask of all supported states */
> -#define  PCI_PM_CAP_PME_D0	0x0800	/* PME# from D0 */
> -#define  PCI_PM_CAP_PME_D1	0x1000	/* PME# from D1 */
> -#define  PCI_PM_CAP_PME_D2	0x2000	/* PME# from D2 */
> -#define  PCI_PM_CAP_PME_D3	0x4000	/* PME# from D3 (hot) */
> -#define  PCI_PM_CAP_PME_D3cold	0x8000	/* PME# from D3 (cold) */
> -#define  PCI_PM_CAP_PME_SHIFT	11	/* Start of the PME Mask in PMC */
> -#define PCI_PM_CTRL		4	/* PM control and status register */
> -#define  PCI_PM_CTRL_STATE_MASK	0x0003	/* Current power state (D0 to D3) */
> -#define  PCI_PM_CTRL_NO_SOFT_RESET	0x0008	/* No reset for D3hot->D0 */
> -#define  PCI_PM_CTRL_PME_ENABLE	0x0100	/* PME pin enable */
> -#define  PCI_PM_CTRL_DATA_SEL_MASK	0x1e00	/* Data select (??) */
> -#define  PCI_PM_CTRL_DATA_SCALE_MASK	0x6000	/* Data scale (??) */
> -#define  PCI_PM_CTRL_PME_STATUS	0x8000	/* PME pin status */
> -#define PCI_PM_PPB_EXTENSIONS	6	/* PPB support extensions (??) */
> -#define  PCI_PM_PPB_B2_B3	0x40	/* Stop clock when in D3hot (??) */
> -#define  PCI_PM_BPCC_ENABLE	0x80	/* Bus power/clock control enable (??) */
> -#define PCI_PM_DATA_REGISTER	7	/* (??) */
> -#define PCI_PM_SIZEOF		8
> -
> -/* AGP registers */
> -
> -#define PCI_AGP_VERSION		2	/* BCD version number */
> -#define PCI_AGP_RFU		3	/* Rest of capability flags */
> -#define PCI_AGP_STATUS		4	/* Status register */
> -#define  PCI_AGP_STATUS_RQ_MASK	0xff000000	/* Maximum number of requests - 1 */
> -#define  PCI_AGP_STATUS_SBA	0x0200	/* Sideband addressing supported */
> -#define  PCI_AGP_STATUS_64BIT	0x0020	/* 64-bit addressing supported */
> -#define  PCI_AGP_STATUS_FW	0x0010	/* FW transfers supported */
> -#define  PCI_AGP_STATUS_RATE4	0x0004	/* 4x transfer rate supported */
> -#define  PCI_AGP_STATUS_RATE2	0x0002	/* 2x transfer rate supported */
> -#define  PCI_AGP_STATUS_RATE1	0x0001	/* 1x transfer rate supported */
> -#define PCI_AGP_COMMAND		8	/* Control register */
> -#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
> -#define  PCI_AGP_COMMAND_SBA	0x0200	/* Sideband addressing enabled */
> -#define  PCI_AGP_COMMAND_AGP	0x0100	/* Allow processing of AGP transactions */
> -#define  PCI_AGP_COMMAND_64BIT	0x0020 	/* Allow processing of 64-bit addresses */
> -#define  PCI_AGP_COMMAND_FW	0x0010 	/* Force FW transfers */
> -#define  PCI_AGP_COMMAND_RATE4	0x0004	/* Use 4x rate */
> -#define  PCI_AGP_COMMAND_RATE2	0x0002	/* Use 2x rate */
> -#define  PCI_AGP_COMMAND_RATE1	0x0001	/* Use 1x rate */
> -#define PCI_AGP_SIZEOF		12
> -
> -/* Vital Product Data */
> -
> -#define PCI_VPD_ADDR		2	/* Address to access (15 bits!) */
> -#define  PCI_VPD_ADDR_MASK	0x7fff	/* Address mask */
> -#define  PCI_VPD_ADDR_F		0x8000	/* Write 0, 1 indicates completion */
> -#define PCI_VPD_DATA		4	/* 32-bits of data returned here */
> -
> -/* Slot Identification */
> -
> -#define PCI_SID_ESR		2	/* Expansion Slot Register */
> -#define  PCI_SID_ESR_NSLOTS	0x1f	/* Number of expansion slots available */
> -#define  PCI_SID_ESR_FIC	0x20	/* First In Chassis Flag */
> -#define PCI_SID_CHASSIS_NR	3	/* Chassis Number */
> -
> -/* Message Signalled Interrupts registers */
> -
> -#define PCI_MSI_FLAGS		2	/* Various flags */
> -#define  PCI_MSI_FLAGS_64BIT	0x80	/* 64-bit addresses allowed */
> -#define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
> -#define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
> -#define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
> -#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
> -#define PCI_MSI_RFU		3	/* Rest of capability flags */
> -#define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
> -#define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
> -#define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
> -#define PCI_MSI_MASK_32		12	/* Mask bits register for 32-bit devices */
> -#define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
> -#define PCI_MSI_MASK_64		16	/* Mask bits register for 64-bit devices */
> -
> -/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
> -#define PCI_MSIX_FLAGS		2
> -#define  PCI_MSIX_FLAGS_QSIZE	0x7FF
> -#define  PCI_MSIX_FLAGS_ENABLE	(1 << 15)
> -#define  PCI_MSIX_FLAGS_MASKALL	(1 << 14)
> -#define PCI_MSIX_FLAGS_BIRMASK	(7 << 0)
> -
> -/* CompactPCI Hotswap Register */
> -
> -#define PCI_CHSWP_CSR		2	/* Control and Status Register */
> -#define  PCI_CHSWP_DHA		0x01	/* Device Hiding Arm */
> -#define  PCI_CHSWP_EIM		0x02	/* ENUM# Signal Mask */
> -#define  PCI_CHSWP_PIE		0x04	/* Pending Insert or Extract */
> -#define  PCI_CHSWP_LOO		0x08	/* LED On / Off */
> -#define  PCI_CHSWP_PI		0x30	/* Programming Interface */
> -#define  PCI_CHSWP_EXT		0x40	/* ENUM# status - extraction */
> -#define  PCI_CHSWP_INS		0x80	/* ENUM# status - insertion */
> -
> -/* PCI Advanced Feature registers */
> -
> -#define PCI_AF_LENGTH		2
> -#define PCI_AF_CAP		3
> -#define  PCI_AF_CAP_TP		0x01
> -#define  PCI_AF_CAP_FLR		0x02
> -#define PCI_AF_CTRL		4
> -#define  PCI_AF_CTRL_FLR	0x01
> -#define PCI_AF_STATUS		5
> -#define  PCI_AF_STATUS_TP	0x01
> -
> -/* PCI-X registers */
> -
> -#define PCI_X_CMD		2	/* Modes & Features */
> -#define  PCI_X_CMD_DPERR_E	0x0001	/* Data Parity Error Recovery Enable */
> -#define  PCI_X_CMD_ERO		0x0002	/* Enable Relaxed Ordering */
> -#define  PCI_X_CMD_READ_512	0x0000	/* 512 byte maximum read byte count */
> -#define  PCI_X_CMD_READ_1K	0x0004	/* 1Kbyte maximum read byte count */
> -#define  PCI_X_CMD_READ_2K	0x0008	/* 2Kbyte maximum read byte count */
> -#define  PCI_X_CMD_READ_4K	0x000c	/* 4Kbyte maximum read byte count */
> -#define  PCI_X_CMD_MAX_READ	0x000c	/* Max Memory Read Byte Count */
> -				/* Max # of outstanding split transactions */
> -#define  PCI_X_CMD_SPLIT_1	0x0000	/* Max 1 */
> -#define  PCI_X_CMD_SPLIT_2	0x0010	/* Max 2 */
> -#define  PCI_X_CMD_SPLIT_3	0x0020	/* Max 3 */
> -#define  PCI_X_CMD_SPLIT_4	0x0030	/* Max 4 */
> -#define  PCI_X_CMD_SPLIT_8	0x0040	/* Max 8 */
> -#define  PCI_X_CMD_SPLIT_12	0x0050	/* Max 12 */
> -#define  PCI_X_CMD_SPLIT_16	0x0060	/* Max 16 */
> -#define  PCI_X_CMD_SPLIT_32	0x0070	/* Max 32 */
> -#define  PCI_X_CMD_MAX_SPLIT	0x0070	/* Max Outstanding Split Transactions */
> -#define  PCI_X_CMD_VERSION(x) 	(((x) >> 12) & 3) /* Version */
> -#define PCI_X_STATUS		4	/* PCI-X capabilities */
> -#define  PCI_X_STATUS_DEVFN	0x000000ff	/* A copy of devfn */
> -#define  PCI_X_STATUS_BUS	0x0000ff00	/* A copy of bus nr */
> -#define  PCI_X_STATUS_64BIT	0x00010000	/* 64-bit device */
> -#define  PCI_X_STATUS_133MHZ	0x00020000	/* 133 MHz capable */
> -#define  PCI_X_STATUS_SPL_DISC	0x00040000	/* Split Completion Discarded */
> -#define  PCI_X_STATUS_UNX_SPL	0x00080000	/* Unexpected Split Completion */
> -#define  PCI_X_STATUS_COMPLEX	0x00100000	/* Device Complexity */
> -#define  PCI_X_STATUS_MAX_READ	0x00600000	/* Designed Max Memory Read Count */
> -#define  PCI_X_STATUS_MAX_SPLIT	0x03800000	/* Designed Max Outstanding Split Transactions */
> -#define  PCI_X_STATUS_MAX_CUM	0x1c000000	/* Designed Max Cumulative Read Size */
> -#define  PCI_X_STATUS_SPL_ERR	0x20000000	/* Rcvd Split Completion Error Msg */
> -#define  PCI_X_STATUS_266MHZ	0x40000000	/* 266 MHz capable */
> -#define  PCI_X_STATUS_533MHZ	0x80000000	/* 533 MHz capable */
> -
> -/* PCI Express capability registers */
> -
> -#define PCI_EXP_FLAGS		2	/* Capabilities register */
> -#define PCI_EXP_FLAGS_VERS	0x000f	/* Capability version */
> -#define PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
> -#define  PCI_EXP_TYPE_ENDPOINT	0x0	/* Express Endpoint */
> -#define  PCI_EXP_TYPE_LEG_END	0x1	/* Legacy Endpoint */
> -#define  PCI_EXP_TYPE_ROOT_PORT 0x4	/* Root Port */
> -#define  PCI_EXP_TYPE_UPSTREAM	0x5	/* Upstream Port */
> -#define  PCI_EXP_TYPE_DOWNSTREAM 0x6	/* Downstream Port */
> -#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7	/* PCI/PCI-X Bridge */
> -#define  PCI_EXP_TYPE_RC_END	0x9	/* Root Complex Integrated Endpoint */
> -#define  PCI_EXP_TYPE_RC_EC	0x10	/* Root Complex Event Collector */
> -#define PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
> -#define PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
> -#define PCI_EXP_DEVCAP		4	/* Device capabilities */
> -#define  PCI_EXP_DEVCAP_PAYLOAD	0x07	/* Max_Payload_Size */
> -#define  PCI_EXP_DEVCAP_PHANTOM	0x18	/* Phantom functions */
> -#define  PCI_EXP_DEVCAP_EXT_TAG	0x20	/* Extended tags */
> -#define  PCI_EXP_DEVCAP_L0S	0x1c0	/* L0s Acceptable Latency */
> -#define  PCI_EXP_DEVCAP_L1	0xe00	/* L1 Acceptable Latency */
> -#define  PCI_EXP_DEVCAP_ATN_BUT	0x1000	/* Attention Button Present */
> -#define  PCI_EXP_DEVCAP_ATN_IND	0x2000	/* Attention Indicator Present */
> -#define  PCI_EXP_DEVCAP_PWR_IND	0x4000	/* Power Indicator Present */
> -#define  PCI_EXP_DEVCAP_RBER	0x8000	/* Role-Based Error Reporting */
> -#define  PCI_EXP_DEVCAP_PWR_VAL	0x3fc0000 /* Slot Power Limit Value */
> -#define  PCI_EXP_DEVCAP_PWR_SCL	0xc000000 /* Slot Power Limit Scale */
> -#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
> -#define PCI_EXP_DEVCTL		8	/* Device Control */
> -#define  PCI_EXP_DEVCTL_CERE	0x0001	/* Correctable Error Reporting En. */
> -#define  PCI_EXP_DEVCTL_NFERE	0x0002	/* Non-Fatal Error Reporting Enable */
> -#define  PCI_EXP_DEVCTL_FERE	0x0004	/* Fatal Error Reporting Enable */
> -#define  PCI_EXP_DEVCTL_URRE	0x0008	/* Unsupported Request Reporting En. */
> -#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
> -#define  PCI_EXP_DEVCTL_PAYLOAD	0x00e0	/* Max_Payload_Size */
> -#define  PCI_EXP_DEVCTL_EXT_TAG	0x0100	/* Extended Tag Field Enable */
> -#define  PCI_EXP_DEVCTL_PHANTOM	0x0200	/* Phantom Functions Enable */
> -#define  PCI_EXP_DEVCTL_AUX_PME	0x0400	/* Auxiliary Power PM Enable */
> -#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
> -#define  PCI_EXP_DEVCTL_READRQ	0x7000	/* Max_Read_Request_Size */
> -#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
> -#define PCI_EXP_DEVSTA		10	/* Device Status */
> -#define  PCI_EXP_DEVSTA_CED	0x01	/* Correctable Error Detected */
> -#define  PCI_EXP_DEVSTA_NFED	0x02	/* Non-Fatal Error Detected */
> -#define  PCI_EXP_DEVSTA_FED	0x04	/* Fatal Error Detected */
> -#define  PCI_EXP_DEVSTA_URD	0x08	/* Unsupported Request Detected */
> -#define  PCI_EXP_DEVSTA_AUXPD	0x10	/* AUX Power Detected */
> -#define  PCI_EXP_DEVSTA_TRPND	0x20	/* Transactions Pending */
> -#define PCI_EXP_LNKCAP		12	/* Link Capabilities */
> -#define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
> -#define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
> -#define  PCI_EXP_LNKCAP_ASPMS	0x00000c00 /* ASPM Support */
> -#define  PCI_EXP_LNKCAP_L0SEL	0x00007000 /* L0s Exit Latency */
> -#define  PCI_EXP_LNKCAP_L1EL	0x00038000 /* L1 Exit Latency */
> -#define  PCI_EXP_LNKCAP_CLKPM	0x00040000 /* L1 Clock Power Management */
> -#define  PCI_EXP_LNKCAP_SDERC	0x00080000 /* Suprise Down Error Reporting Capable */
> -#define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
> -#define  PCI_EXP_LNKCAP_LBNC	0x00200000 /* Link Bandwidth Notification Capability */
> -#define  PCI_EXP_LNKCAP_PN	0xff000000 /* Port Number */
> -#define PCI_EXP_LNKCTL		16	/* Link Control */
> -#define  PCI_EXP_LNKCTL_ASPMC	0x0003	/* ASPM Control */
> -#define  PCI_EXP_LNKCTL_RCB	0x0008	/* Read Completion Boundary */
> -#define  PCI_EXP_LNKCTL_LD	0x0010	/* Link Disable */
> -#define  PCI_EXP_LNKCTL_RL	0x0020	/* Retrain Link */
> -#define  PCI_EXP_LNKCTL_CCC	0x0040	/* Common Clock Configuration */
> -#define  PCI_EXP_LNKCTL_ES	0x0080	/* Extended Synch */
> -#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100	/* Enable clkreq */
> -#define  PCI_EXP_LNKCTL_HAWD	0x0200	/* Hardware Autonomous Width Disable */
> -#define  PCI_EXP_LNKCTL_LBMIE	0x0400	/* Link Bandwidth Management Interrupt Enable */
> -#define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Lnk Autonomous Bandwidth Interrupt Enable */
> -#define PCI_EXP_LNKSTA		18	/* Link Status */
> -#define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
> -#define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Nogotiated Link Width */
> -#define  PCI_EXP_LNKSTA_LT	0x0800	/* Link Training */
> -#define  PCI_EXP_LNKSTA_SLC	0x1000	/* Slot Clock Configuration */
> -#define  PCI_EXP_LNKSTA_DLLLA	0x2000	/* Data Link Layer Link Active */
> -#define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
> -#define  PCI_EXP_LNKSTA_LABS	0x8000	/* Link Autonomous Bandwidth Status */
> -#define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
> -#define  PCI_EXP_SLTCAP_ABP	0x00000001 /* Attention Button Present */
> -#define  PCI_EXP_SLTCAP_PCP	0x00000002 /* Power Controller Present */
> -#define  PCI_EXP_SLTCAP_MRLSP	0x00000004 /* MRL Sensor Present */
> -#define  PCI_EXP_SLTCAP_AIP	0x00000008 /* Attention Indicator Present */
> -#define  PCI_EXP_SLTCAP_PIP	0x00000010 /* Power Indicator Present */
> -#define  PCI_EXP_SLTCAP_HPS	0x00000020 /* Hot-Plug Surprise */
> -#define  PCI_EXP_SLTCAP_HPC	0x00000040 /* Hot-Plug Capable */
> -#define  PCI_EXP_SLTCAP_SPLV	0x00007f80 /* Slot Power Limit Value */
> -#define  PCI_EXP_SLTCAP_SPLS	0x00018000 /* Slot Power Limit Scale */
> -#define  PCI_EXP_SLTCAP_EIP	0x00020000 /* Electromechanical Interlock Present */
> -#define  PCI_EXP_SLTCAP_NCCS	0x00040000 /* No Command Completed Support */
> -#define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
> -#define PCI_EXP_SLTCTL		24	/* Slot Control */
> -#define  PCI_EXP_SLTCTL_ABPE	0x0001	/* Attention Button Pressed Enable */
> -#define  PCI_EXP_SLTCTL_PFDE	0x0002	/* Power Fault Detected Enable */
> -#define  PCI_EXP_SLTCTL_MRLSCE	0x0004	/* MRL Sensor Changed Enable */
> -#define  PCI_EXP_SLTCTL_PDCE	0x0008	/* Presence Detect Changed Enable */
> -#define  PCI_EXP_SLTCTL_CCIE	0x0010	/* Command Completed Interrupt Enable */
> -#define  PCI_EXP_SLTCTL_HPIE	0x0020	/* Hot-Plug Interrupt Enable */
> -#define  PCI_EXP_SLTCTL_AIC	0x00c0	/* Attention Indicator Control */
> -#define  PCI_EXP_SLTCTL_PIC	0x0300	/* Power Indicator Control */
> -#define  PCI_EXP_SLTCTL_PCC	0x0400	/* Power Controller Control */
> -#define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
> -#define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
> -#define PCI_EXP_SLTSTA		26	/* Slot Status */
> -#define  PCI_EXP_SLTSTA_ABP	0x0001	/* Attention Button Pressed */
> -#define  PCI_EXP_SLTSTA_PFD	0x0002	/* Power Fault Detected */
> -#define  PCI_EXP_SLTSTA_MRLSC	0x0004	/* MRL Sensor Changed */
> -#define  PCI_EXP_SLTSTA_PDC	0x0008	/* Presence Detect Changed */
> -#define  PCI_EXP_SLTSTA_CC	0x0010	/* Command Completed */
> -#define  PCI_EXP_SLTSTA_MRLSS	0x0020	/* MRL Sensor State */
> -#define  PCI_EXP_SLTSTA_PDS	0x0040	/* Presence Detect State */
> -#define  PCI_EXP_SLTSTA_EIS	0x0080	/* Electromechanical Interlock Status */
> -#define  PCI_EXP_SLTSTA_DLLSC	0x0100	/* Data Link Layer State Changed */
> -#define PCI_EXP_RTCTL		28	/* Root Control */
> -#define  PCI_EXP_RTCTL_SECEE	0x01	/* System Error on Correctable Error */
> -#define  PCI_EXP_RTCTL_SENFEE	0x02	/* System Error on Non-Fatal Error */
> -#define  PCI_EXP_RTCTL_SEFEE	0x04	/* System Error on Fatal Error */
> -#define  PCI_EXP_RTCTL_PMEIE	0x08	/* PME Interrupt Enable */
> -#define  PCI_EXP_RTCTL_CRSSVE	0x10	/* CRS Software Visibility Enable */
> -#define PCI_EXP_RTCAP		30	/* Root Capabilities */
> -#define PCI_EXP_RTSTA		32	/* Root Status */
> -#define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
> -#define  PCI_EXP_DEVCAP2_ARI	0x20	/* Alternative Routing-ID */
> -#define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
> -#define  PCI_EXP_DEVCTL2_ARI	0x20	/* Alternative Routing-ID */
> -#define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
> -#define PCI_EXP_SLTCTL2		56	/* Slot Control 2 */
> -
> -/* Extended Capabilities (PCI-X 2.0 and Express) */
> -#define PCI_EXT_CAP_ID(header)		(header & 0x0000ffff)
> -#define PCI_EXT_CAP_VER(header)		((header >> 16) & 0xf)
> -#define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
> -
> -#define PCI_EXT_CAP_ID_ERR	1
> -#define PCI_EXT_CAP_ID_VC	2
> -#define PCI_EXT_CAP_ID_DSN	3
> -#define PCI_EXT_CAP_ID_PWR	4
> -#define PCI_EXT_CAP_ID_ARI	14
> -#define PCI_EXT_CAP_ID_ATS	15
> -#define PCI_EXT_CAP_ID_SRIOV	16
> -
> -/* Advanced Error Reporting */
> -#define PCI_ERR_UNCOR_STATUS	4	/* Uncorrectable Error Status */
> -#define  PCI_ERR_UNC_TRAIN	0x00000001	/* Training */
> -#define  PCI_ERR_UNC_DLP	0x00000010	/* Data Link Protocol */
> -#define  PCI_ERR_UNC_POISON_TLP	0x00001000	/* Poisoned TLP */
> -#define  PCI_ERR_UNC_FCP	0x00002000	/* Flow Control Protocol */
> -#define  PCI_ERR_UNC_COMP_TIME	0x00004000	/* Completion Timeout */
> -#define  PCI_ERR_UNC_COMP_ABORT	0x00008000	/* Completer Abort */
> -#define  PCI_ERR_UNC_UNX_COMP	0x00010000	/* Unexpected Completion */
> -#define  PCI_ERR_UNC_RX_OVER	0x00020000	/* Receiver Overflow */
> -#define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
> -#define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
> -#define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
> -#define PCI_ERR_UNCOR_MASK	8	/* Uncorrectable Error Mask */
> -	/* Same bits as above */
> -#define PCI_ERR_UNCOR_SEVER	12	/* Uncorrectable Error Severity */
> -	/* Same bits as above */
> -#define PCI_ERR_COR_STATUS	16	/* Correctable Error Status */
> -#define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
> -#define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
> -#define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
> -#define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
> -#define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
> -#define PCI_ERR_COR_MASK	20	/* Correctable Error Mask */
> -	/* Same bits as above */
> -#define PCI_ERR_CAP		24	/* Advanced Error Capabilities */
> -#define  PCI_ERR_CAP_FEP(x)	((x) & 31)	/* First Error Pointer */
> -#define  PCI_ERR_CAP_ECRC_GENC	0x00000020	/* ECRC Generation Capable */
> -#define  PCI_ERR_CAP_ECRC_GENE	0x00000040	/* ECRC Generation Enable */
> -#define  PCI_ERR_CAP_ECRC_CHKC	0x00000080	/* ECRC Check Capable */
> -#define  PCI_ERR_CAP_ECRC_CHKE	0x00000100	/* ECRC Check Enable */
> -#define PCI_ERR_HEADER_LOG	28	/* Header Log Register (16 bytes) */
> -#define PCI_ERR_ROOT_COMMAND	44	/* Root Error Command */
> -/* Correctable Err Reporting Enable */
> -#define PCI_ERR_ROOT_CMD_COR_EN		0x00000001
> -/* Non-fatal Err Reporting Enable */
> -#define PCI_ERR_ROOT_CMD_NONFATAL_EN	0x00000002
> -/* Fatal Err Reporting Enable */
> -#define PCI_ERR_ROOT_CMD_FATAL_EN	0x00000004
> -#define PCI_ERR_ROOT_STATUS	48
> -#define PCI_ERR_ROOT_COR_RCV		0x00000001	/* ERR_COR Received */
> -/* Multi ERR_COR Received */
> -#define PCI_ERR_ROOT_MULTI_COR_RCV	0x00000002
> -/* ERR_FATAL/NONFATAL Recevied */
> -#define PCI_ERR_ROOT_UNCOR_RCV		0x00000004
> -/* Multi ERR_FATAL/NONFATAL Recevied */
> -#define PCI_ERR_ROOT_MULTI_UNCOR_RCV	0x00000008
> -#define PCI_ERR_ROOT_FIRST_FATAL	0x00000010	/* First Fatal */
> -#define PCI_ERR_ROOT_NONFATAL_RCV	0x00000020	/* Non-Fatal Received */
> -#define PCI_ERR_ROOT_FATAL_RCV		0x00000040	/* Fatal Received */
> -#define PCI_ERR_ROOT_COR_SRC	52
> -#define PCI_ERR_ROOT_SRC	54
> -
> -/* Virtual Channel */
> -#define PCI_VC_PORT_REG1	4
> -#define PCI_VC_PORT_REG2	8
> -#define PCI_VC_PORT_CTRL	12
> -#define PCI_VC_PORT_STATUS	14
> -#define PCI_VC_RES_CAP		16
> -#define PCI_VC_RES_CTRL		20
> -#define PCI_VC_RES_STATUS	26
> -
> -/* Power Budgeting */
> -#define PCI_PWR_DSR		4	/* Data Select Register */
> -#define PCI_PWR_DATA		8	/* Data Register */
> -#define  PCI_PWR_DATA_BASE(x)	((x) & 0xff)	    /* Base Power */
> -#define  PCI_PWR_DATA_SCALE(x)	(((x) >> 8) & 3)    /* Data Scale */
> -#define  PCI_PWR_DATA_PM_SUB(x)	(((x) >> 10) & 7)   /* PM Sub State */
> -#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
> -#define  PCI_PWR_DATA_TYPE(x)	(((x) >> 15) & 7)   /* Type */
> -#define  PCI_PWR_DATA_RAIL(x)	(((x) >> 18) & 7)   /* Power Rail */
> -#define PCI_PWR_CAP		12	/* Capability */
> -#define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
> -
> -/*
> - * Hypertransport sub capability types
> - *
> - * Unfortunately there are both 3 bit and 5 bit capability types defined
> - * in the HT spec, catering for that is a little messy. You probably don't
> - * want to use these directly, just use pci_find_ht_capability() and it
> - * will do the right thing for you.
> - */
> -#define HT_3BIT_CAP_MASK	0xE0
> -#define HT_CAPTYPE_SLAVE	0x00	/* Slave/Primary link configuration */
> -#define HT_CAPTYPE_HOST		0x20	/* Host/Secondary link configuration */
> -
> -#define HT_5BIT_CAP_MASK	0xF8
> -#define HT_CAPTYPE_IRQ		0x80	/* IRQ Configuration */
> -#define HT_CAPTYPE_REMAPPING_40	0xA0	/* 40 bit address remapping */
> -#define HT_CAPTYPE_REMAPPING_64 0xA2	/* 64 bit address remapping */
> -#define HT_CAPTYPE_UNITID_CLUMP	0x90	/* Unit ID clumping */
> -#define HT_CAPTYPE_EXTCONF	0x98	/* Extended Configuration Space Access */
> -#define HT_CAPTYPE_MSI_MAPPING	0xA8	/* MSI Mapping Capability */
> -#define  HT_MSI_FLAGS		0x02		/* Offset to flags */
> -#define  HT_MSI_FLAGS_ENABLE	0x1		/* Mapping enable */
> -#define  HT_MSI_FLAGS_FIXED	0x2		/* Fixed mapping only */
> -#define  HT_MSI_FIXED_ADDR	0x00000000FEE00000ULL	/* Fixed addr */
> -#define  HT_MSI_ADDR_LO		0x04		/* Offset to low addr bits */
> -#define  HT_MSI_ADDR_LO_MASK	0xFFF00000	/* Low address bit mask */
> -#define  HT_MSI_ADDR_HI		0x08		/* Offset to high addr bits */
> -#define HT_CAPTYPE_DIRECT_ROUTE	0xB0	/* Direct routing configuration */
> -#define HT_CAPTYPE_VCSET	0xB8	/* Virtual Channel configuration */
> -#define HT_CAPTYPE_ERROR_RETRY	0xC0	/* Retry on error configuration */
> -#define HT_CAPTYPE_GEN3		0xD0	/* Generation 3 hypertransport configuration */
> -#define HT_CAPTYPE_PM		0xE0	/* Hypertransport powermanagement configuration */
> -
> -/* Alternative Routing-ID Interpretation */
> -#define PCI_ARI_CAP		0x04	/* ARI Capability Register */
> -#define  PCI_ARI_CAP_MFVC	0x0001	/* MFVC Function Groups Capability */
> -#define  PCI_ARI_CAP_ACS	0x0002	/* ACS Function Groups Capability */
> -#define  PCI_ARI_CAP_NFN(x)	(((x) >> 8) & 0xff) /* Next Function Number */
> -#define PCI_ARI_CTRL		0x06	/* ARI Control Register */
> -#define  PCI_ARI_CTRL_MFVC	0x0001	/* MFVC Function Groups Enable */
> -#define  PCI_ARI_CTRL_ACS	0x0002	/* ACS Function Groups Enable */
> -#define  PCI_ARI_CTRL_FG(x)	(((x) >> 4) & 7) /* Function Group */
> -
> -/* Address Translation Service */
> -#define PCI_ATS_CAP		0x04	/* ATS Capability Register */
> -#define  PCI_ATS_CAP_QDEP(x)	((x) & 0x1f)	/* Invalidate Queue Depth */
> -#define  PCI_ATS_MAX_QDEP	32	/* Max Invalidate Queue Depth */
> -#define PCI_ATS_CTRL		0x06	/* ATS Control Register */
> -#define  PCI_ATS_CTRL_ENABLE	0x8000	/* ATS Enable */
> -#define  PCI_ATS_CTRL_STU(x)	((x) & 0x1f)	/* Smallest Translation Unit */
> -#define  PCI_ATS_MIN_STU	12	/* shift of minimum STU block */
> -
> -/* Single Root I/O Virtualization */
> -#define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
> -#define  PCI_SRIOV_CAP_VFM	0x01	/* VF Migration Capable */
> -#define  PCI_SRIOV_CAP_INTR(x)	((x) >> 21) /* Interrupt Message Number */
> -#define PCI_SRIOV_CTRL		0x08	/* SR-IOV Control */
> -#define  PCI_SRIOV_CTRL_VFE	0x01	/* VF Enable */
> -#define  PCI_SRIOV_CTRL_VFM	0x02	/* VF Migration Enable */
> -#define  PCI_SRIOV_CTRL_INTR	0x04	/* VF Migration Interrupt Enable */
> -#define  PCI_SRIOV_CTRL_MSE	0x08	/* VF Memory Space Enable */
> -#define  PCI_SRIOV_CTRL_ARI	0x10	/* ARI Capable Hierarchy */
> -#define PCI_SRIOV_STATUS	0x0a	/* SR-IOV Status */
> -#define  PCI_SRIOV_STATUS_VFM	0x01	/* VF Migration Status */
> -#define PCI_SRIOV_INITIAL_VF	0x0c	/* Initial VFs */
> -#define PCI_SRIOV_TOTAL_VF	0x0e	/* Total VFs */
> -#define PCI_SRIOV_NUM_VF	0x10	/* Number of VFs */
> -#define PCI_SRIOV_FUNC_LINK	0x12	/* Function Dependency Link */
> -#define PCI_SRIOV_VF_OFFSET	0x14	/* First VF Offset */
> -#define PCI_SRIOV_VF_STRIDE	0x16	/* Following VF Stride */
> -#define PCI_SRIOV_VF_DID	0x1a	/* VF Device ID */
> -#define PCI_SRIOV_SUP_PGSIZE	0x1c	/* Supported Page Sizes */
> -#define PCI_SRIOV_SYS_PGSIZE	0x20	/* System Page Size */
> -#define PCI_SRIOV_BAR		0x24	/* VF BAR0 */
> -#define  PCI_SRIOV_NUM_BARS	6	/* Number of VF BARs */
> -#define PCI_SRIOV_VFM		0x3c	/* VF Migration State Array Offset*/
> -#define  PCI_SRIOV_VFM_BIR(x)	((x) & 7)	/* State BIR */
> -#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)	/* State Offset */
> -#define  PCI_SRIOV_VFM_UA	0x0	/* Inactive.Unavailable */
> -#define  PCI_SRIOV_VFM_MI	0x1	/* Dormant.MigrateIn */
> -#define  PCI_SRIOV_VFM_MO	0x2	/* Active.MigrateOut */
> -#define  PCI_SRIOV_VFM_AV	0x3	/* Active.Available */
> -
> -#endif /* LINUX_PCI_REGS_H */
> +/*
> + *      pci_regs.h
> + *
> + *      PCI standard defines
> + *      Copyright 1994, Drew Eckhardt
> + *      Copyright 1997--1999 Martin Mares <mj@ucw.cz>
> + *
> + *      For more information, please consult the following manuals (look at
> + *      http://www.pcisig.com/ for how to get them):
> + *
> + *      PCI BIOS Specification
> + *      PCI Local Bus Specification
> + *      PCI to PCI Bridge Specification
> + *      PCI System Design Guide
> + *
> + *      For hypertransport information, please consult the following manuals
> + *      from http://www.hypertransport.org
> + *
> + *      The Hypertransport I/O Link Specification
> + */
> +
> +#ifndef LINUX_PCI_REGS_H
> +#define LINUX_PCI_REGS_H
> +
> +/*
> + * Under PCI, each device has 256 bytes of configuration address space,
> + * of which the first 64 bytes are standardized as follows:
> + */
> +#define PCI_VENDOR_ID           0x00    /* 16 bits */
> +#define PCI_DEVICE_ID           0x02    /* 16 bits */
> +#define PCI_COMMAND             0x04    /* 16 bits */
> +#define  PCI_COMMAND_IO         0x1     /* Enable response in I/O space */
> +#define  PCI_COMMAND_MEMORY     0x2     /* Enable response in Memory space */
> +#define  PCI_COMMAND_MASTER     0x4     /* Enable bus mastering */
> +#define  PCI_COMMAND_SPECIAL    0x8     /* Enable response to special cycles */
> +#define  PCI_COMMAND_INVALIDATE 0x10    /* Use memory write and invalidate */
> +#define  PCI_COMMAND_VGA_PALETTE 0x20   /* Enable palette snooping */
> +#define  PCI_COMMAND_PARITY     0x40    /* Enable parity checking */
> +#define  PCI_COMMAND_WAIT       0x80    /* Enable address/data stepping */
> +#define  PCI_COMMAND_SERR       0x100   /* Enable SERR */
> +#define  PCI_COMMAND_FAST_BACK  0x200   /* Enable back-to-back writes */
> +#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
> +
> +#define PCI_STATUS              0x06    /* 16 bits */
> +#define  PCI_STATUS_INTERRUPT   0x08    /* Interrupt status */
> +#define  PCI_STATUS_CAP_LIST    0x10    /* Support Capability List */
> +#define  PCI_STATUS_66MHZ       0x20    /* Support 66 Mhz PCI 2.1 bus */
> +#define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
> +#define  PCI_STATUS_FAST_BACK   0x80    /* Accept fast-back to back */
> +#define  PCI_STATUS_PARITY      0x100   /* Detected parity error */
> +#define  PCI_STATUS_DEVSEL_MASK 0x600   /* DEVSEL timing */
> +#define  PCI_STATUS_DEVSEL_FAST         0x000
> +#define  PCI_STATUS_DEVSEL_MEDIUM       0x200
> +#define  PCI_STATUS_DEVSEL_SLOW         0x400
> +#define  PCI_STATUS_SIG_TARGET_ABORT    0x800 /* Set on target abort */
> +#define  PCI_STATUS_REC_TARGET_ABORT    0x1000 /* Master ack of " */
> +#define  PCI_STATUS_REC_MASTER_ABORT    0x2000 /* Set on master abort */
> +#define  PCI_STATUS_SIG_SYSTEM_ERROR    0x4000 /* Set when we drive SERR */
> +#define  PCI_STATUS_DETECTED_PARITY     0x8000 /* Set on parity error */
> +
> +#define PCI_CLASS_REVISION      0x08    /* High 24 bits are class, low 8 revision */
> +#define PCI_REVISION_ID         0x08    /* Revision ID */
> +#define PCI_CLASS_PROG          0x09    /* Reg. Level Programming Interface */
> +#define PCI_CLASS_DEVICE        0x0a    /* Device class */
> +
> +#define PCI_CACHE_LINE_SIZE     0x0c    /* 8 bits */
> +#define PCI_LATENCY_TIMER       0x0d    /* 8 bits */
> +#define PCI_HEADER_TYPE         0x0e    /* 8 bits */
> +#define  PCI_HEADER_TYPE_NORMAL         0
> +#define  PCI_HEADER_TYPE_BRIDGE         1
> +#define  PCI_HEADER_TYPE_CARDBUS        2
> +
> +#define PCI_BIST                0x0f    /* 8 bits */
> +#define  PCI_BIST_CODE_MASK     0x0f    /* Return result */
> +#define  PCI_BIST_START         0x40    /* 1 to start BIST, 2 secs or less */
> +#define  PCI_BIST_CAPABLE       0x80    /* 1 if BIST capable */
> +
> +/*
> + * Base addresses specify locations in memory or I/O space.
> + * Decoded size can be determined by writing a value of
> + * 0xffffffff to the register, and reading it back.  Only
> + * 1 bits are decoded.
> + */
> +#define PCI_BASE_ADDRESS_0      0x10    /* 32 bits */
> +#define PCI_BASE_ADDRESS_1      0x14    /* 32 bits [htype 0,1 only] */
> +#define PCI_BASE_ADDRESS_2      0x18    /* 32 bits [htype 0 only] */
> +#define PCI_BASE_ADDRESS_3      0x1c    /* 32 bits */
> +#define PCI_BASE_ADDRESS_4      0x20    /* 32 bits */
> +#define PCI_BASE_ADDRESS_5      0x24    /* 32 bits */
> +#define  PCI_BASE_ADDRESS_SPACE         0x01    /* 0 = memory, 1 = I/O */
> +#define  PCI_BASE_ADDRESS_SPACE_IO      0x01
> +#define  PCI_BASE_ADDRESS_SPACE_MEMORY  0x00
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_MASK 0x06
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_32   0x00    /* 32 bit address */
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_1M   0x02    /* Below 1M [obsolete] */
> +#define  PCI_BASE_ADDRESS_MEM_TYPE_64   0x04    /* 64 bit address */
> +#define  PCI_BASE_ADDRESS_MEM_PREFETCH  0x08    /* prefetchable? */
> +#define  PCI_BASE_ADDRESS_MEM_MASK      (~0x0fUL)
> +#define  PCI_BASE_ADDRESS_IO_MASK       (~0x03UL)
> +/* bit 1 is reserved if address_space = 1 */
> +
> +/* Header type 0 (normal devices) */
> +#define PCI_CARDBUS_CIS         0x28
> +#define PCI_SUBSYSTEM_VENDOR_ID 0x2c
> +#define PCI_SUBSYSTEM_ID        0x2e
> +#define PCI_ROM_ADDRESS         0x30    /* Bits 31..11 are address, 10..1 reserved */
> +#define  PCI_ROM_ADDRESS_ENABLE 0x01
> +#define PCI_ROM_ADDRESS_MASK    (~0x7ffUL)
> +
> +#define PCI_CAPABILITY_LIST     0x34    /* Offset of first capability list entry */
> +
> +/* 0x35-0x3b are reserved */
> +#define PCI_INTERRUPT_LINE      0x3c    /* 8 bits */
> +#define PCI_INTERRUPT_PIN       0x3d    /* 8 bits */
> +#define PCI_MIN_GNT             0x3e    /* 8 bits */
> +#define PCI_MAX_LAT             0x3f    /* 8 bits */
> +
> +/* Header type 1 (PCI-to-PCI bridges) */
> +#define PCI_PRIMARY_BUS         0x18    /* Primary bus number */
> +#define PCI_SECONDARY_BUS       0x19    /* Secondary bus number */
> +#define PCI_SUBORDINATE_BUS     0x1a    /* Highest bus number behind the bridge */
> +#define PCI_SEC_LATENCY_TIMER   0x1b    /* Latency timer for secondary interface */
> +#define PCI_IO_BASE             0x1c    /* I/O range behind the bridge */
> +#define PCI_IO_LIMIT            0x1d
> +#define  PCI_IO_RANGE_TYPE_MASK 0x0fUL  /* I/O bridging type */
> +#define  PCI_IO_RANGE_TYPE_16   0x00
> +#define  PCI_IO_RANGE_TYPE_32   0x01
> +#define  PCI_IO_RANGE_MASK      (~0x0fUL)
> +#define PCI_SEC_STATUS          0x1e    /* Secondary status register, only bit 14 used */
> +#define PCI_MEMORY_BASE         0x20    /* Memory range behind */
> +#define PCI_MEMORY_LIMIT        0x22
> +#define  PCI_MEMORY_RANGE_TYPE_MASK 0x0fUL
> +#define  PCI_MEMORY_RANGE_MASK  (~0x0fUL)
> +#define PCI_PREF_MEMORY_BASE    0x24    /* Prefetchable memory range behind */
> +#define PCI_PREF_MEMORY_LIMIT   0x26
> +#define  PCI_PREF_RANGE_TYPE_MASK 0x0fUL
> +#define  PCI_PREF_RANGE_TYPE_32 0x00
> +#define  PCI_PREF_RANGE_TYPE_64 0x01
> +#define  PCI_PREF_RANGE_MASK    (~0x0fUL)
> +#define PCI_PREF_BASE_UPPER32   0x28    /* Upper half of prefetchable memory range */
> +#define PCI_PREF_LIMIT_UPPER32  0x2c
> +#define PCI_IO_BASE_UPPER16     0x30    /* Upper half of I/O addresses */
> +#define PCI_IO_LIMIT_UPPER16    0x32
> +/* 0x34 same as for htype 0 */
> +/* 0x35-0x3b is reserved */
> +#define PCI_ROM_ADDRESS1        0x38    /* Same as PCI_ROM_ADDRESS, but for htype 1 */
> +/* 0x3c-0x3d are same as for htype 0 */
> +#define PCI_BRIDGE_CONTROL      0x3e
> +#define  PCI_BRIDGE_CTL_PARITY  0x01    /* Enable parity detection on secondary interface */
> +#define  PCI_BRIDGE_CTL_SERR    0x02    /* The same for SERR forwarding */
> +#define  PCI_BRIDGE_CTL_ISA     0x04    /* Enable ISA mode */
> +#define  PCI_BRIDGE_CTL_VGA     0x08    /* Forward VGA addresses */
> +#define  PCI_BRIDGE_CTL_MASTER_ABORT    0x20  /* Report master aborts */
> +#define  PCI_BRIDGE_CTL_BUS_RESET       0x40    /* Secondary bus reset */
> +#define  PCI_BRIDGE_CTL_FAST_BACK       0x80    /* Fast Back2Back enabled on secondary interface */
> +
> +/* Header type 2 (CardBus bridges) */
> +#define PCI_CB_CAPABILITY_LIST  0x14
> +/* 0x15 reserved */
> +#define PCI_CB_SEC_STATUS       0x16    /* Secondary status */
> +#define PCI_CB_PRIMARY_BUS      0x18    /* PCI bus number */
> +#define PCI_CB_CARD_BUS         0x19    /* CardBus bus number */
> +#define PCI_CB_SUBORDINATE_BUS  0x1a    /* Subordinate bus number */
> +#define PCI_CB_LATENCY_TIMER    0x1b    /* CardBus latency timer */
> +#define PCI_CB_MEMORY_BASE_0    0x1c
> +#define PCI_CB_MEMORY_LIMIT_0   0x20
> +#define PCI_CB_MEMORY_BASE_1    0x24
> +#define PCI_CB_MEMORY_LIMIT_1   0x28
> +#define PCI_CB_IO_BASE_0        0x2c
> +#define PCI_CB_IO_BASE_0_HI     0x2e
> +#define PCI_CB_IO_LIMIT_0       0x30
> +#define PCI_CB_IO_LIMIT_0_HI    0x32
> +#define PCI_CB_IO_BASE_1        0x34
> +#define PCI_CB_IO_BASE_1_HI     0x36
> +#define PCI_CB_IO_LIMIT_1       0x38
> +#define PCI_CB_IO_LIMIT_1_HI    0x3a
> +#define  PCI_CB_IO_RANGE_MASK   (~0x03UL)
> +/* 0x3c-0x3d are same as for htype 0 */
> +#define PCI_CB_BRIDGE_CONTROL   0x3e
> +#define  PCI_CB_BRIDGE_CTL_PARITY       0x01    /* Similar to standard bridge control register */
> +#define  PCI_CB_BRIDGE_CTL_SERR         0x02
> +#define  PCI_CB_BRIDGE_CTL_ISA          0x04
> +#define  PCI_CB_BRIDGE_CTL_VGA          0x08
> +#define  PCI_CB_BRIDGE_CTL_MASTER_ABORT 0x20
> +#define  PCI_CB_BRIDGE_CTL_CB_RESET     0x40    /* CardBus reset */
> +#define  PCI_CB_BRIDGE_CTL_16BIT_INT    0x80    /* Enable interrupt for 16-bit cards */
> +#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100  /* Prefetch enable for both memory regions */
> +#define  PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200
> +#define  PCI_CB_BRIDGE_CTL_POST_WRITES  0x400
> +#define PCI_CB_SUBSYSTEM_VENDOR_ID      0x40
> +#define PCI_CB_SUBSYSTEM_ID             0x42
> +#define PCI_CB_LEGACY_MODE_BASE         0x44    /* 16-bit PC Card legacy mode base address (ExCa) */
> +/* 0x48-0x7f reserved */
> +
> +/* Capability lists */
> +
> +#define PCI_CAP_LIST_ID         0       /* Capability ID */
> +#define  PCI_CAP_ID_PM          0x01    /* Power Management */
> +#define  PCI_CAP_ID_AGP         0x02    /* Accelerated Graphics Port */
> +#define  PCI_CAP_ID_VPD         0x03    /* Vital Product Data */
> +#define  PCI_CAP_ID_SLOTID      0x04    /* Slot Identification */
> +#define  PCI_CAP_ID_MSI         0x05    /* Message Signalled Interrupts */
> +#define  PCI_CAP_ID_CHSWP       0x06    /* CompactPCI HotSwap */
> +#define  PCI_CAP_ID_PCIX        0x07    /* PCI-X */
> +#define  PCI_CAP_ID_HT          0x08    /* HyperTransport */
> +#define  PCI_CAP_ID_VNDR        0x09    /* Vendor specific */
> +#define  PCI_CAP_ID_DBG         0x0A    /* Debug port */
> +#define  PCI_CAP_ID_CCRC        0x0B    /* CompactPCI Central Resource Control */
> +#define  PCI_CAP_ID_SHPC        0x0C    /* PCI Standard Hot-Plug Controller */
> +#define  PCI_CAP_ID_SSVID       0x0D    /* Bridge subsystem vendor/device ID */
> +#define  PCI_CAP_ID_AGP3        0x0E    /* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_EXP         0x10    /* PCI Express */
> +#define  PCI_CAP_ID_MSIX        0x11    /* MSI-X */
> +#define  PCI_CAP_ID_AF          0x13    /* PCI Advanced Features */
> +#define PCI_CAP_LIST_NEXT       1       /* Next capability in the list */
> +#define PCI_CAP_FLAGS           2       /* Capability defined flags (16 bits) */
> +#define PCI_CAP_SIZEOF          4
> +
> +/* Power Management Registers */
> +
> +#define PCI_PM_PMC              2       /* PM Capabilities Register */
> +#define  PCI_PM_CAP_VER_MASK    0x0007  /* Version */
> +#define  PCI_PM_CAP_PME_CLOCK   0x0008  /* PME clock required */
> +#define  PCI_PM_CAP_RESERVED    0x0010  /* Reserved field */
> +#define  PCI_PM_CAP_DSI         0x0020  /* Device specific initialization */
> +#define  PCI_PM_CAP_AUX_POWER   0x01C0  /* Auxilliary power support mask */
> +#define  PCI_PM_CAP_D1          0x0200  /* D1 power state support */
> +#define  PCI_PM_CAP_D2          0x0400  /* D2 power state support */
> +#define  PCI_PM_CAP_PME         0x0800  /* PME pin supported */
> +#define  PCI_PM_CAP_PME_MASK    0xF800  /* PME Mask of all supported states */
> +#define  PCI_PM_CAP_PME_D0      0x0800  /* PME# from D0 */
> +#define  PCI_PM_CAP_PME_D1      0x1000  /* PME# from D1 */
> +#define  PCI_PM_CAP_PME_D2      0x2000  /* PME# from D2 */
> +#define  PCI_PM_CAP_PME_D3      0x4000  /* PME# from D3 (hot) */
> +#define  PCI_PM_CAP_PME_D3cold  0x8000  /* PME# from D3 (cold) */
> +#define  PCI_PM_CAP_PME_SHIFT   11      /* Start of the PME Mask in PMC */
> +#define PCI_PM_CTRL             4       /* PM control and status register */
> +#define  PCI_PM_CTRL_STATE_MASK 0x0003  /* Current power state (D0 to D3) */
> +#define  PCI_PM_CTRL_NO_SOFT_RESET      0x0008  /* No reset for D3hot->D0 */
> +#define  PCI_PM_CTRL_PME_ENABLE 0x0100  /* PME pin enable */
> +#define  PCI_PM_CTRL_DATA_SEL_MASK      0x1e00  /* Data select (??) */
> +#define  PCI_PM_CTRL_DATA_SCALE_MASK    0x6000  /* Data scale (??) */
> +#define  PCI_PM_CTRL_PME_STATUS 0x8000  /* PME pin status */
> +#define PCI_PM_PPB_EXTENSIONS   6       /* PPB support extensions (??) */
> +#define  PCI_PM_PPB_B2_B3       0x40    /* Stop clock when in D3hot (??) */
> +#define  PCI_PM_BPCC_ENABLE     0x80    /* Bus power/clock control enable (??) */
> +#define PCI_PM_DATA_REGISTER    7       /* (??) */
> +#define PCI_PM_SIZEOF           8
> +
> +/* AGP registers */
> +
> +#define PCI_AGP_VERSION         2       /* BCD version number */
> +#define PCI_AGP_RFU             3       /* Rest of capability flags */
> +#define PCI_AGP_STATUS          4       /* Status register */
> +#define  PCI_AGP_STATUS_RQ_MASK 0xff000000      /* Maximum number of requests - 1 */
> +#define  PCI_AGP_STATUS_SBA     0x0200  /* Sideband addressing supported */
> +#define  PCI_AGP_STATUS_64BIT   0x0020  /* 64-bit addressing supported */
> +#define  PCI_AGP_STATUS_FW      0x0010  /* FW transfers supported */
> +#define  PCI_AGP_STATUS_RATE4   0x0004  /* 4x transfer rate supported */
> +#define  PCI_AGP_STATUS_RATE2   0x0002  /* 2x transfer rate supported */
> +#define  PCI_AGP_STATUS_RATE1   0x0001  /* 1x transfer rate supported */
> +#define PCI_AGP_COMMAND         8       /* Control register */
> +#define  PCI_AGP_COMMAND_RQ_MASK 0xff000000  /* Master: Maximum number of requests */
> +#define  PCI_AGP_COMMAND_SBA    0x0200  /* Sideband addressing enabled */
> +#define  PCI_AGP_COMMAND_AGP    0x0100  /* Allow processing of AGP transactions */
> +#define  PCI_AGP_COMMAND_64BIT  0x0020  /* Allow processing of 64-bit addresses */
> +#define  PCI_AGP_COMMAND_FW     0x0010  /* Force FW transfers */
> +#define  PCI_AGP_COMMAND_RATE4  0x0004  /* Use 4x rate */
> +#define  PCI_AGP_COMMAND_RATE2  0x0002  /* Use 2x rate */
> +#define  PCI_AGP_COMMAND_RATE1  0x0001  /* Use 1x rate */
> +#define PCI_AGP_SIZEOF          12
> +
> +/* Vital Product Data */
> +
> +#define PCI_VPD_ADDR            2       /* Address to access (15 bits!) */
> +#define  PCI_VPD_ADDR_MASK      0x7fff  /* Address mask */
> +#define  PCI_VPD_ADDR_F         0x8000  /* Write 0, 1 indicates completion */
> +#define PCI_VPD_DATA            4       /* 32-bits of data returned here */
> +
> +/* Slot Identification */
> +
> +#define PCI_SID_ESR             2       /* Expansion Slot Register */
> +#define  PCI_SID_ESR_NSLOTS     0x1f    /* Number of expansion slots available */
> +#define  PCI_SID_ESR_FIC        0x20    /* First In Chassis Flag */
> +#define PCI_SID_CHASSIS_NR      3       /* Chassis Number */
> +
> +/* Message Signalled Interrupts registers */
> +
> +#define PCI_MSI_FLAGS           2       /* Various flags */
> +#define  PCI_MSI_FLAGS_64BIT    0x80    /* 64-bit addresses allowed */
> +#define  PCI_MSI_FLAGS_QSIZE    0x70    /* Message queue size configured */
> +#define  PCI_MSI_FLAGS_QMASK    0x0e    /* Maximum queue size available */
> +#define  PCI_MSI_FLAGS_ENABLE   0x01    /* MSI feature enabled */
> +#define  PCI_MSI_FLAGS_MASKBIT  0x100   /* 64-bit mask bits allowed */
> +#define PCI_MSI_RFU             3       /* Rest of capability flags */
> +#define PCI_MSI_ADDRESS_LO      4       /* Lower 32 bits */
> +#define PCI_MSI_ADDRESS_HI      8       /* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
> +#define PCI_MSI_DATA_32         8       /* 16 bits of data for 32-bit devices */
> +#define PCI_MSI_MASK_32         12      /* Mask bits register for 32-bit devices */
> +#define PCI_MSI_DATA_64         12      /* 16 bits of data for 64-bit devices */
> +#define PCI_MSI_MASK_64         16      /* Mask bits register for 64-bit devices */
> +
> +/* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
> +#define PCI_MSIX_FLAGS          2
> +#define  PCI_MSIX_FLAGS_QSIZE   0x7FF
> +#define  PCI_MSIX_FLAGS_ENABLE  (1 << 15)
> +#define  PCI_MSIX_FLAGS_MASKALL (1 << 14)
> +#define PCI_MSIX_FLAGS_BIRMASK  (7 << 0)
> +
> +/* CompactPCI Hotswap Register */
> +
> +#define PCI_CHSWP_CSR           2       /* Control and Status Register */
> +#define  PCI_CHSWP_DHA          0x01    /* Device Hiding Arm */
> +#define  PCI_CHSWP_EIM          0x02    /* ENUM# Signal Mask */
> +#define  PCI_CHSWP_PIE          0x04    /* Pending Insert or Extract */
> +#define  PCI_CHSWP_LOO          0x08    /* LED On / Off */
> +#define  PCI_CHSWP_PI           0x30    /* Programming Interface */
> +#define  PCI_CHSWP_EXT          0x40    /* ENUM# status - extraction */
> +#define  PCI_CHSWP_INS          0x80    /* ENUM# status - insertion */
> +
> +/* PCI Advanced Feature registers */
> +
> +#define PCI_AF_LENGTH           2
> +#define PCI_AF_CAP              3
> +#define  PCI_AF_CAP_TP          0x01
> +#define  PCI_AF_CAP_FLR         0x02
> +#define PCI_AF_CTRL             4
> +#define  PCI_AF_CTRL_FLR        0x01
> +#define PCI_AF_STATUS           5
> +#define  PCI_AF_STATUS_TP       0x01
> +
> +/* PCI-X registers */
> +
> +#define PCI_X_CMD               2       /* Modes & Features */
> +#define  PCI_X_CMD_DPERR_E      0x0001  /* Data Parity Error Recovery Enable */
> +#define  PCI_X_CMD_ERO          0x0002  /* Enable Relaxed Ordering */
> +#define  PCI_X_CMD_READ_512     0x0000  /* 512 byte maximum read byte count */
> +#define  PCI_X_CMD_READ_1K      0x0004  /* 1Kbyte maximum read byte count */
> +#define  PCI_X_CMD_READ_2K      0x0008  /* 2Kbyte maximum read byte count */
> +#define  PCI_X_CMD_READ_4K      0x000c  /* 4Kbyte maximum read byte count */
> +#define  PCI_X_CMD_MAX_READ     0x000c  /* Max Memory Read Byte Count */
> +                                /* Max # of outstanding split transactions */
> +#define  PCI_X_CMD_SPLIT_1      0x0000  /* Max 1 */
> +#define  PCI_X_CMD_SPLIT_2      0x0010  /* Max 2 */
> +#define  PCI_X_CMD_SPLIT_3      0x0020  /* Max 3 */
> +#define  PCI_X_CMD_SPLIT_4      0x0030  /* Max 4 */
> +#define  PCI_X_CMD_SPLIT_8      0x0040  /* Max 8 */
> +#define  PCI_X_CMD_SPLIT_12     0x0050  /* Max 12 */
> +#define  PCI_X_CMD_SPLIT_16     0x0060  /* Max 16 */
> +#define  PCI_X_CMD_SPLIT_32     0x0070  /* Max 32 */
> +#define  PCI_X_CMD_MAX_SPLIT    0x0070  /* Max Outstanding Split Transactions */
> +#define  PCI_X_CMD_VERSION(x)   (((x) >> 12) & 3) /* Version */
> +#define PCI_X_STATUS            4       /* PCI-X capabilities */
> +#define  PCI_X_STATUS_DEVFN     0x000000ff      /* A copy of devfn */
> +#define  PCI_X_STATUS_BUS       0x0000ff00      /* A copy of bus nr */
> +#define  PCI_X_STATUS_64BIT     0x00010000      /* 64-bit device */
> +#define  PCI_X_STATUS_133MHZ    0x00020000      /* 133 MHz capable */
> +#define  PCI_X_STATUS_SPL_DISC  0x00040000      /* Split Completion Discarded */
> +#define  PCI_X_STATUS_UNX_SPL   0x00080000      /* Unexpected Split Completion */
> +#define  PCI_X_STATUS_COMPLEX   0x00100000      /* Device Complexity */
> +#define  PCI_X_STATUS_MAX_READ  0x00600000      /* Designed Max Memory Read Count */
> +#define  PCI_X_STATUS_MAX_SPLIT 0x03800000      /* Designed Max Outstanding Split Transactions */
> +#define  PCI_X_STATUS_MAX_CUM   0x1c000000      /* Designed Max Cumulative Read Size */
> +#define  PCI_X_STATUS_SPL_ERR   0x20000000      /* Rcvd Split Completion Error Msg */
> +#define  PCI_X_STATUS_266MHZ    0x40000000      /* 266 MHz capable */
> +#define  PCI_X_STATUS_533MHZ    0x80000000      /* 533 MHz capable */
> +
> +/* PCI Express capability registers */
> +
> +#define PCI_EXP_FLAGS           2       /* Capabilities register */
> +#define PCI_EXP_FLAGS_VERS      0x000f  /* Capability version */
> +#define PCI_EXP_FLAGS_TYPE      0x00f0  /* Device/Port type */
> +#define  PCI_EXP_TYPE_ENDPOINT  0x0     /* Express Endpoint */
> +#define  PCI_EXP_TYPE_LEG_END   0x1     /* Legacy Endpoint */
> +#define  PCI_EXP_TYPE_ROOT_PORT 0x4     /* Root Port */
> +#define  PCI_EXP_TYPE_UPSTREAM  0x5     /* Upstream Port */
> +#define  PCI_EXP_TYPE_DOWNSTREAM 0x6    /* Downstream Port */
> +#define  PCI_EXP_TYPE_PCI_BRIDGE 0x7    /* PCI/PCI-X Bridge */
> +#define  PCI_EXP_TYPE_RC_END    0x9     /* Root Complex Integrated Endpoint */
> +#define  PCI_EXP_TYPE_RC_EC     0x10    /* Root Complex Event Collector */
> +#define PCI_EXP_FLAGS_SLOT      0x0100  /* Slot implemented */
> +#define PCI_EXP_FLAGS_IRQ       0x3e00  /* Interrupt message number */
> +#define PCI_EXP_DEVCAP          4       /* Device capabilities */
> +#define  PCI_EXP_DEVCAP_PAYLOAD 0x07    /* Max_Payload_Size */
> +#define  PCI_EXP_DEVCAP_PHANTOM 0x18    /* Phantom functions */
> +#define  PCI_EXP_DEVCAP_EXT_TAG 0x20    /* Extended tags */
> +#define  PCI_EXP_DEVCAP_L0S     0x1c0   /* L0s Acceptable Latency */
> +#define  PCI_EXP_DEVCAP_L1      0xe00   /* L1 Acceptable Latency */
> +#define  PCI_EXP_DEVCAP_ATN_BUT 0x1000  /* Attention Button Present */
> +#define  PCI_EXP_DEVCAP_ATN_IND 0x2000  /* Attention Indicator Present */
> +#define  PCI_EXP_DEVCAP_PWR_IND 0x4000  /* Power Indicator Present */
> +#define  PCI_EXP_DEVCAP_RBER    0x8000  /* Role-Based Error Reporting */
> +#define  PCI_EXP_DEVCAP_PWR_VAL 0x3fc0000 /* Slot Power Limit Value */
> +#define  PCI_EXP_DEVCAP_PWR_SCL 0xc000000 /* Slot Power Limit Scale */
> +#define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
> +#define PCI_EXP_DEVCTL          8       /* Device Control */
> +#define  PCI_EXP_DEVCTL_CERE    0x0001  /* Correctable Error Reporting En. */
> +#define  PCI_EXP_DEVCTL_NFERE   0x0002  /* Non-Fatal Error Reporting Enable */
> +#define  PCI_EXP_DEVCTL_FERE    0x0004  /* Fatal Error Reporting Enable */
> +#define  PCI_EXP_DEVCTL_URRE    0x0008  /* Unsupported Request Reporting En. */
> +#define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
> +#define  PCI_EXP_DEVCTL_PAYLOAD 0x00e0  /* Max_Payload_Size */
> +#define  PCI_EXP_DEVCTL_EXT_TAG 0x0100  /* Extended Tag Field Enable */
> +#define  PCI_EXP_DEVCTL_PHANTOM 0x0200  /* Phantom Functions Enable */
> +#define  PCI_EXP_DEVCTL_AUX_PME 0x0400  /* Auxiliary Power PM Enable */
> +#define  PCI_EXP_DEVCTL_NOSNOOP_EN 0x0800  /* Enable No Snoop */
> +#define  PCI_EXP_DEVCTL_READRQ  0x7000  /* Max_Read_Request_Size */
> +#define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
> +#define PCI_EXP_DEVSTA          10      /* Device Status */
> +#define  PCI_EXP_DEVSTA_CED     0x01    /* Correctable Error Detected */
> +#define  PCI_EXP_DEVSTA_NFED    0x02    /* Non-Fatal Error Detected */
> +#define  PCI_EXP_DEVSTA_FED     0x04    /* Fatal Error Detected */
> +#define  PCI_EXP_DEVSTA_URD     0x08    /* Unsupported Request Detected */
> +#define  PCI_EXP_DEVSTA_AUXPD   0x10    /* AUX Power Detected */
> +#define  PCI_EXP_DEVSTA_TRPND   0x20    /* Transactions Pending */
> +#define PCI_EXP_LNKCAP          12      /* Link Capabilities */
> +#define  PCI_EXP_LNKCAP_SLS     0x0000000f /* Supported Link Speeds */
> +#define  PCI_EXP_LNKCAP_MLW     0x000003f0 /* Maximum Link Width */
> +#define  PCI_EXP_LNKCAP_ASPMS   0x00000c00 /* ASPM Support */
> +#define  PCI_EXP_LNKCAP_L0SEL   0x00007000 /* L0s Exit Latency */
> +#define  PCI_EXP_LNKCAP_L1EL    0x00038000 /* L1 Exit Latency */
> +#define  PCI_EXP_LNKCAP_CLKPM   0x00040000 /* L1 Clock Power Management */
> +#define  PCI_EXP_LNKCAP_SDERC   0x00080000 /* Suprise Down Error Reporting Capable */
> +#define  PCI_EXP_LNKCAP_DLLLARC 0x00100000 /* Data Link Layer Link Active Reporting Capable */
> +#define  PCI_EXP_LNKCAP_LBNC    0x00200000 /* Link Bandwidth Notification Capability */
> +#define  PCI_EXP_LNKCAP_PN      0xff000000 /* Port Number */
> +#define PCI_EXP_LNKCTL          16      /* Link Control */
> +#define  PCI_EXP_LNKCTL_ASPMC   0x0003  /* ASPM Control */
> +#define  PCI_EXP_LNKCTL_RCB     0x0008  /* Read Completion Boundary */
> +#define  PCI_EXP_LNKCTL_LD      0x0010  /* Link Disable */
> +#define  PCI_EXP_LNKCTL_RL      0x0020  /* Retrain Link */
> +#define  PCI_EXP_LNKCTL_CCC     0x0040  /* Common Clock Configuration */
> +#define  PCI_EXP_LNKCTL_ES      0x0080  /* Extended Synch */
> +#define  PCI_EXP_LNKCTL_CLKREQ_EN 0x100 /* Enable clkreq */
> +#define  PCI_EXP_LNKCTL_HAWD    0x0200  /* Hardware Autonomous Width Disable */
> +#define  PCI_EXP_LNKCTL_LBMIE   0x0400  /* Link Bandwidth Management Interrupt Enable */
> +#define  PCI_EXP_LNKCTL_LABIE   0x0800  /* Lnk Autonomous Bandwidth Interrupt Enable */
> +#define PCI_EXP_LNKSTA          18      /* Link Status */
> +#define  PCI_EXP_LNKSTA_CLS     0x000f  /* Current Link Speed */
> +#define  PCI_EXP_LNKSTA_NLW     0x03f0  /* Nogotiated Link Width */
> +#define  PCI_EXP_LNKSTA_LT      0x0800  /* Link Training */
> +#define  PCI_EXP_LNKSTA_SLC     0x1000  /* Slot Clock Configuration */
> +#define  PCI_EXP_LNKSTA_DLLLA   0x2000  /* Data Link Layer Link Active */
> +#define  PCI_EXP_LNKSTA_LBMS    0x4000  /* Link Bandwidth Management Status */
> +#define  PCI_EXP_LNKSTA_LABS    0x8000  /* Link Autonomous Bandwidth Status */
> +#define PCI_EXP_SLTCAP          20      /* Slot Capabilities */
> +#define  PCI_EXP_SLTCAP_ABP     0x00000001 /* Attention Button Present */
> +#define  PCI_EXP_SLTCAP_PCP     0x00000002 /* Power Controller Present */
> +#define  PCI_EXP_SLTCAP_MRLSP   0x00000004 /* MRL Sensor Present */
> +#define  PCI_EXP_SLTCAP_AIP     0x00000008 /* Attention Indicator Present */
> +#define  PCI_EXP_SLTCAP_PIP     0x00000010 /* Power Indicator Present */
> +#define  PCI_EXP_SLTCAP_HPS     0x00000020 /* Hot-Plug Surprise */
> +#define  PCI_EXP_SLTCAP_HPC     0x00000040 /* Hot-Plug Capable */
> +#define  PCI_EXP_SLTCAP_SPLV    0x00007f80 /* Slot Power Limit Value */
> +#define  PCI_EXP_SLTCAP_SPLS    0x00018000 /* Slot Power Limit Scale */
> +#define  PCI_EXP_SLTCAP_EIP     0x00020000 /* Electromechanical Interlock Present */
> +#define  PCI_EXP_SLTCAP_NCCS    0x00040000 /* No Command Completed Support */
> +#define  PCI_EXP_SLTCAP_PSN     0xfff80000 /* Physical Slot Number */
> +#define PCI_EXP_SLTCTL          24      /* Slot Control */
> +#define  PCI_EXP_SLTCTL_ABPE    0x0001  /* Attention Button Pressed Enable */
> +#define  PCI_EXP_SLTCTL_PFDE    0x0002  /* Power Fault Detected Enable */
> +#define  PCI_EXP_SLTCTL_MRLSCE  0x0004  /* MRL Sensor Changed Enable */
> +#define  PCI_EXP_SLTCTL_PDCE    0x0008  /* Presence Detect Changed Enable */
> +#define  PCI_EXP_SLTCTL_CCIE    0x0010  /* Command Completed Interrupt Enable */
> +#define  PCI_EXP_SLTCTL_HPIE    0x0020  /* Hot-Plug Interrupt Enable */
> +#define  PCI_EXP_SLTCTL_AIC     0x00c0  /* Attention Indicator Control */
> +#define  PCI_EXP_SLTCTL_PIC     0x0300  /* Power Indicator Control */
> +#define  PCI_EXP_SLTCTL_PCC     0x0400  /* Power Controller Control */
> +#define  PCI_EXP_SLTCTL_EIC     0x0800  /* Electromechanical Interlock Control */
> +#define  PCI_EXP_SLTCTL_DLLSCE  0x1000  /* Data Link Layer State Changed Enable */
> +#define PCI_EXP_SLTSTA          26      /* Slot Status */
> +#define  PCI_EXP_SLTSTA_ABP     0x0001  /* Attention Button Pressed */
> +#define  PCI_EXP_SLTSTA_PFD     0x0002  /* Power Fault Detected */
> +#define  PCI_EXP_SLTSTA_MRLSC   0x0004  /* MRL Sensor Changed */
> +#define  PCI_EXP_SLTSTA_PDC     0x0008  /* Presence Detect Changed */
> +#define  PCI_EXP_SLTSTA_CC      0x0010  /* Command Completed */
> +#define  PCI_EXP_SLTSTA_MRLSS   0x0020  /* MRL Sensor State */
> +#define  PCI_EXP_SLTSTA_PDS     0x0040  /* Presence Detect State */
> +#define  PCI_EXP_SLTSTA_EIS     0x0080  /* Electromechanical Interlock Status */
> +#define  PCI_EXP_SLTSTA_DLLSC   0x0100  /* Data Link Layer State Changed */
> +#define PCI_EXP_RTCTL           28      /* Root Control */
> +#define  PCI_EXP_RTCTL_SECEE    0x01    /* System Error on Correctable Error */
> +#define  PCI_EXP_RTCTL_SENFEE   0x02    /* System Error on Non-Fatal Error */
> +#define  PCI_EXP_RTCTL_SEFEE    0x04    /* System Error on Fatal Error */
> +#define  PCI_EXP_RTCTL_PMEIE    0x08    /* PME Interrupt Enable */
> +#define  PCI_EXP_RTCTL_CRSSVE   0x10    /* CRS Software Visibility Enable */
> +#define PCI_EXP_RTCAP           30      /* Root Capabilities */
> +#define PCI_EXP_RTSTA           32      /* Root Status */
> +#define PCI_EXP_DEVCAP2         36      /* Device Capabilities 2 */
> +#define  PCI_EXP_DEVCAP2_ARI    0x20    /* Alternative Routing-ID */
> +#define PCI_EXP_DEVCTL2         40      /* Device Control 2 */
> +#define  PCI_EXP_DEVCTL2_ARI    0x20    /* Alternative Routing-ID */
> +#define PCI_EXP_LNKCTL2         48      /* Link Control 2 */
> +#define PCI_EXP_SLTCTL2         56      /* Slot Control 2 */
> +
> +/* Extended Capabilities (PCI-X 2.0 and Express) */
> +#define PCI_EXT_CAP_ID(header)          (header & 0x0000ffff)
> +#define PCI_EXT_CAP_VER(header)         ((header >> 16) & 0xf)
> +#define PCI_EXT_CAP_NEXT(header)        ((header >> 20) & 0xffc)
> +
> +#define PCI_EXT_CAP_ID_ERR      1
> +#define PCI_EXT_CAP_ID_VC       2
> +#define PCI_EXT_CAP_ID_DSN      3
> +#define PCI_EXT_CAP_ID_PWR      4
> +#define PCI_EXT_CAP_ID_ARI      14
> +#define PCI_EXT_CAP_ID_ATS      15
> +#define PCI_EXT_CAP_ID_SRIOV    16
> +
> +/* Advanced Error Reporting */
> +#define PCI_ERR_UNCOR_STATUS    4       /* Uncorrectable Error Status */
> +#define  PCI_ERR_UNC_TRAIN      0x00000001      /* Training */
> +#define  PCI_ERR_UNC_DLP        0x00000010      /* Data Link Protocol */
> +#define  PCI_ERR_UNC_POISON_TLP 0x00001000      /* Poisoned TLP */
> +#define  PCI_ERR_UNC_FCP        0x00002000      /* Flow Control Protocol */
> +#define  PCI_ERR_UNC_COMP_TIME  0x00004000      /* Completion Timeout */
> +#define  PCI_ERR_UNC_COMP_ABORT 0x00008000      /* Completer Abort */
> +#define  PCI_ERR_UNC_UNX_COMP   0x00010000      /* Unexpected Completion */
> +#define  PCI_ERR_UNC_RX_OVER    0x00020000      /* Receiver Overflow */
> +#define  PCI_ERR_UNC_MALF_TLP   0x00040000      /* Malformed TLP */
> +#define  PCI_ERR_UNC_ECRC       0x00080000      /* ECRC Error Status */
> +#define  PCI_ERR_UNC_UNSUP      0x00100000      /* Unsupported Request */
> +#define PCI_ERR_UNCOR_MASK      8       /* Uncorrectable Error Mask */
> +        /* Same bits as above */
> +#define PCI_ERR_UNCOR_SEVER     12      /* Uncorrectable Error Severity */
> +        /* Same bits as above */
> +#define PCI_ERR_COR_STATUS      16      /* Correctable Error Status */
> +#define  PCI_ERR_COR_RCVR       0x00000001      /* Receiver Error Status */
> +#define  PCI_ERR_COR_BAD_TLP    0x00000040      /* Bad TLP Status */
> +#define  PCI_ERR_COR_BAD_DLLP   0x00000080      /* Bad DLLP Status */
> +#define  PCI_ERR_COR_REP_ROLL   0x00000100      /* REPLAY_NUM Rollover */
> +#define  PCI_ERR_COR_REP_TIMER  0x00001000      /* Replay Timer Timeout */
> +#define PCI_ERR_COR_MASK        20      /* Correctable Error Mask */
> +        /* Same bits as above */
> +#define PCI_ERR_CAP             24      /* Advanced Error Capabilities */
> +#define  PCI_ERR_CAP_FEP(x)     ((x) & 31)      /* First Error Pointer */
> +#define  PCI_ERR_CAP_ECRC_GENC  0x00000020      /* ECRC Generation Capable */
> +#define  PCI_ERR_CAP_ECRC_GENE  0x00000040      /* ECRC Generation Enable */
> +#define  PCI_ERR_CAP_ECRC_CHKC  0x00000080      /* ECRC Check Capable */
> +#define  PCI_ERR_CAP_ECRC_CHKE  0x00000100      /* ECRC Check Enable */
> +#define PCI_ERR_HEADER_LOG      28      /* Header Log Register (16 bytes) */
> +#define PCI_ERR_ROOT_COMMAND    44      /* Root Error Command */
> +/* Correctable Err Reporting Enable */
> +#define PCI_ERR_ROOT_CMD_COR_EN         0x00000001
> +/* Non-fatal Err Reporting Enable */
> +#define PCI_ERR_ROOT_CMD_NONFATAL_EN    0x00000002
> +/* Fatal Err Reporting Enable */
> +#define PCI_ERR_ROOT_CMD_FATAL_EN       0x00000004
> +#define PCI_ERR_ROOT_STATUS     48
> +#define PCI_ERR_ROOT_COR_RCV            0x00000001      /* ERR_COR Received */
> +/* Multi ERR_COR Received */
> +#define PCI_ERR_ROOT_MULTI_COR_RCV      0x00000002
> +/* ERR_FATAL/NONFATAL Recevied */
> +#define PCI_ERR_ROOT_UNCOR_RCV          0x00000004
> +/* Multi ERR_FATAL/NONFATAL Recevied */
> +#define PCI_ERR_ROOT_MULTI_UNCOR_RCV    0x00000008
> +#define PCI_ERR_ROOT_FIRST_FATAL        0x00000010      /* First Fatal */
> +#define PCI_ERR_ROOT_NONFATAL_RCV       0x00000020      /* Non-Fatal Received */
> +#define PCI_ERR_ROOT_FATAL_RCV          0x00000040      /* Fatal Received */
> +#define PCI_ERR_ROOT_COR_SRC    52
> +#define PCI_ERR_ROOT_SRC        54
> +
> +/* Virtual Channel */
> +#define PCI_VC_PORT_REG1        4
> +#define PCI_VC_PORT_REG2        8
> +#define PCI_VC_PORT_CTRL        12
> +#define PCI_VC_PORT_STATUS      14
> +#define PCI_VC_RES_CAP          16
> +#define PCI_VC_RES_CTRL         20
> +#define PCI_VC_RES_STATUS       26
> +
> +/* Power Budgeting */
> +#define PCI_PWR_DSR             4       /* Data Select Register */
> +#define PCI_PWR_DATA            8       /* Data Register */
> +#define  PCI_PWR_DATA_BASE(x)   ((x) & 0xff)        /* Base Power */
> +#define  PCI_PWR_DATA_SCALE(x)  (((x) >> 8) & 3)    /* Data Scale */
> +#define  PCI_PWR_DATA_PM_SUB(x) (((x) >> 10) & 7)   /* PM Sub State */
> +#define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
> +#define  PCI_PWR_DATA_TYPE(x)   (((x) >> 15) & 7)   /* Type */
> +#define  PCI_PWR_DATA_RAIL(x)   (((x) >> 18) & 7)   /* Power Rail */
> +#define PCI_PWR_CAP             12      /* Capability */
> +#define  PCI_PWR_CAP_BUDGET(x)  ((x) & 1)       /* Included in system budget */
> +
> +/*
> + * Hypertransport sub capability types
> + *
> + * Unfortunately there are both 3 bit and 5 bit capability types defined
> + * in the HT spec, catering for that is a little messy. You probably don't
> + * want to use these directly, just use pci_find_ht_capability() and it
> + * will do the right thing for you.
> + */
> +#define HT_3BIT_CAP_MASK        0xE0
> +#define HT_CAPTYPE_SLAVE        0x00    /* Slave/Primary link configuration */
> +#define HT_CAPTYPE_HOST         0x20    /* Host/Secondary link configuration */
> +
> +#define HT_5BIT_CAP_MASK        0xF8
> +#define HT_CAPTYPE_IRQ          0x80    /* IRQ Configuration */
> +#define HT_CAPTYPE_REMAPPING_40 0xA0    /* 40 bit address remapping */
> +#define HT_CAPTYPE_REMAPPING_64 0xA2    /* 64 bit address remapping */
> +#define HT_CAPTYPE_UNITID_CLUMP 0x90    /* Unit ID clumping */
> +#define HT_CAPTYPE_EXTCONF      0x98    /* Extended Configuration Space Access */
> +#define HT_CAPTYPE_MSI_MAPPING  0xA8    /* MSI Mapping Capability */
> +#define  HT_MSI_FLAGS           0x02            /* Offset to flags */
> +#define  HT_MSI_FLAGS_ENABLE    0x1             /* Mapping enable */
> +#define  HT_MSI_FLAGS_FIXED     0x2             /* Fixed mapping only */
> +#define  HT_MSI_FIXED_ADDR      0x00000000FEE00000ULL   /* Fixed addr */
> +#define  HT_MSI_ADDR_LO         0x04            /* Offset to low addr bits */
> +#define  HT_MSI_ADDR_LO_MASK    0xFFF00000      /* Low address bit mask */
> +#define  HT_MSI_ADDR_HI         0x08            /* Offset to high addr bits */
> +#define HT_CAPTYPE_DIRECT_ROUTE 0xB0    /* Direct routing configuration */
> +#define HT_CAPTYPE_VCSET        0xB8    /* Virtual Channel configuration */
> +#define HT_CAPTYPE_ERROR_RETRY  0xC0    /* Retry on error configuration */
> +#define HT_CAPTYPE_GEN3         0xD0    /* Generation 3 hypertransport configuration */
> +#define HT_CAPTYPE_PM           0xE0    /* Hypertransport powermanagement configuration */
> +
> +/* Alternative Routing-ID Interpretation */
> +#define PCI_ARI_CAP             0x04    /* ARI Capability Register */
> +#define  PCI_ARI_CAP_MFVC       0x0001  /* MFVC Function Groups Capability */
> +#define  PCI_ARI_CAP_ACS        0x0002  /* ACS Function Groups Capability */
> +#define  PCI_ARI_CAP_NFN(x)     (((x) >> 8) & 0xff) /* Next Function Number */
> +#define PCI_ARI_CTRL            0x06    /* ARI Control Register */
> +#define  PCI_ARI_CTRL_MFVC      0x0001  /* MFVC Function Groups Enable */
> +#define  PCI_ARI_CTRL_ACS       0x0002  /* ACS Function Groups Enable */
> +#define  PCI_ARI_CTRL_FG(x)     (((x) >> 4) & 7) /* Function Group */
> +
> +/* Address Translation Service */
> +#define PCI_ATS_CAP             0x04    /* ATS Capability Register */
> +#define  PCI_ATS_CAP_QDEP(x)    ((x) & 0x1f)    /* Invalidate Queue Depth */
> +#define  PCI_ATS_MAX_QDEP       32      /* Max Invalidate Queue Depth */
> +#define PCI_ATS_CTRL            0x06    /* ATS Control Register */
> +#define  PCI_ATS_CTRL_ENABLE    0x8000  /* ATS Enable */
> +#define  PCI_ATS_CTRL_STU(x)    ((x) & 0x1f)    /* Smallest Translation Unit */
> +#define  PCI_ATS_MIN_STU        12      /* shift of minimum STU block */
> +
> +/* Single Root I/O Virtualization */
> +#define PCI_SRIOV_CAP           0x04    /* SR-IOV Capabilities */
> +#define  PCI_SRIOV_CAP_VFM      0x01    /* VF Migration Capable */
> +#define  PCI_SRIOV_CAP_INTR(x)  ((x) >> 21) /* Interrupt Message Number */
> +#define PCI_SRIOV_CTRL          0x08    /* SR-IOV Control */
> +#define  PCI_SRIOV_CTRL_VFE     0x01    /* VF Enable */
> +#define  PCI_SRIOV_CTRL_VFM     0x02    /* VF Migration Enable */
> +#define  PCI_SRIOV_CTRL_INTR    0x04    /* VF Migration Interrupt Enable */
> +#define  PCI_SRIOV_CTRL_MSE     0x08    /* VF Memory Space Enable */
> +#define  PCI_SRIOV_CTRL_ARI     0x10    /* ARI Capable Hierarchy */
> +#define PCI_SRIOV_STATUS        0x0a    /* SR-IOV Status */
> +#define  PCI_SRIOV_STATUS_VFM   0x01    /* VF Migration Status */
> +#define PCI_SRIOV_INITIAL_VF    0x0c    /* Initial VFs */
> +#define PCI_SRIOV_TOTAL_VF      0x0e    /* Total VFs */
> +#define PCI_SRIOV_NUM_VF        0x10    /* Number of VFs */
> +#define PCI_SRIOV_FUNC_LINK     0x12    /* Function Dependency Link */
> +#define PCI_SRIOV_VF_OFFSET     0x14    /* First VF Offset */
> +#define PCI_SRIOV_VF_STRIDE     0x16    /* Following VF Stride */
> +#define PCI_SRIOV_VF_DID        0x1a    /* VF Device ID */
> +#define PCI_SRIOV_SUP_PGSIZE    0x1c    /* Supported Page Sizes */
> +#define PCI_SRIOV_SYS_PGSIZE    0x20    /* System Page Size */
> +#define PCI_SRIOV_BAR           0x24    /* VF BAR0 */
> +#define  PCI_SRIOV_NUM_BARS     6       /* Number of VF BARs */
> +#define PCI_SRIOV_VFM           0x3c    /* VF Migration State Array Offset*/
> +#define  PCI_SRIOV_VFM_BIR(x)   ((x) & 7)       /* State BIR */
> +#define  PCI_SRIOV_VFM_OFFSET(x) ((x) & ~7)     /* State Offset */
> +#define  PCI_SRIOV_VFM_UA       0x0     /* Inactive.Unavailable */
> +#define  PCI_SRIOV_VFM_MI       0x1     /* Dormant.MigrateIn */
> +#define  PCI_SRIOV_VFM_MO       0x2     /* Active.MigrateOut */
> +#define  PCI_SRIOV_VFM_AV       0x3     /* Active.Available */
> +
> +#endif /* LINUX_PCI_REGS_H */
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
  2010-08-31 20:29     ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-08-31 22:58       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-31 22:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Tue, Aug 31, 2010 at 11:29:53PM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
> > The conversion was done using the GNU 'expand' tool (default settings)
> > to make it obey the QEMU coding style.
> > 
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> 
> I'm not really interested in this: we copied pci_regs.h from linux
> to help non-linux hosts, and keeping the code consistent
> with the original makes detecting bugs and adding new stuff
> from linux/pci_regs.h easier.
> 
> > ---
> >  hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
> >  1 files changed, 665 insertions(+), 665 deletions(-)
> >  rewrite hw/pci_regs.h (90%)

Ok, I'll drop it. The only reason I did it was one of my additions to
this file made the patch look indented awkwardly.

I'll use tabs and merge it into Linux as well.


	Thanks,
	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
@ 2010-08-31 22:58       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-31 22:58 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Tue, Aug 31, 2010 at 11:29:53PM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
> > The conversion was done using the GNU 'expand' tool (default settings)
> > to make it obey the QEMU coding style.
> > 
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> 
> I'm not really interested in this: we copied pci_regs.h from linux
> to help non-linux hosts, and keeping the code consistent
> with the original makes detecting bugs and adding new stuff
> from linux/pci_regs.h easier.
> 
> > ---
> >  hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
> >  1 files changed, 665 insertions(+), 665 deletions(-)
> >  rewrite hw/pci_regs.h (90%)

Ok, I'll drop it. The only reason I did it was one of my additions to
this file made the patch look indented awkwardly.

I'll use tabs and merge it into Linux as well.


	Thanks,
	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
  2010-08-31 22:58       ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-01 10:39         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-01 10:39 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Wed, Sep 01, 2010 at 01:58:30AM +0300, Eduard - Gabriel Munteanu wrote:
> On Tue, Aug 31, 2010 at 11:29:53PM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
> > > The conversion was done using the GNU 'expand' tool (default settings)
> > > to make it obey the QEMU coding style.
> > > 
> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > 
> > I'm not really interested in this: we copied pci_regs.h from linux
> > to help non-linux hosts, and keeping the code consistent
> > with the original makes detecting bugs and adding new stuff
> > from linux/pci_regs.h easier.
> > 
> > > ---
> > >  hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
> > >  1 files changed, 665 insertions(+), 665 deletions(-)
> > >  rewrite hw/pci_regs.h (90%)
> 
> Ok, I'll drop it. The only reason I did it was one of my additions to
> this file made the patch look indented awkwardly.
> 
> I'll use tabs and merge it into Linux as well.
> 
> 
> 	Thanks,
> 	Eduard

Good idea, this way more people with pci knowledge check it.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h
@ 2010-09-01 10:39         ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-01 10:39 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Wed, Sep 01, 2010 at 01:58:30AM +0300, Eduard - Gabriel Munteanu wrote:
> On Tue, Aug 31, 2010 at 11:29:53PM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 28, 2010 at 05:54:52PM +0300, Eduard - Gabriel Munteanu wrote:
> > > The conversion was done using the GNU 'expand' tool (default settings)
> > > to make it obey the QEMU coding style.
> > > 
> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > 
> > I'm not really interested in this: we copied pci_regs.h from linux
> > to help non-linux hosts, and keeping the code consistent
> > with the original makes detecting bugs and adding new stuff
> > from linux/pci_regs.h easier.
> > 
> > > ---
> > >  hw/pci_regs.h | 1330 ++++++++++++++++++++++++++++----------------------------
> > >  1 files changed, 665 insertions(+), 665 deletions(-)
> > >  rewrite hw/pci_regs.h (90%)
> 
> Ok, I'll drop it. The only reason I did it was one of my additions to
> this file made the patch look indented awkwardly.
> 
> I'll use tabs and merge it into Linux as well.
> 
> 
> 	Thanks,
> 	Eduard

Good idea, this way more people with pci knowledge check it.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-29 22:08         ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-01 20:10           ` Stefan Weil
  -1 siblings, 0 replies; 96+ messages in thread
From: Stefan Weil @ 2010-09-01 20:10 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Please see my comments at the end of this mail.


Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
>
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
> hw/pci.c | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> hw/pci.h | 69 +++++++++++++++++++
> hw/pci_internals.h | 12 +++
> qemu-common.h | 1 +
> 4 files changed, 272 insertions(+), 1 deletions(-)
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..afcb33c 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
>
> ...
>
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..c95863a 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
> char *romfile;
> ram_addr_t rom_offset;
> uint32_t rom_bar;
> +
> + QLIST_HEAD(, PCIMemoryMap) memory_maps;
> };
>
> PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, 
> uint64_t len1,
> return !(last2 < first1 || last1 < first2);
> }
>
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ (1 << 0)
> +#define IOMMU_PERM_WRITE (1 << 1)
> +#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> + PCIDevice *dev,
> + pcibus_t addr,
> + target_phys_addr_t *paddr,
> + target_phys_addr_t *len,
> + unsigned perms);
> +
> +void pci_memory_rw(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len,
> + int is_write);
> +void *pci_memory_map(PCIDevice *dev,
> + PCIInvalidateMapFunc *cb,
> + void *opaque,
> + pcibus_t addr,
> + target_phys_addr_t *len,
> + int is_write);
> +void pci_memory_unmap(PCIDevice *dev,
> + void *buffer,
> + target_phys_addr_t len,
> + int is_write,
> + target_phys_addr_t access_len);
> +void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
> +void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, 
> pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size) \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size) \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> + pcibus_t addr,
> + const uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
> #endif

The functions pci_memory_read and pci_memory_write not only read
or write byte data but many different data types which leads to
a lot of type casts in your other patches.

I'd prefer "void *buf" and "const void *buf" in the argument lists.
Then all those type casts could be removed.

Regards
Stefan Weil


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-01 20:10           ` Stefan Weil
  0 siblings, 0 replies; 96+ messages in thread
From: Stefan Weil @ 2010-09-01 20:10 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Please see my comments at the end of this mail.


Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
>
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
> hw/pci.c | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> hw/pci.h | 69 +++++++++++++++++++
> hw/pci_internals.h | 12 +++
> qemu-common.h | 1 +
> 4 files changed, 272 insertions(+), 1 deletions(-)
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..afcb33c 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
>
> ...
>
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..c95863a 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
> char *romfile;
> ram_addr_t rom_offset;
> uint32_t rom_bar;
> +
> + QLIST_HEAD(, PCIMemoryMap) memory_maps;
> };
>
> PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, 
> uint64_t len1,
> return !(last2 < first1 || last1 < first2);
> }
>
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ (1 << 0)
> +#define IOMMU_PERM_WRITE (1 << 1)
> +#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> + PCIDevice *dev,
> + pcibus_t addr,
> + target_phys_addr_t *paddr,
> + target_phys_addr_t *len,
> + unsigned perms);
> +
> +void pci_memory_rw(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len,
> + int is_write);
> +void *pci_memory_map(PCIDevice *dev,
> + PCIInvalidateMapFunc *cb,
> + void *opaque,
> + pcibus_t addr,
> + target_phys_addr_t *len,
> + int is_write);
> +void pci_memory_unmap(PCIDevice *dev,
> + void *buffer,
> + target_phys_addr_t len,
> + int is_write,
> + target_phys_addr_t access_len);
> +void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
> +void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, 
> pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size) \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size) \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> + pcibus_t addr,
> + const uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
> #endif

The functions pci_memory_read and pci_memory_write not only read
or write byte data but many different data types which leads to
a lot of type casts in your other patches.

I'd prefer "void *buf" and "const void *buf" in the argument lists.
Then all those type casts could be removed.

Regards
Stefan Weil

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-08-28 14:54   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-02  5:19     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  5:19 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> Emulated PCI IDE controllers now use the memory access interface. This
> also allows an emulated IOMMU to translate and check accesses.
> 
> Map invalidation results in cancelling DMA transfers. Since the guest OS
> can't properly recover the DMA results in case the mapping is changed,
> this is a fairly good approximation.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  dma-helpers.c     |   46 +++++++++++++++++++++++++++++++++++++++++-----
>  dma.h             |   21 ++++++++++++++++++++-
>  hw/ide/core.c     |   15 ++++++++-------
>  hw/ide/internal.h |   39 +++++++++++++++++++++++++++++++++++++++
>  hw/ide/macio.c    |    4 ++--
>  hw/ide/pci.c      |    7 +++++++
>  6 files changed, 117 insertions(+), 15 deletions(-)
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 712ed89..a0dcdb8 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -10,12 +10,36 @@
>  #include "dma.h"
>  #include "block_int.h"
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
> +static void *qemu_sglist_default_map(void *opaque,
> +                                     QEMUSGInvalMapFunc *inval_cb,
> +                                     void *inval_opaque,
> +                                     target_phys_addr_t addr,
> +                                     target_phys_addr_t *len,
> +                                     int is_write)
> +{
> +    return cpu_physical_memory_map(addr, len, is_write);
> +}
> +
> +static void qemu_sglist_default_unmap(void *opaque,
> +                                      void *buffer,
> +                                      target_phys_addr_t len,
> +                                      int is_write,
> +                                      target_phys_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +}
> +
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
> +                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
>  {
>      qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
>      qsg->nsg = 0;
>      qsg->nalloc = alloc_hint;
>      qsg->size = 0;
> +
> +    qsg->map = map ? map : qemu_sglist_default_map;
> +    qsg->unmap = unmap ? unmap : qemu_sglist_default_unmap;
> +    qsg->opaque = opaque;
>  }
>  
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
> @@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
>      int i;
>  
>      for (i = 0; i < dbs->iov.niov; ++i) {
> -        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
> -                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
> -                                  dbs->iov.iov[i].iov_len);
> +        dbs->sg->unmap(dbs->sg->opaque,
> +                       dbs->iov.iov[i].iov_base,
> +                       dbs->iov.iov[i].iov_len, !dbs->is_write,
> +                       dbs->iov.iov[i].iov_len);
>      }
>  }
>  
> +static void dma_bdrv_cancel(void *opaque)
> +{
> +    DMAAIOCB *dbs = opaque;
> +
> +    bdrv_aio_cancel(dbs->acb);
> +    dma_bdrv_unmap(dbs);
> +    qemu_iovec_destroy(&dbs->iov);
> +    qemu_aio_release(dbs);
> +}
> +
>  static void dma_bdrv_cb(void *opaque, int ret)
>  {
>      DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> @@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
>      while (dbs->sg_cur_index < dbs->sg->nsg) {
>          cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
>          cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
> -        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
> +        mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
> +                           cur_addr, &cur_len, !dbs->is_write);
>          if (!mem)
>              break;
>          qemu_iovec_add(&dbs->iov, mem, cur_len);
> diff --git a/dma.h b/dma.h
> index f3bb275..d48f35c 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -15,6 +15,19 @@
>  #include "hw/hw.h"
>  #include "block.h"
>  
> +typedef void QEMUSGInvalMapFunc(void *opaque);
> +typedef void *QEMUSGMapFunc(void *opaque,
> +                            QEMUSGInvalMapFunc *inval_cb,
> +                            void *inval_opaque,
> +                            target_phys_addr_t addr,
> +                            target_phys_addr_t *len,
> +                            int is_write);
> +typedef void QEMUSGUnmapFunc(void *opaque,
> +                             void *buffer,
> +                             target_phys_addr_t len,
> +                             int is_write,
> +                             target_phys_addr_t access_len);
> +
>  typedef struct {
>      target_phys_addr_t base;
>      target_phys_addr_t len;
> @@ -25,9 +38,15 @@ typedef struct {
>      int nsg;
>      int nalloc;
>      target_phys_addr_t size;
> +
> +    QEMUSGMapFunc *map;
> +    QEMUSGUnmapFunc *unmap;
> +    void *opaque;
>  } QEMUSGList;
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
> +                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
> +                      void *opaque);
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
>                       target_phys_addr_t len);
>  void qemu_sglist_destroy(QEMUSGList *qsg);
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index af52c2c..024a125 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -436,7 +436,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
>      } prd;
>      int l, len;
>  
> -    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
> +    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
> +                     bm->map, bm->unmap, bm->opaque);
>      s->io_buffer_size = 0;
>      for(;;) {
>          if (bm->cur_prd_len == 0) {
> @@ -444,7 +445,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
>                  return s->io_buffer_size != 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -527,7 +528,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
>                  return 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -542,11 +543,11 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
>              l = bm->cur_prd_len;
>          if (l > 0) {
>              if (is_write) {
> -                cpu_physical_memory_write(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                bmdma_memory_write(bm, bm->cur_prd_addr,
> +                                   s->io_buffer + s->io_buffer_index, l);
>              } else {
> -                cpu_physical_memory_read(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                bmdma_memory_read(bm, bm->cur_prd_addr,
> +                                  s->io_buffer + s->io_buffer_index, l);
>              }
>              bm->cur_prd_addr += l;
>              bm->cur_prd_len -= l;
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 4165543..f686d38 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -477,6 +477,24 @@ struct IDEDeviceInfo {
>  #define BM_CMD_START     0x01
>  #define BM_CMD_READ      0x08
>  
> +typedef void BMDMAInvalMapFunc(void *opaque);
> +typedef void BMDMARWFunc(void *opaque,
> +                         target_phys_addr_t addr,
> +                         uint8_t *buf,
> +                         target_phys_addr_t len,
> +                         int is_write);
> +typedef void *BMDMAMapFunc(void *opaque,
> +                           BMDMAInvalMapFunc *inval_cb,
> +                           void *inval_opaque,
> +                           target_phys_addr_t addr,
> +                           target_phys_addr_t *len,
> +                           int is_write);
> +typedef void BMDMAUnmapFunc(void *opaque,
> +                            void *buffer,
> +                            target_phys_addr_t len,
> +                            int is_write,
> +                            target_phys_addr_t access_len);
> +
>  struct BMDMAState {
>      uint8_t cmd;
>      uint8_t status;
> @@ -496,8 +514,29 @@ struct BMDMAState {
>      int64_t sector_num;
>      uint32_t nsector;
>      QEMUBH *bh;
> +
> +    BMDMARWFunc *rw;
> +    BMDMAMapFunc *map;
> +    BMDMAUnmapFunc *unmap;
> +    void *opaque;
>  };
>  
> +static inline void bmdma_memory_read(BMDMAState *bm,
> +                                     target_phys_addr_t addr,
> +                                     uint8_t *buf,
> +                                     target_phys_addr_t len)
> +{
> +    bm->rw(bm->opaque, addr, buf, len, 0);
> +}
> +
> +static inline void bmdma_memory_write(BMDMAState *bm,
> +                                      target_phys_addr_t addr,
> +                                      uint8_t *buf,
> +                                      target_phys_addr_t len)
> +{
> +    bm->rw(bm->opaque, addr, buf, len, 1);
> +}
> +

Here again, I am concerned about indirection and pointer chaising on data path.
Can we have an iommu pointer in the device, and do a fast path in case
there is no iommu?

>  static inline IDEState *idebus_active_if(IDEBus *bus)
>  {
>      return bus->ifs + bus->unit;
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index bd1c73e..962ae13 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
>  
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>      s->io_buffer_index = 0;
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 4d95cc5..5879044 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
>              continue;
>          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
>      }
> +
> +    for (i = 0; i < 2; i++) {
> +        d->bmdma[i].rw = (void *) pci_memory_rw;
> +        d->bmdma[i].map = (void *) pci_memory_map;
> +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> +        d->bmdma[i].opaque = dev;
> +    }
>  }

These casts show something is wrong with the API, IMO.

> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02  5:19     ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  5:19 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> Emulated PCI IDE controllers now use the memory access interface. This
> also allows an emulated IOMMU to translate and check accesses.
> 
> Map invalidation results in cancelling DMA transfers. Since the guest OS
> can't properly recover the DMA results in case the mapping is changed,
> this is a fairly good approximation.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  dma-helpers.c     |   46 +++++++++++++++++++++++++++++++++++++++++-----
>  dma.h             |   21 ++++++++++++++++++++-
>  hw/ide/core.c     |   15 ++++++++-------
>  hw/ide/internal.h |   39 +++++++++++++++++++++++++++++++++++++++
>  hw/ide/macio.c    |    4 ++--
>  hw/ide/pci.c      |    7 +++++++
>  6 files changed, 117 insertions(+), 15 deletions(-)
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 712ed89..a0dcdb8 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -10,12 +10,36 @@
>  #include "dma.h"
>  #include "block_int.h"
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
> +static void *qemu_sglist_default_map(void *opaque,
> +                                     QEMUSGInvalMapFunc *inval_cb,
> +                                     void *inval_opaque,
> +                                     target_phys_addr_t addr,
> +                                     target_phys_addr_t *len,
> +                                     int is_write)
> +{
> +    return cpu_physical_memory_map(addr, len, is_write);
> +}
> +
> +static void qemu_sglist_default_unmap(void *opaque,
> +                                      void *buffer,
> +                                      target_phys_addr_t len,
> +                                      int is_write,
> +                                      target_phys_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +}
> +
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
> +                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
>  {
>      qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
>      qsg->nsg = 0;
>      qsg->nalloc = alloc_hint;
>      qsg->size = 0;
> +
> +    qsg->map = map ? map : qemu_sglist_default_map;
> +    qsg->unmap = unmap ? unmap : qemu_sglist_default_unmap;
> +    qsg->opaque = opaque;
>  }
>  
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
> @@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
>      int i;
>  
>      for (i = 0; i < dbs->iov.niov; ++i) {
> -        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
> -                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
> -                                  dbs->iov.iov[i].iov_len);
> +        dbs->sg->unmap(dbs->sg->opaque,
> +                       dbs->iov.iov[i].iov_base,
> +                       dbs->iov.iov[i].iov_len, !dbs->is_write,
> +                       dbs->iov.iov[i].iov_len);
>      }
>  }
>  
> +static void dma_bdrv_cancel(void *opaque)
> +{
> +    DMAAIOCB *dbs = opaque;
> +
> +    bdrv_aio_cancel(dbs->acb);
> +    dma_bdrv_unmap(dbs);
> +    qemu_iovec_destroy(&dbs->iov);
> +    qemu_aio_release(dbs);
> +}
> +
>  static void dma_bdrv_cb(void *opaque, int ret)
>  {
>      DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> @@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
>      while (dbs->sg_cur_index < dbs->sg->nsg) {
>          cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
>          cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
> -        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
> +        mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
> +                           cur_addr, &cur_len, !dbs->is_write);
>          if (!mem)
>              break;
>          qemu_iovec_add(&dbs->iov, mem, cur_len);
> diff --git a/dma.h b/dma.h
> index f3bb275..d48f35c 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -15,6 +15,19 @@
>  #include "hw/hw.h"
>  #include "block.h"
>  
> +typedef void QEMUSGInvalMapFunc(void *opaque);
> +typedef void *QEMUSGMapFunc(void *opaque,
> +                            QEMUSGInvalMapFunc *inval_cb,
> +                            void *inval_opaque,
> +                            target_phys_addr_t addr,
> +                            target_phys_addr_t *len,
> +                            int is_write);
> +typedef void QEMUSGUnmapFunc(void *opaque,
> +                             void *buffer,
> +                             target_phys_addr_t len,
> +                             int is_write,
> +                             target_phys_addr_t access_len);
> +
>  typedef struct {
>      target_phys_addr_t base;
>      target_phys_addr_t len;
> @@ -25,9 +38,15 @@ typedef struct {
>      int nsg;
>      int nalloc;
>      target_phys_addr_t size;
> +
> +    QEMUSGMapFunc *map;
> +    QEMUSGUnmapFunc *unmap;
> +    void *opaque;
>  } QEMUSGList;
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
> +                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
> +                      void *opaque);
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
>                       target_phys_addr_t len);
>  void qemu_sglist_destroy(QEMUSGList *qsg);
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index af52c2c..024a125 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -436,7 +436,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
>      } prd;
>      int l, len;
>  
> -    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
> +    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
> +                     bm->map, bm->unmap, bm->opaque);
>      s->io_buffer_size = 0;
>      for(;;) {
>          if (bm->cur_prd_len == 0) {
> @@ -444,7 +445,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
>                  return s->io_buffer_size != 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -527,7 +528,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
>                  return 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -542,11 +543,11 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
>              l = bm->cur_prd_len;
>          if (l > 0) {
>              if (is_write) {
> -                cpu_physical_memory_write(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                bmdma_memory_write(bm, bm->cur_prd_addr,
> +                                   s->io_buffer + s->io_buffer_index, l);
>              } else {
> -                cpu_physical_memory_read(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                bmdma_memory_read(bm, bm->cur_prd_addr,
> +                                  s->io_buffer + s->io_buffer_index, l);
>              }
>              bm->cur_prd_addr += l;
>              bm->cur_prd_len -= l;
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 4165543..f686d38 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -477,6 +477,24 @@ struct IDEDeviceInfo {
>  #define BM_CMD_START     0x01
>  #define BM_CMD_READ      0x08
>  
> +typedef void BMDMAInvalMapFunc(void *opaque);
> +typedef void BMDMARWFunc(void *opaque,
> +                         target_phys_addr_t addr,
> +                         uint8_t *buf,
> +                         target_phys_addr_t len,
> +                         int is_write);
> +typedef void *BMDMAMapFunc(void *opaque,
> +                           BMDMAInvalMapFunc *inval_cb,
> +                           void *inval_opaque,
> +                           target_phys_addr_t addr,
> +                           target_phys_addr_t *len,
> +                           int is_write);
> +typedef void BMDMAUnmapFunc(void *opaque,
> +                            void *buffer,
> +                            target_phys_addr_t len,
> +                            int is_write,
> +                            target_phys_addr_t access_len);
> +
>  struct BMDMAState {
>      uint8_t cmd;
>      uint8_t status;
> @@ -496,8 +514,29 @@ struct BMDMAState {
>      int64_t sector_num;
>      uint32_t nsector;
>      QEMUBH *bh;
> +
> +    BMDMARWFunc *rw;
> +    BMDMAMapFunc *map;
> +    BMDMAUnmapFunc *unmap;
> +    void *opaque;
>  };
>  
> +static inline void bmdma_memory_read(BMDMAState *bm,
> +                                     target_phys_addr_t addr,
> +                                     uint8_t *buf,
> +                                     target_phys_addr_t len)
> +{
> +    bm->rw(bm->opaque, addr, buf, len, 0);
> +}
> +
> +static inline void bmdma_memory_write(BMDMAState *bm,
> +                                      target_phys_addr_t addr,
> +                                      uint8_t *buf,
> +                                      target_phys_addr_t len)
> +{
> +    bm->rw(bm->opaque, addr, buf, len, 1);
> +}
> +

Here again, I am concerned about indirection and pointer chaising on data path.
Can we have an iommu pointer in the device, and do a fast path in case
there is no iommu?

>  static inline IDEState *idebus_active_if(IDEBus *bus)
>  {
>      return bus->ifs + bus->unit;
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index bd1c73e..962ae13 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
>  
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>      s->io_buffer_index = 0;
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 4d95cc5..5879044 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
>              continue;
>          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
>      }
> +
> +    for (i = 0; i < 2; i++) {
> +        d->bmdma[i].rw = (void *) pci_memory_rw;
> +        d->bmdma[i].map = (void *) pci_memory_map;
> +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> +        d->bmdma[i].opaque = dev;
> +    }
>  }

These casts show something is wrong with the API, IMO.

> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-28 14:54   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-02  5:28     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  5:28 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
> 
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>


I am concerned about adding more pointer chaising on data path.
Could we have
1. an iommu pointer in a device, inherited by secondary buses
   when they are created and by devices from buses when they are attached.
2. translation pointer in the iommu instead of the bus
3. pci_memory_XX functions inline, doing fast path for non-iommu case:
   
	if (__builtin_expect(!dev->iommu, 1)
		return cpu_memory_rw
	


> ---
>  hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/pci.h           |   74 +++++++++++++++++++++
>  hw/pci_internals.h |   12 ++++
>  qemu-common.h      |    1 +
>  4 files changed, 271 insertions(+), 1 deletions(-)

Almost nothing here is PCI specific.
Can this code go into dma.c/dma.h?
We would have struct DMADevice, APIs like device_dma_write etc.
This would help us get rid of the void * stuff as well?

> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..b460905 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
>      pci_update_mappings(dev);
>  }
>  
> +static int pci_no_translate(PCIDevice *iommu,
> +                            PCIDevice *dev,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *paddr,
> +                            target_phys_addr_t *len,
> +                            unsigned perms)
> +{
> +    *paddr = addr;
> +    *len = -1;
> +
> +    return 0;
> +}
> +
>  static void pci_bus_reset(void *opaque)
>  {
>      PCIBus *bus = opaque;
> @@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
>  {
>      qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
>      assert(PCI_FUNC(devfn_min) == 0);
> -    bus->devfn_min = devfn_min;
> +
> +    bus->devfn_min  = devfn_min;
> +    bus->iommu      = NULL;
> +    bus->translate  = pci_no_translate;
>  
>      /* host bridge */
>      QLIST_INIT(&bus->child);
> @@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
>      return strdup(path);
>  }
>  
> +void pci_register_iommu(PCIDevice *iommu,
> +                        PCITranslateFunc *translate)
> +{
> +    iommu->bus->iommu = iommu;
> +    iommu->bus->translate = translate;
> +}
> +

The above seems broken for secondary buses, right?  Also, can we use
qdev for initialization in some way, instead of adding more APIs?  E.g.
I think it would be nice if we could just use qdev command line flags to
control which bus is behind iommu and which isn't.


> +void pci_memory_rw(PCIDevice *dev,
> +                   pcibus_t addr,
> +                   uint8_t *buf,
> +                   pcibus_t len,
> +                   int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    while (len) {
> +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +        if (err)
> +            return;
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len)
> +            plen = len;
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static void pci_memory_register_map(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    pcibus_t len,
> +                                    target_phys_addr_t paddr,
> +                                    PCIInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    PCIMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(PCIMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> +}
> +
> +static void pci_memory_unregister_map(PCIDevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      target_phys_addr_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void pci_memory_invalidate_range(PCIDevice *dev,
> +                                 pcibus_t addr,
> +                                 pcibus_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *pci_memory_map(PCIDevice *dev,
> +                     PCIInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     pcibus_t addr,
> +                     target_phys_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    plen = *len;
> +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +    if (err)
> +        return NULL;
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len)
> +        *len = plen;
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb)
> +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +

All the above is really only useful for when there is an iommu,
right? So maybe we should shortcut all this if there's no iommu?

> +void pci_memory_unmap(PCIDevice *dev,
> +                      void *buffer,
> +                      target_phys_addr_t len,
> +                      int is_write,
> +                      target_phys_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
> +}
> +
> +#define DEFINE_PCI_LD(suffix, size)                                       \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_PCI_ST(suffix, size)                                       \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_PCI_LD(ub, 8)
> +DEFINE_PCI_LD(uw, 16)
> +DEFINE_PCI_LD(l, 32)
> +DEFINE_PCI_LD(q, 64)
> +
> +DEFINE_PCI_ST(b, 8)
> +DEFINE_PCI_ST(w, 16)
> +DEFINE_PCI_ST(l, 32)
> +DEFINE_PCI_ST(q, 64)
> +
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..3131016 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
>      char *romfile;
>      ram_addr_t rom_offset;
>      uint32_t rom_bar;
> +
> +    QLIST_HEAD(, PCIMemoryMap) memory_maps;
>  };
>  
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>      return !(last2 < first1 || last1 < first2);
>  }
>  
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ     (1 << 0)
> +#define IOMMU_PERM_WRITE    (1 << 1)
> +#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> +                             PCIDevice *dev,
> +                             pcibus_t addr,
> +                             target_phys_addr_t *paddr,
> +                             target_phys_addr_t *len,
> +                             unsigned perms);
> +
> +extern void pci_memory_rw(PCIDevice *dev,
> +                          pcibus_t addr,
> +                          uint8_t *buf,
> +                          pcibus_t len,
> +                          int is_write);
> +extern void *pci_memory_map(PCIDevice *dev,
> +                            PCIInvalidateMapFunc *cb,
> +                            void *opaque,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *len,
> +                            int is_write);
> +extern void pci_memory_unmap(PCIDevice *dev,
> +                             void *buffer,
> +                             target_phys_addr_t len,
> +                             int is_write,
> +                             target_phys_addr_t access_len);
> +extern void pci_register_iommu(PCIDevice *dev,
> +                               PCITranslateFunc *translate);
> +extern void pci_memory_invalidate_range(PCIDevice *dev,
> +                                        pcibus_t addr,
> +                                        pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size)                                    \
> +extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size)                                    \
> +extern void pci_st##suffix(PCIDevice *dev,                              \
> +                           pcibus_t addr,                               \
> +                           uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> +                                   pcibus_t addr,
> +                                   uint8_t *buf,
> +                                   pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    const uint8_t *buf,
> +                                    pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
>  #endif
> diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> index e3c93a3..fb134b9 100644
> --- a/hw/pci_internals.h
> +++ b/hw/pci_internals.h
> @@ -33,6 +33,9 @@ struct PCIBus {
>         Keep a count of the number of devices with raised IRQs.  */
>      int nirq;
>      int *irq_count;
> +
> +    PCIDevice                       *iommu;
> +    PCITranslateFunc                *translate;
>  };

Why is translate pointer in a bus? I think it's a work of an iommu?

>  struct PCIBridge {
> @@ -44,4 +47,13 @@ struct PCIBridge {
>      const char *bus_name;
>  };
>  
> +struct PCIMemoryMap {
> +    pcibus_t                        addr;
> +    pcibus_t                        len;
> +    target_phys_addr_t              paddr;
> +    PCIInvalidateMapFunc            *invalidate;
> +    void                            *invalidate_opaque;

Can we have a structure that encapsulates the mapping
data instead of a void *?


> +    QLIST_ENTRY(PCIMemoryMap)       list;
> +};
> +
>  #endif /* QEMU_PCI_INTERNALS_H */
> diff --git a/qemu-common.h b/qemu-common.h
> index d735235..8b060e8 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
>  typedef struct PCIHostState PCIHostState;
>  typedef struct PCIExpressHost PCIExpressHost;
>  typedef struct PCIBus PCIBus;
> +typedef struct PCIMemoryMap PCIMemoryMap;
>  typedef struct PCIDevice PCIDevice;
>  typedef struct PCIBridge PCIBridge;
>  typedef struct SerialState SerialState;
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  5:28     ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  5:28 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
> 
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>


I am concerned about adding more pointer chaising on data path.
Could we have
1. an iommu pointer in a device, inherited by secondary buses
   when they are created and by devices from buses when they are attached.
2. translation pointer in the iommu instead of the bus
3. pci_memory_XX functions inline, doing fast path for non-iommu case:
   
	if (__builtin_expect(!dev->iommu, 1)
		return cpu_memory_rw
	


> ---
>  hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/pci.h           |   74 +++++++++++++++++++++
>  hw/pci_internals.h |   12 ++++
>  qemu-common.h      |    1 +
>  4 files changed, 271 insertions(+), 1 deletions(-)

Almost nothing here is PCI specific.
Can this code go into dma.c/dma.h?
We would have struct DMADevice, APIs like device_dma_write etc.
This would help us get rid of the void * stuff as well?

> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..b460905 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
>      pci_update_mappings(dev);
>  }
>  
> +static int pci_no_translate(PCIDevice *iommu,
> +                            PCIDevice *dev,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *paddr,
> +                            target_phys_addr_t *len,
> +                            unsigned perms)
> +{
> +    *paddr = addr;
> +    *len = -1;
> +
> +    return 0;
> +}
> +
>  static void pci_bus_reset(void *opaque)
>  {
>      PCIBus *bus = opaque;
> @@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
>  {
>      qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
>      assert(PCI_FUNC(devfn_min) == 0);
> -    bus->devfn_min = devfn_min;
> +
> +    bus->devfn_min  = devfn_min;
> +    bus->iommu      = NULL;
> +    bus->translate  = pci_no_translate;
>  
>      /* host bridge */
>      QLIST_INIT(&bus->child);
> @@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
>      return strdup(path);
>  }
>  
> +void pci_register_iommu(PCIDevice *iommu,
> +                        PCITranslateFunc *translate)
> +{
> +    iommu->bus->iommu = iommu;
> +    iommu->bus->translate = translate;
> +}
> +

The above seems broken for secondary buses, right?  Also, can we use
qdev for initialization in some way, instead of adding more APIs?  E.g.
I think it would be nice if we could just use qdev command line flags to
control which bus is behind iommu and which isn't.


> +void pci_memory_rw(PCIDevice *dev,
> +                   pcibus_t addr,
> +                   uint8_t *buf,
> +                   pcibus_t len,
> +                   int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    while (len) {
> +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +        if (err)
> +            return;
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len)
> +            plen = len;
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static void pci_memory_register_map(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    pcibus_t len,
> +                                    target_phys_addr_t paddr,
> +                                    PCIInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    PCIMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(PCIMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> +}
> +
> +static void pci_memory_unregister_map(PCIDevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      target_phys_addr_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void pci_memory_invalidate_range(PCIDevice *dev,
> +                                 pcibus_t addr,
> +                                 pcibus_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *pci_memory_map(PCIDevice *dev,
> +                     PCIInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     pcibus_t addr,
> +                     target_phys_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    plen = *len;
> +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +    if (err)
> +        return NULL;
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len)
> +        *len = plen;
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb)
> +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +

All the above is really only useful for when there is an iommu,
right? So maybe we should shortcut all this if there's no iommu?

> +void pci_memory_unmap(PCIDevice *dev,
> +                      void *buffer,
> +                      target_phys_addr_t len,
> +                      int is_write,
> +                      target_phys_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
> +}
> +
> +#define DEFINE_PCI_LD(suffix, size)                                       \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_PCI_ST(suffix, size)                                       \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_PCI_LD(ub, 8)
> +DEFINE_PCI_LD(uw, 16)
> +DEFINE_PCI_LD(l, 32)
> +DEFINE_PCI_LD(q, 64)
> +
> +DEFINE_PCI_ST(b, 8)
> +DEFINE_PCI_ST(w, 16)
> +DEFINE_PCI_ST(l, 32)
> +DEFINE_PCI_ST(q, 64)
> +
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..3131016 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
>      char *romfile;
>      ram_addr_t rom_offset;
>      uint32_t rom_bar;
> +
> +    QLIST_HEAD(, PCIMemoryMap) memory_maps;
>  };
>  
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>      return !(last2 < first1 || last1 < first2);
>  }
>  
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ     (1 << 0)
> +#define IOMMU_PERM_WRITE    (1 << 1)
> +#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> +                             PCIDevice *dev,
> +                             pcibus_t addr,
> +                             target_phys_addr_t *paddr,
> +                             target_phys_addr_t *len,
> +                             unsigned perms);
> +
> +extern void pci_memory_rw(PCIDevice *dev,
> +                          pcibus_t addr,
> +                          uint8_t *buf,
> +                          pcibus_t len,
> +                          int is_write);
> +extern void *pci_memory_map(PCIDevice *dev,
> +                            PCIInvalidateMapFunc *cb,
> +                            void *opaque,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *len,
> +                            int is_write);
> +extern void pci_memory_unmap(PCIDevice *dev,
> +                             void *buffer,
> +                             target_phys_addr_t len,
> +                             int is_write,
> +                             target_phys_addr_t access_len);
> +extern void pci_register_iommu(PCIDevice *dev,
> +                               PCITranslateFunc *translate);
> +extern void pci_memory_invalidate_range(PCIDevice *dev,
> +                                        pcibus_t addr,
> +                                        pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size)                                    \
> +extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size)                                    \
> +extern void pci_st##suffix(PCIDevice *dev,                              \
> +                           pcibus_t addr,                               \
> +                           uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> +                                   pcibus_t addr,
> +                                   uint8_t *buf,
> +                                   pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    const uint8_t *buf,
> +                                    pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
>  #endif
> diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> index e3c93a3..fb134b9 100644
> --- a/hw/pci_internals.h
> +++ b/hw/pci_internals.h
> @@ -33,6 +33,9 @@ struct PCIBus {
>         Keep a count of the number of devices with raised IRQs.  */
>      int nirq;
>      int *irq_count;
> +
> +    PCIDevice                       *iommu;
> +    PCITranslateFunc                *translate;
>  };

Why is translate pointer in a bus? I think it's a work of an iommu?

>  struct PCIBridge {
> @@ -44,4 +47,13 @@ struct PCIBridge {
>      const char *bus_name;
>  };
>  
> +struct PCIMemoryMap {
> +    pcibus_t                        addr;
> +    pcibus_t                        len;
> +    target_phys_addr_t              paddr;
> +    PCIInvalidateMapFunc            *invalidate;
> +    void                            *invalidate_opaque;

Can we have a structure that encapsulates the mapping
data instead of a void *?


> +    QLIST_ENTRY(PCIMemoryMap)       list;
> +};
> +
>  #endif /* QEMU_PCI_INTERNALS_H */
> diff --git a/qemu-common.h b/qemu-common.h
> index d735235..8b060e8 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
>  typedef struct PCIHostState PCIHostState;
>  typedef struct PCIExpressHost PCIExpressHost;
>  typedef struct PCIBus PCIBus;
> +typedef struct PCIMemoryMap PCIMemoryMap;
>  typedef struct PCIDevice PCIDevice;
>  typedef struct PCIBridge PCIBridge;
>  typedef struct SerialState SerialState;
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-01 20:10           ` Stefan Weil
@ 2010-09-02  6:00             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  6:00 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Eduard - Gabriel Munteanu, kvm, joro, qemu-devel, blauwirbel,
	yamahata, paul, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> >+static inline void pci_memory_read(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, buf, len, 0);
> >+}
> >+
> >+static inline void pci_memory_write(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ const uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> >+}
> >+
> >#endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

Further, I am not sure pcibus_t is a good type to use here.
This also forces use of pci specific types in e.g. ide, or resorting to
casts as this patch does. We probably should use a more generic type
for this.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  6:00             ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  6:00 UTC (permalink / raw)
  To: Stefan Weil
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> >+static inline void pci_memory_read(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, buf, len, 0);
> >+}
> >+
> >+static inline void pci_memory_write(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ const uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> >+}
> >+
> >#endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

Further, I am not sure pcibus_t is a good type to use here.
This also forces use of pci specific types in e.g. ide, or resorting to
casts as this patch does. We probably should use a more generic type
for this.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  5:28     ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-02  8:40       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  8:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> > PCI devices should access memory through pci_memory_*() instead of
> > cpu_physical_memory_*(). This also provides support for translation and
> > access checking in case an IOMMU is emulated.
> > 
> > Memory maps are treated as remote IOTLBs (that is, translation caches
> > belonging to the IOMMU-aware device itself). Clients (devices) must
> > provide callbacks for map invalidation in case these maps are
> > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> > 
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> 
> 
> I am concerned about adding more pointer chaising on data path.
> Could we have
> 1. an iommu pointer in a device, inherited by secondary buses
>    when they are created and by devices from buses when they are attached.
> 2. translation pointer in the iommu instead of the bus

The first solution I proposed was based on qdev, that is, each
DeviceState had an 'iommu' field. Translation would be done by
recursively looking in the parent bus/devs for an IOMMU.

But Anthony said we're better off with bus-specific APIs, mostly because
(IIRC) there may be different types of addresses and it might be
difficult to abstract those properly.

I suppose I could revisit the idea by integrating the IOMMU in a
PCIDevice as opposed to a DeviceState.

Anthony, Paul, any thoughts on this?

> 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
>    
> 	if (__builtin_expect(!dev->iommu, 1)
> 		return cpu_memory_rw

But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
kernel? If so, it puts the IOMMU-enabled case at disadvantage.

I suppose most emulated systems would have at least some theoretical
reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
PCI devices) or for userspace drivers. So there are reasons to enable
the IOMMU even when you don't have a real host IOMMU and you're not
using nested guests.

> > ---
> >  hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  hw/pci.h           |   74 +++++++++++++++++++++
> >  hw/pci_internals.h |   12 ++++
> >  qemu-common.h      |    1 +
> >  4 files changed, 271 insertions(+), 1 deletions(-)
> 
> Almost nothing here is PCI specific.
> Can this code go into dma.c/dma.h?
> We would have struct DMADevice, APIs like device_dma_write etc.
> This would help us get rid of the void * stuff as well?
> 

Yeah, I know, that's similar to what I intended to do at first. Though
I'm not sure that rids us of 'void *' stuff, quite on the contrary from
what I've seen.

Some stuff still needs to stay 'void *' (or an equivalent typedef, but
still an opaque) simply because of the required level of abstraction
that's needed.

[snip]

> > +void pci_register_iommu(PCIDevice *iommu,
> > +                        PCITranslateFunc *translate)
> > +{
> > +    iommu->bus->iommu = iommu;
> > +    iommu->bus->translate = translate;
> > +}
> > +
> 
> The above seems broken for secondary buses, right?  Also, can we use
> qdev for initialization in some way, instead of adding more APIs?  E.g.
> I think it would be nice if we could just use qdev command line flags to
> control which bus is behind iommu and which isn't.
> 
> 

Each bus must have its own IOMMU. The secondary bus should ask the
primary bus instead of going through cpu_physical_memory_*(). If that
isn't the case, it's broken and the secondary bus must be converted to
the new API just like regular devices. I'll have a look at that.

> > +void pci_memory_rw(PCIDevice *dev,
> > +                   pcibus_t addr,
> > +                   uint8_t *buf,
> > +                   pcibus_t len,
> > +                   int is_write)
> > +{
> > +    int err;
> > +    unsigned perms;
> > +    PCIDevice *iommu = dev->bus->iommu;
> > +    target_phys_addr_t paddr, plen;
> > +
> > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > +
> > +    while (len) {
> > +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > +        if (err)
> > +            return;
> > +
> > +        /* The translation might be valid for larger regions. */
> > +        if (plen > len)
> > +            plen = len;
> > +
> > +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > +
> > +        len -= plen;
> > +        addr += plen;
> > +        buf += plen;
> > +    }
> > +}
> > +
> > +static void pci_memory_register_map(PCIDevice *dev,
> > +                                    pcibus_t addr,
> > +                                    pcibus_t len,
> > +                                    target_phys_addr_t paddr,
> > +                                    PCIInvalidateMapFunc *invalidate,
> > +                                    void *invalidate_opaque)
> > +{
> > +    PCIMemoryMap *map;
> > +
> > +    map = qemu_malloc(sizeof(PCIMemoryMap));
> > +    map->addr               = addr;
> > +    map->len                = len;
> > +    map->paddr              = paddr;
> > +    map->invalidate         = invalidate;
> > +    map->invalidate_opaque  = invalidate_opaque;
> > +
> > +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> > +}
> > +
> > +static void pci_memory_unregister_map(PCIDevice *dev,
> > +                                      target_phys_addr_t paddr,
> > +                                      target_phys_addr_t len)
> > +{
> > +    PCIMemoryMap *map;
> > +
> > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > +        if (map->paddr == paddr && map->len == len) {
> > +            QLIST_REMOVE(map, list);
> > +            free(map);
> > +        }
> > +    }
> > +}
> > +
> > +void pci_memory_invalidate_range(PCIDevice *dev,
> > +                                 pcibus_t addr,
> > +                                 pcibus_t len)
> > +{
> > +    PCIMemoryMap *map;
> > +
> > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> > +            map->invalidate(map->invalidate_opaque);
> > +            QLIST_REMOVE(map, list);
> > +            free(map);
> > +        }
> > +    }
> > +}
> > +
> > +void *pci_memory_map(PCIDevice *dev,
> > +                     PCIInvalidateMapFunc *cb,
> > +                     void *opaque,
> > +                     pcibus_t addr,
> > +                     target_phys_addr_t *len,
> > +                     int is_write)
> > +{
> > +    int err;
> > +    unsigned perms;
> > +    PCIDevice *iommu = dev->bus->iommu;
> > +    target_phys_addr_t paddr, plen;
> > +
> > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > +
> > +    plen = *len;
> > +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > +    if (err)
> > +        return NULL;
> > +
> > +    /*
> > +     * If this is true, the virtual region is contiguous,
> > +     * but the translated physical region isn't. We just
> > +     * clamp *len, much like cpu_physical_memory_map() does.
> > +     */
> > +    if (plen < *len)
> > +        *len = plen;
> > +
> > +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> > +    if (cb)
> > +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> > +
> > +    return cpu_physical_memory_map(paddr, len, is_write);
> > +}
> > +
> 
> All the above is really only useful for when there is an iommu,
> right? So maybe we should shortcut all this if there's no iommu?
> 

Some people (e.g. Blue) suggested I shouldn't make the IOMMU emulation a
compile-time option, like I originally did. And I'm not sure any runtime
"optimization" (as in likely()/unlikely()) is justified.

[snip]

> > diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> > index e3c93a3..fb134b9 100644
> > --- a/hw/pci_internals.h
> > +++ b/hw/pci_internals.h
> > @@ -33,6 +33,9 @@ struct PCIBus {
> >         Keep a count of the number of devices with raised IRQs.  */
> >      int nirq;
> >      int *irq_count;
> > +
> > +    PCIDevice                       *iommu;
> > +    PCITranslateFunc                *translate;
> >  };
> 
> Why is translate pointer in a bus? I think it's a work of an iommu?
> 

Anthony and Paul thought it's best to simply as the parent bus for
translation. I somewhat agree to that: devices that aren't IOMMU-aware
simply attempt to do PCI requests to memory and the IOMMU translates
and checks them transparently.

> >  struct PCIBridge {
> > @@ -44,4 +47,13 @@ struct PCIBridge {
> >      const char *bus_name;
> >  };
> >  
> > +struct PCIMemoryMap {
> > +    pcibus_t                        addr;
> > +    pcibus_t                        len;
> > +    target_phys_addr_t              paddr;
> > +    PCIInvalidateMapFunc            *invalidate;
> > +    void                            *invalidate_opaque;
> 
> Can we have a structure that encapsulates the mapping
> data instead of a void *?
> 
> 

Not really. 'invalidate_opaque' belongs to device code. It's meant to be
a handle to easily identify the mapping. For example, DMA code wants to
cancel AIO transfers when the bus requests the map to be invalidated.
It's difficult to look that AIO transfer up using non-opaque data.

> > +    QLIST_ENTRY(PCIMemoryMap)       list;
> > +};
> > +
> >  #endif /* QEMU_PCI_INTERNALS_H */
> > diff --git a/qemu-common.h b/qemu-common.h
> > index d735235..8b060e8 100644
> > --- a/qemu-common.h
> > +++ b/qemu-common.h
> > @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
> >  typedef struct PCIHostState PCIHostState;
> >  typedef struct PCIExpressHost PCIExpressHost;
> >  typedef struct PCIBus PCIBus;
> > +typedef struct PCIMemoryMap PCIMemoryMap;
> >  typedef struct PCIDevice PCIDevice;
> >  typedef struct PCIBridge PCIBridge;
> >  typedef struct SerialState SerialState;
> > -- 
> > 1.7.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  8:40       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  8:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> > PCI devices should access memory through pci_memory_*() instead of
> > cpu_physical_memory_*(). This also provides support for translation and
> > access checking in case an IOMMU is emulated.
> > 
> > Memory maps are treated as remote IOTLBs (that is, translation caches
> > belonging to the IOMMU-aware device itself). Clients (devices) must
> > provide callbacks for map invalidation in case these maps are
> > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> > 
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> 
> 
> I am concerned about adding more pointer chaising on data path.
> Could we have
> 1. an iommu pointer in a device, inherited by secondary buses
>    when they are created and by devices from buses when they are attached.
> 2. translation pointer in the iommu instead of the bus

The first solution I proposed was based on qdev, that is, each
DeviceState had an 'iommu' field. Translation would be done by
recursively looking in the parent bus/devs for an IOMMU.

But Anthony said we're better off with bus-specific APIs, mostly because
(IIRC) there may be different types of addresses and it might be
difficult to abstract those properly.

I suppose I could revisit the idea by integrating the IOMMU in a
PCIDevice as opposed to a DeviceState.

Anthony, Paul, any thoughts on this?

> 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
>    
> 	if (__builtin_expect(!dev->iommu, 1)
> 		return cpu_memory_rw

But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
kernel? If so, it puts the IOMMU-enabled case at disadvantage.

I suppose most emulated systems would have at least some theoretical
reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
PCI devices) or for userspace drivers. So there are reasons to enable
the IOMMU even when you don't have a real host IOMMU and you're not
using nested guests.

> > ---
> >  hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  hw/pci.h           |   74 +++++++++++++++++++++
> >  hw/pci_internals.h |   12 ++++
> >  qemu-common.h      |    1 +
> >  4 files changed, 271 insertions(+), 1 deletions(-)
> 
> Almost nothing here is PCI specific.
> Can this code go into dma.c/dma.h?
> We would have struct DMADevice, APIs like device_dma_write etc.
> This would help us get rid of the void * stuff as well?
> 

Yeah, I know, that's similar to what I intended to do at first. Though
I'm not sure that rids us of 'void *' stuff, quite on the contrary from
what I've seen.

Some stuff still needs to stay 'void *' (or an equivalent typedef, but
still an opaque) simply because of the required level of abstraction
that's needed.

[snip]

> > +void pci_register_iommu(PCIDevice *iommu,
> > +                        PCITranslateFunc *translate)
> > +{
> > +    iommu->bus->iommu = iommu;
> > +    iommu->bus->translate = translate;
> > +}
> > +
> 
> The above seems broken for secondary buses, right?  Also, can we use
> qdev for initialization in some way, instead of adding more APIs?  E.g.
> I think it would be nice if we could just use qdev command line flags to
> control which bus is behind iommu and which isn't.
> 
> 

Each bus must have its own IOMMU. The secondary bus should ask the
primary bus instead of going through cpu_physical_memory_*(). If that
isn't the case, it's broken and the secondary bus must be converted to
the new API just like regular devices. I'll have a look at that.

> > +void pci_memory_rw(PCIDevice *dev,
> > +                   pcibus_t addr,
> > +                   uint8_t *buf,
> > +                   pcibus_t len,
> > +                   int is_write)
> > +{
> > +    int err;
> > +    unsigned perms;
> > +    PCIDevice *iommu = dev->bus->iommu;
> > +    target_phys_addr_t paddr, plen;
> > +
> > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > +
> > +    while (len) {
> > +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > +        if (err)
> > +            return;
> > +
> > +        /* The translation might be valid for larger regions. */
> > +        if (plen > len)
> > +            plen = len;
> > +
> > +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > +
> > +        len -= plen;
> > +        addr += plen;
> > +        buf += plen;
> > +    }
> > +}
> > +
> > +static void pci_memory_register_map(PCIDevice *dev,
> > +                                    pcibus_t addr,
> > +                                    pcibus_t len,
> > +                                    target_phys_addr_t paddr,
> > +                                    PCIInvalidateMapFunc *invalidate,
> > +                                    void *invalidate_opaque)
> > +{
> > +    PCIMemoryMap *map;
> > +
> > +    map = qemu_malloc(sizeof(PCIMemoryMap));
> > +    map->addr               = addr;
> > +    map->len                = len;
> > +    map->paddr              = paddr;
> > +    map->invalidate         = invalidate;
> > +    map->invalidate_opaque  = invalidate_opaque;
> > +
> > +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> > +}
> > +
> > +static void pci_memory_unregister_map(PCIDevice *dev,
> > +                                      target_phys_addr_t paddr,
> > +                                      target_phys_addr_t len)
> > +{
> > +    PCIMemoryMap *map;
> > +
> > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > +        if (map->paddr == paddr && map->len == len) {
> > +            QLIST_REMOVE(map, list);
> > +            free(map);
> > +        }
> > +    }
> > +}
> > +
> > +void pci_memory_invalidate_range(PCIDevice *dev,
> > +                                 pcibus_t addr,
> > +                                 pcibus_t len)
> > +{
> > +    PCIMemoryMap *map;
> > +
> > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> > +            map->invalidate(map->invalidate_opaque);
> > +            QLIST_REMOVE(map, list);
> > +            free(map);
> > +        }
> > +    }
> > +}
> > +
> > +void *pci_memory_map(PCIDevice *dev,
> > +                     PCIInvalidateMapFunc *cb,
> > +                     void *opaque,
> > +                     pcibus_t addr,
> > +                     target_phys_addr_t *len,
> > +                     int is_write)
> > +{
> > +    int err;
> > +    unsigned perms;
> > +    PCIDevice *iommu = dev->bus->iommu;
> > +    target_phys_addr_t paddr, plen;
> > +
> > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > +
> > +    plen = *len;
> > +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > +    if (err)
> > +        return NULL;
> > +
> > +    /*
> > +     * If this is true, the virtual region is contiguous,
> > +     * but the translated physical region isn't. We just
> > +     * clamp *len, much like cpu_physical_memory_map() does.
> > +     */
> > +    if (plen < *len)
> > +        *len = plen;
> > +
> > +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> > +    if (cb)
> > +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> > +
> > +    return cpu_physical_memory_map(paddr, len, is_write);
> > +}
> > +
> 
> All the above is really only useful for when there is an iommu,
> right? So maybe we should shortcut all this if there's no iommu?
> 

Some people (e.g. Blue) suggested I shouldn't make the IOMMU emulation a
compile-time option, like I originally did. And I'm not sure any runtime
"optimization" (as in likely()/unlikely()) is justified.

[snip]

> > diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> > index e3c93a3..fb134b9 100644
> > --- a/hw/pci_internals.h
> > +++ b/hw/pci_internals.h
> > @@ -33,6 +33,9 @@ struct PCIBus {
> >         Keep a count of the number of devices with raised IRQs.  */
> >      int nirq;
> >      int *irq_count;
> > +
> > +    PCIDevice                       *iommu;
> > +    PCITranslateFunc                *translate;
> >  };
> 
> Why is translate pointer in a bus? I think it's a work of an iommu?
> 

Anthony and Paul thought it's best to simply as the parent bus for
translation. I somewhat agree to that: devices that aren't IOMMU-aware
simply attempt to do PCI requests to memory and the IOMMU translates
and checks them transparently.

> >  struct PCIBridge {
> > @@ -44,4 +47,13 @@ struct PCIBridge {
> >      const char *bus_name;
> >  };
> >  
> > +struct PCIMemoryMap {
> > +    pcibus_t                        addr;
> > +    pcibus_t                        len;
> > +    target_phys_addr_t              paddr;
> > +    PCIInvalidateMapFunc            *invalidate;
> > +    void                            *invalidate_opaque;
> 
> Can we have a structure that encapsulates the mapping
> data instead of a void *?
> 
> 

Not really. 'invalidate_opaque' belongs to device code. It's meant to be
a handle to easily identify the mapping. For example, DMA code wants to
cancel AIO transfers when the bus requests the map to be invalidated.
It's difficult to look that AIO transfer up using non-opaque data.

> > +    QLIST_ENTRY(PCIMemoryMap)       list;
> > +};
> > +
> >  #endif /* QEMU_PCI_INTERNALS_H */
> > diff --git a/qemu-common.h b/qemu-common.h
> > index d735235..8b060e8 100644
> > --- a/qemu-common.h
> > +++ b/qemu-common.h
> > @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
> >  typedef struct PCIHostState PCIHostState;
> >  typedef struct PCIExpressHost PCIExpressHost;
> >  typedef struct PCIBus PCIBus;
> > +typedef struct PCIMemoryMap PCIMemoryMap;
> >  typedef struct PCIDevice PCIDevice;
> >  typedef struct PCIBridge PCIBridge;
> >  typedef struct SerialState SerialState;
> > -- 
> > 1.7.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-01 20:10           ` Stefan Weil
@ 2010-09-02  8:51             ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  8:51 UTC (permalink / raw)
  To: Stefan Weil; +Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> Please see my comments at the end of this mail.
> 
> 
> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> > PCI devices should access memory through pci_memory_*() instead of
> > cpu_physical_memory_*(). This also provides support for translation and
> > access checking in case an IOMMU is emulated.
> >
> > Memory maps are treated as remote IOTLBs (that is, translation caches
> > belonging to the IOMMU-aware device itself). Clients (devices) must
> > provide callbacks for map invalidation in case these maps are
> > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > +static inline void pci_memory_read(PCIDevice *dev,
> > + pcibus_t addr,
> > + uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, buf, len, 0);
> > +}
> > +
> > +static inline void pci_memory_write(PCIDevice *dev,
> > + pcibus_t addr,
> > + const uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > +}
> > +
> > #endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

I only followed an approach similar to how cpu_physical_memory_{read,write}()
is defined. I think I should change both cpu_physical_memory_* stuff and
pci_memory_* stuff, not only the latter, if I decide to go on that
approach.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  8:51             ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  8:51 UTC (permalink / raw)
  To: Stefan Weil; +Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> Please see my comments at the end of this mail.
> 
> 
> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> > PCI devices should access memory through pci_memory_*() instead of
> > cpu_physical_memory_*(). This also provides support for translation and
> > access checking in case an IOMMU is emulated.
> >
> > Memory maps are treated as remote IOTLBs (that is, translation caches
> > belonging to the IOMMU-aware device itself). Clients (devices) must
> > provide callbacks for map invalidation in case these maps are
> > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > +static inline void pci_memory_read(PCIDevice *dev,
> > + pcibus_t addr,
> > + uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, buf, len, 0);
> > +}
> > +
> > +static inline void pci_memory_write(PCIDevice *dev,
> > + pcibus_t addr,
> > + const uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > +}
> > +
> > #endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

I only followed an approach similar to how cpu_physical_memory_{read,write}()
is defined. I think I should change both cpu_physical_memory_* stuff and
pci_memory_* stuff, not only the latter, if I decide to go on that
approach.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  6:00             ` Michael S. Tsirkin
@ 2010-09-02  9:08               ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Weil, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> > >+static inline void pci_memory_read(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, buf, len, 0);
> > >+}
> > >+
> > >+static inline void pci_memory_write(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ const uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > >+}
> > >+
> > >#endif
> > 
> > The functions pci_memory_read and pci_memory_write not only read
> > or write byte data but many different data types which leads to
> > a lot of type casts in your other patches.
> > 
> > I'd prefer "void *buf" and "const void *buf" in the argument lists.
> > Then all those type casts could be removed.
> > 
> > Regards
> > Stefan Weil
> 
> Further, I am not sure pcibus_t is a good type to use here.
> This also forces use of pci specific types in e.g. ide, or resorting to
> casts as this patch does. We probably should use a more generic type
> for this.

It only forces use of PCI-specific types in the IDE controller, which is
already a PCI device.


	Eduard

> -- 
> MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  9:08               ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, yamahata, joro, qemu-devel, blauwirbel, paul, avi

On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> > >+static inline void pci_memory_read(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, buf, len, 0);
> > >+}
> > >+
> > >+static inline void pci_memory_write(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ const uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > >+}
> > >+
> > >#endif
> > 
> > The functions pci_memory_read and pci_memory_write not only read
> > or write byte data but many different data types which leads to
> > a lot of type casts in your other patches.
> > 
> > I'd prefer "void *buf" and "const void *buf" in the argument lists.
> > Then all those type casts could be removed.
> > 
> > Regards
> > Stefan Weil
> 
> Further, I am not sure pcibus_t is a good type to use here.
> This also forces use of pci specific types in e.g. ide, or resorting to
> casts as this patch does. We probably should use a more generic type
> for this.

It only forces use of PCI-specific types in the IDE controller, which is
already a PCI device.


	Eduard

> -- 
> MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02  5:19     ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-02  9:12       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  9:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> > Emulated PCI IDE controllers now use the memory access interface. This
> > also allows an emulated IOMMU to translate and check accesses.
> > 
> > Map invalidation results in cancelling DMA transfers. Since the guest OS
> > can't properly recover the DMA results in case the mapping is changed,
> > this is a fairly good approximation.
> > 
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > +static inline void bmdma_memory_read(BMDMAState *bm,
> > +                                     target_phys_addr_t addr,
> > +                                     uint8_t *buf,
> > +                                     target_phys_addr_t len)
> > +{
> > +    bm->rw(bm->opaque, addr, buf, len, 0);
> > +}
> > +
> > +static inline void bmdma_memory_write(BMDMAState *bm,
> > +                                      target_phys_addr_t addr,
> > +                                      uint8_t *buf,
> > +                                      target_phys_addr_t len)
> > +{
> > +    bm->rw(bm->opaque, addr, buf, len, 1);
> > +}
> > +
> 
> Here again, I am concerned about indirection and pointer chaising on data path.
> Can we have an iommu pointer in the device, and do a fast path in case
> there is no iommu?
> 

See my other reply.

> >  static inline IDEState *idebus_active_if(IDEBus *bus)
> >  {
> >      return bus->ifs + bus->unit;
> > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > index bd1c73e..962ae13 100644
> > --- a/hw/ide/macio.c
> > +++ b/hw/ide/macio.c
> > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> >  
> >      s->io_buffer_size = io->len;
> >  
> > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> >      qemu_sglist_add(&s->sg, io->addr, io->len);
> >      io->addr += io->len;
> >      io->len = 0;
> > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> >      s->io_buffer_index = 0;
> >      s->io_buffer_size = io->len;
> >  
> > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> >      qemu_sglist_add(&s->sg, io->addr, io->len);
> >      io->addr += io->len;
> >      io->len = 0;
> > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > index 4d95cc5..5879044 100644
> > --- a/hw/ide/pci.c
> > +++ b/hw/ide/pci.c
> > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> >              continue;
> >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> >      }
> > +
> > +    for (i = 0; i < 2; i++) {
> > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > +        d->bmdma[i].map = (void *) pci_memory_map;
> > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > +        d->bmdma[i].opaque = dev;
> > +    }
> >  }
> 
> These casts show something is wrong with the API, IMO.
> 

Hm, here's an oversight on my part: I think I should provide explicit
bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
uint{32,64}_t depending on the guest machine, so it might be buggy on
32-bit wrt calling conventions. But that introduces yet another
non-inlined function call :-(. That would drop the (void *) cast,
though.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02  9:12       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  9:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> > Emulated PCI IDE controllers now use the memory access interface. This
> > also allows an emulated IOMMU to translate and check accesses.
> > 
> > Map invalidation results in cancelling DMA transfers. Since the guest OS
> > can't properly recover the DMA results in case the mapping is changed,
> > this is a fairly good approximation.
> > 
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > +static inline void bmdma_memory_read(BMDMAState *bm,
> > +                                     target_phys_addr_t addr,
> > +                                     uint8_t *buf,
> > +                                     target_phys_addr_t len)
> > +{
> > +    bm->rw(bm->opaque, addr, buf, len, 0);
> > +}
> > +
> > +static inline void bmdma_memory_write(BMDMAState *bm,
> > +                                      target_phys_addr_t addr,
> > +                                      uint8_t *buf,
> > +                                      target_phys_addr_t len)
> > +{
> > +    bm->rw(bm->opaque, addr, buf, len, 1);
> > +}
> > +
> 
> Here again, I am concerned about indirection and pointer chaising on data path.
> Can we have an iommu pointer in the device, and do a fast path in case
> there is no iommu?
> 

See my other reply.

> >  static inline IDEState *idebus_active_if(IDEBus *bus)
> >  {
> >      return bus->ifs + bus->unit;
> > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > index bd1c73e..962ae13 100644
> > --- a/hw/ide/macio.c
> > +++ b/hw/ide/macio.c
> > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> >  
> >      s->io_buffer_size = io->len;
> >  
> > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> >      qemu_sglist_add(&s->sg, io->addr, io->len);
> >      io->addr += io->len;
> >      io->len = 0;
> > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> >      s->io_buffer_index = 0;
> >      s->io_buffer_size = io->len;
> >  
> > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> >      qemu_sglist_add(&s->sg, io->addr, io->len);
> >      io->addr += io->len;
> >      io->len = 0;
> > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > index 4d95cc5..5879044 100644
> > --- a/hw/ide/pci.c
> > +++ b/hw/ide/pci.c
> > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> >              continue;
> >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> >      }
> > +
> > +    for (i = 0; i < 2; i++) {
> > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > +        d->bmdma[i].map = (void *) pci_memory_map;
> > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > +        d->bmdma[i].opaque = dev;
> > +    }
> >  }
> 
> These casts show something is wrong with the API, IMO.
> 

Hm, here's an oversight on my part: I think I should provide explicit
bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
uint{32,64}_t depending on the guest machine, so it might be buggy on
32-bit wrt calling conventions. But that introduces yet another
non-inlined function call :-(. That would drop the (void *) cast,
though.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  8:40       ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-02  9:49         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  9:49 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> > > PCI devices should access memory through pci_memory_*() instead of
> > > cpu_physical_memory_*(). This also provides support for translation and
> > > access checking in case an IOMMU is emulated.
> > > 
> > > Memory maps are treated as remote IOTLBs (that is, translation caches
> > > belonging to the IOMMU-aware device itself). Clients (devices) must
> > > provide callbacks for map invalidation in case these maps are
> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> > > 
> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > 
> > 
> > I am concerned about adding more pointer chaising on data path.
> > Could we have
> > 1. an iommu pointer in a device, inherited by secondary buses
> >    when they are created and by devices from buses when they are attached.
> > 2. translation pointer in the iommu instead of the bus
> 
> The first solution I proposed was based on qdev, that is, each
> DeviceState had an 'iommu' field. Translation would be done by
> recursively looking in the parent bus/devs for an IOMMU.
> 
> But Anthony said we're better off with bus-specific APIs, mostly because
> (IIRC) there may be different types of addresses and it might be
> difficult to abstract those properly.

Well we ended up with casting
away types to make pci callbacks fit in ide structure,
and silently assuming that all addresses are in fact 64 bit.
So maybe it's hard to abstract addresses properly, but
it appears we'll have to, to avoid even worse problems.

> I suppose I could revisit the idea by integrating the IOMMU in a
> PCIDevice as opposed to a DeviceState.
> 
> Anthony, Paul, any thoughts on this?

Just to clarify: this is an optimization idea:
instead of a bus walk on each access, do the walk
when device is attached to the bus, and copy the iommu
from the root to the device itself.

This will also make it possible to create
DMADeviceState structure which would have this iommu field,
and we'd use this structure instead of the void pointers all over.


> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
> >    
> > 	if (__builtin_expect(!dev->iommu, 1)
> > 		return cpu_memory_rw
> 
> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
> kernel? If so, it puts the IOMMU-enabled case at disadvantage.

IOMMU has a ton of indirections anyway.

> I suppose most emulated systems would have at least some theoretical
> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
> PCI devices) or for userspace drivers.
> So there are reasons to enable
> the IOMMU even when you don't have a real host IOMMU and you're not
> using nested guests.

The time most people enable iommu for all devices in both real and virtualized
systems appears distant, one of the reasons is because it has a lot of overhead.
Let's start with not adding overhead for existing users, makes sense?


> > > ---
> > >  hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  hw/pci.h           |   74 +++++++++++++++++++++
> > >  hw/pci_internals.h |   12 ++++
> > >  qemu-common.h      |    1 +
> > >  4 files changed, 271 insertions(+), 1 deletions(-)
> > 
> > Almost nothing here is PCI specific.
> > Can this code go into dma.c/dma.h?
> > We would have struct DMADevice, APIs like device_dma_write etc.
> > This would help us get rid of the void * stuff as well?
> > 
> 
> Yeah, I know, that's similar to what I intended to do at first. Though
> I'm not sure that rids us of 'void *' stuff, quite on the contrary from
> what I've seen.
> 
> Some stuff still needs to stay 'void *' (or an equivalent typedef, but
> still an opaque) simply because of the required level of abstraction
> that's needed.
> 
> [snip]
> 
> > > +void pci_register_iommu(PCIDevice *iommu,
> > > +                        PCITranslateFunc *translate)
> > > +{
> > > +    iommu->bus->iommu = iommu;
> > > +    iommu->bus->translate = translate;
> > > +}
> > > +
> > 
> > The above seems broken for secondary buses, right?  Also, can we use
> > qdev for initialization in some way, instead of adding more APIs?  E.g.
> > I think it would be nice if we could just use qdev command line flags to
> > control which bus is behind iommu and which isn't.
> > 
> > 
> 
> Each bus must have its own IOMMU. The secondary bus should ask the
> primary bus instead of going through cpu_physical_memory_*(). If that
> isn't the case, it's broken and the secondary bus must be converted to
> the new API just like regular devices. I'll have a look at that.
> 
> > > +void pci_memory_rw(PCIDevice *dev,
> > > +                   pcibus_t addr,
> > > +                   uint8_t *buf,
> > > +                   pcibus_t len,
> > > +                   int is_write)
> > > +{
> > > +    int err;
> > > +    unsigned perms;
> > > +    PCIDevice *iommu = dev->bus->iommu;
> > > +    target_phys_addr_t paddr, plen;
> > > +
> > > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > > +
> > > +    while (len) {
> > > +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > > +        if (err)
> > > +            return;
> > > +
> > > +        /* The translation might be valid for larger regions. */
> > > +        if (plen > len)
> > > +            plen = len;
> > > +
> > > +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > > +
> > > +        len -= plen;
> > > +        addr += plen;
> > > +        buf += plen;
> > > +    }
> > > +}
> > > +
> > > +static void pci_memory_register_map(PCIDevice *dev,
> > > +                                    pcibus_t addr,
> > > +                                    pcibus_t len,
> > > +                                    target_phys_addr_t paddr,
> > > +                                    PCIInvalidateMapFunc *invalidate,
> > > +                                    void *invalidate_opaque)
> > > +{
> > > +    PCIMemoryMap *map;
> > > +
> > > +    map = qemu_malloc(sizeof(PCIMemoryMap));
> > > +    map->addr               = addr;
> > > +    map->len                = len;
> > > +    map->paddr              = paddr;
> > > +    map->invalidate         = invalidate;
> > > +    map->invalidate_opaque  = invalidate_opaque;
> > > +
> > > +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> > > +}
> > > +
> > > +static void pci_memory_unregister_map(PCIDevice *dev,
> > > +                                      target_phys_addr_t paddr,
> > > +                                      target_phys_addr_t len)
> > > +{
> > > +    PCIMemoryMap *map;
> > > +
> > > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > > +        if (map->paddr == paddr && map->len == len) {
> > > +            QLIST_REMOVE(map, list);
> > > +            free(map);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +void pci_memory_invalidate_range(PCIDevice *dev,
> > > +                                 pcibus_t addr,
> > > +                                 pcibus_t len)
> > > +{
> > > +    PCIMemoryMap *map;
> > > +
> > > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > > +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> > > +            map->invalidate(map->invalidate_opaque);
> > > +            QLIST_REMOVE(map, list);
> > > +            free(map);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +void *pci_memory_map(PCIDevice *dev,
> > > +                     PCIInvalidateMapFunc *cb,
> > > +                     void *opaque,
> > > +                     pcibus_t addr,
> > > +                     target_phys_addr_t *len,
> > > +                     int is_write)
> > > +{
> > > +    int err;
> > > +    unsigned perms;
> > > +    PCIDevice *iommu = dev->bus->iommu;
> > > +    target_phys_addr_t paddr, plen;
> > > +
> > > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > > +
> > > +    plen = *len;
> > > +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > > +    if (err)
> > > +        return NULL;
> > > +
> > > +    /*
> > > +     * If this is true, the virtual region is contiguous,
> > > +     * but the translated physical region isn't. We just
> > > +     * clamp *len, much like cpu_physical_memory_map() does.
> > > +     */
> > > +    if (plen < *len)
> > > +        *len = plen;
> > > +
> > > +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> > > +    if (cb)
> > > +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> > > +
> > > +    return cpu_physical_memory_map(paddr, len, is_write);
> > > +}
> > > +
> > 
> > All the above is really only useful for when there is an iommu,
> > right? So maybe we should shortcut all this if there's no iommu?
> > 
> 
> Some people (e.g. Blue) suggested I shouldn't make the IOMMU emulation a
> compile-time option, like I originally did. And I'm not sure any runtime
> "optimization" (as in likely()/unlikely()) is justified.
> 
> [snip]
> 
> > > diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> > > index e3c93a3..fb134b9 100644
> > > --- a/hw/pci_internals.h
> > > +++ b/hw/pci_internals.h
> > > @@ -33,6 +33,9 @@ struct PCIBus {
> > >         Keep a count of the number of devices with raised IRQs.  */
> > >      int nirq;
> > >      int *irq_count;
> > > +
> > > +    PCIDevice                       *iommu;
> > > +    PCITranslateFunc                *translate;
> > >  };
> > 
> > Why is translate pointer in a bus? I think it's a work of an iommu?
> > 
> 
> Anthony and Paul thought it's best to simply as the parent bus for
> translation. I somewhat agree to that: devices that aren't IOMMU-aware
> simply attempt to do PCI requests to memory and the IOMMU translates
> and checks them transparently.
> 
> > >  struct PCIBridge {
> > > @@ -44,4 +47,13 @@ struct PCIBridge {
> > >      const char *bus_name;
> > >  };
> > >  
> > > +struct PCIMemoryMap {
> > > +    pcibus_t                        addr;
> > > +    pcibus_t                        len;
> > > +    target_phys_addr_t              paddr;
> > > +    PCIInvalidateMapFunc            *invalidate;
> > > +    void                            *invalidate_opaque;
> > 
> > Can we have a structure that encapsulates the mapping
> > data instead of a void *?
> > 
> > 
> 
> Not really. 'invalidate_opaque' belongs to device code. It's meant to be
> a handle to easily identify the mapping. For example, DMA code wants to
> cancel AIO transfers when the bus requests the map to be invalidated.
> It's difficult to look that AIO transfer up using non-opaque data.
> 
> > > +    QLIST_ENTRY(PCIMemoryMap)       list;
> > > +};
> > > +
> > >  #endif /* QEMU_PCI_INTERNALS_H */
> > > diff --git a/qemu-common.h b/qemu-common.h
> > > index d735235..8b060e8 100644
> > > --- a/qemu-common.h
> > > +++ b/qemu-common.h
> > > @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
> > >  typedef struct PCIHostState PCIHostState;
> > >  typedef struct PCIExpressHost PCIExpressHost;
> > >  typedef struct PCIBus PCIBus;
> > > +typedef struct PCIMemoryMap PCIMemoryMap;
> > >  typedef struct PCIDevice PCIDevice;
> > >  typedef struct PCIBridge PCIBridge;
> > >  typedef struct SerialState SerialState;
> > > -- 
> > > 1.7.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  9:49         ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  9:49 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> > > PCI devices should access memory through pci_memory_*() instead of
> > > cpu_physical_memory_*(). This also provides support for translation and
> > > access checking in case an IOMMU is emulated.
> > > 
> > > Memory maps are treated as remote IOTLBs (that is, translation caches
> > > belonging to the IOMMU-aware device itself). Clients (devices) must
> > > provide callbacks for map invalidation in case these maps are
> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> > > 
> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > 
> > 
> > I am concerned about adding more pointer chaising on data path.
> > Could we have
> > 1. an iommu pointer in a device, inherited by secondary buses
> >    when they are created and by devices from buses when they are attached.
> > 2. translation pointer in the iommu instead of the bus
> 
> The first solution I proposed was based on qdev, that is, each
> DeviceState had an 'iommu' field. Translation would be done by
> recursively looking in the parent bus/devs for an IOMMU.
> 
> But Anthony said we're better off with bus-specific APIs, mostly because
> (IIRC) there may be different types of addresses and it might be
> difficult to abstract those properly.

Well we ended up with casting
away types to make pci callbacks fit in ide structure,
and silently assuming that all addresses are in fact 64 bit.
So maybe it's hard to abstract addresses properly, but
it appears we'll have to, to avoid even worse problems.

> I suppose I could revisit the idea by integrating the IOMMU in a
> PCIDevice as opposed to a DeviceState.
> 
> Anthony, Paul, any thoughts on this?

Just to clarify: this is an optimization idea:
instead of a bus walk on each access, do the walk
when device is attached to the bus, and copy the iommu
from the root to the device itself.

This will also make it possible to create
DMADeviceState structure which would have this iommu field,
and we'd use this structure instead of the void pointers all over.


> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
> >    
> > 	if (__builtin_expect(!dev->iommu, 1)
> > 		return cpu_memory_rw
> 
> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
> kernel? If so, it puts the IOMMU-enabled case at disadvantage.

IOMMU has a ton of indirections anyway.

> I suppose most emulated systems would have at least some theoretical
> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
> PCI devices) or for userspace drivers.
> So there are reasons to enable
> the IOMMU even when you don't have a real host IOMMU and you're not
> using nested guests.

The time most people enable iommu for all devices in both real and virtualized
systems appears distant, one of the reasons is because it has a lot of overhead.
Let's start with not adding overhead for existing users, makes sense?


> > > ---
> > >  hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  hw/pci.h           |   74 +++++++++++++++++++++
> > >  hw/pci_internals.h |   12 ++++
> > >  qemu-common.h      |    1 +
> > >  4 files changed, 271 insertions(+), 1 deletions(-)
> > 
> > Almost nothing here is PCI specific.
> > Can this code go into dma.c/dma.h?
> > We would have struct DMADevice, APIs like device_dma_write etc.
> > This would help us get rid of the void * stuff as well?
> > 
> 
> Yeah, I know, that's similar to what I intended to do at first. Though
> I'm not sure that rids us of 'void *' stuff, quite on the contrary from
> what I've seen.
> 
> Some stuff still needs to stay 'void *' (or an equivalent typedef, but
> still an opaque) simply because of the required level of abstraction
> that's needed.
> 
> [snip]
> 
> > > +void pci_register_iommu(PCIDevice *iommu,
> > > +                        PCITranslateFunc *translate)
> > > +{
> > > +    iommu->bus->iommu = iommu;
> > > +    iommu->bus->translate = translate;
> > > +}
> > > +
> > 
> > The above seems broken for secondary buses, right?  Also, can we use
> > qdev for initialization in some way, instead of adding more APIs?  E.g.
> > I think it would be nice if we could just use qdev command line flags to
> > control which bus is behind iommu and which isn't.
> > 
> > 
> 
> Each bus must have its own IOMMU. The secondary bus should ask the
> primary bus instead of going through cpu_physical_memory_*(). If that
> isn't the case, it's broken and the secondary bus must be converted to
> the new API just like regular devices. I'll have a look at that.
> 
> > > +void pci_memory_rw(PCIDevice *dev,
> > > +                   pcibus_t addr,
> > > +                   uint8_t *buf,
> > > +                   pcibus_t len,
> > > +                   int is_write)
> > > +{
> > > +    int err;
> > > +    unsigned perms;
> > > +    PCIDevice *iommu = dev->bus->iommu;
> > > +    target_phys_addr_t paddr, plen;
> > > +
> > > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > > +
> > > +    while (len) {
> > > +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > > +        if (err)
> > > +            return;
> > > +
> > > +        /* The translation might be valid for larger regions. */
> > > +        if (plen > len)
> > > +            plen = len;
> > > +
> > > +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > > +
> > > +        len -= plen;
> > > +        addr += plen;
> > > +        buf += plen;
> > > +    }
> > > +}
> > > +
> > > +static void pci_memory_register_map(PCIDevice *dev,
> > > +                                    pcibus_t addr,
> > > +                                    pcibus_t len,
> > > +                                    target_phys_addr_t paddr,
> > > +                                    PCIInvalidateMapFunc *invalidate,
> > > +                                    void *invalidate_opaque)
> > > +{
> > > +    PCIMemoryMap *map;
> > > +
> > > +    map = qemu_malloc(sizeof(PCIMemoryMap));
> > > +    map->addr               = addr;
> > > +    map->len                = len;
> > > +    map->paddr              = paddr;
> > > +    map->invalidate         = invalidate;
> > > +    map->invalidate_opaque  = invalidate_opaque;
> > > +
> > > +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> > > +}
> > > +
> > > +static void pci_memory_unregister_map(PCIDevice *dev,
> > > +                                      target_phys_addr_t paddr,
> > > +                                      target_phys_addr_t len)
> > > +{
> > > +    PCIMemoryMap *map;
> > > +
> > > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > > +        if (map->paddr == paddr && map->len == len) {
> > > +            QLIST_REMOVE(map, list);
> > > +            free(map);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +void pci_memory_invalidate_range(PCIDevice *dev,
> > > +                                 pcibus_t addr,
> > > +                                 pcibus_t len)
> > > +{
> > > +    PCIMemoryMap *map;
> > > +
> > > +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> > > +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> > > +            map->invalidate(map->invalidate_opaque);
> > > +            QLIST_REMOVE(map, list);
> > > +            free(map);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +void *pci_memory_map(PCIDevice *dev,
> > > +                     PCIInvalidateMapFunc *cb,
> > > +                     void *opaque,
> > > +                     pcibus_t addr,
> > > +                     target_phys_addr_t *len,
> > > +                     int is_write)
> > > +{
> > > +    int err;
> > > +    unsigned perms;
> > > +    PCIDevice *iommu = dev->bus->iommu;
> > > +    target_phys_addr_t paddr, plen;
> > > +
> > > +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> > > +
> > > +    plen = *len;
> > > +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> > > +    if (err)
> > > +        return NULL;
> > > +
> > > +    /*
> > > +     * If this is true, the virtual region is contiguous,
> > > +     * but the translated physical region isn't. We just
> > > +     * clamp *len, much like cpu_physical_memory_map() does.
> > > +     */
> > > +    if (plen < *len)
> > > +        *len = plen;
> > > +
> > > +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> > > +    if (cb)
> > > +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> > > +
> > > +    return cpu_physical_memory_map(paddr, len, is_write);
> > > +}
> > > +
> > 
> > All the above is really only useful for when there is an iommu,
> > right? So maybe we should shortcut all this if there's no iommu?
> > 
> 
> Some people (e.g. Blue) suggested I shouldn't make the IOMMU emulation a
> compile-time option, like I originally did. And I'm not sure any runtime
> "optimization" (as in likely()/unlikely()) is justified.
> 
> [snip]
> 
> > > diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> > > index e3c93a3..fb134b9 100644
> > > --- a/hw/pci_internals.h
> > > +++ b/hw/pci_internals.h
> > > @@ -33,6 +33,9 @@ struct PCIBus {
> > >         Keep a count of the number of devices with raised IRQs.  */
> > >      int nirq;
> > >      int *irq_count;
> > > +
> > > +    PCIDevice                       *iommu;
> > > +    PCITranslateFunc                *translate;
> > >  };
> > 
> > Why is translate pointer in a bus? I think it's a work of an iommu?
> > 
> 
> Anthony and Paul thought it's best to simply as the parent bus for
> translation. I somewhat agree to that: devices that aren't IOMMU-aware
> simply attempt to do PCI requests to memory and the IOMMU translates
> and checks them transparently.
> 
> > >  struct PCIBridge {
> > > @@ -44,4 +47,13 @@ struct PCIBridge {
> > >      const char *bus_name;
> > >  };
> > >  
> > > +struct PCIMemoryMap {
> > > +    pcibus_t                        addr;
> > > +    pcibus_t                        len;
> > > +    target_phys_addr_t              paddr;
> > > +    PCIInvalidateMapFunc            *invalidate;
> > > +    void                            *invalidate_opaque;
> > 
> > Can we have a structure that encapsulates the mapping
> > data instead of a void *?
> > 
> > 
> 
> Not really. 'invalidate_opaque' belongs to device code. It's meant to be
> a handle to easily identify the mapping. For example, DMA code wants to
> cancel AIO transfers when the bus requests the map to be invalidated.
> It's difficult to look that AIO transfer up using non-opaque data.
> 
> > > +    QLIST_ENTRY(PCIMemoryMap)       list;
> > > +};
> > > +
> > >  #endif /* QEMU_PCI_INTERNALS_H */
> > > diff --git a/qemu-common.h b/qemu-common.h
> > > index d735235..8b060e8 100644
> > > --- a/qemu-common.h
> > > +++ b/qemu-common.h
> > > @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
> > >  typedef struct PCIHostState PCIHostState;
> > >  typedef struct PCIExpressHost PCIExpressHost;
> > >  typedef struct PCIBus PCIBus;
> > > +typedef struct PCIMemoryMap PCIMemoryMap;
> > >  typedef struct PCIDevice PCIDevice;
> > >  typedef struct PCIBridge PCIBridge;
> > >  typedef struct SerialState SerialState;
> > > -- 
> > > 1.7.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02  9:12       ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-02  9:58         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  9:58 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> > > Emulated PCI IDE controllers now use the memory access interface. This
> > > also allows an emulated IOMMU to translate and check accesses.
> > > 
> > > Map invalidation results in cancelling DMA transfers. Since the guest OS
> > > can't properly recover the DMA results in case the mapping is changed,
> > > this is a fairly good approximation.
> > > 
> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > > ---
> 
> [snip]
> 
> > > +static inline void bmdma_memory_read(BMDMAState *bm,
> > > +                                     target_phys_addr_t addr,
> > > +                                     uint8_t *buf,
> > > +                                     target_phys_addr_t len)
> > > +{
> > > +    bm->rw(bm->opaque, addr, buf, len, 0);
> > > +}
> > > +
> > > +static inline void bmdma_memory_write(BMDMAState *bm,
> > > +                                      target_phys_addr_t addr,
> > > +                                      uint8_t *buf,
> > > +                                      target_phys_addr_t len)
> > > +{
> > > +    bm->rw(bm->opaque, addr, buf, len, 1);
> > > +}
> > > +
> > 
> > Here again, I am concerned about indirection and pointer chaising on data path.
> > Can we have an iommu pointer in the device, and do a fast path in case
> > there is no iommu?
> > 
> 
> See my other reply.

I don't insist on this solution, but what other way do you propose to
avoid the overhead for everyone not using an iommu?
I'm all for a solution that would help iommu as well,
but one wasn't yet proposed.

> > >  static inline IDEState *idebus_active_if(IDEBus *bus)
> > >  {
> > >      return bus->ifs + bus->unit;
> > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > > index bd1c73e..962ae13 100644
> > > --- a/hw/ide/macio.c
> > > +++ b/hw/ide/macio.c
> > > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> > >  
> > >      s->io_buffer_size = io->len;
> > >  
> > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > >      io->addr += io->len;
> > >      io->len = 0;
> > > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> > >      s->io_buffer_index = 0;
> > >      s->io_buffer_size = io->len;
> > >  
> > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > >      io->addr += io->len;
> > >      io->len = 0;
> > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > > index 4d95cc5..5879044 100644
> > > --- a/hw/ide/pci.c
> > > +++ b/hw/ide/pci.c
> > > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> > >              continue;
> > >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> > >      }
> > > +
> > > +    for (i = 0; i < 2; i++) {
> > > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > > +        d->bmdma[i].map = (void *) pci_memory_map;
> > > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > > +        d->bmdma[i].opaque = dev;
> > > +    }
> > >  }
> > 
> > These casts show something is wrong with the API, IMO.
> > 
> 
> Hm, here's an oversight on my part: I think I should provide explicit
> bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
> uint{32,64}_t depending on the guest machine, so it might be buggy on
> 32-bit wrt calling conventions. But that introduces yet another
> non-inlined function call :-(. That would drop the (void *) cast,
> though.
> 
> 
> 	Eduard

So we get away with it without casts but only because C compiler
will let us silently convert the types, possibly discarding
data in the process. Or we'll add a check that will try and detect
this, but there's no good way to report a DMA error to user.
IOW, if our code only works because target fits in pcibus, what good
is the abstraction and using distinct types?

This is why I think we need a generic DMA APIs using dma addresses.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02  9:58         ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  9:58 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> > > Emulated PCI IDE controllers now use the memory access interface. This
> > > also allows an emulated IOMMU to translate and check accesses.
> > > 
> > > Map invalidation results in cancelling DMA transfers. Since the guest OS
> > > can't properly recover the DMA results in case the mapping is changed,
> > > this is a fairly good approximation.
> > > 
> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > > ---
> 
> [snip]
> 
> > > +static inline void bmdma_memory_read(BMDMAState *bm,
> > > +                                     target_phys_addr_t addr,
> > > +                                     uint8_t *buf,
> > > +                                     target_phys_addr_t len)
> > > +{
> > > +    bm->rw(bm->opaque, addr, buf, len, 0);
> > > +}
> > > +
> > > +static inline void bmdma_memory_write(BMDMAState *bm,
> > > +                                      target_phys_addr_t addr,
> > > +                                      uint8_t *buf,
> > > +                                      target_phys_addr_t len)
> > > +{
> > > +    bm->rw(bm->opaque, addr, buf, len, 1);
> > > +}
> > > +
> > 
> > Here again, I am concerned about indirection and pointer chaising on data path.
> > Can we have an iommu pointer in the device, and do a fast path in case
> > there is no iommu?
> > 
> 
> See my other reply.

I don't insist on this solution, but what other way do you propose to
avoid the overhead for everyone not using an iommu?
I'm all for a solution that would help iommu as well,
but one wasn't yet proposed.

> > >  static inline IDEState *idebus_active_if(IDEBus *bus)
> > >  {
> > >      return bus->ifs + bus->unit;
> > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > > index bd1c73e..962ae13 100644
> > > --- a/hw/ide/macio.c
> > > +++ b/hw/ide/macio.c
> > > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> > >  
> > >      s->io_buffer_size = io->len;
> > >  
> > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > >      io->addr += io->len;
> > >      io->len = 0;
> > > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> > >      s->io_buffer_index = 0;
> > >      s->io_buffer_size = io->len;
> > >  
> > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > >      io->addr += io->len;
> > >      io->len = 0;
> > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > > index 4d95cc5..5879044 100644
> > > --- a/hw/ide/pci.c
> > > +++ b/hw/ide/pci.c
> > > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> > >              continue;
> > >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> > >      }
> > > +
> > > +    for (i = 0; i < 2; i++) {
> > > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > > +        d->bmdma[i].map = (void *) pci_memory_map;
> > > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > > +        d->bmdma[i].opaque = dev;
> > > +    }
> > >  }
> > 
> > These casts show something is wrong with the API, IMO.
> > 
> 
> Hm, here's an oversight on my part: I think I should provide explicit
> bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
> uint{32,64}_t depending on the guest machine, so it might be buggy on
> 32-bit wrt calling conventions. But that introduces yet another
> non-inlined function call :-(. That would drop the (void *) cast,
> though.
> 
> 
> 	Eduard

So we get away with it without casts but only because C compiler
will let us silently convert the types, possibly discarding
data in the process. Or we'll add a check that will try and detect
this, but there's no good way to report a DMA error to user.
IOW, if our code only works because target fits in pcibus, what good
is the abstraction and using distinct types?

This is why I think we need a generic DMA APIs using dma addresses.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  9:08               ` Eduard - Gabriel Munteanu
@ 2010-09-02 13:24                 ` Anthony Liguori
  -1 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2010-09-02 13:24 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: Michael S. Tsirkin, Stefan Weil, kvm, joro, qemu-devel,
	blauwirbel, yamahata, paul, avi

On 09/02/2010 04:08 AM, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
>    
>> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>>      
>>>> +static inline void pci_memory_read(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>>> +}
>>>> +
>>>> +static inline void pci_memory_write(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + const uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>>> +}
>>>> +
>>>> #endif
>>>>          
>>> The functions pci_memory_read and pci_memory_write not only read
>>> or write byte data but many different data types which leads to
>>> a lot of type casts in your other patches.
>>>
>>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>>> Then all those type casts could be removed.
>>>
>>> Regards
>>> Stefan Weil
>>>        
>> Further, I am not sure pcibus_t is a good type to use here.
>> This also forces use of pci specific types in e.g. ide, or resorting to
>> casts as this patch does. We probably should use a more generic type
>> for this.
>>      
> It only forces use of PCI-specific types in the IDE controller, which is
> already a PCI device.
>    

But IDE controllers are not always PCI devices...  This isn't an issue 
with your patch, per-say, but with how we're modelling the IDE 
controller today.  There's no great solution but I think your patch is 
an improvement over what we have today.

I do agree with Stefan though that void * would make a lot more sense.

Regards,

Anthony Liguori

> 	Eduard
>
>    
>> -- 
>> MST
>>      
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02 13:24                 ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2010-09-02 13:24 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, Michael S. Tsirkin, joro, qemu-devel, blauwirbel, paul, avi,
	yamahata

On 09/02/2010 04:08 AM, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
>    
>> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>>      
>>>> +static inline void pci_memory_read(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>>> +}
>>>> +
>>>> +static inline void pci_memory_write(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + const uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>>> +}
>>>> +
>>>> #endif
>>>>          
>>> The functions pci_memory_read and pci_memory_write not only read
>>> or write byte data but many different data types which leads to
>>> a lot of type casts in your other patches.
>>>
>>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>>> Then all those type casts could be removed.
>>>
>>> Regards
>>> Stefan Weil
>>>        
>> Further, I am not sure pcibus_t is a good type to use here.
>> This also forces use of pci specific types in e.g. ide, or resorting to
>> casts as this patch does. We probably should use a more generic type
>> for this.
>>      
> It only forces use of PCI-specific types in the IDE controller, which is
> already a PCI device.
>    

But IDE controllers are not always PCI devices...  This isn't an issue 
with your patch, per-say, but with how we're modelling the IDE 
controller today.  There's no great solution but I think your patch is 
an improvement over what we have today.

I do agree with Stefan though that void * would make a lot more sense.

Regards,

Anthony Liguori

> 	Eduard
>
>    
>> -- 
>> MST
>>      
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02  9:58         ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-02 15:01           ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02 15:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 12:58:13PM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
> > On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> 
> I don't insist on this solution, but what other way do you propose to
> avoid the overhead for everyone not using an iommu?
> I'm all for a solution that would help iommu as well,
> but one wasn't yet proposed.
> 

Hm, we could get even better performance by simply making the IOMMU a
compile-time option. It also avoids problems in case some device hasn't
been converted yet, and involves little to no tradeoffs. What do you
think?

AFAICT, there are few uses for the IOMMU besides development and
avantgarde stuff, as you note. So distributions can continue supplying
prebuilt QEMU/KVM packages compiled with the IOMMU turned off for the
time being. The only practical (commercial) use right now would be in
the case of private virtual servers, which could be divided further into
nested guests (though real IOMMU hardware isn't widespread yet).

Blue Swirl, in the light of this, do you agree on making it a
compile-time option?

> > > >  static inline IDEState *idebus_active_if(IDEBus *bus)
> > > >  {
> > > >      return bus->ifs + bus->unit;
> > > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > > > index bd1c73e..962ae13 100644
> > > > --- a/hw/ide/macio.c
> > > > +++ b/hw/ide/macio.c
> > > > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> > > >  
> > > >      s->io_buffer_size = io->len;
> > > >  
> > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > >      io->addr += io->len;
> > > >      io->len = 0;
> > > > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> > > >      s->io_buffer_index = 0;
> > > >      s->io_buffer_size = io->len;
> > > >  
> > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > >      io->addr += io->len;
> > > >      io->len = 0;
> > > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > > > index 4d95cc5..5879044 100644
> > > > --- a/hw/ide/pci.c
> > > > +++ b/hw/ide/pci.c
> > > > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> > > >              continue;
> > > >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> > > >      }
> > > > +
> > > > +    for (i = 0; i < 2; i++) {
> > > > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > > > +        d->bmdma[i].map = (void *) pci_memory_map;
> > > > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > > > +        d->bmdma[i].opaque = dev;
> > > > +    }
> > > >  }
> > > 
> > > These casts show something is wrong with the API, IMO.
> > > 
> > 
> > Hm, here's an oversight on my part: I think I should provide explicit
> > bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
> > uint{32,64}_t depending on the guest machine, so it might be buggy on
> > 32-bit wrt calling conventions. But that introduces yet another
> > non-inlined function call :-(. That would drop the (void *) cast,
> > though.
> > 
> > 
> > 	Eduard
> 
> So we get away with it without casts but only because C compiler
> will let us silently convert the types, possibly discarding
> data in the process. Or we'll add a check that will try and detect
> this, but there's no good way to report a DMA error to user.
> IOW, if our code only works because target fits in pcibus, what good
> is the abstraction and using distinct types?
> 
> This is why I think we need a generic DMA APIs using dma addresses.

The API was made so that it doesn't report errors. That's because the
PCI bus doesn't provide any possibility of doing so (real devices can't
retry transfers in case an I/O page fault occurs).

In my previous generic IOMMU layer implementation pci_memory_*()
returned non-zero on failure, but I decided to drop it when switching to
a PCI-only (rather a PCI-specific) approach.

In case target_phys_addr_t no longer fits in pcibus_t by a simple
implicit conversion, those explicit bmdma hooks I was going to add will
do the necessary conversions.

The idea of using distinct types is two-fold: let the programmer know
not to rely on them being the same thing, and let the compiler prevent
him from shooting himself in the foot (like I did). Even if there is a
dma_addr_t, some piece of code still needs to provide glue and
conversion between the DMA code and bus-specific code.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02 15:01           ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02 15:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 12:58:13PM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
> > On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> 
> I don't insist on this solution, but what other way do you propose to
> avoid the overhead for everyone not using an iommu?
> I'm all for a solution that would help iommu as well,
> but one wasn't yet proposed.
> 

Hm, we could get even better performance by simply making the IOMMU a
compile-time option. It also avoids problems in case some device hasn't
been converted yet, and involves little to no tradeoffs. What do you
think?

AFAICT, there are few uses for the IOMMU besides development and
avantgarde stuff, as you note. So distributions can continue supplying
prebuilt QEMU/KVM packages compiled with the IOMMU turned off for the
time being. The only practical (commercial) use right now would be in
the case of private virtual servers, which could be divided further into
nested guests (though real IOMMU hardware isn't widespread yet).

Blue Swirl, in the light of this, do you agree on making it a
compile-time option?

> > > >  static inline IDEState *idebus_active_if(IDEBus *bus)
> > > >  {
> > > >      return bus->ifs + bus->unit;
> > > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > > > index bd1c73e..962ae13 100644
> > > > --- a/hw/ide/macio.c
> > > > +++ b/hw/ide/macio.c
> > > > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> > > >  
> > > >      s->io_buffer_size = io->len;
> > > >  
> > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > >      io->addr += io->len;
> > > >      io->len = 0;
> > > > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> > > >      s->io_buffer_index = 0;
> > > >      s->io_buffer_size = io->len;
> > > >  
> > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > >      io->addr += io->len;
> > > >      io->len = 0;
> > > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > > > index 4d95cc5..5879044 100644
> > > > --- a/hw/ide/pci.c
> > > > +++ b/hw/ide/pci.c
> > > > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> > > >              continue;
> > > >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> > > >      }
> > > > +
> > > > +    for (i = 0; i < 2; i++) {
> > > > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > > > +        d->bmdma[i].map = (void *) pci_memory_map;
> > > > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > > > +        d->bmdma[i].opaque = dev;
> > > > +    }
> > > >  }
> > > 
> > > These casts show something is wrong with the API, IMO.
> > > 
> > 
> > Hm, here's an oversight on my part: I think I should provide explicit
> > bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
> > uint{32,64}_t depending on the guest machine, so it might be buggy on
> > 32-bit wrt calling conventions. But that introduces yet another
> > non-inlined function call :-(. That would drop the (void *) cast,
> > though.
> > 
> > 
> > 	Eduard
> 
> So we get away with it without casts but only because C compiler
> will let us silently convert the types, possibly discarding
> data in the process. Or we'll add a check that will try and detect
> this, but there's no good way to report a DMA error to user.
> IOW, if our code only works because target fits in pcibus, what good
> is the abstraction and using distinct types?
> 
> This is why I think we need a generic DMA APIs using dma addresses.

The API was made so that it doesn't report errors. That's because the
PCI bus doesn't provide any possibility of doing so (real devices can't
retry transfers in case an I/O page fault occurs).

In my previous generic IOMMU layer implementation pci_memory_*()
returned non-zero on failure, but I decided to drop it when switching to
a PCI-only (rather a PCI-specific) approach.

In case target_phys_addr_t no longer fits in pcibus_t by a simple
implicit conversion, those explicit bmdma hooks I was going to add will
do the necessary conversions.

The idea of using distinct types is two-fold: let the programmer know
not to rely on them being the same thing, and let the compiler prevent
him from shooting himself in the foot (like I did). Even if there is a
dma_addr_t, some piece of code still needs to provide glue and
conversion between the DMA code and bus-specific code.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02 15:01           ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-02 15:24             ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-09-02 15:24 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: Michael S. Tsirkin, joro, blauwirbel, paul, anthony, av1474,
	yamahata, kvm, qemu-devel

  On 09/02/2010 06:01 PM, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 12:58:13PM +0300, Michael S. Tsirkin wrote:
>> On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
>>> On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
>>>> On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
>> I don't insist on this solution, but what other way do you propose to
>> avoid the overhead for everyone not using an iommu?
>> I'm all for a solution that would help iommu as well,
>> but one wasn't yet proposed.
>>
> Hm, we could get even better performance by simply making the IOMMU a
> compile-time option. It also avoids problems in case some device hasn't
> been converted yet, and involves little to no tradeoffs. What do you
> think?
>
> AFAICT, there are few uses for the IOMMU besides development and
> avantgarde stuff, as you note. So distributions can continue supplying
> prebuilt QEMU/KVM packages compiled with the IOMMU turned off for the
> time being. The only practical (commercial) use right now would be in
> the case of private virtual servers, which could be divided further into
> nested guests (though real IOMMU hardware isn't widespread yet).
>
> Blue Swirl, in the light of this, do you agree on making it a
> compile-time option?

That's not a practical long term solution.  Eventually everything gets 
turned on.

I don't really see a problem with the additional indirection.  By the 
time we reach actual hardware to satisfy the request, we'll have gone 
through many such indirections; modern processors deal very well with them.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02 15:24             ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-09-02 15:24 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, Michael S. Tsirkin, joro, qemu-devel, blauwirbel, yamahata, paul

  On 09/02/2010 06:01 PM, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 12:58:13PM +0300, Michael S. Tsirkin wrote:
>> On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
>>> On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
>>>> On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
>> I don't insist on this solution, but what other way do you propose to
>> avoid the overhead for everyone not using an iommu?
>> I'm all for a solution that would help iommu as well,
>> but one wasn't yet proposed.
>>
> Hm, we could get even better performance by simply making the IOMMU a
> compile-time option. It also avoids problems in case some device hasn't
> been converted yet, and involves little to no tradeoffs. What do you
> think?
>
> AFAICT, there are few uses for the IOMMU besides development and
> avantgarde stuff, as you note. So distributions can continue supplying
> prebuilt QEMU/KVM packages compiled with the IOMMU turned off for the
> time being. The only practical (commercial) use right now would be in
> the case of private virtual servers, which could be divided further into
> nested guests (though real IOMMU hardware isn't widespread yet).
>
> Blue Swirl, in the light of this, do you agree on making it a
> compile-time option?

That's not a practical long term solution.  Eventually everything gets 
turned on.

I don't really see a problem with the additional indirection.  By the 
time we reach actual hardware to satisfy the request, we'll have gone 
through many such indirections; modern processors deal very well with them.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02 15:01           ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-02 15:31             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02 15:31 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 06:01:35PM +0300, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 12:58:13PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
> > > On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> > > > On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> > 
> > I don't insist on this solution, but what other way do you propose to
> > avoid the overhead for everyone not using an iommu?
> > I'm all for a solution that would help iommu as well,
> > but one wasn't yet proposed.
> > 
> 
> Hm, we could get even better performance by simply making the IOMMU a
> compile-time option. It also avoids problems in case some device hasn't
> been converted yet, and involves little to no tradeoffs. What do you
> think?
> 
> AFAICT, there are few uses for the IOMMU besides development and
> avantgarde stuff, as you note. So distributions can continue supplying
> prebuilt QEMU/KVM packages compiled with the IOMMU turned off for the
> time being. The only practical (commercial) use right now would be in
> the case of private virtual servers, which could be divided further into
> nested guests (though real IOMMU hardware isn't widespread yet).
> 
> Blue Swirl, in the light of this, do you agree on making it a
> compile-time option?
> 
> > > > >  static inline IDEState *idebus_active_if(IDEBus *bus)
> > > > >  {
> > > > >      return bus->ifs + bus->unit;
> > > > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > > > > index bd1c73e..962ae13 100644
> > > > > --- a/hw/ide/macio.c
> > > > > +++ b/hw/ide/macio.c
> > > > > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> > > > >  
> > > > >      s->io_buffer_size = io->len;
> > > > >  
> > > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > > >      io->addr += io->len;
> > > > >      io->len = 0;
> > > > > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> > > > >      s->io_buffer_index = 0;
> > > > >      s->io_buffer_size = io->len;
> > > > >  
> > > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > > >      io->addr += io->len;
> > > > >      io->len = 0;
> > > > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > > > > index 4d95cc5..5879044 100644
> > > > > --- a/hw/ide/pci.c
> > > > > +++ b/hw/ide/pci.c
> > > > > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> > > > >              continue;
> > > > >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> > > > >      }
> > > > > +
> > > > > +    for (i = 0; i < 2; i++) {
> > > > > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > > > > +        d->bmdma[i].map = (void *) pci_memory_map;
> > > > > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > > > > +        d->bmdma[i].opaque = dev;
> > > > > +    }
> > > > >  }
> > > > 
> > > > These casts show something is wrong with the API, IMO.
> > > > 
> > > 
> > > Hm, here's an oversight on my part: I think I should provide explicit
> > > bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
> > > uint{32,64}_t depending on the guest machine, so it might be buggy on
> > > 32-bit wrt calling conventions. But that introduces yet another
> > > non-inlined function call :-(. That would drop the (void *) cast,
> > > though.
> > > 
> > > 
> > > 	Eduard
> > 
> > So we get away with it without casts but only because C compiler
> > will let us silently convert the types, possibly discarding
> > data in the process. Or we'll add a check that will try and detect
> > this, but there's no good way to report a DMA error to user.
> > IOW, if our code only works because target fits in pcibus, what good
> > is the abstraction and using distinct types?
> > 
> > This is why I think we need a generic DMA APIs using dma addresses.
> 
> The API was made so that it doesn't report errors. That's because the
> PCI bus doesn't provide any possibility of doing so (real devices can't
> retry transfers in case an I/O page fault occurs).

This is what I am saying. We can't deal with errors.

> In my previous generic IOMMU layer implementation pci_memory_*()
> returned non-zero on failure, but I decided to drop it when switching to
> a PCI-only (rather a PCI-specific) approach.
> 
> In case target_phys_addr_t no longer fits in pcibus_t by a simple
> implicit conversion, those explicit bmdma hooks I was going to add will
> do the necessary conversions.
> 
> The idea of using distinct types is two-fold: let the programmer know
> not to rely on them being the same thing, and let the compiler prevent
> him from shooting himself in the foot (like I did). Even if there is a
> dma_addr_t, some piece of code still needs to provide glue and
> conversion between the DMA code and bus-specific code.
> 
> 
> 	Eduard

Nothing I see here is bus-specific, really. Without an mmu addresses
that make sense are target addresses, with iommu - whatever iommu
supports. So make iommu work with dma_addr_t and do the conversion.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02 15:31             ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02 15:31 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 06:01:35PM +0300, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 12:58:13PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 02, 2010 at 12:12:00PM +0300, Eduard - Gabriel Munteanu wrote:
> > > On Thu, Sep 02, 2010 at 08:19:11AM +0300, Michael S. Tsirkin wrote:
> > > > On Sat, Aug 28, 2010 at 05:54:55PM +0300, Eduard - Gabriel Munteanu wrote:
> > 
> > I don't insist on this solution, but what other way do you propose to
> > avoid the overhead for everyone not using an iommu?
> > I'm all for a solution that would help iommu as well,
> > but one wasn't yet proposed.
> > 
> 
> Hm, we could get even better performance by simply making the IOMMU a
> compile-time option. It also avoids problems in case some device hasn't
> been converted yet, and involves little to no tradeoffs. What do you
> think?
> 
> AFAICT, there are few uses for the IOMMU besides development and
> avantgarde stuff, as you note. So distributions can continue supplying
> prebuilt QEMU/KVM packages compiled with the IOMMU turned off for the
> time being. The only practical (commercial) use right now would be in
> the case of private virtual servers, which could be divided further into
> nested guests (though real IOMMU hardware isn't widespread yet).
> 
> Blue Swirl, in the light of this, do you agree on making it a
> compile-time option?
> 
> > > > >  static inline IDEState *idebus_active_if(IDEBus *bus)
> > > > >  {
> > > > >      return bus->ifs + bus->unit;
> > > > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> > > > > index bd1c73e..962ae13 100644
> > > > > --- a/hw/ide/macio.c
> > > > > +++ b/hw/ide/macio.c
> > > > > @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> > > > >  
> > > > >      s->io_buffer_size = io->len;
> > > > >  
> > > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > > >      io->addr += io->len;
> > > > >      io->len = 0;
> > > > > @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
> > > > >      s->io_buffer_index = 0;
> > > > >      s->io_buffer_size = io->len;
> > > > >  
> > > > > -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> > > > > +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL, NULL, NULL);
> > > > >      qemu_sglist_add(&s->sg, io->addr, io->len);
> > > > >      io->addr += io->len;
> > > > >      io->len = 0;
> > > > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> > > > > index 4d95cc5..5879044 100644
> > > > > --- a/hw/ide/pci.c
> > > > > +++ b/hw/ide/pci.c
> > > > > @@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
> > > > >              continue;
> > > > >          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
> > > > >      }
> > > > > +
> > > > > +    for (i = 0; i < 2; i++) {
> > > > > +        d->bmdma[i].rw = (void *) pci_memory_rw;
> > > > > +        d->bmdma[i].map = (void *) pci_memory_map;
> > > > > +        d->bmdma[i].unmap = (void *) pci_memory_unmap;
> > > > > +        d->bmdma[i].opaque = dev;
> > > > > +    }
> > > > >  }
> > > > 
> > > > These casts show something is wrong with the API, IMO.
> > > > 
> > > 
> > > Hm, here's an oversight on my part: I think I should provide explicit
> > > bmdma hooks, since pcibus_t is a uint64_t and target_phys_addr_t is a
> > > uint{32,64}_t depending on the guest machine, so it might be buggy on
> > > 32-bit wrt calling conventions. But that introduces yet another
> > > non-inlined function call :-(. That would drop the (void *) cast,
> > > though.
> > > 
> > > 
> > > 	Eduard
> > 
> > So we get away with it without casts but only because C compiler
> > will let us silently convert the types, possibly discarding
> > data in the process. Or we'll add a check that will try and detect
> > this, but there's no good way to report a DMA error to user.
> > IOW, if our code only works because target fits in pcibus, what good
> > is the abstraction and using distinct types?
> > 
> > This is why I think we need a generic DMA APIs using dma addresses.
> 
> The API was made so that it doesn't report errors. That's because the
> PCI bus doesn't provide any possibility of doing so (real devices can't
> retry transfers in case an I/O page fault occurs).

This is what I am saying. We can't deal with errors.

> In my previous generic IOMMU layer implementation pci_memory_*()
> returned non-zero on failure, but I decided to drop it when switching to
> a PCI-only (rather a PCI-specific) approach.
> 
> In case target_phys_addr_t no longer fits in pcibus_t by a simple
> implicit conversion, those explicit bmdma hooks I was going to add will
> do the necessary conversions.
> 
> The idea of using distinct types is two-fold: let the programmer know
> not to rely on them being the same thing, and let the compiler prevent
> him from shooting himself in the foot (like I did). Even if there is a
> dma_addr_t, some piece of code still needs to provide glue and
> conversion between the DMA code and bus-specific code.
> 
> 
> 	Eduard

Nothing I see here is bus-specific, really. Without an mmu addresses
that make sense are target addresses, with iommu - whatever iommu
supports. So make iommu work with dma_addr_t and do the conversion.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02 15:24             ` [Qemu-devel] " Avi Kivity
@ 2010-09-02 15:39               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02 15:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Eduard - Gabriel Munteanu, joro, blauwirbel, paul, anthony,
	av1474, yamahata, kvm, qemu-devel

On Thu, Sep 02, 2010 at 06:24:25PM +0300, Avi Kivity wrote:
> That's not a practical long term solution.  Eventually everything
> gets turned on.

That's why I wanted a simple !iommu check and fallback.
This way unless it's really used there's no overhead.

> I don't really see a problem with the additional indirection.  By
> the time we reach actual hardware to satisfy the request,
> we'll have gone through many such indirections; modern processors deal
> very well with them.
> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02 15:39               ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02 15:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu

On Thu, Sep 02, 2010 at 06:24:25PM +0300, Avi Kivity wrote:
> That's not a practical long term solution.  Eventually everything
> gets turned on.

That's why I wanted a simple !iommu check and fallback.
This way unless it's really used there's no overhead.

> I don't really see a problem with the additional indirection.  By
> the time we reach actual hardware to satisfy the request,
> we'll have gone through many such indirections; modern processors deal
> very well with them.
> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  8:51             ` Eduard - Gabriel Munteanu
@ 2010-09-02 16:05               ` Stefan Weil
  -1 siblings, 0 replies; 96+ messages in thread
From: Stefan Weil @ 2010-09-02 16:05 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>    
>> Please see my comments at the end of this mail.
>>
>>
>> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
>>      
>>> PCI devices should access memory through pci_memory_*() instead of
>>> cpu_physical_memory_*(). This also provides support for translation and
>>> access checking in case an IOMMU is emulated.
>>>
>>> Memory maps are treated as remote IOTLBs (that is, translation caches
>>> belonging to the IOMMU-aware device itself). Clients (devices) must
>>> provide callbacks for map invalidation in case these maps are
>>> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>>>
>>> Signed-off-by: Eduard - Gabriel Munteanu<eduard.munteanu@linux360.ro>
>>> ---
>>>        
> [snip]
>
>    
>>> +static inline void pci_memory_read(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>> +}
>>> +
>>> +static inline void pci_memory_write(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + const uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>> +}
>>> +
>>> #endif
>>>        
>> The functions pci_memory_read and pci_memory_write not only read
>> or write byte data but many different data types which leads to
>> a lot of type casts in your other patches.
>>
>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>> Then all those type casts could be removed.
>>
>> Regards
>> Stefan Weil
>>      
> I only followed an approach similar to how cpu_physical_memory_{read,write}()
> is defined. I think I should change both cpu_physical_memory_* stuff and
> pci_memory_* stuff, not only the latter, if I decide to go on that
> approach.
>
>
> 	Eduard
>    


Yes, cpu_physical_memory_read, cpu_physical_memory_write
and cpu_physical_memory_rw should be changed, too.

They also require several type casts today.

But this change can be done in an independent patch.

Stefan


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02 16:05               ` Stefan Weil
  0 siblings, 0 replies; 96+ messages in thread
From: Stefan Weil @ 2010-09-02 16:05 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>    
>> Please see my comments at the end of this mail.
>>
>>
>> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
>>      
>>> PCI devices should access memory through pci_memory_*() instead of
>>> cpu_physical_memory_*(). This also provides support for translation and
>>> access checking in case an IOMMU is emulated.
>>>
>>> Memory maps are treated as remote IOTLBs (that is, translation caches
>>> belonging to the IOMMU-aware device itself). Clients (devices) must
>>> provide callbacks for map invalidation in case these maps are
>>> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>>>
>>> Signed-off-by: Eduard - Gabriel Munteanu<eduard.munteanu@linux360.ro>
>>> ---
>>>        
> [snip]
>
>    
>>> +static inline void pci_memory_read(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>> +}
>>> +
>>> +static inline void pci_memory_write(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + const uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>> +}
>>> +
>>> #endif
>>>        
>> The functions pci_memory_read and pci_memory_write not only read
>> or write byte data but many different data types which leads to
>> a lot of type casts in your other patches.
>>
>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>> Then all those type casts could be removed.
>>
>> Regards
>> Stefan Weil
>>      
> I only followed an approach similar to how cpu_physical_memory_{read,write}()
> is defined. I think I should change both cpu_physical_memory_* stuff and
> pci_memory_* stuff, not only the latter, if I decide to go on that
> approach.
>
>
> 	Eduard
>    


Yes, cpu_physical_memory_read, cpu_physical_memory_write
and cpu_physical_memory_rw should be changed, too.

They also require several type casts today.

But this change can be done in an independent patch.

Stefan

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 4/7] ide: use the PCI memory access interface
  2010-09-02 15:39               ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-02 16:07                 ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-09-02 16:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduard - Gabriel Munteanu, joro, blauwirbel, paul, anthony,
	av1474, yamahata, kvm, qemu-devel

  On 09/02/2010 06:39 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 02, 2010 at 06:24:25PM +0300, Avi Kivity wrote:
>> That's not a practical long term solution.  Eventually everything
>> gets turned on.
> That's why I wanted a simple !iommu check and fallback.
> This way unless it's really used there's no overhead.
>

It's not very different from an indirect function call.  Modern branch 
predictors store the target function address and supply it ahead of time.

I've never seen a function call instruction in a profile.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-09-02 16:07                 ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-09-02 16:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu

  On 09/02/2010 06:39 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 02, 2010 at 06:24:25PM +0300, Avi Kivity wrote:
>> That's not a practical long term solution.  Eventually everything
>> gets turned on.
> That's why I wanted a simple !iommu check and fallback.
> This way unless it's really used there's no overhead.
>

It's not very different from an indirect function call.  Modern branch 
predictors store the target function address and supply it ahead of time.

I've never seen a function call instruction in a profile.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02 16:05               ` Stefan Weil
@ 2010-09-02 16:14                 ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02 16:14 UTC (permalink / raw)
  To: Stefan Weil; +Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 06:05:39PM +0200, Stefan Weil wrote:
> Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:

[snip]

> >> The functions pci_memory_read and pci_memory_write not only read
> >> or write byte data but many different data types which leads to
> >> a lot of type casts in your other patches.
> >>
> >> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> >> Then all those type casts could be removed.
> >>
> >> Regards
> >> Stefan Weil
> >>      
> > I only followed an approach similar to how cpu_physical_memory_{read,write}()
> > is defined. I think I should change both cpu_physical_memory_* stuff and
> > pci_memory_* stuff, not only the latter, if I decide to go on that
> > approach.
> >
> >
> > 	Eduard
> >    
> 
> 
> Yes, cpu_physical_memory_read, cpu_physical_memory_write
> and cpu_physical_memory_rw should be changed, too.
> 
> They also require several type casts today.
> 
> But this change can be done in an independent patch.
> 
> Stefan

Roger, I'm on it. The existing casts could remain there AFAICT, so it's
a pretty simple change.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02 16:14                 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02 16:14 UTC (permalink / raw)
  To: Stefan Weil; +Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 06:05:39PM +0200, Stefan Weil wrote:
> Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:

[snip]

> >> The functions pci_memory_read and pci_memory_write not only read
> >> or write byte data but many different data types which leads to
> >> a lot of type casts in your other patches.
> >>
> >> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> >> Then all those type casts could be removed.
> >>
> >> Regards
> >> Stefan Weil
> >>      
> > I only followed an approach similar to how cpu_physical_memory_{read,write}()
> > is defined. I think I should change both cpu_physical_memory_* stuff and
> > pci_memory_* stuff, not only the latter, if I decide to go on that
> > approach.
> >
> >
> > 	Eduard
> >    
> 
> 
> Yes, cpu_physical_memory_read, cpu_physical_memory_write
> and cpu_physical_memory_rw should be changed, too.
> 
> They also require several type casts today.
> 
> But this change can be done in an independent patch.
> 
> Stefan

Roger, I'm on it. The existing casts could remain there AFAICT, so it's
a pretty simple change.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  9:49         ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-04  9:01           ` Blue Swirl
  -1 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-09-04  9:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduard - Gabriel Munteanu, joro, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
>> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
>> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
>> > > PCI devices should access memory through pci_memory_*() instead of
>> > > cpu_physical_memory_*(). This also provides support for translation and
>> > > access checking in case an IOMMU is emulated.
>> > >
>> > > Memory maps are treated as remote IOTLBs (that is, translation caches
>> > > belonging to the IOMMU-aware device itself). Clients (devices) must
>> > > provide callbacks for map invalidation in case these maps are
>> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
>> > >
>> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
>> >
>> >
>> > I am concerned about adding more pointer chaising on data path.
>> > Could we have
>> > 1. an iommu pointer in a device, inherited by secondary buses
>> >    when they are created and by devices from buses when they are attached.
>> > 2. translation pointer in the iommu instead of the bus
>>
>> The first solution I proposed was based on qdev, that is, each
>> DeviceState had an 'iommu' field. Translation would be done by
>> recursively looking in the parent bus/devs for an IOMMU.
>>
>> But Anthony said we're better off with bus-specific APIs, mostly because
>> (IIRC) there may be different types of addresses and it might be
>> difficult to abstract those properly.
>
> Well we ended up with casting
> away types to make pci callbacks fit in ide structure,
> and silently assuming that all addresses are in fact 64 bit.
> So maybe it's hard to abstract addresses properly, but
> it appears we'll have to, to avoid even worse problems.
>
>> I suppose I could revisit the idea by integrating the IOMMU in a
>> PCIDevice as opposed to a DeviceState.
>>
>> Anthony, Paul, any thoughts on this?
>
> Just to clarify: this is an optimization idea:
> instead of a bus walk on each access, do the walk
> when device is attached to the bus, and copy the iommu
> from the root to the device itself.
>
> This will also make it possible to create
> DMADeviceState structure which would have this iommu field,
> and we'd use this structure instead of the void pointers all over.
>
>
>> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
>> >
>> >     if (__builtin_expect(!dev->iommu, 1)
>> >             return cpu_memory_rw
>>
>> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
>> kernel? If so, it puts the IOMMU-enabled case at disadvantage.
>
> IOMMU has a ton of indirections anyway.
>
>> I suppose most emulated systems would have at least some theoretical
>> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
>> PCI devices) or for userspace drivers.
>> So there are reasons to enable
>> the IOMMU even when you don't have a real host IOMMU and you're not
>> using nested guests.
>
> The time most people enable iommu for all devices in both real and virtualized
> systems appears distant, one of the reasons is because it has a lot of overhead.
> Let's start with not adding overhead for existing users, makes sense?

I think the goal architecture (not for IOMMU, but in general) is one
with zero copy DMA. This means we have stage one where the addresses
are translated to host pointers and stage two where the read/write
operation happens. The devices need to be converted.

Now, let's consider the IOMMU in this zero copy architecture. It's one
stage of address translation, for the access operation it will not
matter. We can add translation caching at device level (or even at any
intermediate levels), but that needs a cache invalidation callback
system as discussed earlier. This can be implemented later, we need
the zero copy stuff first.

Given this overall picture, I think eliminating some pointer
dereference overheads in non-zero copy architecture is a very
premature optimization and it may even direct the architecture to
wrong direction.

If the performance degradation at this point is not acceptable, we
could also postpone merging IOMMU until zero copy conversion has
happened, or make IOMMU a compile time option. But it would be nice to
back the decision by performance figures.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-04  9:01           ` Blue Swirl
  0 siblings, 0 replies; 96+ messages in thread
From: Blue Swirl @ 2010-09-04  9:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, joro, qemu-devel, yamahata, avi, Eduard - Gabriel Munteanu, paul

On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
>> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
>> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
>> > > PCI devices should access memory through pci_memory_*() instead of
>> > > cpu_physical_memory_*(). This also provides support for translation and
>> > > access checking in case an IOMMU is emulated.
>> > >
>> > > Memory maps are treated as remote IOTLBs (that is, translation caches
>> > > belonging to the IOMMU-aware device itself). Clients (devices) must
>> > > provide callbacks for map invalidation in case these maps are
>> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
>> > >
>> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
>> >
>> >
>> > I am concerned about adding more pointer chaising on data path.
>> > Could we have
>> > 1. an iommu pointer in a device, inherited by secondary buses
>> >    when they are created and by devices from buses when they are attached.
>> > 2. translation pointer in the iommu instead of the bus
>>
>> The first solution I proposed was based on qdev, that is, each
>> DeviceState had an 'iommu' field. Translation would be done by
>> recursively looking in the parent bus/devs for an IOMMU.
>>
>> But Anthony said we're better off with bus-specific APIs, mostly because
>> (IIRC) there may be different types of addresses and it might be
>> difficult to abstract those properly.
>
> Well we ended up with casting
> away types to make pci callbacks fit in ide structure,
> and silently assuming that all addresses are in fact 64 bit.
> So maybe it's hard to abstract addresses properly, but
> it appears we'll have to, to avoid even worse problems.
>
>> I suppose I could revisit the idea by integrating the IOMMU in a
>> PCIDevice as opposed to a DeviceState.
>>
>> Anthony, Paul, any thoughts on this?
>
> Just to clarify: this is an optimization idea:
> instead of a bus walk on each access, do the walk
> when device is attached to the bus, and copy the iommu
> from the root to the device itself.
>
> This will also make it possible to create
> DMADeviceState structure which would have this iommu field,
> and we'd use this structure instead of the void pointers all over.
>
>
>> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
>> >
>> >     if (__builtin_expect(!dev->iommu, 1)
>> >             return cpu_memory_rw
>>
>> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
>> kernel? If so, it puts the IOMMU-enabled case at disadvantage.
>
> IOMMU has a ton of indirections anyway.
>
>> I suppose most emulated systems would have at least some theoretical
>> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
>> PCI devices) or for userspace drivers.
>> So there are reasons to enable
>> the IOMMU even when you don't have a real host IOMMU and you're not
>> using nested guests.
>
> The time most people enable iommu for all devices in both real and virtualized
> systems appears distant, one of the reasons is because it has a lot of overhead.
> Let's start with not adding overhead for existing users, makes sense?

I think the goal architecture (not for IOMMU, but in general) is one
with zero copy DMA. This means we have stage one where the addresses
are translated to host pointers and stage two where the read/write
operation happens. The devices need to be converted.

Now, let's consider the IOMMU in this zero copy architecture. It's one
stage of address translation, for the access operation it will not
matter. We can add translation caching at device level (or even at any
intermediate levels), but that needs a cache invalidation callback
system as discussed earlier. This can be implemented later, we need
the zero copy stuff first.

Given this overall picture, I think eliminating some pointer
dereference overheads in non-zero copy architecture is a very
premature optimization and it may even direct the architecture to
wrong direction.

If the performance degradation at this point is not acceptable, we
could also postpone merging IOMMU until zero copy conversion has
happened, or make IOMMU a compile time option. But it would be nice to
back the decision by performance figures.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-04  9:01           ` [Qemu-devel] " Blue Swirl
@ 2010-09-05  7:10             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-05  7:10 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Eduard - Gabriel Munteanu, joro, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Sat, Sep 04, 2010 at 09:01:06AM +0000, Blue Swirl wrote:
> On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
> >> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> >> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> >> > > PCI devices should access memory through pci_memory_*() instead of
> >> > > cpu_physical_memory_*(). This also provides support for translation and
> >> > > access checking in case an IOMMU is emulated.
> >> > >
> >> > > Memory maps are treated as remote IOTLBs (that is, translation caches
> >> > > belonging to the IOMMU-aware device itself). Clients (devices) must
> >> > > provide callbacks for map invalidation in case these maps are
> >> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >> > >
> >> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> >> >
> >> >
> >> > I am concerned about adding more pointer chaising on data path.
> >> > Could we have
> >> > 1. an iommu pointer in a device, inherited by secondary buses
> >> >    when they are created and by devices from buses when they are attached.
> >> > 2. translation pointer in the iommu instead of the bus
> >>
> >> The first solution I proposed was based on qdev, that is, each
> >> DeviceState had an 'iommu' field. Translation would be done by
> >> recursively looking in the parent bus/devs for an IOMMU.
> >>
> >> But Anthony said we're better off with bus-specific APIs, mostly because
> >> (IIRC) there may be different types of addresses and it might be
> >> difficult to abstract those properly.
> >
> > Well we ended up with casting
> > away types to make pci callbacks fit in ide structure,
> > and silently assuming that all addresses are in fact 64 bit.
> > So maybe it's hard to abstract addresses properly, but
> > it appears we'll have to, to avoid even worse problems.
> >
> >> I suppose I could revisit the idea by integrating the IOMMU in a
> >> PCIDevice as opposed to a DeviceState.
> >>
> >> Anthony, Paul, any thoughts on this?
> >
> > Just to clarify: this is an optimization idea:
> > instead of a bus walk on each access, do the walk
> > when device is attached to the bus, and copy the iommu
> > from the root to the device itself.
> >
> > This will also make it possible to create
> > DMADeviceState structure which would have this iommu field,
> > and we'd use this structure instead of the void pointers all over.
> >
> >
> >> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
> >> >
> >> >     if (__builtin_expect(!dev->iommu, 1)
> >> >             return cpu_memory_rw
> >>
> >> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
> >> kernel? If so, it puts the IOMMU-enabled case at disadvantage.
> >
> > IOMMU has a ton of indirections anyway.
> >
> >> I suppose most emulated systems would have at least some theoretical
> >> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
> >> PCI devices) or for userspace drivers.
> >> So there are reasons to enable
> >> the IOMMU even when you don't have a real host IOMMU and you're not
> >> using nested guests.
> >
> > The time most people enable iommu for all devices in both real and virtualized
> > systems appears distant, one of the reasons is because it has a lot of overhead.
> > Let's start with not adding overhead for existing users, makes sense?
> 
> I think the goal architecture (not for IOMMU, but in general) is one
> with zero copy DMA. This means we have stage one where the addresses
> are translated to host pointers and stage two where the read/write
> operation happens. The devices need to be converted.
> 
> Now, let's consider the IOMMU in this zero copy architecture. It's one
> stage of address translation, for the access operation it will not
> matter. We can add translation caching at device level (or even at any
> intermediate levels), but that needs a cache invalidation callback
> system as discussed earlier. This can be implemented later, we need
> the zero copy stuff first.
> 
> Given this overall picture, I think eliminating some pointer
> dereference overheads in non-zero copy architecture is a very
> premature optimization and it may even direct the architecture to
> wrong direction.
> 
> If the performance degradation at this point is not acceptable, we
> could also postpone merging IOMMU until zero copy conversion has
> happened, or make IOMMU a compile time option. But it would be nice to
> back the decision by performance figures.

I agree, a minimal benchmark showing no performance impact
when disabled would put these concerns to rest.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-05  7:10             ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-05  7:10 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, joro, qemu-devel, yamahata, avi, Eduard - Gabriel Munteanu, paul

On Sat, Sep 04, 2010 at 09:01:06AM +0000, Blue Swirl wrote:
> On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
> >> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> >> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> >> > > PCI devices should access memory through pci_memory_*() instead of
> >> > > cpu_physical_memory_*(). This also provides support for translation and
> >> > > access checking in case an IOMMU is emulated.
> >> > >
> >> > > Memory maps are treated as remote IOTLBs (that is, translation caches
> >> > > belonging to the IOMMU-aware device itself). Clients (devices) must
> >> > > provide callbacks for map invalidation in case these maps are
> >> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >> > >
> >> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> >> >
> >> >
> >> > I am concerned about adding more pointer chaising on data path.
> >> > Could we have
> >> > 1. an iommu pointer in a device, inherited by secondary buses
> >> >    when they are created and by devices from buses when they are attached.
> >> > 2. translation pointer in the iommu instead of the bus
> >>
> >> The first solution I proposed was based on qdev, that is, each
> >> DeviceState had an 'iommu' field. Translation would be done by
> >> recursively looking in the parent bus/devs for an IOMMU.
> >>
> >> But Anthony said we're better off with bus-specific APIs, mostly because
> >> (IIRC) there may be different types of addresses and it might be
> >> difficult to abstract those properly.
> >
> > Well we ended up with casting
> > away types to make pci callbacks fit in ide structure,
> > and silently assuming that all addresses are in fact 64 bit.
> > So maybe it's hard to abstract addresses properly, but
> > it appears we'll have to, to avoid even worse problems.
> >
> >> I suppose I could revisit the idea by integrating the IOMMU in a
> >> PCIDevice as opposed to a DeviceState.
> >>
> >> Anthony, Paul, any thoughts on this?
> >
> > Just to clarify: this is an optimization idea:
> > instead of a bus walk on each access, do the walk
> > when device is attached to the bus, and copy the iommu
> > from the root to the device itself.
> >
> > This will also make it possible to create
> > DMADeviceState structure which would have this iommu field,
> > and we'd use this structure instead of the void pointers all over.
> >
> >
> >> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
> >> >
> >> >     if (__builtin_expect(!dev->iommu, 1)
> >> >             return cpu_memory_rw
> >>
> >> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
> >> kernel? If so, it puts the IOMMU-enabled case at disadvantage.
> >
> > IOMMU has a ton of indirections anyway.
> >
> >> I suppose most emulated systems would have at least some theoretical
> >> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
> >> PCI devices) or for userspace drivers.
> >> So there are reasons to enable
> >> the IOMMU even when you don't have a real host IOMMU and you're not
> >> using nested guests.
> >
> > The time most people enable iommu for all devices in both real and virtualized
> > systems appears distant, one of the reasons is because it has a lot of overhead.
> > Let's start with not adding overhead for existing users, makes sense?
> 
> I think the goal architecture (not for IOMMU, but in general) is one
> with zero copy DMA. This means we have stage one where the addresses
> are translated to host pointers and stage two where the read/write
> operation happens. The devices need to be converted.
> 
> Now, let's consider the IOMMU in this zero copy architecture. It's one
> stage of address translation, for the access operation it will not
> matter. We can add translation caching at device level (or even at any
> intermediate levels), but that needs a cache invalidation callback
> system as discussed earlier. This can be implemented later, we need
> the zero copy stuff first.
> 
> Given this overall picture, I think eliminating some pointer
> dereference overheads in non-zero copy architecture is a very
> premature optimization and it may even direct the architecture to
> wrong direction.
> 
> If the performance degradation at this point is not acceptable, we
> could also postpone merging IOMMU until zero copy conversion has
> happened, or make IOMMU a compile time option. But it would be nice to
> back the decision by performance figures.

I agree, a minimal benchmark showing no performance impact
when disabled would put these concerns to rest.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
  2010-08-28 14:54 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-13 20:01   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-13 20:01 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

So I think the following will give the idea of what an API
might look like that will let us avoid the scary hacks in
e.g. the ide layer and other generic layers that need to do DMA,
without either binding us to pci, adding more complexity with
callbacks, or losing type safety with casts and void*.

Basically we have DMADevice that we can use container_of on
to get a PCIDevice from, and DMAMmu that will get instanciated
in a specific MMU.

This is not complete code - just a header - I might complete
this later if/when there's interest or hopefully someone interested
in iommu emulation will.

Notes:
the IOMMU_PERM_RW code seem unused, so I replaced
this with plain is_write. Is it ever useful?

It seems that invalidate callback should be able to
get away with just a device, so I switched to that
from a void pointer for type safety.
Seems enough for the users I saw.

I saw devices do stl_le_phys and such, these
might need to be wrapped as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/hw/dma_rw.h b/hw/dma_rw.h
new file mode 100644
index 0000000..d63fd17
--- /dev/null
+++ b/hw/dma_rw.h
@@ -0,0 +1,122 @@
+#ifndef DMA_RW_H
+#define DMA_RW_H
+
+#include "qemu-common.h"
+
+/* We currently only have pci mmus, but using
+   a generic type makes it possible to use this
+   e.g. from the generic ide code without callbacks. */
+typedef uint64_t dma_addr_t;
+
+typedef struct DMAMmu DMAMmu;
+typedef struct DMADevice DMADevice;
+
+typedef int DMATranslateFunc(DMAMmu *mmu,
+                             DMADevice *dev,
+                             dma_addr_t addr,
+                             dma_addr_t *paddr,
+                             dma_addr_t *len,
+                             int is_write);
+
+typedef int DMAInvalidateMapFunc(DMADevice *);
+struct DMAMmu {
+	/* invalidate, etc. */
+	DmaTranslateFunc *translate;
+};
+
+struct DMADevice {
+	DMAMmu *mmu;
+	DMAInvalidateMapFunc *invalidate;
+};
+
+void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
+
+static inline void dma_memory_rw(DMADevice *dev,
+				 dma_addr_t addr,
+				 void *buf,
+				 uint32_t len,
+				 int is_write)
+{
+    uint32_t plen;
+    /* Fast-path non-iommu.
+     * More importantly, makes it obvious what this function does. */
+    if (!dev->mmu) {
+    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
+    	return;
+    }
+    while (len) {
+        err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write);
+        if (err) {
+            return;
+        }
+                                      
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+    
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+void *dma_memory_map(DMADevice *dev,
+                            dma_addr_t addr,
+                            uint32_t *len,
+                            int is_write);
+void dma_memory_unmap(DMADevice *dev,
+		      void *buffer,
+		      uint32_t len,
+		      int is_write,
+		      uint32_t access_len);
+
+
++#define DEFINE_DMA_LD(suffix, size)                                       \
++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
++{                                                                         \
++    int err;                                                              \
++    target_phys_addr_t paddr, plen;                                       \
++    if (!dev->mmu) {                                                      \
++        return ld##suffix##_phys(addr, val);                              \
++    }                                                                     \
++                                                                          \
++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
++                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
++    if (err || (plen < size / 8))                                         \
++        return 0;                                                         \
++                                                                          \
++    return ld##suffix##_phys(paddr);                                      \
++}
++
++#define DEFINE_DMA_ST(suffix, size)                                       \
++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
++{                                                                         \
++    int err;                                                              \
++    target_phys_addr_t paddr, plen;                                       \
++                                                                          \
++    if (!dev->mmu) {                                                      \
++        st##suffix##_phys(addr, val);                                     \
++        return;                                                           \
++    }                                                                     \
++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
++                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
++    if (err || (plen < size / 8))                                         \
++        return;                                                           \
++                                                                          \
++    st##suffix##_phys(paddr, val);                                        \
++}
+
+DEFINE_DMA_LD(ub, 8)
+DEFINE_DMA_LD(uw, 16)
+DEFINE_DMA_LD(l, 32)
+DEFINE_DMA_LD(q, 64)
+
+DEFINE_DMA_ST(b, 8)
+DEFINE_DMA_ST(w, 16)
+DEFINE_DMA_ST(l, 32)
+DEFINE_DMA_ST(q, 64)
+
+#endif
diff --git a/hw/pci.h b/hw/pci.h
index 1c6075e..9737f0e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -5,6 +5,7 @@
 #include "qobject.h"
 
 #include "qdev.h"
+#include "dma_rw.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -119,6 +120,10 @@ enum {
 
 struct PCIDevice {
     DeviceState qdev;
+
+    /* For devices that do DMA. */
+    DMADevice dma;
+
     /* PCI config space */
     uint8_t *config;
 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
@ 2010-09-13 20:01   ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-13 20:01 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

So I think the following will give the idea of what an API
might look like that will let us avoid the scary hacks in
e.g. the ide layer and other generic layers that need to do DMA,
without either binding us to pci, adding more complexity with
callbacks, or losing type safety with casts and void*.

Basically we have DMADevice that we can use container_of on
to get a PCIDevice from, and DMAMmu that will get instanciated
in a specific MMU.

This is not complete code - just a header - I might complete
this later if/when there's interest or hopefully someone interested
in iommu emulation will.

Notes:
the IOMMU_PERM_RW code seem unused, so I replaced
this with plain is_write. Is it ever useful?

It seems that invalidate callback should be able to
get away with just a device, so I switched to that
from a void pointer for type safety.
Seems enough for the users I saw.

I saw devices do stl_le_phys and such, these
might need to be wrapped as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/hw/dma_rw.h b/hw/dma_rw.h
new file mode 100644
index 0000000..d63fd17
--- /dev/null
+++ b/hw/dma_rw.h
@@ -0,0 +1,122 @@
+#ifndef DMA_RW_H
+#define DMA_RW_H
+
+#include "qemu-common.h"
+
+/* We currently only have pci mmus, but using
+   a generic type makes it possible to use this
+   e.g. from the generic ide code without callbacks. */
+typedef uint64_t dma_addr_t;
+
+typedef struct DMAMmu DMAMmu;
+typedef struct DMADevice DMADevice;
+
+typedef int DMATranslateFunc(DMAMmu *mmu,
+                             DMADevice *dev,
+                             dma_addr_t addr,
+                             dma_addr_t *paddr,
+                             dma_addr_t *len,
+                             int is_write);
+
+typedef int DMAInvalidateMapFunc(DMADevice *);
+struct DMAMmu {
+	/* invalidate, etc. */
+	DmaTranslateFunc *translate;
+};
+
+struct DMADevice {
+	DMAMmu *mmu;
+	DMAInvalidateMapFunc *invalidate;
+};
+
+void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
+
+static inline void dma_memory_rw(DMADevice *dev,
+				 dma_addr_t addr,
+				 void *buf,
+				 uint32_t len,
+				 int is_write)
+{
+    uint32_t plen;
+    /* Fast-path non-iommu.
+     * More importantly, makes it obvious what this function does. */
+    if (!dev->mmu) {
+    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
+    	return;
+    }
+    while (len) {
+        err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write);
+        if (err) {
+            return;
+        }
+                                      
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+    
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+void *dma_memory_map(DMADevice *dev,
+                            dma_addr_t addr,
+                            uint32_t *len,
+                            int is_write);
+void dma_memory_unmap(DMADevice *dev,
+		      void *buffer,
+		      uint32_t len,
+		      int is_write,
+		      uint32_t access_len);
+
+
++#define DEFINE_DMA_LD(suffix, size)                                       \
++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
++{                                                                         \
++    int err;                                                              \
++    target_phys_addr_t paddr, plen;                                       \
++    if (!dev->mmu) {                                                      \
++        return ld##suffix##_phys(addr, val);                              \
++    }                                                                     \
++                                                                          \
++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
++                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
++    if (err || (plen < size / 8))                                         \
++        return 0;                                                         \
++                                                                          \
++    return ld##suffix##_phys(paddr);                                      \
++}
++
++#define DEFINE_DMA_ST(suffix, size)                                       \
++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
++{                                                                         \
++    int err;                                                              \
++    target_phys_addr_t paddr, plen;                                       \
++                                                                          \
++    if (!dev->mmu) {                                                      \
++        st##suffix##_phys(addr, val);                                     \
++        return;                                                           \
++    }                                                                     \
++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
++                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
++    if (err || (plen < size / 8))                                         \
++        return;                                                           \
++                                                                          \
++    st##suffix##_phys(paddr, val);                                        \
++}
+
+DEFINE_DMA_LD(ub, 8)
+DEFINE_DMA_LD(uw, 16)
+DEFINE_DMA_LD(l, 32)
+DEFINE_DMA_LD(q, 64)
+
+DEFINE_DMA_ST(b, 8)
+DEFINE_DMA_ST(w, 16)
+DEFINE_DMA_ST(l, 32)
+DEFINE_DMA_ST(q, 64)
+
+#endif
diff --git a/hw/pci.h b/hw/pci.h
index 1c6075e..9737f0e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -5,6 +5,7 @@
 #include "qobject.h"
 
 #include "qdev.h"
+#include "dma_rw.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -119,6 +120,10 @@ enum {
 
 struct PCIDevice {
     DeviceState qdev;
+
+    /* For devices that do DMA. */
+    DMADevice dma;
+
     /* PCI config space */
     uint8_t *config;
 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
  2010-09-13 20:01   ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-13 20:45     ` Anthony Liguori
  -1 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2010-09-13 20:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduard - Gabriel Munteanu, kvm, joro, qemu-devel, blauwirbel,
	yamahata, paul, avi

On 09/13/2010 03:01 PM, Michael S. Tsirkin wrote:
> So I think the following will give the idea of what an API
> might look like that will let us avoid the scary hacks in
> e.g. the ide layer and other generic layers that need to do DMA,
> without either binding us to pci, adding more complexity with
> callbacks, or losing type safety with casts and void*.
>
> Basically we have DMADevice that we can use container_of on
> to get a PCIDevice from, and DMAMmu that will get instanciated
> in a specific MMU.
>
> This is not complete code - just a header - I might complete
> this later if/when there's interest or hopefully someone interested
> in iommu emulation will.
>
> Notes:
> the IOMMU_PERM_RW code seem unused, so I replaced
> this with plain is_write. Is it ever useful?
>
> It seems that invalidate callback should be able to
> get away with just a device, so I switched to that
> from a void pointer for type safety.
> Seems enough for the users I saw.
>
> I saw devices do stl_le_phys and such, these
> might need to be wrapped as well.
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>    

One of the troubles with an interface like this is that I'm not sure a 
generic model universally works.

For instance, I know some PCI busses do transparent byte swapping.  For 
this to work, there has to be a notion of generic memory reads/writes 
vs. reads of a 32-bit, 16-bit, and 8-bit value.

With a generic API, we lose the flexibility to do this type of bus 
interface.

Regards,

Anthony Liguori

> ---
>
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..d63fd17
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,122 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +/* We currently only have pci mmus, but using
> +   a generic type makes it possible to use this
> +   e.g. from the generic ide code without callbacks. */
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +
> +typedef int DMATranslateFunc(DMAMmu *mmu,
> +                             DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef int DMAInvalidateMapFunc(DMADevice *);
> +struct DMAMmu {
> +	/* invalidate, etc. */
> +	DmaTranslateFunc *translate;
> +};
> +
> +struct DMADevice {
> +	DMAMmu *mmu;
> +	DMAInvalidateMapFunc *invalidate;
> +};
> +
> +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +				 dma_addr_t addr,
> +				 void *buf,
> +				 uint32_t len,
> +				 int is_write)
> +{
> +    uint32_t plen;
> +    /* Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does. */
> +    if (!dev->mmu) {
> +    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +    	return;
> +    }
> +    while (len) {
> +        err = dev->mmu->translate(iommu, dev, addr,&paddr,&plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen>  len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                            dma_addr_t addr,
> +                            uint32_t *len,
> +                            int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +		      void *buffer,
> +		      uint32_t len,
> +		      int is_write,
> +		      uint32_t access_len);
> +
> +
> ++#define DEFINE_DMA_LD(suffix, size)                                       \
> ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++    if (!dev->mmu) {                                                      \
> ++        return ld##suffix##_phys(addr, val);                              \
> ++    }                                                                     \
> ++                                                                          \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr,&paddr,&plen, IOMMU_PERM_READ);      \
> ++    if (err || (plen<  size / 8))                                         \
> ++        return 0;                                                         \
> ++                                                                          \
> ++    return ld##suffix##_phys(paddr);                                      \
> ++}
> ++
> ++#define DEFINE_DMA_ST(suffix, size)                                       \
> ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++                                                                          \
> ++    if (!dev->mmu) {                                                      \
> ++        st##suffix##_phys(addr, val);                                     \
> ++        return;                                                           \
> ++    }                                                                     \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr,&paddr,&plen, IOMMU_PERM_WRITE);     \
> ++    if (err || (plen<  size / 8))                                         \
> ++        return;                                                           \
> ++                                                                          \
> ++    st##suffix##_phys(paddr, val);                                        \
> ++}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif
> diff --git a/hw/pci.h b/hw/pci.h
> index 1c6075e..9737f0e 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -5,6 +5,7 @@
>   #include "qobject.h"
>
>   #include "qdev.h"
> +#include "dma_rw.h"
>
>   /* PCI includes legacy ISA access.  */
>   #include "isa.h"
> @@ -119,6 +120,10 @@ enum {
>
>   struct PCIDevice {
>       DeviceState qdev;
> +
> +    /* For devices that do DMA. */
> +    DMADevice dma;
> +
>       /* PCI config space */
>       uint8_t *config;
>
>
>    


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
@ 2010-09-13 20:45     ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2010-09-13 20:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

On 09/13/2010 03:01 PM, Michael S. Tsirkin wrote:
> So I think the following will give the idea of what an API
> might look like that will let us avoid the scary hacks in
> e.g. the ide layer and other generic layers that need to do DMA,
> without either binding us to pci, adding more complexity with
> callbacks, or losing type safety with casts and void*.
>
> Basically we have DMADevice that we can use container_of on
> to get a PCIDevice from, and DMAMmu that will get instanciated
> in a specific MMU.
>
> This is not complete code - just a header - I might complete
> this later if/when there's interest or hopefully someone interested
> in iommu emulation will.
>
> Notes:
> the IOMMU_PERM_RW code seem unused, so I replaced
> this with plain is_write. Is it ever useful?
>
> It seems that invalidate callback should be able to
> get away with just a device, so I switched to that
> from a void pointer for type safety.
> Seems enough for the users I saw.
>
> I saw devices do stl_le_phys and such, these
> might need to be wrapped as well.
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>    

One of the troubles with an interface like this is that I'm not sure a 
generic model universally works.

For instance, I know some PCI busses do transparent byte swapping.  For 
this to work, there has to be a notion of generic memory reads/writes 
vs. reads of a 32-bit, 16-bit, and 8-bit value.

With a generic API, we lose the flexibility to do this type of bus 
interface.

Regards,

Anthony Liguori

> ---
>
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..d63fd17
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,122 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +/* We currently only have pci mmus, but using
> +   a generic type makes it possible to use this
> +   e.g. from the generic ide code without callbacks. */
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +
> +typedef int DMATranslateFunc(DMAMmu *mmu,
> +                             DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef int DMAInvalidateMapFunc(DMADevice *);
> +struct DMAMmu {
> +	/* invalidate, etc. */
> +	DmaTranslateFunc *translate;
> +};
> +
> +struct DMADevice {
> +	DMAMmu *mmu;
> +	DMAInvalidateMapFunc *invalidate;
> +};
> +
> +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +				 dma_addr_t addr,
> +				 void *buf,
> +				 uint32_t len,
> +				 int is_write)
> +{
> +    uint32_t plen;
> +    /* Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does. */
> +    if (!dev->mmu) {
> +    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +    	return;
> +    }
> +    while (len) {
> +        err = dev->mmu->translate(iommu, dev, addr,&paddr,&plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen>  len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                            dma_addr_t addr,
> +                            uint32_t *len,
> +                            int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +		      void *buffer,
> +		      uint32_t len,
> +		      int is_write,
> +		      uint32_t access_len);
> +
> +
> ++#define DEFINE_DMA_LD(suffix, size)                                       \
> ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++    if (!dev->mmu) {                                                      \
> ++        return ld##suffix##_phys(addr, val);                              \
> ++    }                                                                     \
> ++                                                                          \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr,&paddr,&plen, IOMMU_PERM_READ);      \
> ++    if (err || (plen<  size / 8))                                         \
> ++        return 0;                                                         \
> ++                                                                          \
> ++    return ld##suffix##_phys(paddr);                                      \
> ++}
> ++
> ++#define DEFINE_DMA_ST(suffix, size)                                       \
> ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++                                                                          \
> ++    if (!dev->mmu) {                                                      \
> ++        st##suffix##_phys(addr, val);                                     \
> ++        return;                                                           \
> ++    }                                                                     \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr,&paddr,&plen, IOMMU_PERM_WRITE);     \
> ++    if (err || (plen<  size / 8))                                         \
> ++        return;                                                           \
> ++                                                                          \
> ++    st##suffix##_phys(paddr, val);                                        \
> ++}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif
> diff --git a/hw/pci.h b/hw/pci.h
> index 1c6075e..9737f0e 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -5,6 +5,7 @@
>   #include "qobject.h"
>
>   #include "qdev.h"
> +#include "dma_rw.h"
>
>   /* PCI includes legacy ISA access.  */
>   #include "isa.h"
> @@ -119,6 +120,10 @@ enum {
>
>   struct PCIDevice {
>       DeviceState qdev;
> +
> +    /* For devices that do DMA. */
> +    DMADevice dma;
> +
>       /* PCI config space */
>       uint8_t *config;
>
>
>    

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
  2010-09-13 20:01   ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-16  7:06     ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-16  7:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote:
> So I think the following will give the idea of what an API
> might look like that will let us avoid the scary hacks in
> e.g. the ide layer and other generic layers that need to do DMA,
> without either binding us to pci, adding more complexity with
> callbacks, or losing type safety with casts and void*.
> 
> Basically we have DMADevice that we can use container_of on
> to get a PCIDevice from, and DMAMmu that will get instanciated
> in a specific MMU.
> 
> This is not complete code - just a header - I might complete
> this later if/when there's interest or hopefully someone interested
> in iommu emulation will.

Hi,

I personally like this approach better. It also seems to make poisoning
cpu_physical_memory_*() easier if we convert every device to this API.
We could then ban cpu_physical_memory_*(), perhaps by requiring a
#define and #ifdef-ing those declarations.

> Notes:
> the IOMMU_PERM_RW code seem unused, so I replaced
> this with plain is_write. Is it ever useful?

The original idea made provisions for stuff like full R/W memory maps.
In that case cpu_physical_memory_map() would call the translation /
checking function with perms == IOMMU_PERM_RW. That's not there yet so
it can be removed at the moment, especially since it only affects these
helpers.

Also, I'm not sure if there are other sorts of accesses besides reads
and writes we want to check or translate.

> It seems that invalidate callback should be able to
> get away with just a device, so I switched to that
> from a void pointer for type safety.
> Seems enough for the users I saw.

I think this makes matters too complicated. Normally, a single DMADevice
should be embedded within a <bus>Device, so doing this makes it really
hard to invalidate a specific map when there are more of them. It forces
device code to act as a bus, provide fake 'DMADevice's for each map and
dispatch translation to the real DMATranslateFunc. I see no other way.

If you really want more type-safety (although I think this is a case of
a true opaque identifying something only device code understands), I
have another proposal: have a DMAMap embedded in the opaque. Example
from dma-helpers.c:

typedef struct {
	DMADevice *owner;
	[...]
} DMAMap;

typedef struct {
	[...]
	DMAMap map;
	[...]
} DMAAIOCB;

/* The callback. */
static void dma_bdrv_cancel(DMAMap *map)
{
	DMAAIOCB *dbs = container_of(map, DMAAIOCB, map);

	[...]
}

The upside is we only need to pass the DMAMap. That can also contain
details of the actual map in case the device wants to release only the
relevant range and remap the rest.

> I saw devices do stl_le_phys and such, these
> might need to be wrapped as well.

stl_le_phys() is defined and used only by hw/eepro100.c. That's already
dealt with by converting the device.


	Thanks,
	Eduard

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..d63fd17
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,122 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +/* We currently only have pci mmus, but using
> +   a generic type makes it possible to use this
> +   e.g. from the generic ide code without callbacks. */
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +
> +typedef int DMATranslateFunc(DMAMmu *mmu,
> +                             DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef int DMAInvalidateMapFunc(DMADevice *);
> +struct DMAMmu {
> +	/* invalidate, etc. */
> +	DmaTranslateFunc *translate;
> +};
> +
> +struct DMADevice {
> +	DMAMmu *mmu;
> +	DMAInvalidateMapFunc *invalidate;
> +};
> +
> +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +				 dma_addr_t addr,
> +				 void *buf,
> +				 uint32_t len,
> +				 int is_write)
> +{
> +    uint32_t plen;
> +    /* Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does. */
> +    if (!dev->mmu) {
> +    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +    	return;
> +    }
> +    while (len) {
> +        err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +                                      
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +    
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                            dma_addr_t addr,
> +                            uint32_t *len,
> +                            int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +		      void *buffer,
> +		      uint32_t len,
> +		      int is_write,
> +		      uint32_t access_len);
> +
> +
> ++#define DEFINE_DMA_LD(suffix, size)                                       \
> ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++    if (!dev->mmu) {                                                      \
> ++        return ld##suffix##_phys(addr, val);                              \
> ++    }                                                                     \
> ++                                                                          \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> ++    if (err || (plen < size / 8))                                         \
> ++        return 0;                                                         \
> ++                                                                          \
> ++    return ld##suffix##_phys(paddr);                                      \
> ++}
> ++
> ++#define DEFINE_DMA_ST(suffix, size)                                       \
> ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++                                                                          \
> ++    if (!dev->mmu) {                                                      \
> ++        st##suffix##_phys(addr, val);                                     \
> ++        return;                                                           \
> ++    }                                                                     \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> ++    if (err || (plen < size / 8))                                         \
> ++        return;                                                           \
> ++                                                                          \
> ++    st##suffix##_phys(paddr, val);                                        \
> ++}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif
> diff --git a/hw/pci.h b/hw/pci.h
> index 1c6075e..9737f0e 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -5,6 +5,7 @@
>  #include "qobject.h"
>  
>  #include "qdev.h"
> +#include "dma_rw.h"
>  
>  /* PCI includes legacy ISA access.  */
>  #include "isa.h"
> @@ -119,6 +120,10 @@ enum {
>  
>  struct PCIDevice {
>      DeviceState qdev;
> +
> +    /* For devices that do DMA. */
> +    DMADevice dma;
> +
>      /* PCI config space */
>      uint8_t *config;
>  

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
@ 2010-09-16  7:06     ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-16  7:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote:
> So I think the following will give the idea of what an API
> might look like that will let us avoid the scary hacks in
> e.g. the ide layer and other generic layers that need to do DMA,
> without either binding us to pci, adding more complexity with
> callbacks, or losing type safety with casts and void*.
> 
> Basically we have DMADevice that we can use container_of on
> to get a PCIDevice from, and DMAMmu that will get instanciated
> in a specific MMU.
> 
> This is not complete code - just a header - I might complete
> this later if/when there's interest or hopefully someone interested
> in iommu emulation will.

Hi,

I personally like this approach better. It also seems to make poisoning
cpu_physical_memory_*() easier if we convert every device to this API.
We could then ban cpu_physical_memory_*(), perhaps by requiring a
#define and #ifdef-ing those declarations.

> Notes:
> the IOMMU_PERM_RW code seem unused, so I replaced
> this with plain is_write. Is it ever useful?

The original idea made provisions for stuff like full R/W memory maps.
In that case cpu_physical_memory_map() would call the translation /
checking function with perms == IOMMU_PERM_RW. That's not there yet so
it can be removed at the moment, especially since it only affects these
helpers.

Also, I'm not sure if there are other sorts of accesses besides reads
and writes we want to check or translate.

> It seems that invalidate callback should be able to
> get away with just a device, so I switched to that
> from a void pointer for type safety.
> Seems enough for the users I saw.

I think this makes matters too complicated. Normally, a single DMADevice
should be embedded within a <bus>Device, so doing this makes it really
hard to invalidate a specific map when there are more of them. It forces
device code to act as a bus, provide fake 'DMADevice's for each map and
dispatch translation to the real DMATranslateFunc. I see no other way.

If you really want more type-safety (although I think this is a case of
a true opaque identifying something only device code understands), I
have another proposal: have a DMAMap embedded in the opaque. Example
from dma-helpers.c:

typedef struct {
	DMADevice *owner;
	[...]
} DMAMap;

typedef struct {
	[...]
	DMAMap map;
	[...]
} DMAAIOCB;

/* The callback. */
static void dma_bdrv_cancel(DMAMap *map)
{
	DMAAIOCB *dbs = container_of(map, DMAAIOCB, map);

	[...]
}

The upside is we only need to pass the DMAMap. That can also contain
details of the actual map in case the device wants to release only the
relevant range and remap the rest.

> I saw devices do stl_le_phys and such, these
> might need to be wrapped as well.

stl_le_phys() is defined and used only by hw/eepro100.c. That's already
dealt with by converting the device.


	Thanks,
	Eduard

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..d63fd17
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,122 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +/* We currently only have pci mmus, but using
> +   a generic type makes it possible to use this
> +   e.g. from the generic ide code without callbacks. */
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +
> +typedef int DMATranslateFunc(DMAMmu *mmu,
> +                             DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef int DMAInvalidateMapFunc(DMADevice *);
> +struct DMAMmu {
> +	/* invalidate, etc. */
> +	DmaTranslateFunc *translate;
> +};
> +
> +struct DMADevice {
> +	DMAMmu *mmu;
> +	DMAInvalidateMapFunc *invalidate;
> +};
> +
> +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +				 dma_addr_t addr,
> +				 void *buf,
> +				 uint32_t len,
> +				 int is_write)
> +{
> +    uint32_t plen;
> +    /* Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does. */
> +    if (!dev->mmu) {
> +    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +    	return;
> +    }
> +    while (len) {
> +        err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +                                      
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +    
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                            dma_addr_t addr,
> +                            uint32_t *len,
> +                            int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +		      void *buffer,
> +		      uint32_t len,
> +		      int is_write,
> +		      uint32_t access_len);
> +
> +
> ++#define DEFINE_DMA_LD(suffix, size)                                       \
> ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++    if (!dev->mmu) {                                                      \
> ++        return ld##suffix##_phys(addr, val);                              \
> ++    }                                                                     \
> ++                                                                          \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> ++    if (err || (plen < size / 8))                                         \
> ++        return 0;                                                         \
> ++                                                                          \
> ++    return ld##suffix##_phys(paddr);                                      \
> ++}
> ++
> ++#define DEFINE_DMA_ST(suffix, size)                                       \
> ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> ++{                                                                         \
> ++    int err;                                                              \
> ++    target_phys_addr_t paddr, plen;                                       \
> ++                                                                          \
> ++    if (!dev->mmu) {                                                      \
> ++        st##suffix##_phys(addr, val);                                     \
> ++        return;                                                           \
> ++    }                                                                     \
> ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> ++                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> ++    if (err || (plen < size / 8))                                         \
> ++        return;                                                           \
> ++                                                                          \
> ++    st##suffix##_phys(paddr, val);                                        \
> ++}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif
> diff --git a/hw/pci.h b/hw/pci.h
> index 1c6075e..9737f0e 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -5,6 +5,7 @@
>  #include "qobject.h"
>  
>  #include "qdev.h"
> +#include "dma_rw.h"
>  
>  /* PCI includes legacy ISA access.  */
>  #include "isa.h"
> @@ -119,6 +120,10 @@ enum {
>  
>  struct PCIDevice {
>      DeviceState qdev;
> +
> +    /* For devices that do DMA. */
> +    DMADevice dma;
> +
>      /* PCI config space */
>      uint8_t *config;
>  

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
  2010-09-13 20:45     ` Anthony Liguori
@ 2010-09-16  7:12       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-16  7:12 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael S. Tsirkin, kvm, joro, qemu-devel, blauwirbel, yamahata,
	paul, avi

On Mon, Sep 13, 2010 at 03:45:34PM -0500, Anthony Liguori wrote:
> On 09/13/2010 03:01 PM, Michael S. Tsirkin wrote:
> > So I think the following will give the idea of what an API
> > might look like that will let us avoid the scary hacks in
> > e.g. the ide layer and other generic layers that need to do DMA,
> > without either binding us to pci, adding more complexity with
> > callbacks, or losing type safety with casts and void*.
> >
> > Basically we have DMADevice that we can use container_of on
> > to get a PCIDevice from, and DMAMmu that will get instanciated
> > in a specific MMU.
> >
> > This is not complete code - just a header - I might complete
> > this later if/when there's interest or hopefully someone interested
> > in iommu emulation will.
> >
> > Notes:
> > the IOMMU_PERM_RW code seem unused, so I replaced
> > this with plain is_write. Is it ever useful?
> >
> > It seems that invalidate callback should be able to
> > get away with just a device, so I switched to that
> > from a void pointer for type safety.
> > Seems enough for the users I saw.
> >
> > I saw devices do stl_le_phys and such, these
> > might need to be wrapped as well.
> >
> > Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >    
> 
> One of the troubles with an interface like this is that I'm not sure a 
> generic model universally works.
> 
> For instance, I know some PCI busses do transparent byte swapping.  For 
> this to work, there has to be a notion of generic memory reads/writes 
> vs. reads of a 32-bit, 16-bit, and 8-bit value.
> 
> With a generic API, we lose the flexibility to do this type of bus 
> interface.
> 
> Regards,
> 
> Anthony Liguori
> 

[snip]

I suppose additional callbacks that do the actual R/W could solve this.
If those aren't present, default to cpu_physical_memory_*().

It should be easy for such a callback to decide on a case-by-case basis
depending on the R/W transaction size, if this is ever needed.


	Eduard


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
@ 2010-09-16  7:12       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 96+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-16  7:12 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Michael S. Tsirkin, joro, qemu-devel, blauwirbel, yamahata,
	paul, avi

On Mon, Sep 13, 2010 at 03:45:34PM -0500, Anthony Liguori wrote:
> On 09/13/2010 03:01 PM, Michael S. Tsirkin wrote:
> > So I think the following will give the idea of what an API
> > might look like that will let us avoid the scary hacks in
> > e.g. the ide layer and other generic layers that need to do DMA,
> > without either binding us to pci, adding more complexity with
> > callbacks, or losing type safety with casts and void*.
> >
> > Basically we have DMADevice that we can use container_of on
> > to get a PCIDevice from, and DMAMmu that will get instanciated
> > in a specific MMU.
> >
> > This is not complete code - just a header - I might complete
> > this later if/when there's interest or hopefully someone interested
> > in iommu emulation will.
> >
> > Notes:
> > the IOMMU_PERM_RW code seem unused, so I replaced
> > this with plain is_write. Is it ever useful?
> >
> > It seems that invalidate callback should be able to
> > get away with just a device, so I switched to that
> > from a void pointer for type safety.
> > Seems enough for the users I saw.
> >
> > I saw devices do stl_le_phys and such, these
> > might need to be wrapped as well.
> >
> > Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >    
> 
> One of the troubles with an interface like this is that I'm not sure a 
> generic model universally works.
> 
> For instance, I know some PCI busses do transparent byte swapping.  For 
> this to work, there has to be a notion of generic memory reads/writes 
> vs. reads of a 32-bit, 16-bit, and 8-bit value.
> 
> With a generic API, we lose the flexibility to do this type of bus 
> interface.
> 
> Regards,
> 
> Anthony Liguori
> 

[snip]

I suppose additional callbacks that do the actual R/W could solve this.
If those aren't present, default to cpu_physical_memory_*().

It should be easy for such a callback to decide on a case-by-case basis
depending on the R/W transaction size, if this is ever needed.


	Eduard

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
  2010-09-16  7:06     ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-09-16  9:20       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-16  9:20 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, blauwirbel, paul, avi, anthony, av1474, yamahata, kvm, qemu-devel

On Thu, Sep 16, 2010 at 10:06:16AM +0300, Eduard - Gabriel Munteanu wrote:
> On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote:
> > So I think the following will give the idea of what an API
> > might look like that will let us avoid the scary hacks in
> > e.g. the ide layer and other generic layers that need to do DMA,
> > without either binding us to pci, adding more complexity with
> > callbacks, or losing type safety with casts and void*.
> > 
> > Basically we have DMADevice that we can use container_of on
> > to get a PCIDevice from, and DMAMmu that will get instanciated
> > in a specific MMU.
> > 
> > This is not complete code - just a header - I might complete
> > this later if/when there's interest or hopefully someone interested
> > in iommu emulation will.
> 
> Hi,
> 
> I personally like this approach better. It also seems to make poisoning
> cpu_physical_memory_*() easier if we convert every device to this API.
> We could then ban cpu_physical_memory_*(), perhaps by requiring a
> #define and #ifdef-ing those declarations.
> 
> > Notes:
> > the IOMMU_PERM_RW code seem unused, so I replaced
> > this with plain is_write. Is it ever useful?
> 
> The original idea made provisions for stuff like full R/W memory maps.
> In that case cpu_physical_memory_map() would call the translation /
> checking function with perms == IOMMU_PERM_RW. That's not there yet so
> it can be removed at the moment, especially since it only affects these
> helpers.
> 
> Also, I'm not sure if there are other sorts of accesses besides reads
> and writes we want to check or translate.
> 
> > It seems that invalidate callback should be able to
> > get away with just a device, so I switched to that
> > from a void pointer for type safety.
> > Seems enough for the users I saw.
> 
> I think this makes matters too complicated. Normally, a single DMADevice
> should be embedded within a <bus>Device,

No, DMADevice is a device that does DMA.
So e.g. a PCI device would embed one.
Remember, traslations are per device, right?
DMAMmu is part of the iommu object.

> so doing this makes it really
> hard to invalidate a specific map when there are more of them. It forces
> device code to act as a bus, provide fake 'DMADevice's for each map and
> dispatch translation to the real DMATranslateFunc. I see no other way.
> 
> If you really want more type-safety (although I think this is a case of
> a true opaque identifying something only device code understands), I
> have another proposal: have a DMAMap embedded in the opaque. Example
> from dma-helpers.c:
> 
> typedef struct {
> 	DMADevice *owner;
> 	[...]
> } DMAMap;
> 
> typedef struct {
> 	[...]
> 	DMAMap map;
> 	[...]
> } DMAAIOCB;
> 
> /* The callback. */
> static void dma_bdrv_cancel(DMAMap *map)
> {
> 	DMAAIOCB *dbs = container_of(map, DMAAIOCB, map);
> 
> 	[...]
> }
> 
> The upside is we only need to pass the DMAMap. That can also contain
> details of the actual map in case the device wants to release only the
> relevant range and remap the rest.

Fine.
Or maybe DMAAIOCB (just make some letters lower case: DMAIocb?).
Everyone will use it anyway, right?

> > I saw devices do stl_le_phys and such, these
> > might need to be wrapped as well.
> 
> stl_le_phys() is defined and used only by hw/eepro100.c. That's already
> dealt with by converting the device.
> 

I see. Need to get around to adding some prefix to it to make this clear.

> 	Thanks,
> 	Eduard
> 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ---
> > 
> > diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> > new file mode 100644
> > index 0000000..d63fd17
> > --- /dev/null
> > +++ b/hw/dma_rw.h
> > @@ -0,0 +1,122 @@
> > +#ifndef DMA_RW_H
> > +#define DMA_RW_H
> > +
> > +#include "qemu-common.h"
> > +
> > +/* We currently only have pci mmus, but using
> > +   a generic type makes it possible to use this
> > +   e.g. from the generic ide code without callbacks. */
> > +typedef uint64_t dma_addr_t;
> > +
> > +typedef struct DMAMmu DMAMmu;
> > +typedef struct DMADevice DMADevice;
> > +
> > +typedef int DMATranslateFunc(DMAMmu *mmu,
> > +                             DMADevice *dev,
> > +                             dma_addr_t addr,
> > +                             dma_addr_t *paddr,
> > +                             dma_addr_t *len,
> > +                             int is_write);
> > +
> > +typedef int DMAInvalidateMapFunc(DMADevice *);
> > +struct DMAMmu {
> > +	/* invalidate, etc. */
> > +	DmaTranslateFunc *translate;
> > +};
> > +
> > +struct DMADevice {
> > +	DMAMmu *mmu;
> > +	DMAInvalidateMapFunc *invalidate;
> > +};
> > +
> > +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> > +
> > +static inline void dma_memory_rw(DMADevice *dev,
> > +				 dma_addr_t addr,
> > +				 void *buf,
> > +				 uint32_t len,
> > +				 int is_write)
> > +{
> > +    uint32_t plen;
> > +    /* Fast-path non-iommu.
> > +     * More importantly, makes it obvious what this function does. */
> > +    if (!dev->mmu) {
> > +    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > +    	return;
> > +    }
> > +    while (len) {
> > +        err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write);
> > +        if (err) {
> > +            return;
> > +        }
> > +                                      
> > +        /* The translation might be valid for larger regions. */
> > +        if (plen > len) {
> > +            plen = len;
> > +        }
> > +    
> > +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > +
> > +        len -= plen;
> > +        addr += plen;
> > +        buf += plen;
> > +    }
> > +}
> > +
> > +void *dma_memory_map(DMADevice *dev,
> > +                            dma_addr_t addr,
> > +                            uint32_t *len,
> > +                            int is_write);
> > +void dma_memory_unmap(DMADevice *dev,
> > +		      void *buffer,
> > +		      uint32_t len,
> > +		      int is_write,
> > +		      uint32_t access_len);
> > +
> > +
> > ++#define DEFINE_DMA_LD(suffix, size)                                       \
> > ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> > ++{                                                                         \
> > ++    int err;                                                              \
> > ++    target_phys_addr_t paddr, plen;                                       \
> > ++    if (!dev->mmu) {                                                      \
> > ++        return ld##suffix##_phys(addr, val);                              \
> > ++    }                                                                     \
> > ++                                                                          \
> > ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> > ++                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> > ++    if (err || (plen < size / 8))                                         \
> > ++        return 0;                                                         \
> > ++                                                                          \
> > ++    return ld##suffix##_phys(paddr);                                      \
> > ++}
> > ++
> > ++#define DEFINE_DMA_ST(suffix, size)                                       \
> > ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> > ++{                                                                         \
> > ++    int err;                                                              \
> > ++    target_phys_addr_t paddr, plen;                                       \
> > ++                                                                          \
> > ++    if (!dev->mmu) {                                                      \
> > ++        st##suffix##_phys(addr, val);                                     \
> > ++        return;                                                           \
> > ++    }                                                                     \
> > ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> > ++                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> > ++    if (err || (plen < size / 8))                                         \
> > ++        return;                                                           \
> > ++                                                                          \
> > ++    st##suffix##_phys(paddr, val);                                        \
> > ++}
> > +
> > +DEFINE_DMA_LD(ub, 8)
> > +DEFINE_DMA_LD(uw, 16)
> > +DEFINE_DMA_LD(l, 32)
> > +DEFINE_DMA_LD(q, 64)
> > +
> > +DEFINE_DMA_ST(b, 8)
> > +DEFINE_DMA_ST(w, 16)
> > +DEFINE_DMA_ST(l, 32)
> > +DEFINE_DMA_ST(q, 64)
> > +
> > +#endif
> > diff --git a/hw/pci.h b/hw/pci.h
> > index 1c6075e..9737f0e 100644
> > --- a/hw/pci.h
> > +++ b/hw/pci.h
> > @@ -5,6 +5,7 @@
> >  #include "qobject.h"
> >  
> >  #include "qdev.h"
> > +#include "dma_rw.h"
> >  
> >  /* PCI includes legacy ISA access.  */
> >  #include "isa.h"
> > @@ -119,6 +120,10 @@ enum {
> >  
> >  struct PCIDevice {
> >      DeviceState qdev;
> > +
> > +    /* For devices that do DMA. */
> > +    DMADevice dma;
> > +
> >      /* PCI config space */
> >      uint8_t *config;
> >  

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
@ 2010-09-16  9:20       ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-16  9:20 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 16, 2010 at 10:06:16AM +0300, Eduard - Gabriel Munteanu wrote:
> On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote:
> > So I think the following will give the idea of what an API
> > might look like that will let us avoid the scary hacks in
> > e.g. the ide layer and other generic layers that need to do DMA,
> > without either binding us to pci, adding more complexity with
> > callbacks, or losing type safety with casts and void*.
> > 
> > Basically we have DMADevice that we can use container_of on
> > to get a PCIDevice from, and DMAMmu that will get instanciated
> > in a specific MMU.
> > 
> > This is not complete code - just a header - I might complete
> > this later if/when there's interest or hopefully someone interested
> > in iommu emulation will.
> 
> Hi,
> 
> I personally like this approach better. It also seems to make poisoning
> cpu_physical_memory_*() easier if we convert every device to this API.
> We could then ban cpu_physical_memory_*(), perhaps by requiring a
> #define and #ifdef-ing those declarations.
> 
> > Notes:
> > the IOMMU_PERM_RW code seem unused, so I replaced
> > this with plain is_write. Is it ever useful?
> 
> The original idea made provisions for stuff like full R/W memory maps.
> In that case cpu_physical_memory_map() would call the translation /
> checking function with perms == IOMMU_PERM_RW. That's not there yet so
> it can be removed at the moment, especially since it only affects these
> helpers.
> 
> Also, I'm not sure if there are other sorts of accesses besides reads
> and writes we want to check or translate.
> 
> > It seems that invalidate callback should be able to
> > get away with just a device, so I switched to that
> > from a void pointer for type safety.
> > Seems enough for the users I saw.
> 
> I think this makes matters too complicated. Normally, a single DMADevice
> should be embedded within a <bus>Device,

No, DMADevice is a device that does DMA.
So e.g. a PCI device would embed one.
Remember, traslations are per device, right?
DMAMmu is part of the iommu object.

> so doing this makes it really
> hard to invalidate a specific map when there are more of them. It forces
> device code to act as a bus, provide fake 'DMADevice's for each map and
> dispatch translation to the real DMATranslateFunc. I see no other way.
> 
> If you really want more type-safety (although I think this is a case of
> a true opaque identifying something only device code understands), I
> have another proposal: have a DMAMap embedded in the opaque. Example
> from dma-helpers.c:
> 
> typedef struct {
> 	DMADevice *owner;
> 	[...]
> } DMAMap;
> 
> typedef struct {
> 	[...]
> 	DMAMap map;
> 	[...]
> } DMAAIOCB;
> 
> /* The callback. */
> static void dma_bdrv_cancel(DMAMap *map)
> {
> 	DMAAIOCB *dbs = container_of(map, DMAAIOCB, map);
> 
> 	[...]
> }
> 
> The upside is we only need to pass the DMAMap. That can also contain
> details of the actual map in case the device wants to release only the
> relevant range and remap the rest.

Fine.
Or maybe DMAAIOCB (just make some letters lower case: DMAIocb?).
Everyone will use it anyway, right?

> > I saw devices do stl_le_phys and such, these
> > might need to be wrapped as well.
> 
> stl_le_phys() is defined and used only by hw/eepro100.c. That's already
> dealt with by converting the device.
> 

I see. Need to get around to adding some prefix to it to make this clear.

> 	Thanks,
> 	Eduard
> 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ---
> > 
> > diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> > new file mode 100644
> > index 0000000..d63fd17
> > --- /dev/null
> > +++ b/hw/dma_rw.h
> > @@ -0,0 +1,122 @@
> > +#ifndef DMA_RW_H
> > +#define DMA_RW_H
> > +
> > +#include "qemu-common.h"
> > +
> > +/* We currently only have pci mmus, but using
> > +   a generic type makes it possible to use this
> > +   e.g. from the generic ide code without callbacks. */
> > +typedef uint64_t dma_addr_t;
> > +
> > +typedef struct DMAMmu DMAMmu;
> > +typedef struct DMADevice DMADevice;
> > +
> > +typedef int DMATranslateFunc(DMAMmu *mmu,
> > +                             DMADevice *dev,
> > +                             dma_addr_t addr,
> > +                             dma_addr_t *paddr,
> > +                             dma_addr_t *len,
> > +                             int is_write);
> > +
> > +typedef int DMAInvalidateMapFunc(DMADevice *);
> > +struct DMAMmu {
> > +	/* invalidate, etc. */
> > +	DmaTranslateFunc *translate;
> > +};
> > +
> > +struct DMADevice {
> > +	DMAMmu *mmu;
> > +	DMAInvalidateMapFunc *invalidate;
> > +};
> > +
> > +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> > +
> > +static inline void dma_memory_rw(DMADevice *dev,
> > +				 dma_addr_t addr,
> > +				 void *buf,
> > +				 uint32_t len,
> > +				 int is_write)
> > +{
> > +    uint32_t plen;
> > +    /* Fast-path non-iommu.
> > +     * More importantly, makes it obvious what this function does. */
> > +    if (!dev->mmu) {
> > +    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > +    	return;
> > +    }
> > +    while (len) {
> > +        err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write);
> > +        if (err) {
> > +            return;
> > +        }
> > +                                      
> > +        /* The translation might be valid for larger regions. */
> > +        if (plen > len) {
> > +            plen = len;
> > +        }
> > +    
> > +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> > +
> > +        len -= plen;
> > +        addr += plen;
> > +        buf += plen;
> > +    }
> > +}
> > +
> > +void *dma_memory_map(DMADevice *dev,
> > +                            dma_addr_t addr,
> > +                            uint32_t *len,
> > +                            int is_write);
> > +void dma_memory_unmap(DMADevice *dev,
> > +		      void *buffer,
> > +		      uint32_t len,
> > +		      int is_write,
> > +		      uint32_t access_len);
> > +
> > +
> > ++#define DEFINE_DMA_LD(suffix, size)                                       \
> > ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> > ++{                                                                         \
> > ++    int err;                                                              \
> > ++    target_phys_addr_t paddr, plen;                                       \
> > ++    if (!dev->mmu) {                                                      \
> > ++        return ld##suffix##_phys(addr, val);                              \
> > ++    }                                                                     \
> > ++                                                                          \
> > ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> > ++                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> > ++    if (err || (plen < size / 8))                                         \
> > ++        return 0;                                                         \
> > ++                                                                          \
> > ++    return ld##suffix##_phys(paddr);                                      \
> > ++}
> > ++
> > ++#define DEFINE_DMA_ST(suffix, size)                                       \
> > ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> > ++{                                                                         \
> > ++    int err;                                                              \
> > ++    target_phys_addr_t paddr, plen;                                       \
> > ++                                                                          \
> > ++    if (!dev->mmu) {                                                      \
> > ++        st##suffix##_phys(addr, val);                                     \
> > ++        return;                                                           \
> > ++    }                                                                     \
> > ++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> > ++                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> > ++    if (err || (plen < size / 8))                                         \
> > ++        return;                                                           \
> > ++                                                                          \
> > ++    st##suffix##_phys(paddr, val);                                        \
> > ++}
> > +
> > +DEFINE_DMA_LD(ub, 8)
> > +DEFINE_DMA_LD(uw, 16)
> > +DEFINE_DMA_LD(l, 32)
> > +DEFINE_DMA_LD(q, 64)
> > +
> > +DEFINE_DMA_ST(b, 8)
> > +DEFINE_DMA_ST(w, 16)
> > +DEFINE_DMA_ST(l, 32)
> > +DEFINE_DMA_ST(q, 64)
> > +
> > +#endif
> > diff --git a/hw/pci.h b/hw/pci.h
> > index 1c6075e..9737f0e 100644
> > --- a/hw/pci.h
> > +++ b/hw/pci.h
> > @@ -5,6 +5,7 @@
> >  #include "qobject.h"
> >  
> >  #include "qdev.h"
> > +#include "dma_rw.h"
> >  
> >  /* PCI includes legacy ISA access.  */
> >  #include "isa.h"
> > @@ -119,6 +120,10 @@ enum {
> >  
> >  struct PCIDevice {
> >      DeviceState qdev;
> > +
> > +    /* For devices that do DMA. */
> > +    DMADevice dma;
> > +
> >      /* PCI config space */
> >      uint8_t *config;
> >  

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
  2010-09-13 20:45     ` Anthony Liguori
@ 2010-09-16  9:35       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-16  9:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Eduard - Gabriel Munteanu, kvm, joro, qemu-devel, blauwirbel,
	yamahata, paul, avi

On Mon, Sep 13, 2010 at 03:45:34PM -0500, Anthony Liguori wrote:
> On 09/13/2010 03:01 PM, Michael S. Tsirkin wrote:
> >So I think the following will give the idea of what an API
> >might look like that will let us avoid the scary hacks in
> >e.g. the ide layer and other generic layers that need to do DMA,
> >without either binding us to pci, adding more complexity with
> >callbacks, or losing type safety with casts and void*.
> >
> >Basically we have DMADevice that we can use container_of on
> >to get a PCIDevice from, and DMAMmu that will get instanciated
> >in a specific MMU.
> >
> >This is not complete code - just a header - I might complete
> >this later if/when there's interest or hopefully someone interested
> >in iommu emulation will.
> >
> >Notes:
> >the IOMMU_PERM_RW code seem unused, so I replaced
> >this with plain is_write. Is it ever useful?
> >
> >It seems that invalidate callback should be able to
> >get away with just a device, so I switched to that
> >from a void pointer for type safety.
> >Seems enough for the users I saw.
> >
> >I saw devices do stl_le_phys and such, these
> >might need to be wrapped as well.
> >
> >Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> 
> One of the troubles with an interface like this is that I'm not sure
> a generic model universally works.
> 
> For instance, I know some PCI busses do transparent byte swapping.
> For this to work, there has to be a notion of generic memory
> reads/writes vs. reads of a 32-bit, 16-bit, and 8-bit value.
> 
> With a generic API, we lose the flexibility to do this type of bus
> interface.
> 
> Regards,
> 
> Anthony Liguori

Surely only PCI root can do such tricks.
Anyway, I suspect what you refer to is byte swapping of config cycles
and similar IO done by driver.  If a bus byteswapped a DMA transaction,
this basically breaks DMA as driver will have to go and fix up all data
before passing it up to the OS. Right?

We'd have to add more wrappers to emulate such insanity,
as MMU intentionally only handles translation.


> >---
> >
> >diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> >new file mode 100644
> >index 0000000..d63fd17
> >--- /dev/null
> >+++ b/hw/dma_rw.h
> >@@ -0,0 +1,122 @@
> >+#ifndef DMA_RW_H
> >+#define DMA_RW_H
> >+
> >+#include "qemu-common.h"
> >+
> >+/* We currently only have pci mmus, but using
> >+   a generic type makes it possible to use this
> >+   e.g. from the generic ide code without callbacks. */
> >+typedef uint64_t dma_addr_t;
> >+
> >+typedef struct DMAMmu DMAMmu;
> >+typedef struct DMADevice DMADevice;
> >+
> >+typedef int DMATranslateFunc(DMAMmu *mmu,
> >+                             DMADevice *dev,
> >+                             dma_addr_t addr,
> >+                             dma_addr_t *paddr,
> >+                             dma_addr_t *len,
> >+                             int is_write);
> >+
> >+typedef int DMAInvalidateMapFunc(DMADevice *);
> >+struct DMAMmu {
> >+	/* invalidate, etc. */
> >+	DmaTranslateFunc *translate;
> >+};
> >+
> >+struct DMADevice {
> >+	DMAMmu *mmu;
> >+	DMAInvalidateMapFunc *invalidate;
> >+};
> >+
> >+void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *);
> >+
> >+static inline void dma_memory_rw(DMADevice *dev,
> >+				 dma_addr_t addr,
> >+				 void *buf,
> >+				 uint32_t len,
> >+				 int is_write)
> >+{
> >+    uint32_t plen;
> >+    /* Fast-path non-iommu.
> >+     * More importantly, makes it obvious what this function does. */
> >+    if (!dev->mmu) {
> >+    	cpu_physical_memory_rw(paddr, buf, plen, is_write);
> >+    	return;
> >+    }
> >+    while (len) {
> >+        err = dev->mmu->translate(iommu, dev, addr,&paddr,&plen, is_write);
> >+        if (err) {
> >+            return;
> >+        }
> >+
> >+        /* The translation might be valid for larger regions. */
> >+        if (plen>  len) {
> >+            plen = len;
> >+        }
> >+
> >+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> >+
> >+        len -= plen;
> >+        addr += plen;
> >+        buf += plen;
> >+    }
> >+}
> >+
> >+void *dma_memory_map(DMADevice *dev,
> >+                            dma_addr_t addr,
> >+                            uint32_t *len,
> >+                            int is_write);
> >+void dma_memory_unmap(DMADevice *dev,
> >+		      void *buffer,
> >+		      uint32_t len,
> >+		      int is_write,
> >+		      uint32_t access_len);
> >+
> >+
> >++#define DEFINE_DMA_LD(suffix, size)                                       \
> >++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr)            \
> >++{                                                                         \
> >++    int err;                                                              \
> >++    target_phys_addr_t paddr, plen;                                       \
> >++    if (!dev->mmu) {                                                      \
> >++        return ld##suffix##_phys(addr, val);                              \
> >++    }                                                                     \
> >++                                                                          \
> >++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> >++                              addr,&paddr,&plen, IOMMU_PERM_READ);      \
> >++    if (err || (plen<  size / 8))                                         \
> >++        return 0;                                                         \
> >++                                                                          \
> >++    return ld##suffix##_phys(paddr);                                      \
> >++}
> >++
> >++#define DEFINE_DMA_ST(suffix, size)                                       \
> >++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)  \
> >++{                                                                         \
> >++    int err;                                                              \
> >++    target_phys_addr_t paddr, plen;                                       \
> >++                                                                          \
> >++    if (!dev->mmu) {                                                      \
> >++        st##suffix##_phys(addr, val);                                     \
> >++        return;                                                           \
> >++    }                                                                     \
> >++    err = dev->mmu->translate(dev->bus->iommu, dev,                       \
> >++                              addr,&paddr,&plen, IOMMU_PERM_WRITE);     \
> >++    if (err || (plen<  size / 8))                                         \
> >++        return;                                                           \
> >++                                                                          \
> >++    st##suffix##_phys(paddr, val);                                        \
> >++}
> >+
> >+DEFINE_DMA_LD(ub, 8)
> >+DEFINE_DMA_LD(uw, 16)
> >+DEFINE_DMA_LD(l, 32)
> >+DEFINE_DMA_LD(q, 64)
> >+
> >+DEFINE_DMA_ST(b, 8)
> >+DEFINE_DMA_ST(w, 16)
> >+DEFINE_DMA_ST(l, 32)
> >+DEFINE_DMA_ST(q, 64)
> >+
> >+#endif
> >diff --git a/hw/pci.h b/hw/pci.h
> >index 1c6075e..9737f0e 100644
> >--- a/hw/pci.h
> >+++ b/hw/pci.h
> >@@ -5,6 +5,7 @@
> >  #include "qobject.h"
> >
> >  #include "qdev.h"
> >+#include "dma_rw.h"
> >
> >  /* PCI includes legacy ISA access.  */
> >  #include "isa.h"
> >@@ -119,6 +120,10 @@ enum {
> >
> >  struct PCIDevice {
> >      DeviceState qdev;
> >+
> >+    /* For devices that do DMA. */
> >+    DMADevice dma;
> >+
> >      /* PCI config space */
> >      uint8_t *config;
> >
> >

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4)
@ 2010-09-16  9:35       ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2010-09-16  9:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

On Mon, Sep 13, 2010 at 03:45:34PM -0500, Anthony Liguori wrote:
> On 09/13/2010 03:01 PM, Michael S. Tsirkin wrote:
> >So I think the following will give the idea of what an API
> >might look like that will let us avoid the scary hacks in
> >e.g. the ide layer and other generic layers that need to do DMA,
> >without either binding us to pci, adding more complexity with
> >callbacks, or losing type safety with casts and void*.
> >
> >Basically we have DMADevice that we can use container_of on
> >to get a PCIDevice from, and DMAMmu that will get instanciated
> >in a specific MMU.
> >
> >This is not complete code - just a header - I might complete
> >this later if/when there's interest or hopefully someone interested
> >in iommu emulation will.
> >
> >Notes:
> >the IOMMU_PERM_RW code seem unused, so I replaced
> >this with plain is_write. Is it ever useful?
> >
> >It seems that invalidate callback should be able to
> >get away with just a device, so I switched to that
> >from a void pointer for type safety.
> >Seems enough for the users I saw.
> >
> >I saw devices do stl_le_phys and such, these
> >might need to be wrapped as well.
> >
> >Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> 
> One of the troubles with an interface like this is that I'm not sure
> a generic model universally works.
> 
> For instance, I know some PCI busses do transparent byte swapping.
> For this to work, there has to be a notion of generic memory
> reads/writes vs. reads of a 32-bit, 16-bit, and 8-bit value.
> 
> With a generic API, we lose the flexibility to do this type of bus
> interface.
> 
> Regards,
> 
> Anthony Liguori

Surely only PCI root can do such tricks.
Anyway, I suspect what you refer to is byte swapping of config cycles
and similar IO done by driver.  If a bus byteswapped a DMA transaction,
this basically breaks DMA as driver will have to go and fix up all data
before passing it up to the OS. Right?

We'd have to add more wrappers to emulate such insanity,
as MMU intentionally only handles translation.


> >---
> >
> >diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> >new file mode 100644
> >index 0000000..d63fd17
> >--- /dev/null
> >+++ b/hw/dma_rw.h
> >@@ -0,0 +1,122 @@
> >+#ifndef DMA_RW_H
> >+#define DMA_RW_H
> >+
> >+#include "qemu-common.h"
> >+
> >+/* We currently only have pci mmus, but using
> >+   a generic type makes it possible to use this
> >+   e.g. from the generic ide code without callbacks. */
> >+typedef uint64_t dma_addr_t;
> >+
> >+typedef struct DMAMmu DMAMmu;
> >+typedef struct DMADevice DMADevice;
> >+
> >+typedef int DMATranslateFunc(DMAMmu *mmu,
> >+                             DMADevic