xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* PCI Pass-through in Xen ARM - Draft 2.
@ 2015-06-28 18:38 Manish Jaggi
  2015-06-29 10:31 ` Julien Grall
  2015-06-29 15:34 ` Ian Campbell
  0 siblings, 2 replies; 51+ messages in thread
From: Manish Jaggi @ 2015-06-28 18:38 UTC (permalink / raw)
  To: xen-devel, Ian Campbell, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Kulkarni, Ganapatrao, Prasun Kapoor,
	Julien Grall, Kumar, Vijaya

PCI Pass-through in Xen ARM
--------------------------

Draft 2

Index

1. Background

2. Basic PCI Support in Xen ARM
2.1 pci_hostbridge and pci_hostbridge_ops
2.2 PHYSDEVOP_HOSTBRIDGE_ADD hypercall

3. Dom0 Access PCI devices

4. DomU assignment of PCI device
4.1 Holes in guest memory space
4.2 New entries in xenstore for device BARs
4.3 Hypercall for bdf mapping noification to xen
4.4 Change in Linux PCI FrontEnd - backend driver
  for MSI/X programming

5. NUMA and PCI passthrough

6. DomU pci device attach flow


Revision History
----------------
Changes from Draft 1
a) map_mmio hypercall removed from earlier draft
b) device bar mapping into guest not 1:1
c) holes in guest address space 32bit / 64bit for MMIO virtual BARs
d) xenstore device's BAR info addition.


1. Background of PCI passthrough
--------------------------------
Passthrough refers to assigning a pci device to a guest domain (domU) such that
the guest has full control over the device.The MMIO space and interrupts are
managed by the guest itself, close to how a bare kernel manages a device.

Device's access to guest address space needs to be isolated and protected. SMMU
(System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device
access guest memory for data transfer and sending MSI/X interrupts. In case of
MSI/X  the device writes to GITS (ITS address space) Interrupt Translation
Register.

2. Basic PCI Support for ARM
----------------------------
The apis to read write from pci configuration space are based on segment:bdf.
How the sbdf is mapped to a physical address is under the realm of the pci
host controller.

ARM PCI support in Xen, introduces pci host controller similar to what exists
in Linux. Each drivers registers callbacks, which are invoked on matching the
compatible property in pci device tree node.

2.1:
The init function in the pci host driver calls to register hostbridge callbacks:
int pci_hostbridge_register(pci_hostbridge_t *pcihb);

struct pci_hostbridge_ops {
     u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                 u32 reg, u32 bytes);
     void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                 u32 reg, u32 bytes, u32 val);
};

struct pci_hostbridge{
     u32 segno;
     paddr_t cfg_base;
     paddr_t cfg_size;
     struct dt_device_node *dt_node;
     struct pci_hostbridge_ops ops;
     struct list_head list;
};

A pci conf read function would internally be as follows:
u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
{
     pci_hostbridge_t *pcihb;
     list_for_each_entry(pcihb, &pci_hostbridge_list, list)
     {
         if(pcihb->segno == seg)
             return pcihb->ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
     }
     return -1;
}

2.2 PHYSDEVOP_pci_host_bridge_add hypercall

Xen code accesses PCI configuration space based on the sbdf received from the
guest. The order in which the pci device tree node appear may not be the same
order of device enumeration in dom0. Thus there needs to be a mechanism to bind
the segment number assigned by dom0 to the pci host controller. The hypercall
is introduced:

#define PHYSDEVOP_pci_host_bridge_add    44
struct physdev_pci_host_bridge_add {
     /* IN */
     uint16_t seg;
     uint64_t cfg_base;
     uint64_t cfg_size;
};

This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
hypercall. The handler code invokes to update segment number in pci_hostbridge:

int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size);

Subsequent calls to pci_conf_read/write are completed by the pci_hostbridge_ops
of the respective pci_hostbridge.

3. Dom0 access PCI device
---------------------------------
As per the design of xen hypervisor, dom0 enumerates the PCI devices. For each
device the MMIO space has to be mapped in the Stage2 translation for dom0. For
dom0 xen maps the ranges in pci nodes in stage 2 translation.

GITS_ITRANSLATER space (4k( must be programmed in Stage2 translation so that MSI/X
must work. This is done in vits initialization in dom0/domU.

4. DomU access / assignment PCI device
--------------------------------------
In the flow of pci-attach device, the toolkit will read the pci configuration
space BAR registers. The toolkit has the guest memory map and the information
of the MMIO holes.

When the first pci device is assigned to domU, toolkit allocates a virtual
BAR region from the MMIO hole area. toolkit then sends domctl xc_domain_memory_mapping
to map in stage2 translation.

4.1 Holes in guest memory space
----------------------------
Holes are added in the guest memory space for mapping pci device's BAR regions.
These are defined in arch-arm.h

/* For 32bit */
GUEST_MMIO_HOLE0_BASE, GUEST_MMIO_HOLE0_SIZE
  
/* For 64bit */
GUEST_MMIO_HOLE1_BASE , GUEST_MMIO_HOLE1_SIZE

4.2 New entries in xenstore for device BARs
--------------------------------------------
toolkit also updates the xenstore information for the device (virtualbar:physical bar).
This information is read by xenpciback and returned to the pcifront driver configuration
space accesses.

4.3 Hypercall for bdf mapping notification to xen
-----------------------------------------------
#define PHYSDEVOP_map_sbdf              43
typedef struct {
     u32 s;
     u8 b;
     u8 df;
     u16 res;
} sbdf_t;
struct physdev_map_sbdf {
     int domain_id;
     sbdf_t    sbdf;
     sbdf_t    gsbdf;
};

Each domain has a pdev list, which contains the list of all pci devices. The
pdev structure already has a sbdf information. The arch_pci_dev is updated to
contain the gsbdf information. (gs- guest segment id)

Whenever there is trap from guest or an interrupt has to be injected, the pdev
list is iterated to find the gsbdf.

4.4 Change in Linux PCI ForntEnd - backend driver for MSI/X programming
-------------------------------------------------------------
On the Pci frontend bus a msi-parent as gicv3-its is added. As there is a single
virtual its for a domU, as there is only a single virtual pci bus in domU. This
ensures that the config_msi calls are handled by the gicv3 its driver in domU
kernel and not utilizing frontend-backend communication between dom0-domU.

5. NUMA domU and vITS
-----------------------------
a) On NUMA systems domU still have a single its node.
b) How can xen identify the ITS on which a device is connected.
- Using segment number query using api which gives pci host controllers
device node

struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno)

c) Query the interrupt parent of the pci device node to find out the its.

6. DomU Bootup flow
---------------------
a. DomU boots up without any pci devices assigned. A daemon listens to events
from the xenstore. When a device is attached to domU, the frontend pci bus driver
starts enumerating the devices.Front end driver communicates with backend driver
in dom0 to read the pci config space.

b. backend driver returns the virtual BAR ranges which are already mapped in domU
stage 2 translation.

c. Device driver of the specific pci device invokes methods to configure the
msi/x interrupt which are handled by the its driver in domU kernel. The read/writes
by the its driver are trapped in xen. ITS driver finds out the actual sbdf based
on the map_sbdf hypercall information.

^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: PCI Pass-through in Xen ARM - Draft 2
@ 2015-07-05  6:07 Manish Jaggi
  2015-07-06  9:07 ` Ian Campbell
  0 siblings, 1 reply; 51+ messages in thread
From: Manish Jaggi @ 2015-07-05  6:07 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, Prasun Kapoor, Julien Grall, Kumar, Vijaya

>Ian Campbell Wrote:
>>On Mon, 2015-06-29 at 00:08 +0530, Manish Jaggi wrote:
>> PCI Pass-through in Xen ARM
>> --------------------------
>>
>> Draft 2
>>
>> Index
>>
>> 1. Background
>>
>> 2. Basic PCI Support in Xen ARM
>> 2.1 pci_hostbridge and pci_hostbridge_ops
>> 2.2 PHYSDEVOP_HOSTBRIDGE_ADD hypercall
>>
>> 3. Dom0 Access PCI devices
>>
>> 4. DomU assignment of PCI device
>> 4.1 Holes in guest memory space
>> 4.2 New entries in xenstore for device BARs
>> 4.3 Hypercall for bdf mapping noification to xen
>> 4.4 Change in Linux PCI FrontEnd - backend driver
>>   for MSI/X programming
>>
>> 5. NUMA and PCI passthrough
>>
>> 6. DomU pci device attach flow
>>
>>
>> Revision History
>> ----------------
>> Changes from Draft 1
>> a) map_mmio hypercall removed from earlier draft
>> b) device bar mapping into guest not 1:1
>> c) holes in guest address space 32bit / 64bit for MMIO virtual BARs
>> d) xenstore device's BAR info addition.
>>
>>
>> 1. Background of PCI passthrough
>> --------------------------------
>> Passthrough refers to assigning a pci device to a guest domain (domU) such
>> that
>> the guest has full control over the device.The MMIO space and interrupts are
>> managed by the guest itself, close to how a bare kernel manages a device.
>>
>> Device's access to guest address space needs to be isolated and protected.
>> SMMU
>> (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device
>> access guest memory for data transfer and sending MSI/X interrupts. In case of
>> MSI/X  the device writes to GITS (ITS address space) Interrupt Translation
>> Register.
>>
>> 2. Basic PCI Support for ARM
>> ----------------------------
>> The apis to read write from pci configuration space are based on segment:bdf.
>> How the sbdf is mapped to a physical address is under the realm of the pci
>> host controller.
>>
>> ARM PCI support in Xen, introduces pci host controller similar to what exists
>> in Linux. Each drivers registers callbacks, which are invoked on matching the
>> compatible property in pci device tree node.
>>
>> 2.1:
>> The init function in the pci host driver calls to register hostbridge
>> callbacks:
>> int pci_hostbridge_register(pci_hostbridge_t *pcihb);
>>
>> struct pci_hostbridge_ops {
>>      u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
>>                                  u32 reg, u32 bytes);
>>      void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
>>                                  u32 reg, u32 bytes, u32 val);
>> };
>>
>> struct pci_hostbridge{
>>      u32 segno;
>>      paddr_t cfg_base;
>>      paddr_t cfg_size;
>>      struct dt_device_node *dt_node;
>>      struct pci_hostbridge_ops ops;
>>      struct list_head list;
>> };
>>
>> A pci conf read function would internally be as follows:
>> u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
>> {
>>      pci_hostbridge_t *pcihb;
>>      list_for_each_entry(pcihb, &pci_hostbridge_list, list)
>>      {
>>          if(pcihb->segno == seg)
>>              return pcihb->ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
>>      }
>>      return -1;
>> }
>>
>> 2.2 PHYSDEVOP_pci_host_bridge_add hypercall
>>
>> Xen code accesses PCI configuration space based on the sbdf received from the
>> guest. The order in which the pci device tree node appear may not be the same
>> order of device enumeration in dom0. Thus there needs to be a mechanism to
>> bind
>> the segment number assigned by dom0 to the pci host controller. The hypercall
>> is introduced:
>>
>> #define PHYSDEVOP_pci_host_bridge_add    44
>> struct physdev_pci_host_bridge_add {
>>      /* IN */
>>      uint16_t seg;
>>      uint64_t cfg_base;
>>      uint64_t cfg_size;
>> };
>>
>> This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
>> hypercall. The handler code invokes to update segment number in
>> pci_hostbridge:
>>
>> int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
>> cfg_size);
>>
>> Subsequent calls to pci_conf_read/write are completed by the
>> pci_hostbridge_ops
>> of the respective pci_hostbridge.
>>
>> 3. Dom0 access PCI device
>> ---------------------------------
>> As per the design of xen hypervisor, dom0 enumerates the PCI devices. For each
>> device the MMIO space has to be mapped in the Stage2 translation for dom0.
>
>Here "device" is really host bridge, isn't it? i.e. this is done by
>mapping the entire MMIO window of each host bridge, not the individual
>BAR registers of each device one at a time.

No the device means the PCIe EP device not RC.

>
>IOW this is functionality of the pci host driver's intitial setup, not
>something which is driven from the dom0 enumeration of the bus.
>
>>  For
>> dom0 xen maps the ranges in pci nodes in stage 2 translation.
>>
>> GITS_ITRANSLATER space (4k( must be programmed in Stage2 translation so that
>> MSI/X
>> must work. This is done in vits initialization in dom0/domU.
>
>This also happens at start of day, but what isn't mentioned is that
>(AIUI) the SMMU will need to be programmed to map each SBDF to the dom0
>p2m as the devices are discovered and reported. Right?
>
Yes, I will add SMMU section in the Draft3.
>>
>> 4. DomU access / assignment PCI device
>> --------------------------------------
>> In the flow of pci-attach device, the toolkit
>
>I assume you mean "toolstack" throughout? If so then please run
>s/toolkit/toolstack/g so as to use the usual terminology.
>
yes

>>  will read the pci configuration
>> space BAR registers. The toolkit has the guest memory map and the information
>> of the MMIO holes.
>>
>> When the first pci device is assigned to domU, toolkit allocates a virtual
>> BAR region from the MMIO hole area. toolkit then sends domctl
>> xc_domain_memory_mapping
>> to map in stage2 translation.
>>
>> 4.1 Holes in guest memory space
>> ----------------------------
>> Holes are added in the guest memory space for mapping pci device's BAR
>> regions.
>> These are defined in arch-arm.h
>>
>> /* For 32bit */
>> GUEST_MMIO_HOLE0_BASE, GUEST_MMIO_HOLE0_SIZE
>>
>> /* For 64bit */
>> GUEST_MMIO_HOLE1_BASE , GUEST_MMIO_HOLE1_SIZE
>>
>> 4.2 New entries in xenstore for device BARs
>> --------------------------------------------
>> toolkit also updates the xenstore information for the device
>> (virtualbar:physical bar).
>> This information is read by xenpciback and returned to the pcifront driver
>> configuration
>> space accesses.
>>
>> 4.3 Hypercall for bdf mapping notification to xen
>                   ^v (I think) or maybe vs?
>
>> -----------------------------------------------
>> #define PHYSDEVOP_map_sbdf              43
>> typedef struct {
>>      u32 s;
>>      u8 b;
>>      u8 df;
>>      u16 res;
>> } sbdf_t;
>> struct physdev_map_sbdf {
>>      int domain_id;
>>      sbdf_t    sbdf;
>>      sbdf_t    gsbdf;
>> };
>>
>> Each domain has a pdev list, which contains the list of all pci devices. The
>> pdev structure already has a sbdf information. The arch_pci_dev is updated to
>> -------------------------------------------------------------
>> On the Pci frontend bus a msi-parent as gicv3-its is added.
>
>Are you talking about a device tree property or something else?
>
Device tree property. xl creates a device tree for domU.
It is assumed that the its node be there in domU device treee.

>Note that pcifront is not described in the DT, only in the xenstore
>structure. So a dt property is unlikely to be the right way to describe
>this.
>
>We need to think of some way of specifying this such that we don't tie
>ourselves into a single vits ABI.
>
Please suggest

>>  As there is a single
>> virtual its for a domU, as there is only a single virtual pci bus in domU.
>> This
>> ensures that the config_msi calls are handled by the gicv3 its driver in domU
>> kernel and not utilizing frontend-backend communication between dom0-domU.
>>
>> 5. NUMA domU and vITS
>> -----------------------------
>> a) On NUMA systems domU still have a single its node.
>> b) How can xen identify the ITS on which a device is connected.
>> - Using segment number query using api which gives pci host controllers
>> device node
>>
>> struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno)
>>
>> c) Query the interrupt parent of the pci device node to find out the its.
>>
>> 6. DomU Bootup flow
>> ---------------------
>> a. DomU boots up without any pci devices assigned.
>
>I don't think we can/should rule out cold plug at this stage. IOW it
>must be possible to boot a domU with PCI devices already assigned.
>
As per my understanding the pci front driver receives a notification form xenwatch.
Upon which it starts enumeration.

see: pcifront_backend_changed()

>>  A daemon listens to events
>> from the xenstore. When a device is attached to domU, the frontend pci bus
>> driver
>> starts enumerating the devices.Front end driver communicates with backend
>> driver
>> in dom0 to read the pci config space.
>
>I'm afraid I don't follow any of this. What "daemon"? Is it in the front
>or backend? What does it do with the events it is listening for?
>
xenwatch
>>
>> b. backend driver returns the virtual BAR ranges which are already mapped in
>> domU
>> stage 2 translation.
>>
>> c. Device driver of the specific pci device invokes methods to configure the
>> msi/x interrupt which are handled by the its driver in domU kernel. The
>> read/writes
>> by the its driver are trapped in xen. ITS driver finds out the actual sbdf
>> based
>> on the map_sbdf hypercall information.
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-07-31 15:13 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-28 18:38 PCI Pass-through in Xen ARM - Draft 2 Manish Jaggi
2015-06-29 10:31 ` Julien Grall
2015-06-29 10:50   ` Ian Campbell
2015-06-29 11:00     ` Julien Grall
2015-07-05  5:55   ` Manish Jaggi
2015-07-06  6:13     ` Manish Jaggi
2015-07-06  9:11     ` Ian Campbell
2015-07-06 10:06       ` Manish Jaggi
2015-07-06 10:20         ` Ian Campbell
2015-07-29  9:37           ` Manish Jaggi
2015-07-30  9:54             ` Ian Campbell
2015-07-30 12:51               ` Manish Jaggi
2015-07-30 14:39                 ` Ian Campbell
2015-07-31  7:46                   ` Manish Jaggi
2015-07-31  8:05                     ` Ian Campbell
2015-07-31 10:32                       ` Ian Campbell
2015-07-31 14:24                         ` Konrad Rzeszutek Wilk
2015-07-31 11:07                       ` Manish Jaggi
2015-07-31 11:19                         ` Ian Campbell
2015-07-31 12:50                           ` Manish Jaggi
2015-07-31 12:57                             ` Ian Campbell
2015-07-31 12:59                             ` Julien Grall
2015-07-31 13:27                               ` Ian Campbell
2015-07-31 14:33                               ` Manish Jaggi
2015-07-31 14:56                                 ` Julien Grall
2015-07-31 15:12                                   ` Manish Jaggi
2015-07-31 15:13                                     ` Julien Grall
2015-07-06 10:43     ` Julien Grall
2015-07-06 11:09       ` Manish Jaggi
2015-07-06 11:45         ` Julien Grall
2015-07-07  7:10           ` Manish Jaggi
2015-07-07  8:18             ` Julien Grall
2015-07-07  8:46               ` Manish Jaggi
2015-07-07 10:54                 ` Manish Jaggi
2015-07-07 11:24                 ` Ian Campbell
2015-07-09  7:13                   ` Manish Jaggi
2015-07-09  8:08                     ` Julien Grall
2015-07-09 10:30                       ` Manish Jaggi
2015-07-09 13:57                         ` Julien Grall
2015-07-10  6:07                           ` Pranavkumar Sawargaonkar
2015-07-14 16:37                       ` Stefano Stabellini
2015-07-14 16:46                         ` Stefano Stabellini
2015-07-14 16:58                           ` Julien Grall
2015-07-14 18:01                             ` Stefano Stabellini
2015-07-22  5:41                               ` Manish Jaggi
2015-07-22  8:34                                 ` Julien Grall
2015-07-14 16:47                   ` Stefano Stabellini
2015-07-07 15:27     ` Konrad Rzeszutek Wilk
2015-06-29 15:34 ` Ian Campbell
2015-07-05  6:07 Manish Jaggi
2015-07-06  9:07 ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).