All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] ARM PCI Passthrough design document
@ 2017-05-26 17:14 Julien Grall
  2017-05-29  2:30 ` Manish Jaggi
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Julien Grall @ 2017-05-26 17:14 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, punit.agrawal, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, Julien Grall, vikrams, okaya, Goel,
	Sameer, xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau

Hi all,

The document below is an RFC version of a design proposal for PCI
Passthrough in Xen on ARM. It aims to describe from an high level perspective
the interaction with the different subsystems and how guest will be able
to discover and access PCI.

Currently on ARM, Xen does not have any knowledge about PCI devices. This
means that IOMMU and interrupt controller (such as ITS) requiring specific
configuration will not work with PCI even with DOM0.

The PCI Passthrough work could be divided in 2 phases:
        * Phase 1: Register all PCI devices in Xen => will allow
                   to use ITS and SMMU with PCI in Xen
        * Phase 2: Assign devices to guests

This document aims to describe the 2 phases, but for now only phase
1 is fully described.


I think I was able to gather all of the feedbacks and come up with a solution
that will satisfy all the parties. The design document has changed quite a lot
compare to the early draft sent few months ago. The major changes are:
	* Provide more details how PCI works on ARM and the interactions with
	MSI controller and IOMMU
	* Provide details on the existing host bridge implementations
	* Give more explanation and justifications on the approach chosen 
	* Describing the hypercalls used and how they should be called

Feedbacks are welcomed.

Cheers,

--------------------------------------------------------------------------------

% PCI pass-through support on ARM
% Julien Grall <julien.grall@linaro.org>
% Draft B

# Preface

This document aims to describe the components required to enable the PCI
pass-through on ARM.

This is an early draft and some questions are still unanswered. When this is
the case, the text will contain XXX.

# Introduction

PCI pass-through allows the guest to receive full control of physical PCI
devices. This means the guest will have full and direct access to the PCI
device.

ARM is supporting a kind of guest that exploits as much as possible
virtualization support in hardware. The guest will rely on PV driver only
for IO (e.g block, network) and interrupts will come through the virtualized
interrupt controller, therefore there are no big changes required within the
kernel.

As a consequence, it would be possible to replace PV drivers by assigning real
devices to the guest for I/O access. Xen on ARM would therefore be able to
run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating the
host bridge (there will be more details later). A guest would be able to take
advantage of the firmware tables, obviating the need for a specific driver
for Xen.

Thus, in this document we follow the emulated host bridge approach.

# PCI terminologies

Each PCI device under a host bridge is uniquely identified by its Requester ID
(AKA RID). A Requester ID is a triplet of Bus number, Device number, and
Function.

When the platform has multiple host bridges, the software can add a fourth
number called Segment (sometimes called Domain) to differentiate host bridges.
A PCI device will then uniquely by segment:bus:device:function (AKA SBDF).

So given a specific SBDF, it would be possible to find the host bridge and the
RID associated to a PCI device. The pair (host bridge, RID) will often be used
to find the relevant information for configuring the different subsystems (e.g
IOMMU, MSI controller). For convenience, the rest of the document will use
SBDF to refer to the pair (host bridge, RID).

# PCI host bridge

PCI host bridge enables data transfer between a host processor and PCI bus
based devices. The bridge is used to access the configuration space of each
PCI devices and, on some platform may also act as an MSI controller.

## Initialization of the PCI host bridge

Whilst it would be expected that the bootloader takes care of initializing
the PCI host bridge, on some platforms it is done in the Operating System.

This may include enabling/configuring the clocks that could be shared among
multiple devices.

## Accessing PCI configuration space

Accessing the PCI configuration space can be divided in 2 category:
    * Indirect access, where the configuration spaces are multiplexed. An
    example would be legacy method on x86 (e.g 0xcf8 and 0xcfc). On ARM a
    similar method is used by PCIe RCar root complex (see [12]).
    * ECAM access, each configuration space will have its own address space.

Whilst ECAM is a standard, some PCI host bridges will require specific fiddling
when access the registers (see thunder-ecam [13]).

In most of the cases, accessing all the PCI configuration spaces under a
given PCI host will be done the same way (i.e either indirect access or ECAM
access). However, there are a few cases, dependent on the PCI devices accessed,
which will use different methods (see thunder-pem [14]).

## Generic host bridge

For the purpose of this document, the term "generic host bridge" will be used
to describe any host bridge ECAM-compliant and the initialization, if required,
will be already done by the firmware/bootloader.

# Interaction of the PCI subsystem with other subsystems

In order to have a PCI device fully working, Xen will need to configure
other subsystems such as the IOMMU and the Interrupt Controller.

The interaction expected between the PCI subsystem and the other subsystems is:
    * Add a device
    * Remove a device
    * Assign a device to a guest
    * Deassign a device from a guest

XXX: Detail the interaction when assigning/deassigning device

In the following subsections, the interactions will be briefly described from a
higher level perspective. However, implementation details such as callback,
structure, etc... are beyond the scope of this document.

## IOMMU

The IOMMU will be used to isolate the PCI device when accessing the memory (e.g
DMA and MSI Doorbells). Often the IOMMU will be configured using a MasterID
(aka StreamID for ARM SMMU)  that can be deduced from the SBDF with the help
of the firmware tables (see below).

Whilst in theory, all the memory transactions issued by a PCI device should
go through the IOMMU, on certain platforms some of the memory transaction may
not reach the IOMMU because they are interpreted by the host bridge. For
instance, this could happen if the MSI doorbell is built into the PCI host
bridge or for P2P traffic. See [6] for more details.

XXX: I think this could be solved by using direct mapping (e.g GFN == MFN),
this would mean the guest memory layout would be similar to the host one when
PCI devices will be pass-throughed => Detail it.

## Interrupt controller

PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM,
legacy interrupts will be mapped to SPIs. MSI and MSI-X will write their
payload in a doorbell belonging to a MSI controller.

### Existing MSI controllers

In this section some of the existing controllers and their interaction with
the devices will be briefly described. More details can be found in the
respective specifications of each MSI controller.

MSIs can be distinguished by some combination of
    * the Doorbell
        It is the MMIO address written to. Devices may be configured by
        software to write to arbitrary doorbells which they can address.
        An MSI controller may feature a number of doorbells.
    * the Payload
        Devices may be configured to write an arbitrary payload chosen by
        software. MSI controllers may have restrictions on permitted payload.
        Xen will have to sanitize the payload unless it is known to be always
        safe.
    * Sideband information accompanying the write
        Typically this is neither configurable nor probeable, and depends on
        the path taken through the memory system (i.e it is a property of the
        combination of MSI controller and device rather than a property of
        either in isolation).

### GICv3/GICv4 ITS

The Interrupt Translation Service (ITS) is a MSI controller designed by ARM
and integrated in the GICv3/GICv4 interrupt controller. For the specification
see [GICV3]. Each MSI/MSI-X will be mapped to a new type of interrupt called
LPI. This interrupt will be configured by the software using a pair (DeviceID,
EventID).

A platform may have multiple ITS block (e.g one per NUMA node), each of them
belong to an ITS group.

The DeviceID is a unique identifier with an ITS group for each MSI-capable
device that can be deduced from the RID with the help of the firmware tables
(see below).

The EventID is a unique identifier to distinguish different event sending
by a device.

The MSI payload will only contain the EventID as the DeviceID will be added
afterwards by the hardware in a way that will prevent any tampering.

The [SBSA] appendix I describes the set of rules for the integration of the
ITS that any compliant platform should follow. Some of the rules will explain
the security implication of a misbehaving devices. It ensures that a guest
will never be able to trigger an MSI on behalf of another guest.

XXX: The security implication is described in the [SBSA] but I haven't found
any similar working in the GICv3 specification. It is unclear to me if
non-SBSA compliant platform (e.g embedded) will follow those rules.

### GICv2m

The GICv2m is an extension of the GICv2 to convert MSI/MSI-X writes to unique
interrupts. The specification can be found in the [SBSA] appendix E.

Depending on the platform, the GICv2m will provide one or multiple instance
of register frames. Each frame is composed of a doorbell and associated to
a set of SPIs that can be discovered by reading the register MSI_TYPER.

On an MSI write, the payload will contain the SPI ID to generate. Note that
on some platform the MSI payload may contain an offset form the base SPI
rather than the SPI itself.

The frame will only generate SPI if the written value corresponds to an SPI
allocated to the frame. Each VM should have exclusity to the frame to ensure
isolation and prevent a guest OS to trigger an MSI on-behalf of another guest
OS.

XXX: Linux seems to consider GICv2m as unsafe by default. From my understanding,
it is still unclear how we should proceed on Xen, as GICv2m should be safe
as long as the frame is only accessed by one guest.

### Other MSI controllers

Servers compliant with SBSA level 1 and higher will have to use either ITS
or GICv2m. However, it is by no means the only MSI controllers available.
The hardware vendor may decide to use their custom MSI controller which can be
integrated in the PCI host bridge.

Whether it will be possible to write securely an MSI will depend on the
MSI controller implementations.

XXX: I am happy to give a brief explanation on more MSI controller (such
as Xilinx and Renesas) if people think it is necessary.

This design document does not pertain to a specific MSI controller and will try
to be as agnostic is possible. When possible, it will give insight how to
integrate the MSI controller.

# Information available in the firmware tables

## ACPI

### Host bridges

The static table MCFG (see 4.2 in [1]) will describe the host bridges available
at boot and supporting ECAM. Unfortunately, there are platforms out there
(see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
compatible.

This means that Xen needs to account for possible quirks in the host bridge.
The Linux community are working on a patch series for this, see [2] and [3],
where quirks will be detected with:
    * OEM ID
    * OEM Table ID
    * OEM Revision
    * PCI Segment
    * PCI bus number range (wildcard allowed)

Based on what Linux is currently doing, there are two kind of quirks:
    * Accesses to the configuration space of certain sizes are not allowed
    * A specific driver is necessary for driving the host bridge

The former is straightforward to solve but the latter will require more thought.
Instantiation of a specific driver for the host controller can be easily done
if Xen has the information to detect it. However, those drivers may require
resources described in ASL (see [4] for instance).

The number of platforms requiring specific PCI host bridge driver is currently
limited. Whilst it is not possible to predict the future, it will be expected
upcoming platform to have fully ECAM compliant PCI host bridges. Therefore,
given Xen does not have any ASL parser, the approach suggested is to hardcode
the missing values. This could be revisit in the future if necessary.

### Finding information to configure IOMMU and MSI controller

The static table [IORT] will provide information that will help to deduce
data (such as MasterID and DeviceID) to configure both the IOMMU and the MSI
controller from a given SBDF.

## Finding which NUMA node a PCI device belongs to

On NUMA system, the NUMA node associated to a PCI device can be found using
the _PXM method of the host bridge (?).

XXX: I am not entirely sure where the _PXM will be (i.e host bridge vs PCI
device).

## Device Tree

### Host bridges

Each Device Tree node associated to a host bridge will have at least the
following properties (see bindings in [8]):
    - device_type: will always be "pci".
    - compatible: a string indicating which driver to instanciate

The node may also contain optional properties such as:
    - linux,pci-domain: assign a fix segment number
    - bus-range: indicate the range of bus numbers supported

When the property linux,pci-domain is not present, the operating system would
have to allocate the segment number for each host bridges.

### Finding information to configure IOMMU and MSI controller

### Configuring the IOMMU

The Device Treee provides a generic IOMMU bindings (see [10]) which uses the
properties "iommu-map" and "iommu-map-mask" to described the relationship
between RID and a MasterID.

These properties will be present in the host bridge Device Tree node. From a
given SBDF, it will be possible to find the corresponding MasterID.

Note that the ARM SMMU also have a legacy binding (see [9]), but it does not
have a way to describe the relationship between RID and StreamID. Instead it
assumed that StreamID == RID. This binding has now been deprecated in favor
of the generic IOMMU binding.

### Configuring the MSI controller

The relationship between the RID and data required to configure the MSI
controller (such as DeviceID) can be found using the property "msi-map"
(see [11]).

This property will be present in the host bridge Device Tree node. From a
given SBDF, it will be possible to find the corresponding MasterID.

## Finding which NUMA node a PCI device belongs to

On NUMA system, the NUMA node associated to a PCI device can be found using
the property "numa-node-id" (see [15]) presents in the host bridge Device Tree
node.

# Discovering PCI devices

Whilst PCI devices are currently available in the hardware domain, the
hypervisor does not have any knowledge of them. The first step of supporting
PCI pass-through is to make Xen aware of the PCI devices.

Xen will require access to the PCI configuration space to retrieve information
for the PCI devices or access it on behalf of the guest via the emulated
host bridge.

This means that Xen should be in charge of controlling the host bridge. However,
for some host controller, this may be difficult to implement in Xen because of
depencencies on other components (e.g clocks, see more details in "PCI host
bridge" section).

For this reason, the approach chosen in this document is to let the hardware
domain to discover the host bridges, scan the PCI devices and then report
everything to Xen. This does not rule out the possibility of doing everything
without the help of the hardware domain in the future.

## Who is in charge of the host bridge?

There are numerous implementation of host bridges which exist on ARM. A part of
them requires a specific driver as they cannot be driven by a generic host bridge
driver. Porting those drivers may be complex due to dependencies on other
components.

This would be seen as signal to leave the host bridge drivers in the hardware
domain. Because Xen would need to access the configuration space, all the access
would have to be forwarded to hardware domain which in turn will access the
hardware.

In this design document, we are considering that the host bridge driver can
be ported in Xen. In the case it is not possible, a interface to forward
configuration space access would need to be defined. The interface details
is out of scope.

## Discovering and registering host bridge

The approach taken in the document will require communication between Xen and
the hardware domain. In this case, they would need to agree on the segment
number associated to an host bridge. However, this number is not available in
the Device Tree case.

The hardware domain will register new host bridges using the existing hypercall
PHYSDEV_mmcfg_reserved:

#define XEN_PCI_MMCFG_RESERVED 1

struct physdev_pci_mmcfg_reserved {
    /* IN */
    uint64_t    address;
    uint16_t    segment;
    /* Range of bus supported by the host bridge */
    uint8_t     start_bus;
    uint8_t     end_bus;

    uint32_t    flags;
}

Some of the host bridges may not have a separate configuration address space
region described in the firmware tables. To simplify the registration, the
field 'address' should contains the base address of one of the region
described in the firmware tables.
    * For ACPI, it would be the base address specified in the MCFG or in the
    _CBA method.
    * For Device Tree, this would be any base address of region
    specified in the "reg" property.

The field 'flags' is expected to have XEN_PCI_MMCFG_RESERVED set.

It is expected that this hypercall is called before any PCI devices is
registered to Xen.

When the hardware domain is in charge of the host bridge, this hypercall will
be used to tell Xen the existence of an host bridge in order to find the
associated information for configuring the MSI controller and the IOMMU.

## Discovering and registering PCI devices

The hardware domain will scan the host bridge to find the list of PCI devices
available and then report it to Xen using the existing hypercall
PHYSDEV_pci_device_add:

#define XEN_PCI_DEV_EXTFN   0x1
#define XEN_PCI_DEV_VIRTFN  0x2
#define XEN_PCI_DEV_PXM     0x3

struct physdev_pci_device_add {
    /* IN */
    uint16_t    seg;
    uint8_t     bus;
    uint8_t     devfn;
    uint32_t    flags;
    struct {
        uint8_t bus;
        uint8_t devfn;
    } physfn;
    /*
     * Optional parameters array.
     * First element ([0]) is PXM domain associated with the device (if
     * XEN_PCI_DEV_PXM is set)
     */
    uint32_t optarr[0];
}

When XEN_PCI_DEV_PXM is set in the field 'flag', optarr[0] will contain the
NUMA node ID associated with the device:
    * For ACPI, it would be the value returned by the method _PXM
    * For Device Tree, this would the value found in the property "numa-node-id".
For more details see the section "Finding which NUMA node a PCI device belongs
to" in "ACPI" and "Device Tree".

XXX: I still don't fully understand how XEN_PCI_DEV_EXTFN and XEN_PCI_DEV_VIRTFN
wil work. AFAICT, the former is used with the bus support ARI and the only usage
is in the x86 IOMMU code. For the latter, this is related to IOV but I am not
sure what devfn and physfn.devfn will correspond too.

Note that x86 currently provides two more hypercalls (PHYSDEVOP_manage_pci_add
and PHYSDEVOP_manage_pci_add_ext) to register PCI devices. However they are
subset of the hypercall PHYSDEVOP_pci_device_add. Therefore, it is suggested
to leave them unimplemented on ARM.

## Removing PCI devices

The hardware domain will be in charge Xen a device has been removed using
the existing hypercall PHYSDEV_pci_device_remove:

struct physdev_pci_device {
    /* IN */
    uint16_t    seg;
    uint8_t     bus;
    uint8_t     devfn;
}

Note that x86 currently provide one more hypercall (PHYSDEVOP_manage_pci_remove)
to remove PCI devices. However it does not allow to pass a segment number.
Therefore it is suggested to leave unimplemented on ARM.

# Glossary

ECAM: Enhanced Configuration Mechanism
SBDF: Segment Bus Device Function. The segment is a software concept.
MSI: Message Signaled Interrupt
MSI doorbell: MMIO address written to by a device to generate an MSI
SPI: Shared Peripheral Interrupt
LPI: Locality-specific Peripheral Interrupt
ITS: Interrupt Translation Service

# Specifications
[SBSA]  ARM-DEN-0029 v3.0
[GICV3] IHI0069C
[IORT]  DEN0049B

# Bibliography

[1] PCI firmware specification, rev 3.2
[2] https://www.spinics.net/lists/linux-pci/msg56715.html
[3] https://www.spinics.net/lists/linux-pci/msg56723.html
[4] https://www.spinics.net/lists/linux-pci/msg56728.html
[6] https://www.spinics.net/lists/kvm/msg140116.html
[7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
[8] Documents/devicetree/bindings/pci
[9] Documents/devicetree/bindings/iommu/arm,smmu.txt
[10] Document/devicetree/bindings/pci/pci-iommu.txt
[11] Documents/devicetree/bindings/pci/pci-msi.txt
[12] drivers/pci/host/pcie-rcar.c
[13] drivers/pci/host/pci-thunder-ecam.c
[14] drivers/pci/host/pci-thunder-pem.c
[15] Documents/devicetree/bindings/numa.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
@ 2017-05-29  2:30 ` Manish Jaggi
  2017-05-29 18:14   ` Julien Grall
  2017-05-30  7:40 ` Roger Pau Monné
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Manish Jaggi @ 2017-05-29  2:30 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: edgar.iglesias, okaya, Wei Chen, Steve Capper, Andre Przywara,
	manish.jaggi, punit.agrawal, vikrams, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K, roger.pau

Hi Julien,

On 5/26/2017 10:44 PM, Julien Grall wrote:
> Hi all,
>
> The document below is an RFC version of a design proposal for PCI
> Passthrough in Xen on ARM. It aims to describe from an high level perspective
> the interaction with the different subsystems and how guest will be able
> to discover and access PCI.
>
> Currently on ARM, Xen does not have any knowledge about PCI devices. This
> means that IOMMU and interrupt controller (such as ITS) requiring specific
> configuration will not work with PCI even with DOM0.
>
> The PCI Passthrough work could be divided in 2 phases:
>          * Phase 1: Register all PCI devices in Xen => will allow
>                     to use ITS and SMMU with PCI in Xen
>          * Phase 2: Assign devices to guests
>
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
>
>
> I think I was able to gather all of the feedbacks and come up with a solution
> that will satisfy all the parties. The design document has changed quite a lot
> compare to the early draft sent few months ago. The major changes are:
> 	* Provide more details how PCI works on ARM and the interactions with
> 	MSI controller and IOMMU
> 	* Provide details on the existing host bridge implementations
> 	* Give more explanation and justifications on the approach chosen
> 	* Describing the hypercalls used and how they should be called
>
> Feedbacks are welcomed.
>
> Cheers,
>
> --------------------------------------------------------------------------------
>
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft B
>
> # Preface
>
> This document aims to describe the components required to enable the PCI
> pass-through on ARM.
>
> This is an early draft and some questions are still unanswered. When this is
> the case, the text will contain XXX.
>
> # Introduction
>
> PCI pass-through allows the guest to receive full control of physical PCI
> devices. This means the guest will have full and direct access to the PCI
> device.
>
> ARM is supporting a kind of guest that exploits as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network) and interrupts will come through the virtualized
> interrupt controller, therefore there are no big changes required within the
> kernel.
>
> As a consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
>
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (there will be more details later).
IIUC this means that domU would have an emulated host bridge and dom0 
will see the actual host bridge?
>   A guest would be able to take
> advantage of the firmware tables, obviating the need for a specific driver
> for Xen.
>
> Thus, in this document we follow the emulated host bridge approach.
>
> # PCI terminologies
>
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
>
> When the platform has multiple host bridges, the software can add a fourth
> number called Segment (sometimes called Domain) to differentiate host bridges.
> A PCI device will then uniquely by segment:bus:device:function (AKA SBDF).
>
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device. The pair (host bridge, RID) will often be used
> to find the relevant information for configuring the different subsystems (e.g
> IOMMU, MSI controller). For convenience, the rest of the document will use
> SBDF to refer to the pair (host bridge, RID).
>
> # PCI host bridge
>
> PCI host bridge enables data transfer between a host processor and PCI bus
> based devices. The bridge is used to access the configuration space of each
> PCI devices and, on some platform may also act as an MSI controller.
>
> ## Initialization of the PCI host bridge
>
> Whilst it would be expected that the bootloader takes care of initializing
> the PCI host bridge, on some platforms it is done in the Operating System.
>
> This may include enabling/configuring the clocks that could be shared among
> multiple devices.
>
> ## Accessing PCI configuration space
>
> Accessing the PCI configuration space can be divided in 2 category:
>      * Indirect access, where the configuration spaces are multiplexed. An
>      example would be legacy method on x86 (e.g 0xcf8 and 0xcfc). On ARM a
>      similar method is used by PCIe RCar root complex (see [12]).
>      * ECAM access, each configuration space will have its own address space.
>
> Whilst ECAM is a standard, some PCI host bridges will require specific fiddling
> when access the registers (see thunder-ecam [13]).
>
> In most of the cases, accessing all the PCI configuration spaces under a
> given PCI host will be done the same way (i.e either indirect access or ECAM
> access). However, there are a few cases, dependent on the PCI devices accessed,
> which will use different methods (see thunder-pem [14]).
>
> ## Generic host bridge
>
> For the purpose of this document, the term "generic host bridge" will be used
> to describe any host bridge ECAM-compliant and the initialization, if required,
> will be already done by the firmware/bootloader.
>
> # Interaction of the PCI subsystem with other subsystems
>
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems such as the IOMMU and the Interrupt Controller.
>
> The interaction expected between the PCI subsystem and the other subsystems is:
>      * Add a device
>      * Remove a device
>      * Assign a device to a guest
>      * Deassign a device from a guest
>
> XXX: Detail the interaction when assigning/deassigning device
>
> In the following subsections, the interactions will be briefly described from a
> higher level perspective. However, implementation details such as callback,
> structure, etc... are beyond the scope of this document.
>
> ## IOMMU
>
> The IOMMU will be used to isolate the PCI device when accessing the memory (e.g
> DMA and MSI Doorbells). Often the IOMMU will be configured using a MasterID
> (aka StreamID for ARM SMMU)  that can be deduced from the SBDF with the help
> of the firmware tables (see below).
>
> Whilst in theory, all the memory transactions issued by a PCI device should
> go through the IOMMU, on certain platforms some of the memory transaction may
> not reach the IOMMU because they are interpreted by the host bridge. For
> instance, this could happen if the MSI doorbell is built into the PCI host
> bridge or for P2P traffic. See [6] for more details.
>
> XXX: I think this could be solved by using direct mapping (e.g GFN == MFN),
> this would mean the guest memory layout would be similar to the host one when
> PCI devices will be pass-throughed => Detail it.
In the example given in the IORT spec, for pci devices not behind an SMMU,
how would the writes from the device be protected.

>
> ## Interrupt controller
>
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM,
> legacy interrupts will be mapped to SPIs. MSI and MSI-X will write their
> payload in a doorbell belonging to a MSI controller.
>
> ### Existing MSI controllers
>
> In this section some of the existing controllers and their interaction with
> the devices will be briefly described. More details can be found in the
> respective specifications of each MSI controller.
>
> MSIs can be distinguished by some combination of
>      * the Doorbell
>          It is the MMIO address written to. Devices may be configured by
>          software to write to arbitrary doorbells which they can address.
>          An MSI controller may feature a number of doorbells.
>      * the Payload
>          Devices may be configured to write an arbitrary payload chosen by
>          software. MSI controllers may have restrictions on permitted payload.
>          Xen will have to sanitize the payload unless it is known to be always
>          safe.
>      * Sideband information accompanying the write
>          Typically this is neither configurable nor probeable, and depends on
>          the path taken through the memory system (i.e it is a property of the
>          combination of MSI controller and device rather than a property of
>          either in isolation).
>
> ### GICv3/GICv4 ITS
>
> The Interrupt Translation Service (ITS) is a MSI controller designed by ARM
> and integrated in the GICv3/GICv4 interrupt controller. For the specification
> see [GICV3]. Each MSI/MSI-X will be mapped to a new type of interrupt called
> LPI. This interrupt will be configured by the software using a pair (DeviceID,
> EventID).
>
> A platform may have multiple ITS block (e.g one per NUMA node), each of them
> belong to an ITS group.
>
> The DeviceID is a unique identifier with an ITS group for each MSI-capable
> device that can be deduced from the RID with the help of the firmware tables
> (see below).
>
> The EventID is a unique identifier to distinguish different event sending
> by a device.
>
> The MSI payload will only contain the EventID as the DeviceID will be added
> afterwards by the hardware in a way that will prevent any tampering.
>
> The [SBSA] appendix I describes the set of rules for the integration of the
> ITS that any compliant platform should follow. Some of the rules will explain
> the security implication of a misbehaving devices. It ensures that a guest
> will never be able to trigger an MSI on behalf of another guest.
>
> XXX: The security implication is described in the [SBSA] but I haven't found
> any similar working in the GICv3 specification. It is unclear to me if
> non-SBSA compliant platform (e.g embedded) will follow those rules.
>
> ### GICv2m
>
> The GICv2m is an extension of the GICv2 to convert MSI/MSI-X writes to unique
> interrupts. The specification can be found in the [SBSA] appendix E.
>
> Depending on the platform, the GICv2m will provide one or multiple instance
> of register frames. Each frame is composed of a doorbell and associated to
> a set of SPIs that can be discovered by reading the register MSI_TYPER.
>
> On an MSI write, the payload will contain the SPI ID to generate. Note that
> on some platform the MSI payload may contain an offset form the base SPI
> rather than the SPI itself.
>
> The frame will only generate SPI if the written value corresponds to an SPI
> allocated to the frame. Each VM should have exclusity to the frame to ensure
> isolation and prevent a guest OS to trigger an MSI on-behalf of another guest
> OS.
>
> XXX: Linux seems to consider GICv2m as unsafe by default. From my understanding,
> it is still unclear how we should proceed on Xen, as GICv2m should be safe
> as long as the frame is only accessed by one guest.
>
> ### Other MSI controllers
>
> Servers compliant with SBSA level 1 and higher will have to use either ITS
> or GICv2m. However, it is by no means the only MSI controllers available.
> The hardware vendor may decide to use their custom MSI controller which can be
> integrated in the PCI host bridge.
>
> Whether it will be possible to write securely an MSI will depend on the
> MSI controller implementations.
>
> XXX: I am happy to give a brief explanation on more MSI controller (such
> as Xilinx and Renesas) if people think it is necessary.
>
> This design document does not pertain to a specific MSI controller and will try
> to be as agnostic is possible. When possible, it will give insight how to
> integrate the MSI controller.
>
> # Information available in the firmware tables
>
> ## ACPI
>
> ### Host bridges
>
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately, there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
>
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for this, see [2] and [3],
> where quirks will be detected with:
>      * OEM ID
>      * OEM Table ID
>      * OEM Revision
>      * PCI Segment
>      * PCI bus number range (wildcard allowed)
>
> Based on what Linux is currently doing, there are two kind of quirks:
>      * Accesses to the configuration space of certain sizes are not allowed
>      * A specific driver is necessary for driving the host bridge
>
> The former is straightforward to solve but the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it.
So Xen would parse the MCFG to find a hb, then map the config space in 
dom0 stage2 ?
and then provide the same MCFG to dom0?

> However, those drivers may require
> resources described in ASL (see [4] for instance).
>
> The number of platforms requiring specific PCI host bridge driver is currently
> limited. Whilst it is not possible to predict the future, it will be expected
> upcoming platform to have fully ECAM compliant PCI host bridges. Therefore,
> given Xen does not have any ASL parser, the approach suggested is to hardcode
> the missing values. This could be revisit in the future if necessary.
>
> ### Finding information to configure IOMMU and MSI controller
>
> The static table [IORT] will provide information that will help to deduce
> data (such as MasterID and DeviceID) to configure both the IOMMU and the MSI
> controller from a given SBDF.
>
> ## Finding which NUMA node a PCI device belongs to
>
> On NUMA system, the NUMA node associated to a PCI device can be found using
> the _PXM method of the host bridge (?).
>
> XXX: I am not entirely sure where the _PXM will be (i.e host bridge vs PCI
> device).
>
> ## Device Tree
>
> ### Host bridges
>
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>      - device_type: will always be "pci".
>      - compatible: a string indicating which driver to instanciate
>
> The node may also contain optional properties such as:
>      - linux,pci-domain: assign a fix segment number
>      - bus-range: indicate the range of bus numbers supported
>
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges.
>
> ### Finding information to configure IOMMU and MSI controller
>
> ### Configuring the IOMMU
>
> The Device Treee provides a generic IOMMU bindings (see [10]) which uses the
> properties "iommu-map" and "iommu-map-mask" to described the relationship
> between RID and a MasterID.
>
> These properties will be present in the host bridge Device Tree node. From a
> given SBDF, it will be possible to find the corresponding MasterID.
>
> Note that the ARM SMMU also have a legacy binding (see [9]), but it does not
> have a way to describe the relationship between RID and StreamID. Instead it
> assumed that StreamID == RID. This binding has now been deprecated in favor
> of the generic IOMMU binding.
>
> ### Configuring the MSI controller
>
> The relationship between the RID and data required to configure the MSI
> controller (such as DeviceID) can be found using the property "msi-map"
> (see [11]).
>
> This property will be present in the host bridge Device Tree node. From a
> given SBDF, it will be possible to find the corresponding MasterID.
>
> ## Finding which NUMA node a PCI device belongs to
>
> On NUMA system, the NUMA node associated to a PCI device can be found using
> the property "numa-node-id" (see [15]) presents in the host bridge Device Tree
> node.
>
> # Discovering PCI devices
>
> Whilst PCI devices are currently available in the hardware domain, the
> hypervisor does not have any knowledge of them. The first step of supporting
> PCI pass-through is to make Xen aware of the PCI devices.
>
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
>
> This means that Xen should be in charge of controlling the host bridge. However,
> for some host controller, this may be difficult to implement in Xen because of
> depencencies on other components (e.g clocks, see more details in "PCI host
> bridge" section).
>
> For this reason, the approach chosen in this document is to let the hardware
> domain to discover the host bridges, scan the PCI devices and then report
> everything to Xen. This does not rule out the possibility of doing everything
> without the help of the hardware domain in the future.
>
> ## Who is in charge of the host bridge?
>
> There are numerous implementation of host bridges which exist on ARM. A part of
> them requires a specific driver as they cannot be driven by a generic host bridge
> driver. Porting those drivers may be complex due to dependencies on other
> components.
>
> This would be seen as signal to leave the host bridge drivers in the hardware
> domain. Because Xen would need to access the configuration space, all the access
> would have to be forwarded to hardware domain which in turn will access the
> hardware.
>
> In this design document, we are considering that the host bridge driver can
> be ported in Xen. In the case it is not possible, a interface to forward
> configuration space access would need to be defined. The interface details
> is out of scope.
>
> ## Discovering and registering host bridge
>
> The approach taken in the document will require communication between Xen and
> the hardware domain. In this case, they would need to agree on the segment
> number associated to an host bridge. However, this number is not available in
> the Device Tree case.
>
> The hardware domain will register new host bridges using the existing hypercall
> PHYSDEV_mmcfg_reserved:
>
> #define XEN_PCI_MMCFG_RESERVED 1
>
> struct physdev_pci_mmcfg_reserved {
>      /* IN */
>      uint64_t    address;
>      uint16_t    segment;
>      /* Range of bus supported by the host bridge */
>      uint8_t     start_bus;
>      uint8_t     end_bus;
>
>      uint32_t    flags;
> }
So this hypercall is not required for ACPI?
>
> Some of the host bridges may not have a separate configuration address space
> region described in the firmware tables. To simplify the registration, the
> field 'address' should contains the base address of one of the region
> described in the firmware tables.
>      * For ACPI, it would be the base address specified in the MCFG or in the
>      _CBA method.
>      * For Device Tree, this would be any base address of region
>      specified in the "reg" property.
>
> The field 'flags' is expected to have XEN_PCI_MMCFG_RESERVED set.
>
> It is expected that this hypercall is called before any PCI devices is
> registered to Xen.
>
> When the hardware domain is in charge of the host bridge, this hypercall will
> be used to tell Xen the existence of an host bridge in order to find the
> associated information for configuring the MSI controller and the IOMMU.
>
> ## Discovering and registering PCI devices
>
> The hardware domain will scan the host bridge to find the list of PCI devices
> available and then report it to Xen using the existing hypercall
> PHYSDEV_pci_device_add:
>
> #define XEN_PCI_DEV_EXTFN   0x1
> #define XEN_PCI_DEV_VIRTFN  0x2
> #define XEN_PCI_DEV_PXM     0x3
>
> struct physdev_pci_device_add {
>      /* IN */
>      uint16_t    seg;
>      uint8_t     bus;
>      uint8_t     devfn;
>      uint32_t    flags;
>      struct {
>          uint8_t bus;
>          uint8_t devfn;
>      } physfn;
>      /*
>       * Optional parameters array.
>       * First element ([0]) is PXM domain associated with the device (if
>       * XEN_PCI_DEV_PXM is set)
>       */
>      uint32_t optarr[0];
> }
For mapping the MMIO space of the device in Stage2, we need to add 
support in Xen / via a map hypercall in linux/drivers/xen/pci.c
>
> When XEN_PCI_DEV_PXM is set in the field 'flag', optarr[0] will contain the
> NUMA node ID associated with the device:
>      * For ACPI, it would be the value returned by the method _PXM
>      * For Device Tree, this would the value found in the property "numa-node-id".
> For more details see the section "Finding which NUMA node a PCI device belongs
> to" in "ACPI" and "Device Tree".
>
> XXX: I still don't fully understand how XEN_PCI_DEV_EXTFN and XEN_PCI_DEV_VIRTFN
> wil work. AFAICT, the former is used with the bus support ARI and the only usage
> is in the x86 IOMMU code. For the latter, this is related to IOV but I am not
> sure what devfn and physfn.devfn will correspond too.
>
> Note that x86 currently provides two more hypercalls (PHYSDEVOP_manage_pci_add
> and PHYSDEVOP_manage_pci_add_ext) to register PCI devices. However they are
> subset of the hypercall PHYSDEVOP_pci_device_add. Therefore, it is suggested
> to leave them unimplemented on ARM.
>
> ## Removing PCI devices
>
> The hardware domain will be in charge Xen a device has been removed using
> the existing hypercall PHYSDEV_pci_device_remove:
>
> struct physdev_pci_device {
>      /* IN */
>      uint16_t    seg;
>      uint8_t     bus;
>      uint8_t     devfn;
> }
>
> Note that x86 currently provide one more hypercall (PHYSDEVOP_manage_pci_remove)
> to remove PCI devices. However it does not allow to pass a segment number.
> Therefore it is suggested to leave unimplemented on ARM.
>
> # Glossary
>
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> MSI doorbell: MMIO address written to by a device to generate an MSI
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
>
> # Specifications
> [SBSA]  ARM-DEN-0029 v3.0
> [GICV3] IHI0069C
> [IORT]  DEN0049B
>
> # Bibliography
>
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> [12] drivers/pci/host/pcie-rcar.c
> [13] drivers/pci/host/pci-thunder-ecam.c
> [14] drivers/pci/host/pci-thunder-pem.c
> [15] Documents/devicetree/bindings/numa.txt


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-29  2:30 ` Manish Jaggi
@ 2017-05-29 18:14   ` Julien Grall
  2017-05-30  5:53     ` Manish Jaggi
  2017-05-30  7:53     ` Roger Pau Monné
  0 siblings, 2 replies; 35+ messages in thread
From: Julien Grall @ 2017-05-29 18:14 UTC (permalink / raw)
  To: Manish Jaggi, Stefano Stabellini
  Cc: edgar.iglesias, okaya, Wei Chen, Steve Capper, Andre Przywara,
	manish.jaggi, punit.agrawal, vikrams, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K, roger.pau



On 05/29/2017 03:30 AM, Manish Jaggi wrote:
> Hi Julien,

Hello Manish,

> On 5/26/2017 10:44 PM, Julien Grall wrote:
>> PCI pass-through allows the guest to receive full control of physical PCI
>> devices. This means the guest will have full and direct access to the PCI
>> device.
>>
>> ARM is supporting a kind of guest that exploits as much as possible
>> virtualization support in hardware. The guest will rely on PV driver only
>> for IO (e.g block, network) and interrupts will come through the
>> virtualized
>> interrupt controller, therefore there are no big changes required
>> within the
>> kernel.
>>
>> As a consequence, it would be possible to replace PV drivers by
>> assigning real
>> devices to the guest for I/O access. Xen on ARM would therefore be
>> able to
>> run unmodified operating system.
>>
>> To achieve this goal, it looks more sensible to go towards emulating the
>> host bridge (there will be more details later).
> IIUC this means that domU would have an emulated host bridge and dom0
> will see the actual host bridge?

You don't want the hardware domain and Xen access the configuration 
space at the same time. So if Xen is in charge of the host bridge, then 
an emulated host bridge should be exposed to the hardware.

Although, this is depending on who is in charge of the the host bridge. 
As you may have noticed, this design document is proposing two ways to 
handle configuration space access. At the moment any generic host bridge 
(see the definition in the design document) will be handled in Xen and 
the hardware domain will have an emulated host bridge.

If your host bridges is not a generic one, then the hardware domain will 
be  in charge of the host bridges, any configuration access from Xen 
will be forward to the hardware domain.

At the moment, as part of the first implementation, we are only looking 
to implement a generic host bridge in Xen. We will decide on case by 
case basis for all the other host bridges whether we want to have the 
driver in Xen.

[...]

>> ## IOMMU
>>
>> The IOMMU will be used to isolate the PCI device when accessing the
>> memory (e.g
>> DMA and MSI Doorbells). Often the IOMMU will be configured using a
>> MasterID
>> (aka StreamID for ARM SMMU)  that can be deduced from the SBDF with
>> the help
>> of the firmware tables (see below).
>>
>> Whilst in theory, all the memory transactions issued by a PCI device
>> should
>> go through the IOMMU, on certain platforms some of the memory
>> transaction may
>> not reach the IOMMU because they are interpreted by the host bridge. For
>> instance, this could happen if the MSI doorbell is built into the PCI
>> host
>> bridge or for P2P traffic. See [6] for more details.
>>
>> XXX: I think this could be solved by using direct mapping (e.g GFN ==
>> MFN),
>> this would mean the guest memory layout would be similar to the host
>> one when
>> PCI devices will be pass-throughed => Detail it.
> In the example given in the IORT spec, for pci devices not behind an SMMU,
> how would the writes from the device be protected.

I realize the XXX paragraph is quite confusing. I am not trying to solve 
the problem where PCI devices are not protected behind an SMMU but 
platform where some transactions (e.g P2P or MSI doorbell access) are 
by-passing the SMMU.

You may still want to allow PCI passthrough in that case, because you 
know that P2P cannot be done (or potentially disabled) and MSI doorbell 
access is protected (for instance a write in the ITS doorbell will be 
tagged with the device by the hardware). In order to support such 
platform you need to direct map the doorbel (e.g GFN == MFN) and carve 
out the P2P region from the guest memory map. Hence the suggestion to 
re-use the host memory layout for the guest.

Note that it does not mean the RAM region will be direct mapped. It is 
only there to ease carving out memory region by-passed by the SMMU.

[...]

>> ## ACPI
>>
>> ### Host bridges
>>
>> The static table MCFG (see 4.2 in [1]) will describe the host bridges
>> available
>> at boot and supporting ECAM. Unfortunately, there are platforms out there
>> (see [2]) that re-use MCFG to describe host bridge that are not fully
>> ECAM
>> compatible.
>>
>> This means that Xen needs to account for possible quirks in the host
>> bridge.
>> The Linux community are working on a patch series for this, see [2]
>> and [3],
>> where quirks will be detected with:
>>      * OEM ID
>>      * OEM Table ID
>>      * OEM Revision
>>      * PCI Segment
>>      * PCI bus number range (wildcard allowed)
>>
>> Based on what Linux is currently doing, there are two kind of quirks:
>>      * Accesses to the configuration space of certain sizes are not
>> allowed
>>      * A specific driver is necessary for driving the host bridge
>>
>> The former is straightforward to solve but the latter will require
>> more thought.
>> Instantiation of a specific driver for the host controller can be
>> easily done
>> if Xen has the information to detect it.
> So Xen would parse the MCFG to find a hb, then map the config space in
> dom0 stage2 ?
> and then provide the same MCFG to dom0?

This is implementation details. I have been really careful so far to 
leave the implementation open as it does not matter at this stage how we 
are going to implement it in Xen.

[...]

>> ## Discovering and registering host bridge
>>
>> The approach taken in the document will require communication between
>> Xen and
>> the hardware domain. In this case, they would need to agree on the
>> segment
>> number associated to an host bridge. However, this number is not
>> available in
>> the Device Tree case.
>>
>> The hardware domain will register new host bridges using the existing
>> hypercall
>> PHYSDEV_mmcfg_reserved:
>>
>> #define XEN_PCI_MMCFG_RESERVED 1
>>
>> struct physdev_pci_mmcfg_reserved {
>>      /* IN */
>>      uint64_t    address;
>>      uint16_t    segment;
>>      /* Range of bus supported by the host bridge */
>>      uint8_t     start_bus;
>>      uint8_t     end_bus;
>>
>>      uint32_t    flags;
>> }
> So this hypercall is not required for ACPI?

This is not DT specific as even on ACPI there are platform not fully 
ECAM compliant. As I said above, we will need to decide whether we want 
to support non-ECAM compliant host bridges (e.g all host bridges have a 
specific drivers) in Xen. Likely this will be on case by case basis.

[...]

>> ## Discovering and registering PCI devices
>>
>> The hardware domain will scan the host bridge to find the list of PCI
>> devices
>> available and then report it to Xen using the existing hypercall
>> PHYSDEV_pci_device_add:
>>
>> #define XEN_PCI_DEV_EXTFN   0x1
>> #define XEN_PCI_DEV_VIRTFN  0x2
>> #define XEN_PCI_DEV_PXM     0x3
>>
>> struct physdev_pci_device_add {
>>      /* IN */
>>      uint16_t    seg;
>>      uint8_t     bus;
>>      uint8_t     devfn;
>>      uint32_t    flags;
>>      struct {
>>          uint8_t bus;
>>          uint8_t devfn;
>>      } physfn;
>>      /*
>>       * Optional parameters array.
>>       * First element ([0]) is PXM domain associated with the device (if
>>       * XEN_PCI_DEV_PXM is set)
>>       */
>>      uint32_t optarr[0];
>> }
> For mapping the MMIO space of the device in Stage2, we need to add
> support in Xen / via a map hypercall in linux/drivers/xen/pci.c

Mapping MMIO space in stage-2 is not PCI specific and already addressed 
in Xen 4.9 (see commit 80f9c31 "xen/arm: acpi: Map MMIO on fault in 
stage-2 page table for the hardware domain"). So I don't understand why 
we should care about that here...

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-29 18:14   ` Julien Grall
@ 2017-05-30  5:53     ` Manish Jaggi
  2017-05-30  9:33       ` Julien Grall
  2017-05-30  7:53     ` Roger Pau Monné
  1 sibling, 1 reply; 35+ messages in thread
From: Manish Jaggi @ 2017-05-30  5:53 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: edgar.iglesias, okaya, Wei Chen, Steve Capper, Andre Przywara,
	manish.jaggi, punit.agrawal, vikrams, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K, roger.pau

Hi Julien,

On 5/29/2017 11:44 PM, Julien Grall wrote:
>
>
> On 05/29/2017 03:30 AM, Manish Jaggi wrote:
>> Hi Julien,
>
> Hello Manish,
>
>> On 5/26/2017 10:44 PM, Julien Grall wrote:
>>> PCI pass-through allows the guest to receive full control of 
>>> physical PCI
>>> devices. This means the guest will have full and direct access to 
>>> the PCI
>>> device.
>>>
>>> ARM is supporting a kind of guest that exploits as much as possible
>>> virtualization support in hardware. The guest will rely on PV driver 
>>> only
>>> for IO (e.g block, network) and interrupts will come through the
>>> virtualized
>>> interrupt controller, therefore there are no big changes required
>>> within the
>>> kernel.
>>>
>>> As a consequence, it would be possible to replace PV drivers by
>>> assigning real
>>> devices to the guest for I/O access. Xen on ARM would therefore be
>>> able to
>>> run unmodified operating system.
>>>
>>> To achieve this goal, it looks more sensible to go towards emulating 
>>> the
>>> host bridge (there will be more details later).
>> IIUC this means that domU would have an emulated host bridge and dom0
>> will see the actual host bridge?
>
> You don't want the hardware domain and Xen access the configuration 
> space at the same time. So if Xen is in charge of the host bridge, 
> then an emulated host bridge should be exposed to the hardware.
I believe in x86 case dom0 and Xen do access the config space. In the 
context of pci device add hypercall.
Thats when the pci_config_XXX functions in xen are called.
>
> Although, this is depending on who is in charge of the the host 
> bridge. As you may have noticed, this design document is proposing two 
> ways to handle configuration space access. At the moment any generic 
> host bridge (see the definition in the design document) will be 
> handled in Xen and the hardware domain will have an emulated host bridge.
>
So in case of generic hb, xen will manage the config space and provide a 
emulated I/f to dom0, and accesses would be trapped by Xen.
Essentially the goal is to scan all pci devices and register them with 
Xen (which in turn will configure the smmu).
For a  generic hb, this can be done either in dom0/xen. The only doubt 
here is what extra benefit the emulated hb give in case of dom0.

> If your host bridges is not a generic one, then the hardware domain 
> will be  in charge of the host bridges, any configuration access from 
> Xen will be forward to the hardware domain.
>
> At the moment, as part of the first implementation, we are only 
> looking to implement a generic host bridge in Xen. We will decide on 
> case by case basis for all the other host bridges whether we want to 
> have the driver in Xen.
agreed.
>
> [...]
>
>>> ## IOMMU
>>>
>>> The IOMMU will be used to isolate the PCI device when accessing the
>>> memory (e.g
>>> DMA and MSI Doorbells). Often the IOMMU will be configured using a
>>> MasterID
>>> (aka StreamID for ARM SMMU)  that can be deduced from the SBDF with
>>> the help
>>> of the firmware tables (see below).
>>>
>>> Whilst in theory, all the memory transactions issued by a PCI device
>>> should
>>> go through the IOMMU, on certain platforms some of the memory
>>> transaction may
>>> not reach the IOMMU because they are interpreted by the host bridge. 
>>> For
>>> instance, this could happen if the MSI doorbell is built into the PCI
>>> host
>>> bridge or for P2P traffic. See [6] for more details.
>>>
>>> XXX: I think this could be solved by using direct mapping (e.g GFN ==
>>> MFN),
>>> this would mean the guest memory layout would be similar to the host
>>> one when
>>> PCI devices will be pass-throughed => Detail it.
>> In the example given in the IORT spec, for pci devices not behind an 
>> SMMU,
>> how would the writes from the device be protected.
>
> I realize the XXX paragraph is quite confusing. I am not trying to 
> solve the problem where PCI devices are not protected behind an SMMU 
> but platform where some transactions (e.g P2P or MSI doorbell access) 
> are by-passing the SMMU.
>
> You may still want to allow PCI passthrough in that case, because you 
> know that P2P cannot be done (or potentially disabled) and MSI 
> doorbell access is protected (for instance a write in the ITS doorbell 
> will be tagged with the device by the hardware). In order to support 
> such platform you need to direct map the doorbel (e.g GFN == MFN) and 
> carve out the P2P region from the guest memory map. Hence the 
> suggestion to re-use the host memory layout for the guest.
>
> Note that it does not mean the RAM region will be direct mapped. It is 
> only there to ease carving out memory region by-passed by the SMMU.
>
> [...]
>
>>> ## ACPI
>>>
>>> ### Host bridges
>>>
>>> The static table MCFG (see 4.2 in [1]) will describe the host bridges
>>> available
>>> at boot and supporting ECAM. Unfortunately, there are platforms out 
>>> there
>>> (see [2]) that re-use MCFG to describe host bridge that are not fully
>>> ECAM
>>> compatible.
>>>
>>> This means that Xen needs to account for possible quirks in the host
>>> bridge.
>>> The Linux community are working on a patch series for this, see [2]
>>> and [3],
>>> where quirks will be detected with:
>>>      * OEM ID
>>>      * OEM Table ID
>>>      * OEM Revision
>>>      * PCI Segment
>>>      * PCI bus number range (wildcard allowed)
>>>
>>> Based on what Linux is currently doing, there are two kind of quirks:
>>>      * Accesses to the configuration space of certain sizes are not
>>> allowed
>>>      * A specific driver is necessary for driving the host bridge
>>>
>>> The former is straightforward to solve but the latter will require
>>> more thought.
>>> Instantiation of a specific driver for the host controller can be
>>> easily done
>>> if Xen has the information to detect it.
>> So Xen would parse the MCFG to find a hb, then map the config space in
>> dom0 stage2 ?
>> and then provide the same MCFG to dom0?
>
> This is implementation details. I have been really careful so far to 
> leave the implementation open as it does not matter at this stage how 
> we are going to implement it in Xen.
>
this matters in the case of stage 2 MMIO mappings, see below
> [...]
>
>>> ## Discovering and registering host bridge
>>>
>>> The approach taken in the document will require communication between
>>> Xen and
>>> the hardware domain. In this case, they would need to agree on the
>>> segment
>>> number associated to an host bridge. However, this number is not
>>> available in
>>> the Device Tree case.
>>>
>>> The hardware domain will register new host bridges using the existing
>>> hypercall
>>> PHYSDEV_mmcfg_reserved:
>>>
>>> #define XEN_PCI_MMCFG_RESERVED 1
>>>
>>> struct physdev_pci_mmcfg_reserved {
>>>      /* IN */
>>>      uint64_t    address;
>>>      uint16_t    segment;
>>>      /* Range of bus supported by the host bridge */
>>>      uint8_t     start_bus;
>>>      uint8_t     end_bus;
>>>
>>>      uint32_t    flags;
>>> }
>> So this hypercall is not required for ACPI?
>
> This is not DT specific as even on ACPI there are platform not fully 
> ECAM compliant. As I said above, we will need to decide whether we 
> want to support non-ECAM compliant host bridges (e.g all host bridges 
> have a specific drivers) in Xen. Likely this will be on case by case 
> basis.
>
> [...]
>
>>> ## Discovering and registering PCI devices
>>>
>>> The hardware domain will scan the host bridge to find the list of PCI
>>> devices
>>> available and then report it to Xen using the existing hypercall
>>> PHYSDEV_pci_device_add:
>>>
>>> #define XEN_PCI_DEV_EXTFN   0x1
>>> #define XEN_PCI_DEV_VIRTFN  0x2
>>> #define XEN_PCI_DEV_PXM     0x3
>>>
>>> struct physdev_pci_device_add {
>>>      /* IN */
>>>      uint16_t    seg;
>>>      uint8_t     bus;
>>>      uint8_t     devfn;
>>>      uint32_t    flags;
>>>      struct {
>>>          uint8_t bus;
>>>          uint8_t devfn;
>>>      } physfn;
>>>      /*
>>>       * Optional parameters array.
>>>       * First element ([0]) is PXM domain associated with the device 
>>> (if
>>>       * XEN_PCI_DEV_PXM is set)
>>>       */
>>>      uint32_t optarr[0];
>>> }
>> For mapping the MMIO space of the device in Stage2, we need to add
>> support in Xen / via a map hypercall in linux/drivers/xen/pci.c
>
> Mapping MMIO space in stage-2 is not PCI specific and already 
> addressed in Xen 4.9 (see commit 80f9c31 "xen/arm: acpi: Map MMIO on 
> fault in stage-2 page table for the hardware domain"). So I don't 
> understand why we should care about that here...
>
This approach is ok.
But we could have more granular approach than trapping IMHO.
For ACPI
    -xen parses MCFG and can map pci hb (emulated / original) in stage2 
for dom0
    -device MMIO can be mapped in stage2 alongside pci_device_add call .
What do you think?

> Regards,
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
  2017-05-29  2:30 ` Manish Jaggi
@ 2017-05-30  7:40 ` Roger Pau Monné
  2017-05-30  9:54   ` Julien Grall
  2017-06-16  0:23 ` Stefano Stabellini
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Roger Pau Monné @ 2017-05-30  7:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, vikrams, okaya,
	Goel, Sameer, xen-devel, Dave P Martin, Vijaya Kumar K

On Fri, May 26, 2017 at 06:14:09PM +0100, Julien Grall wrote:
[...]
> ## Who is in charge of the host bridge?
> 
> There are numerous implementation of host bridges which exist on ARM. A part of
> them requires a specific driver as they cannot be driven by a generic host bridge
> driver. Porting those drivers may be complex due to dependencies on other
> components.
> 
> This would be seen as signal to leave the host bridge drivers in the hardware
> domain. Because Xen would need to access the configuration space, all the access
> would have to be forwarded to hardware domain which in turn will access the
> hardware.

IMHO this is much more complicated that what seems from the paragraph
above. There is currently no way for Xen to forward PCI config space
accesses to any other entity. The closer Xen has to this would be
IOREQ servers possibly, but then you have to take into account that
in order to forward PCI config spaces to Dom0 you *might* have to
schedule the Dom0 (ie: context switch to it), perform the access and
then context switch back to Xen and get the value. I don't think the
PCI code is prepared for such asynchronous accesses at all.

> In this design document, we are considering that the host bridge driver can
> be ported in Xen. In the case it is not possible, a interface to forward
> configuration space access would need to be defined. The interface details
> is out of scope.

I think that you have to state that the driver is ported to Xen or the
bridge will not be supported. I don't think it's feasible to forward
PCI config space access from Xen to Dom0 at all.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-29 18:14   ` Julien Grall
  2017-05-30  5:53     ` Manish Jaggi
@ 2017-05-30  7:53     ` Roger Pau Monné
  2017-05-30  9:42       ` Julien Grall
  1 sibling, 1 reply; 35+ messages in thread
From: Roger Pau Monné @ 2017-05-30  7:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Manish Jaggi, manish.jaggi, punit.agrawal, vikrams, okaya, Goel,
	Sameer, Andre Przywara, xen-devel, Dave P Martin, Vijaya Kumar K

On Mon, May 29, 2017 at 07:14:55PM +0100, Julien Grall wrote:
> On 05/29/2017 03:30 AM, Manish Jaggi wrote:
> > On 5/26/2017 10:44 PM, Julien Grall wrote:
[...]
> > > ## Discovering and registering PCI devices
> > > 
> > > The hardware domain will scan the host bridge to find the list of PCI
> > > devices
> > > available and then report it to Xen using the existing hypercall
> > > PHYSDEV_pci_device_add:
> > > 
> > > #define XEN_PCI_DEV_EXTFN   0x1
> > > #define XEN_PCI_DEV_VIRTFN  0x2
> > > #define XEN_PCI_DEV_PXM     0x3
> > > 
> > > struct physdev_pci_device_add {
> > >      /* IN */
> > >      uint16_t    seg;
> > >      uint8_t     bus;
> > >      uint8_t     devfn;
> > >      uint32_t    flags;
> > >      struct {
> > >          uint8_t bus;
> > >          uint8_t devfn;
> > >      } physfn;
> > >      /*
> > >       * Optional parameters array.
> > >       * First element ([0]) is PXM domain associated with the device (if
> > >       * XEN_PCI_DEV_PXM is set)
> > >       */
> > >      uint32_t optarr[0];
> > > }
> > For mapping the MMIO space of the device in Stage2, we need to add
> > support in Xen / via a map hypercall in linux/drivers/xen/pci.c
> 
> Mapping MMIO space in stage-2 is not PCI specific and already addressed in
> Xen 4.9 (see commit 80f9c31 "xen/arm: acpi: Map MMIO on fault in stage-2
> page table for the hardware domain"). So I don't understand why we should
> care about that here...

I'm not sure what Manish means, but you should map the BARs of the
device when adding it to a domain. Doing mapping on faults will work
with CPU accesses, but it's not going to work with SMMU faults, those are
asynchronous, and I don't think you can guarantee that the CPU is
always going to access the BARs before doing any DMA transactions to
them.

Note that Xen can also scan the bridge by itself and add the devices,
I'm not sure you need the PHYSDEV_pci_device_add hypercall.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-30  5:53     ` Manish Jaggi
@ 2017-05-30  9:33       ` Julien Grall
  0 siblings, 0 replies; 35+ messages in thread
From: Julien Grall @ 2017-05-30  9:33 UTC (permalink / raw)
  To: Manish Jaggi, Stefano Stabellini
  Cc: edgar.iglesias, okaya, Wei Chen, Steve Capper, Andre Przywara,
	manish.jaggi, punit.agrawal, vikrams, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K, roger.pau



On 30/05/17 06:53, Manish Jaggi wrote:
> Hi Julien,
>
> On 5/29/2017 11:44 PM, Julien Grall wrote:
>>
>>
>> On 05/29/2017 03:30 AM, Manish Jaggi wrote:
>>> Hi Julien,
>>
>> Hello Manish,
>>
>>> On 5/26/2017 10:44 PM, Julien Grall wrote:
>>>> PCI pass-through allows the guest to receive full control of
>>>> physical PCI
>>>> devices. This means the guest will have full and direct access to
>>>> the PCI
>>>> device.
>>>>
>>>> ARM is supporting a kind of guest that exploits as much as possible
>>>> virtualization support in hardware. The guest will rely on PV driver
>>>> only
>>>> for IO (e.g block, network) and interrupts will come through the
>>>> virtualized
>>>> interrupt controller, therefore there are no big changes required
>>>> within the
>>>> kernel.
>>>>
>>>> As a consequence, it would be possible to replace PV drivers by
>>>> assigning real
>>>> devices to the guest for I/O access. Xen on ARM would therefore be
>>>> able to
>>>> run unmodified operating system.
>>>>
>>>> To achieve this goal, it looks more sensible to go towards emulating
>>>> the
>>>> host bridge (there will be more details later).
>>> IIUC this means that domU would have an emulated host bridge and dom0
>>> will see the actual host bridge?
>>
>> You don't want the hardware domain and Xen access the configuration
>> space at the same time. So if Xen is in charge of the host bridge,
>> then an emulated host bridge should be exposed to the hardware.
> I believe in x86 case dom0 and Xen do access the config space. In the
> context of pci device add hypercall.
> Thats when the pci_config_XXX functions in xen are called.

I don't understand how this is related to what I said... If DOM0 has an 
emulated host bridge, it will not be possible to have both poking the 
real hardware at the same time as only Xen would do hardware access.

>>
>> Although, this is depending on who is in charge of the the host
>> bridge. As you may have noticed, this design document is proposing two
>> ways to handle configuration space access. At the moment any generic
>> host bridge (see the definition in the design document) will be
>> handled in Xen and the hardware domain will have an emulated host bridge.
>>
> So in case of generic hb, xen will manage the config space and provide a
> emulated I/f to dom0, and accesses would be trapped by Xen.
> Essentially the goal is to scan all pci devices and register them with
> Xen (which in turn will configure the smmu).
> For a  generic hb, this can be done either in dom0/xen. The only doubt
> here is what extra benefit the emulated hb give in case of dom0.

Because you don't have 2 entities to access hardware at the same time. 
You don't know how it will behave. You may also want to trap some 
registers for configuration. Note that this what is already done on x86.

[...]

>>> For mapping the MMIO space of the device in Stage2, we need to add
>>> support in Xen / via a map hypercall in linux/drivers/xen/pci.c
>>
>> Mapping MMIO space in stage-2 is not PCI specific and already
>> addressed in Xen 4.9 (see commit 80f9c31 "xen/arm: acpi: Map MMIO on
>> fault in stage-2 page table for the hardware domain"). So I don't
>> understand why we should care about that here...
>>
> This approach is ok.
> But we could have more granular approach than trapping IMHO.
> For ACPI
>    -xen parses MCFG and can map pci hb (emulated / original) in stage2
> for dom0
>    -device MMIO can be mapped in stage2 alongside pci_device_add call .
> What do you think?

There are plenty of way to map MMIO today and again this is not related 
to this design document. It does not matter how you are going to map 
(trapping, XENMEM_add_to_add_physmap, parsing MCFG, reading BARs...) at 
this stage.

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-30  7:53     ` Roger Pau Monné
@ 2017-05-30  9:42       ` Julien Grall
  0 siblings, 0 replies; 35+ messages in thread
From: Julien Grall @ 2017-05-30  9:42 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Manish Jaggi, manish.jaggi, punit.agrawal, vikrams, okaya, Goel,
	Sameer, Andre Przywara, xen-devel, Dave P Martin, Vijaya Kumar K

Hi Roger,

On 30/05/17 08:53, Roger Pau Monné wrote:
> On Mon, May 29, 2017 at 07:14:55PM +0100, Julien Grall wrote:
>> On 05/29/2017 03:30 AM, Manish Jaggi wrote:
>>> On 5/26/2017 10:44 PM, Julien Grall wrote:
> [...]
>>>> ## Discovering and registering PCI devices
>>>>
>>>> The hardware domain will scan the host bridge to find the list of PCI
>>>> devices
>>>> available and then report it to Xen using the existing hypercall
>>>> PHYSDEV_pci_device_add:
>>>>
>>>> #define XEN_PCI_DEV_EXTFN   0x1
>>>> #define XEN_PCI_DEV_VIRTFN  0x2
>>>> #define XEN_PCI_DEV_PXM     0x3
>>>>
>>>> struct physdev_pci_device_add {
>>>>      /* IN */
>>>>      uint16_t    seg;
>>>>      uint8_t     bus;
>>>>      uint8_t     devfn;
>>>>      uint32_t    flags;
>>>>      struct {
>>>>          uint8_t bus;
>>>>          uint8_t devfn;
>>>>      } physfn;
>>>>      /*
>>>>       * Optional parameters array.
>>>>       * First element ([0]) is PXM domain associated with the device (if
>>>>       * XEN_PCI_DEV_PXM is set)
>>>>       */
>>>>      uint32_t optarr[0];
>>>> }
>>> For mapping the MMIO space of the device in Stage2, we need to add
>>> support in Xen / via a map hypercall in linux/drivers/xen/pci.c
>>
>> Mapping MMIO space in stage-2 is not PCI specific and already addressed in
>> Xen 4.9 (see commit 80f9c31 "xen/arm: acpi: Map MMIO on fault in stage-2
>> page table for the hardware domain"). So I don't understand why we should
>> care about that here...
>
> I'm not sure what Manish means, but you should map the BARs of the
> device when adding it to a domain.

This could be done when configuring the BARs. Today for DOM0, we rely 
either on trapping or XENMEM_add_to_add_physmap.

But I still don't understand why it matters so much in the design 
document. This is really an implementation details.

> Doing mapping on faults will work
> with CPU accesses, but it's not going to work with SMMU faults, those are
> asynchronous, and I don't think you can guarantee that the CPU is
> always going to access the BARs before doing any DMA transactions to
> them.

Why would you do DMA using BARs? I thought DMA was only to/from memory?

>
> Note that Xen can also scan the bridge by itself and add the devices,
> I'm not sure you need the PHYSDEV_pci_device_add hypercall.

This should work today without any knowledge of PCIs in Xen. I am not 
aware of any failures with the current approach implemented. If you 
think it does not work, then please give a concrete example.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-30  7:40 ` Roger Pau Monné
@ 2017-05-30  9:54   ` Julien Grall
  2017-06-16  0:31     ` Stefano Stabellini
  0 siblings, 1 reply; 35+ messages in thread
From: Julien Grall @ 2017-05-30  9:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, vikrams, okaya,
	Goel, Sameer, xen-devel, Dave P Martin, Vijaya Kumar K

Hi Roger,

On 30/05/17 08:40, Roger Pau Monné wrote:
> On Fri, May 26, 2017 at 06:14:09PM +0100, Julien Grall wrote:
> [...]
>> ## Who is in charge of the host bridge?
>>
>> There are numerous implementation of host bridges which exist on ARM. A part of
>> them requires a specific driver as they cannot be driven by a generic host bridge
>> driver. Porting those drivers may be complex due to dependencies on other
>> components.
>>
>> This would be seen as signal to leave the host bridge drivers in the hardware
>> domain. Because Xen would need to access the configuration space, all the access
>> would have to be forwarded to hardware domain which in turn will access the
>> hardware.
>
> IMHO this is much more complicated that what seems from the paragraph
> above. There is currently no way for Xen to forward PCI config space
> accesses to any other entity. The closer Xen has to this would be
> IOREQ servers possibly, but then you have to take into account that
> in order to forward PCI config spaces to Dom0 you *might* have to
> schedule the Dom0 (ie: context switch to it), perform the access and
> then context switch back to Xen and get the value. I don't think the
> PCI code is prepared for such asynchronous accesses at all.

I don't see any issue to schedule DOM0... it is configuration access 
space, not BAR access. It does not matter if it is slow. What matters 
here is to be able to use the host bridges and do PCI passthrough with Xen.

Also, the PCI code is currently x86 specific and not prepared for ARM. 
It does not mean we should not get the code in shape to support ARM ;).

>
>> In this design document, we are considering that the host bridge driver can
>> be ported in Xen. In the case it is not possible, a interface to forward
>> configuration space access would need to be defined. The interface details
>> is out of scope.
>
> I think that you have to state that the driver is ported to Xen or the
> bridge will not be supported. I don't think it's feasible to forward
> PCI config space access from Xen to Dom0 at all.

Rather than arguing on the code is not ready for that. I would have 
appreciated if you gave technical details on why it is not feasible.

I already gave quite a few times insights on why it might be difficult 
to port an host bridges in Xen.
	- How do you configure the clock? What if they are shared?
	- How about host bridges using indirect access (e.g cf8 like)? What you 
expose to DOM0?
	- ....

Such host bridges will end up to pull a lot of code in Xen and require 
more design than finding about a way to forward configuration space in 
Xen. Those boards exists and people are looking at using Xen + PCI 
passthrough. So saying they are not supported is not the right solution 
here.

Anyway, I mentioned it in the design document to open a discussion and 
not something I am going to focus for a first version of PCI pass-through.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
  2017-05-29  2:30 ` Manish Jaggi
  2017-05-30  7:40 ` Roger Pau Monné
@ 2017-06-16  0:23 ` Stefano Stabellini
  2017-06-20  0:19 ` Vikram Sethi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Stefano Stabellini @ 2017-06-16  0:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, vikrams, okaya,
	Goel, Sameer, xen-devel, Dave P Martin, Vijaya Kumar K,
	roger.pau

On Fri, 26 May 2017, Julien Grall wrote:
> Hi all,
> 
> The document below is an RFC version of a design proposal for PCI
> Passthrough in Xen on ARM. It aims to describe from an high level perspective
> the interaction with the different subsystems and how guest will be able
> to discover and access PCI.
> 
> Currently on ARM, Xen does not have any knowledge about PCI devices. This
> means that IOMMU and interrupt controller (such as ITS) requiring specific
> configuration will not work with PCI even with DOM0.
> 
> The PCI Passthrough work could be divided in 2 phases:
>         * Phase 1: Register all PCI devices in Xen => will allow
>                    to use ITS and SMMU with PCI in Xen
>         * Phase 2: Assign devices to guests
> 
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
> 
> 
> I think I was able to gather all of the feedbacks and come up with a solution
> that will satisfy all the parties. The design document has changed quite a lot
> compare to the early draft sent few months ago. The major changes are:
> 	* Provide more details how PCI works on ARM and the interactions with
> 	MSI controller and IOMMU
> 	* Provide details on the existing host bridge implementations
> 	* Give more explanation and justifications on the approach chosen 
> 	* Describing the hypercalls used and how they should be called
> 
> Feedbacks are welcomed.
> 
> Cheers,

Hi Julien,

I think this document is a very good first step in the right direction
and I fully agree with the approaches taken here.

A noticed a couple of grammar errors that I pointed out below.


> --------------------------------------------------------------------------------
> 
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft B
> 
> # Preface
> 
> This document aims to describe the components required to enable the PCI
> pass-through on ARM.
> 
> This is an early draft and some questions are still unanswered. When this is
> the case, the text will contain XXX.
> 
> # Introduction
> 
> PCI pass-through allows the guest to receive full control of physical PCI
> devices. This means the guest will have full and direct access to the PCI
> device.
> 
> ARM is supporting a kind of guest that exploits as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network) and interrupts will come through the virtualized
> interrupt controller, therefore there are no big changes required within the
> kernel.
> 
> As a consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
> 
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (there will be more details later). A guest would be able to take
> advantage of the firmware tables, obviating the need for a specific driver
> for Xen.
> 
> Thus, in this document we follow the emulated host bridge approach.
> 
> # PCI terminologies
> 
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
> 
> When the platform has multiple host bridges, the software can add a fourth
> number called Segment (sometimes called Domain) to differentiate host bridges.
> A PCI device will then uniquely by segment:bus:device:function (AKA SBDF).
> 
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device. The pair (host bridge, RID) will often be used
> to find the relevant information for configuring the different subsystems (e.g
> IOMMU, MSI controller). For convenience, the rest of the document will use
> SBDF to refer to the pair (host bridge, RID).
> 
> # PCI host bridge
> 
> PCI host bridge enables data transfer between a host processor and PCI bus
> based devices. The bridge is used to access the configuration space of each
> PCI devices and, on some platform may also act as an MSI controller.
> 
> ## Initialization of the PCI host bridge
> 
> Whilst it would be expected that the bootloader takes care of initializing
> the PCI host bridge, on some platforms it is done in the Operating System.
> 
> This may include enabling/configuring the clocks that could be shared among
> multiple devices.
> 
> ## Accessing PCI configuration space
> 
> Accessing the PCI configuration space can be divided in 2 category:
>     * Indirect access, where the configuration spaces are multiplexed. An
>     example would be legacy method on x86 (e.g 0xcf8 and 0xcfc). On ARM a
>     similar method is used by PCIe RCar root complex (see [12]).
>     * ECAM access, each configuration space will have its own address space.
> 
> Whilst ECAM is a standard, some PCI host bridges will require specific fiddling
> when access the registers (see thunder-ecam [13]).
> 
> In most of the cases, accessing all the PCI configuration spaces under a
> given PCI host will be done the same way (i.e either indirect access or ECAM
> access). However, there are a few cases, dependent on the PCI devices accessed,
> which will use different methods (see thunder-pem [14]).
> 
> ## Generic host bridge
> 
> For the purpose of this document, the term "generic host bridge" will be used
> to describe any host bridge ECAM-compliant and the initialization, if required,
> will be already done by the firmware/bootloader.
> 
> # Interaction of the PCI subsystem with other subsystems
> 
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems such as the IOMMU and the Interrupt Controller.
> 
> The interaction expected between the PCI subsystem and the other subsystems is:
>     * Add a device
>     * Remove a device
>     * Assign a device to a guest
>     * Deassign a device from a guest
> 
> XXX: Detail the interaction when assigning/deassigning device
> 
> In the following subsections, the interactions will be briefly described from a
> higher level perspective. However, implementation details such as callback,
> structure, etc... are beyond the scope of this document.
> 
> ## IOMMU
> 
> The IOMMU will be used to isolate the PCI device when accessing the memory (e.g
> DMA and MSI Doorbells). Often the IOMMU will be configured using a MasterID
> (aka StreamID for ARM SMMU)  that can be deduced from the SBDF with the help
> of the firmware tables (see below).
> 
> Whilst in theory, all the memory transactions issued by a PCI device should
> go through the IOMMU, on certain platforms some of the memory transaction may
> not reach the IOMMU because they are interpreted by the host bridge. For
> instance, this could happen if the MSI doorbell is built into the PCI host
> bridge or for P2P traffic. See [6] for more details.
> 
> XXX: I think this could be solved by using direct mapping (e.g GFN == MFN),
> this would mean the guest memory layout would be similar to the host one when
> PCI devices will be pass-throughed => Detail it.
> 
> ## Interrupt controller
> 
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM,
> legacy interrupts will be mapped to SPIs. MSI and MSI-X will write their
> payload in a doorbell belonging to a MSI controller.
> 
> ### Existing MSI controllers
> 
> In this section some of the existing controllers and their interaction with
> the devices will be briefly described. More details can be found in the
> respective specifications of each MSI controller.
> 
> MSIs can be distinguished by some combination of
>     * the Doorbell
>         It is the MMIO address written to. Devices may be configured by
>         software to write to arbitrary doorbells which they can address.
>         An MSI controller may feature a number of doorbells.
>     * the Payload
>         Devices may be configured to write an arbitrary payload chosen by
>         software. MSI controllers may have restrictions on permitted payload.
>         Xen will have to sanitize the payload unless it is known to be always
>         safe.
>     * Sideband information accompanying the write
>         Typically this is neither configurable nor probeable, and depends on
>         the path taken through the memory system (i.e it is a property of the
>         combination of MSI controller and device rather than a property of
>         either in isolation).
> 
> ### GICv3/GICv4 ITS
> 
> The Interrupt Translation Service (ITS) is a MSI controller designed by ARM
> and integrated in the GICv3/GICv4 interrupt controller. For the specification
> see [GICV3]. Each MSI/MSI-X will be mapped to a new type of interrupt called
> LPI. This interrupt will be configured by the software using a pair (DeviceID,
> EventID).
> 
> A platform may have multiple ITS block (e.g one per NUMA node), each of them
> belong to an ITS group.
> 
> The DeviceID is a unique identifier with an ITS group for each MSI-capable
> device that can be deduced from the RID with the help of the firmware tables
> (see below).
> 
> The EventID is a unique identifier to distinguish different event sending
> by a device.
> 
> The MSI payload will only contain the EventID as the DeviceID will be added
> afterwards by the hardware in a way that will prevent any tampering.
> 
> The [SBSA] appendix I describes the set of rules for the integration of the
                      ^ redundant I


> ITS that any compliant platform should follow. Some of the rules will explain
> the security implication of a misbehaving devices. It ensures that a guest
> will never be able to trigger an MSI on behalf of another guest.
> 
> XXX: The security implication is described in the [SBSA] but I haven't found
> any similar working in the GICv3 specification. It is unclear to me if
> non-SBSA compliant platform (e.g embedded) will follow those rules.
> 
> ### GICv2m
> 
> The GICv2m is an extension of the GICv2 to convert MSI/MSI-X writes to unique
> interrupts. The specification can be found in the [SBSA] appendix E.
> 
> Depending on the platform, the GICv2m will provide one or multiple instance
> of register frames. Each frame is composed of a doorbell and associated to
> a set of SPIs that can be discovered by reading the register MSI_TYPER.
> 
> On an MSI write, the payload will contain the SPI ID to generate. Note that
> on some platform the MSI payload may contain an offset form the base SPI
> rather than the SPI itself.
> 
> The frame will only generate SPI if the written value corresponds to an SPI
> allocated to the frame. Each VM should have exclusity to the frame to ensure
                                               ^ exclusive access ?


> isolation and prevent a guest OS to trigger an MSI on-behalf of another guest
> OS.
> 
> XXX: Linux seems to consider GICv2m as unsafe by default. From my understanding,
> it is still unclear how we should proceed on Xen, as GICv2m should be safe
> as long as the frame is only accessed by one guest.

It seems to me that you are right


> ### Other MSI controllers
> 
> Servers compliant with SBSA level 1 and higher will have to use either ITS
> or GICv2m. However, it is by no means the only MSI controllers available.
> The hardware vendor may decide to use their custom MSI controller which can be
> integrated in the PCI host bridge.
> 
> Whether it will be possible to write securely an MSI will depend on the
> MSI controller implementations.
> 
> XXX: I am happy to give a brief explanation on more MSI controller (such
> as Xilinx and Renesas) if people think it is necessary.
> 
> This design document does not pertain to a specific MSI controller and will try
> to be as agnostic is possible. When possible, it will give insight how to
> integrate the MSI controller.
> 
> # Information available in the firmware tables
> 
> ## ACPI
> 
> ### Host bridges
> 
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately, there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
> 
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for this, see [2] and [3],
> where quirks will be detected with:
>     * OEM ID
>     * OEM Table ID
>     * OEM Revision
>     * PCI Segment
>     * PCI bus number range (wildcard allowed)
> 
> Based on what Linux is currently doing, there are two kind of quirks:
>     * Accesses to the configuration space of certain sizes are not allowed
>     * A specific driver is necessary for driving the host bridge
> 
> The former is straightforward to solve but the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
> 
> The number of platforms requiring specific PCI host bridge driver is currently
> limited. Whilst it is not possible to predict the future, it will be expected
> upcoming platform to have fully ECAM compliant PCI host bridges. Therefore,
> given Xen does not have any ASL parser, the approach suggested is to hardcode
> the missing values. This could be revisit in the future if necessary.
> 
> ### Finding information to configure IOMMU and MSI controller
> 
> The static table [IORT] will provide information that will help to deduce
> data (such as MasterID and DeviceID) to configure both the IOMMU and the MSI
> controller from a given SBDF.
> 
> ## Finding which NUMA node a PCI device belongs to
> 
> On NUMA system, the NUMA node associated to a PCI device can be found using
> the _PXM method of the host bridge (?).
> 
> XXX: I am not entirely sure where the _PXM will be (i.e host bridge vs PCI
> device).
> 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instanciate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges.
> 
> ### Finding information to configure IOMMU and MSI controller
> 
> ### Configuring the IOMMU
> 
> The Device Treee provides a generic IOMMU bindings (see [10]) which uses the
> properties "iommu-map" and "iommu-map-mask" to described the relationship
> between RID and a MasterID.
> 
> These properties will be present in the host bridge Device Tree node. From a
> given SBDF, it will be possible to find the corresponding MasterID.
> 
> Note that the ARM SMMU also have a legacy binding (see [9]), but it does not
> have a way to describe the relationship between RID and StreamID. Instead it
> assumed that StreamID == RID. This binding has now been deprecated in favor
> of the generic IOMMU binding.
> 
> ### Configuring the MSI controller
> 
> The relationship between the RID and data required to configure the MSI
> controller (such as DeviceID) can be found using the property "msi-map"
> (see [11]).
> 
> This property will be present in the host bridge Device Tree node. From a
> given SBDF, it will be possible to find the corresponding MasterID.
> 
> ## Finding which NUMA node a PCI device belongs to
> 
> On NUMA system, the NUMA node associated to a PCI device can be found using
> the property "numa-node-id" (see [15]) presents in the host bridge Device Tree
> node.
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in the hardware domain, the
> hypervisor does not have any knowledge of them. The first step of supporting
> PCI pass-through is to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
> 
> This means that Xen should be in charge of controlling the host bridge. However,
> for some host controller, this may be difficult to implement in Xen because of
> depencencies on other components (e.g clocks, see more details in "PCI host
> bridge" section).
> 
> For this reason, the approach chosen in this document is to let the hardware
> domain to discover the host bridges, scan the PCI devices and then report
> everything to Xen. This does not rule out the possibility of doing everything
> without the help of the hardware domain in the future.
> 
> ## Who is in charge of the host bridge?
> 
> There are numerous implementation of host bridges which exist on ARM. A part of
> them requires a specific driver as they cannot be driven by a generic host bridge
> driver. Porting those drivers may be complex due to dependencies on other
> components.
> 
> This would be seen as signal to leave the host bridge drivers in the hardware
> domain. Because Xen would need to access the configuration space, all the access
> would have to be forwarded to hardware domain which in turn will access the
> hardware.
> 
> In this design document, we are considering that the host bridge driver can
> be ported in Xen. In the case it is not possible, a interface to forward
> configuration space access would need to be defined. The interface details
> is out of scope.
> 
> ## Discovering and registering host bridge
> 
> The approach taken in the document will require communication between Xen and
> the hardware domain. In this case, they would need to agree on the segment
> number associated to an host bridge. However, this number is not available in
> the Device Tree case.
> 
> The hardware domain will register new host bridges using the existing hypercall
> PHYSDEV_mmcfg_reserved:
> 
> #define XEN_PCI_MMCFG_RESERVED 1
> 
> struct physdev_pci_mmcfg_reserved {
>     /* IN */
>     uint64_t    address;
>     uint16_t    segment;
>     /* Range of bus supported by the host bridge */
>     uint8_t     start_bus;
>     uint8_t     end_bus;
> 
>     uint32_t    flags;
> }
> 
> Some of the host bridges may not have a separate configuration address space
> region described in the firmware tables. To simplify the registration, the
> field 'address' should contains the base address of one of the region
> described in the firmware tables.
>     * For ACPI, it would be the base address specified in the MCFG or in the
>     _CBA method.
>     * For Device Tree, this would be any base address of region
>     specified in the "reg" property.
> 
> The field 'flags' is expected to have XEN_PCI_MMCFG_RESERVED set.
> 
> It is expected that this hypercall is called before any PCI devices is
> registered to Xen.
> 
> When the hardware domain is in charge of the host bridge, this hypercall will
> be used to tell Xen the existence of an host bridge in order to find the
> associated information for configuring the MSI controller and the IOMMU.
> 
> ## Discovering and registering PCI devices
> 
> The hardware domain will scan the host bridge to find the list of PCI devices
> available and then report it to Xen using the existing hypercall
> PHYSDEV_pci_device_add:
> 
> #define XEN_PCI_DEV_EXTFN   0x1
> #define XEN_PCI_DEV_VIRTFN  0x2
> #define XEN_PCI_DEV_PXM     0x3
> 
> struct physdev_pci_device_add {
>     /* IN */
>     uint16_t    seg;
>     uint8_t     bus;
>     uint8_t     devfn;
>     uint32_t    flags;
>     struct {
>         uint8_t bus;
>         uint8_t devfn;
>     } physfn;
>     /*
>      * Optional parameters array.
>      * First element ([0]) is PXM domain associated with the device (if
>      * XEN_PCI_DEV_PXM is set)
>      */
>     uint32_t optarr[0];
> }
> 
> When XEN_PCI_DEV_PXM is set in the field 'flag', optarr[0] will contain the
> NUMA node ID associated with the device:
>     * For ACPI, it would be the value returned by the method _PXM
>     * For Device Tree, this would the value found in the property "numa-node-id".
> For more details see the section "Finding which NUMA node a PCI device belongs
> to" in "ACPI" and "Device Tree".
> 
> XXX: I still don't fully understand how XEN_PCI_DEV_EXTFN and XEN_PCI_DEV_VIRTFN
> wil work. AFAICT, the former is used with the bus support ARI and the only usage
> is in the x86 IOMMU code. For the latter, this is related to IOV but I am not
> sure what devfn and physfn.devfn will correspond too.
> 
> Note that x86 currently provides two more hypercalls (PHYSDEVOP_manage_pci_add
> and PHYSDEVOP_manage_pci_add_ext) to register PCI devices. However they are
> subset of the hypercall PHYSDEVOP_pci_device_add. Therefore, it is suggested
> to leave them unimplemented on ARM.
> 
> ## Removing PCI devices
> 
> The hardware domain will be in charge Xen a device has been removed using
> the existing hypercall PHYSDEV_pci_device_remove:
> 
> struct physdev_pci_device {
>     /* IN */
>     uint16_t    seg;
>     uint8_t     bus;
>     uint8_t     devfn;
> }
> 
> Note that x86 currently provide one more hypercall (PHYSDEVOP_manage_pci_remove)
> to remove PCI devices. However it does not allow to pass a segment number.
> Therefore it is suggested to leave unimplemented on ARM.
> 
> # Glossary
> 
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> MSI doorbell: MMIO address written to by a device to generate an MSI
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
> 
> # Specifications
> [SBSA]  ARM-DEN-0029 v3.0
> [GICV3] IHI0069C
> [IORT]  DEN0049B
> 
> # Bibliography
> 
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> [12] drivers/pci/host/pcie-rcar.c
> [13] drivers/pci/host/pci-thunder-ecam.c
> [14] drivers/pci/host/pci-thunder-pem.c
> [15] Documents/devicetree/bindings/numa.txt
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-30  9:54   ` Julien Grall
@ 2017-06-16  0:31     ` Stefano Stabellini
  0 siblings, 0 replies; 35+ messages in thread
From: Stefano Stabellini @ 2017-06-16  0:31 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, vikrams, okaya,
	Goel, Sameer, xen-devel, Dave P Martin, Vijaya Kumar K,
	Roger Pau Monné

On Tue, 30 May 2017, Julien Grall wrote:
> > > In this design document, we are considering that the host bridge driver
> > > can
> > > be ported in Xen. In the case it is not possible, a interface to forward
> > > configuration space access would need to be defined. The interface details
> > > is out of scope.
> > 
> > I think that you have to state that the driver is ported to Xen or the
> > bridge will not be supported. I don't think it's feasible to forward
> > PCI config space access from Xen to Dom0 at all.

Easy to say, but in practice there might be boards that we want to
support which require complex configurations.

Obviously having to send PCI config space read/write requests from Xen
to Dom0 is ugly and slow and doesn't match the Xen architecture, but it
might be the only solution in these cases. This is ARM: the ecosystem
has a lot more variety compared to x86: a single approach might simply
not be possible.

Another ugly (and fragile) idea to solve this problem would be to
initialize those difficult PCI host bridges in Dom0, then cede control
of them from Dom0 to Xen: I expect that once they are initialized, Xen
might be able to drive them more easily, without getting entangled with
clocks and regulators. 


> Rather than arguing on the code is not ready for that. I would have
> appreciated if you gave technical details on why it is not feasible.
> 
> I already gave quite a few times insights on why it might be difficult to port
> an host bridges in Xen.
> 	- How do you configure the clock? What if they are shared?
> 	- How about host bridges using indirect access (e.g cf8 like)? What
> you expose to DOM0?
> 	- ....
> 
> Such host bridges will end up to pull a lot of code in Xen and require more
> design than finding about a way to forward configuration space in Xen. Those
> boards exists and people are looking at using Xen + PCI passthrough. So saying
> they are not supported is not the right solution here.

I agree


> Anyway, I mentioned it in the design document to open a discussion and not
> something I am going to focus for a first version of PCI pass-through.

Indeed: we'll cross that bridge when we get to it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
                   ` (2 preceding siblings ...)
  2017-06-16  0:23 ` Stefano Stabellini
@ 2017-06-20  0:19 ` Vikram Sethi
  2017-06-28 15:22   ` Julien Grall
  2017-07-19 14:41 ` Notes from PCI Passthrough design discussion at Xen Summit Punit Agrawal
  2018-01-22 11:10 ` [RFC] ARM PCI Passthrough design document Manish Jaggi
  5 siblings, 1 reply; 35+ messages in thread
From: Vikram Sethi @ 2017-06-20  0:19 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: edgar.iglesias, Sinan Kaya, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, Sameer Goel,
	xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau

Hi Julien, 
Thanks for posting this. I think some additional topics need to be covered in the design document, under 3 main topics:

Hotplug: how will Xen support hotplug? Many rootports may require firmware hooks such as ACPI ASL to take care of platform specific MMIO initialization on hotplug. Normally firmware (UEFI) would have done that platform specific setup at boot. 

AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) be recoverable in Xen? 
Will drivers in doms be notified about fatal errors so they can be quiesced before doing secondary bus reset in Xen? 
Will Xen support Firmware First Error handling for AER? i.e When platform does Firmware first error handling for AER and/or filtering of AER, sends associated ACPI HEST logs to Xen
How will AER notification and logs be propagated to the doms: injected ACPI HEST?

PCIe DPC (Downstream Port Containment): will it be supported in Xen, and Xen will register for DPC interrupt? When Xen brings the link back up will it send a simulated hotplug to dom0 to show link back up?

Thanks,
Vikram

-----Original Message-----
From: Julien Grall [mailto:julien.grall@linaro.org] 
Sent: Friday, May 26, 2017 12:14 PM
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@linaro.org>; xen-devel <xen-devel@lists.xenproject.org>; edgar.iglesias@xilinx.com; Steve Capper <Steve.Capper@arm.com>; punit.agrawal@arm.com; Wei Chen <Wei.Chen@arm.com>; Dave P Martin <Dave.Martin@arm.com>; Sameer Goel <sgoel@qti.qualcomm.com>; Sinan Kaya <okaya@qti.qualcomm.com>; Vikram Sethi <vikrams@qti.qualcomm.com>; roger.pau@citrix.com; manish.jaggi@caviumnetworks.com; Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>; Andre Przywara <andre.przywara@arm.com>
Subject: [RFC] ARM PCI Passthrough design document

Hi all,

The document below is an RFC version of a design proposal for PCI Passthrough in Xen on ARM. It aims to describe from an high level perspective the interaction with the different subsystems and how guest will be able to discover and access PCI.

Currently on ARM, Xen does not have any knowledge about PCI devices. This means that IOMMU and interrupt controller (such as ITS) requiring specific configuration will not work with PCI even with DOM0.

The PCI Passthrough work could be divided in 2 phases:
        * Phase 1: Register all PCI devices in Xen => will allow
                   to use ITS and SMMU with PCI in Xen
        * Phase 2: Assign devices to guests

This document aims to describe the 2 phases, but for now only phase
1 is fully described.


I think I was able to gather all of the feedbacks and come up with a solution that will satisfy all the parties. The design document has changed quite a lot compare to the early draft sent few months ago. The major changes are:
	* Provide more details how PCI works on ARM and the interactions with
	MSI controller and IOMMU
	* Provide details on the existing host bridge implementations
	* Give more explanation and justifications on the approach chosen 
	* Describing the hypercalls used and how they should be called

Feedbacks are welcomed.

Cheers,

--------------------------------------------------------------------------------

% PCI pass-through support on ARM
% Julien Grall <julien.grall@linaro.org> % Draft B

# Preface

This document aims to describe the components required to enable the PCI pass-through on ARM.

This is an early draft and some questions are still unanswered. When this is the case, the text will contain XXX.

# Introduction

PCI pass-through allows the guest to receive full control of physical PCI devices. This means the guest will have full and direct access to the PCI device.

ARM is supporting a kind of guest that exploits as much as possible virtualization support in hardware. The guest will rely on PV driver only for IO (e.g block, network) and interrupts will come through the virtualized interrupt controller, therefore there are no big changes required within the kernel.

As a consequence, it would be possible to replace PV drivers by assigning real devices to the guest for I/O access. Xen on ARM would therefore be able to run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating the host bridge (there will be more details later). A guest would be able to take advantage of the firmware tables, obviating the need for a specific driver for Xen.

Thus, in this document we follow the emulated host bridge approach.

# PCI terminologies

Each PCI device under a host bridge is uniquely identified by its Requester ID (AKA RID). A Requester ID is a triplet of Bus number, Device number, and Function.

When the platform has multiple host bridges, the software can add a fourth number called Segment (sometimes called Domain) to differentiate host bridges.
A PCI device will then uniquely by segment:bus:device:function (AKA SBDF).

So given a specific SBDF, it would be possible to find the host bridge and the RID associated to a PCI device. The pair (host bridge, RID) will often be used to find the relevant information for configuring the different subsystems (e.g IOMMU, MSI controller). For convenience, the rest of the document will use SBDF to refer to the pair (host bridge, RID).

# PCI host bridge

PCI host bridge enables data transfer between a host processor and PCI bus based devices. The bridge is used to access the configuration space of each PCI devices and, on some platform may also act as an MSI controller.

## Initialization of the PCI host bridge

Whilst it would be expected that the bootloader takes care of initializing the PCI host bridge, on some platforms it is done in the Operating System.

This may include enabling/configuring the clocks that could be shared among multiple devices.

## Accessing PCI configuration space

Accessing the PCI configuration space can be divided in 2 category:
    * Indirect access, where the configuration spaces are multiplexed. An
    example would be legacy method on x86 (e.g 0xcf8 and 0xcfc). On ARM a
    similar method is used by PCIe RCar root complex (see [12]).
    * ECAM access, each configuration space will have its own address space.

Whilst ECAM is a standard, some PCI host bridges will require specific fiddling when access the registers (see thunder-ecam [13]).

In most of the cases, accessing all the PCI configuration spaces under a given PCI host will be done the same way (i.e either indirect access or ECAM access). However, there are a few cases, dependent on the PCI devices accessed, which will use different methods (see thunder-pem [14]).

## Generic host bridge

For the purpose of this document, the term "generic host bridge" will be used to describe any host bridge ECAM-compliant and the initialization, if required, will be already done by the firmware/bootloader.

# Interaction of the PCI subsystem with other subsystems

In order to have a PCI device fully working, Xen will need to configure other subsystems such as the IOMMU and the Interrupt Controller.

The interaction expected between the PCI subsystem and the other subsystems is:
    * Add a device
    * Remove a device
    * Assign a device to a guest
    * Deassign a device from a guest

XXX: Detail the interaction when assigning/deassigning device

In the following subsections, the interactions will be briefly described from a higher level perspective. However, implementation details such as callback, structure, etc... are beyond the scope of this document.

## IOMMU

The IOMMU will be used to isolate the PCI device when accessing the memory (e.g DMA and MSI Doorbells). Often the IOMMU will be configured using a MasterID (aka StreamID for ARM SMMU)  that can be deduced from the SBDF with the help of the firmware tables (see below).

Whilst in theory, all the memory transactions issued by a PCI device should go through the IOMMU, on certain platforms some of the memory transaction may not reach the IOMMU because they are interpreted by the host bridge. For instance, this could happen if the MSI doorbell is built into the PCI host bridge or for P2P traffic. See [6] for more details.

XXX: I think this could be solved by using direct mapping (e.g GFN == MFN), this would mean the guest memory layout would be similar to the host one when PCI devices will be pass-throughed => Detail it.

## Interrupt controller

PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM, legacy interrupts will be mapped to SPIs. MSI and MSI-X will write their payload in a doorbell belonging to a MSI controller.

### Existing MSI controllers

In this section some of the existing controllers and their interaction with the devices will be briefly described. More details can be found in the respective specifications of each MSI controller.

MSIs can be distinguished by some combination of
    * the Doorbell
        It is the MMIO address written to. Devices may be configured by
        software to write to arbitrary doorbells which they can address.
        An MSI controller may feature a number of doorbells.
    * the Payload
        Devices may be configured to write an arbitrary payload chosen by
        software. MSI controllers may have restrictions on permitted payload.
        Xen will have to sanitize the payload unless it is known to be always
        safe.
    * Sideband information accompanying the write
        Typically this is neither configurable nor probeable, and depends on
        the path taken through the memory system (i.e it is a property of the
        combination of MSI controller and device rather than a property of
        either in isolation).

### GICv3/GICv4 ITS

The Interrupt Translation Service (ITS) is a MSI controller designed by ARM and integrated in the GICv3/GICv4 interrupt controller. For the specification see [GICV3]. Each MSI/MSI-X will be mapped to a new type of interrupt called LPI. This interrupt will be configured by the software using a pair (DeviceID, EventID).

A platform may have multiple ITS block (e.g one per NUMA node), each of them belong to an ITS group.

The DeviceID is a unique identifier with an ITS group for each MSI-capable device that can be deduced from the RID with the help of the firmware tables (see below).

The EventID is a unique identifier to distinguish different event sending by a device.

The MSI payload will only contain the EventID as the DeviceID will be added afterwards by the hardware in a way that will prevent any tampering.

The [SBSA] appendix I describes the set of rules for the integration of the ITS that any compliant platform should follow. Some of the rules will explain the security implication of a misbehaving devices. It ensures that a guest will never be able to trigger an MSI on behalf of another guest.

XXX: The security implication is described in the [SBSA] but I haven't found any similar working in the GICv3 specification. It is unclear to me if non-SBSA compliant platform (e.g embedded) will follow those rules.

### GICv2m

The GICv2m is an extension of the GICv2 to convert MSI/MSI-X writes to unique interrupts. The specification can be found in the [SBSA] appendix E.

Depending on the platform, the GICv2m will provide one or multiple instance of register frames. Each frame is composed of a doorbell and associated to a set of SPIs that can be discovered by reading the register MSI_TYPER.

On an MSI write, the payload will contain the SPI ID to generate. Note that on some platform the MSI payload may contain an offset form the base SPI rather than the SPI itself.

The frame will only generate SPI if the written value corresponds to an SPI allocated to the frame. Each VM should have exclusity to the frame to ensure isolation and prevent a guest OS to trigger an MSI on-behalf of another guest OS.

XXX: Linux seems to consider GICv2m as unsafe by default. From my understanding, it is still unclear how we should proceed on Xen, as GICv2m should be safe as long as the frame is only accessed by one guest.

### Other MSI controllers

Servers compliant with SBSA level 1 and higher will have to use either ITS or GICv2m. However, it is by no means the only MSI controllers available.
The hardware vendor may decide to use their custom MSI controller which can be integrated in the PCI host bridge.

Whether it will be possible to write securely an MSI will depend on the MSI controller implementations.

XXX: I am happy to give a brief explanation on more MSI controller (such as Xilinx and Renesas) if people think it is necessary.

This design document does not pertain to a specific MSI controller and will try to be as agnostic is possible. When possible, it will give insight how to integrate the MSI controller.

# Information available in the firmware tables

## ACPI

### Host bridges

The static table MCFG (see 4.2 in [1]) will describe the host bridges available at boot and supporting ECAM. Unfortunately, there are platforms out there (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM compatible.

This means that Xen needs to account for possible quirks in the host bridge.
The Linux community are working on a patch series for this, see [2] and [3], where quirks will be detected with:
    * OEM ID
    * OEM Table ID
    * OEM Revision
    * PCI Segment
    * PCI bus number range (wildcard allowed)

Based on what Linux is currently doing, there are two kind of quirks:
    * Accesses to the configuration space of certain sizes are not allowed
    * A specific driver is necessary for driving the host bridge

The former is straightforward to solve but the latter will require more thought.
Instantiation of a specific driver for the host controller can be easily done if Xen has the information to detect it. However, those drivers may require resources described in ASL (see [4] for instance).

The number of platforms requiring specific PCI host bridge driver is currently limited. Whilst it is not possible to predict the future, it will be expected upcoming platform to have fully ECAM compliant PCI host bridges. Therefore, given Xen does not have any ASL parser, the approach suggested is to hardcode the missing values. This could be revisit in the future if necessary.

### Finding information to configure IOMMU and MSI controller

The static table [IORT] will provide information that will help to deduce data (such as MasterID and DeviceID) to configure both the IOMMU and the MSI controller from a given SBDF.

## Finding which NUMA node a PCI device belongs to

On NUMA system, the NUMA node associated to a PCI device can be found using the _PXM method of the host bridge (?).

XXX: I am not entirely sure where the _PXM will be (i.e host bridge vs PCI device).

## Device Tree

### Host bridges

Each Device Tree node associated to a host bridge will have at least the following properties (see bindings in [8]):
    - device_type: will always be "pci".
    - compatible: a string indicating which driver to instanciate

The node may also contain optional properties such as:
    - linux,pci-domain: assign a fix segment number
    - bus-range: indicate the range of bus numbers supported

When the property linux,pci-domain is not present, the operating system would have to allocate the segment number for each host bridges.

### Finding information to configure IOMMU and MSI controller

### Configuring the IOMMU

The Device Treee provides a generic IOMMU bindings (see [10]) which uses the properties "iommu-map" and "iommu-map-mask" to described the relationship between RID and a MasterID.

These properties will be present in the host bridge Device Tree node. From a given SBDF, it will be possible to find the corresponding MasterID.

Note that the ARM SMMU also have a legacy binding (see [9]), but it does not have a way to describe the relationship between RID and StreamID. Instead it assumed that StreamID == RID. This binding has now been deprecated in favor of the generic IOMMU binding.

### Configuring the MSI controller

The relationship between the RID and data required to configure the MSI controller (such as DeviceID) can be found using the property "msi-map"
(see [11]).

This property will be present in the host bridge Device Tree node. From a given SBDF, it will be possible to find the corresponding MasterID.

## Finding which NUMA node a PCI device belongs to

On NUMA system, the NUMA node associated to a PCI device can be found using the property "numa-node-id" (see [15]) presents in the host bridge Device Tree node.

# Discovering PCI devices

Whilst PCI devices are currently available in the hardware domain, the hypervisor does not have any knowledge of them. The first step of supporting PCI pass-through is to make Xen aware of the PCI devices.

Xen will require access to the PCI configuration space to retrieve information for the PCI devices or access it on behalf of the guest via the emulated host bridge.

This means that Xen should be in charge of controlling the host bridge. However, for some host controller, this may be difficult to implement in Xen because of depencencies on other components (e.g clocks, see more details in "PCI host bridge" section).

For this reason, the approach chosen in this document is to let the hardware domain to discover the host bridges, scan the PCI devices and then report everything to Xen. This does not rule out the possibility of doing everything without the help of the hardware domain in the future.

## Who is in charge of the host bridge?

There are numerous implementation of host bridges which exist on ARM. A part of them requires a specific driver as they cannot be driven by a generic host bridge driver. Porting those drivers may be complex due to dependencies on other components.

This would be seen as signal to leave the host bridge drivers in the hardware domain. Because Xen would need to access the configuration space, all the access would have to be forwarded to hardware domain which in turn will access the hardware.

In this design document, we are considering that the host bridge driver can be ported in Xen. In the case it is not possible, a interface to forward configuration space access would need to be defined. The interface details is out of scope.

## Discovering and registering host bridge

The approach taken in the document will require communication between Xen and the hardware domain. In this case, they would need to agree on the segment number associated to an host bridge. However, this number is not available in the Device Tree case.

The hardware domain will register new host bridges using the existing hypercall
PHYSDEV_mmcfg_reserved:

#define XEN_PCI_MMCFG_RESERVED 1

struct physdev_pci_mmcfg_reserved {
    /* IN */
    uint64_t    address;
    uint16_t    segment;
    /* Range of bus supported by the host bridge */
    uint8_t     start_bus;
    uint8_t     end_bus;

    uint32_t    flags;
}

Some of the host bridges may not have a separate configuration address space region described in the firmware tables. To simplify the registration, the field 'address' should contains the base address of one of the region described in the firmware tables.
    * For ACPI, it would be the base address specified in the MCFG or in the
    _CBA method.
    * For Device Tree, this would be any base address of region
    specified in the "reg" property.

The field 'flags' is expected to have XEN_PCI_MMCFG_RESERVED set.

It is expected that this hypercall is called before any PCI devices is registered to Xen.

When the hardware domain is in charge of the host bridge, this hypercall will be used to tell Xen the existence of an host bridge in order to find the associated information for configuring the MSI controller and the IOMMU.

## Discovering and registering PCI devices

The hardware domain will scan the host bridge to find the list of PCI devices available and then report it to Xen using the existing hypercall
PHYSDEV_pci_device_add:

#define XEN_PCI_DEV_EXTFN   0x1
#define XEN_PCI_DEV_VIRTFN  0x2
#define XEN_PCI_DEV_PXM     0x3

struct physdev_pci_device_add {
    /* IN */
    uint16_t    seg;
    uint8_t     bus;
    uint8_t     devfn;
    uint32_t    flags;
    struct {
        uint8_t bus;
        uint8_t devfn;
    } physfn;
    /*
     * Optional parameters array.
     * First element ([0]) is PXM domain associated with the device (if
     * XEN_PCI_DEV_PXM is set)
     */
    uint32_t optarr[0];
}

When XEN_PCI_DEV_PXM is set in the field 'flag', optarr[0] will contain the NUMA node ID associated with the device:
    * For ACPI, it would be the value returned by the method _PXM
    * For Device Tree, this would the value found in the property "numa-node-id".
For more details see the section "Finding which NUMA node a PCI device belongs to" in "ACPI" and "Device Tree".

XXX: I still don't fully understand how XEN_PCI_DEV_EXTFN and XEN_PCI_DEV_VIRTFN wil work. AFAICT, the former is used with the bus support ARI and the only usage is in the x86 IOMMU code. For the latter, this is related to IOV but I am not sure what devfn and physfn.devfn will correspond too.

Note that x86 currently provides two more hypercalls (PHYSDEVOP_manage_pci_add and PHYSDEVOP_manage_pci_add_ext) to register PCI devices. However they are subset of the hypercall PHYSDEVOP_pci_device_add. Therefore, it is suggested to leave them unimplemented on ARM.

## Removing PCI devices

The hardware domain will be in charge Xen a device has been removed using the existing hypercall PHYSDEV_pci_device_remove:

struct physdev_pci_device {
    /* IN */
    uint16_t    seg;
    uint8_t     bus;
    uint8_t     devfn;
}

Note that x86 currently provide one more hypercall (PHYSDEVOP_manage_pci_remove) to remove PCI devices. However it does not allow to pass a segment number.
Therefore it is suggested to leave unimplemented on ARM.

# Glossary

ECAM: Enhanced Configuration Mechanism
SBDF: Segment Bus Device Function. The segment is a software concept.
MSI: Message Signaled Interrupt
MSI doorbell: MMIO address written to by a device to generate an MSI
SPI: Shared Peripheral Interrupt
LPI: Locality-specific Peripheral Interrupt
ITS: Interrupt Translation Service

# Specifications
[SBSA]  ARM-DEN-0029 v3.0
[GICV3] IHI0069C
[IORT]  DEN0049B

# Bibliography

[1] PCI firmware specification, rev 3.2
[2] https://www.spinics.net/lists/linux-pci/msg56715.html
[3] https://www.spinics.net/lists/linux-pci/msg56723.html
[4] https://www.spinics.net/lists/linux-pci/msg56728.html
[6] https://www.spinics.net/lists/kvm/msg140116.html
[7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
[8] Documents/devicetree/bindings/pci
[9] Documents/devicetree/bindings/iommu/arm,smmu.txt
[10] Document/devicetree/bindings/pci/pci-iommu.txt
[11] Documents/devicetree/bindings/pci/pci-msi.txt
[12] drivers/pci/host/pcie-rcar.c
[13] drivers/pci/host/pci-thunder-ecam.c
[14] drivers/pci/host/pci-thunder-pem.c
[15] Documents/devicetree/bindings/numa.txt
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-06-20  0:19 ` Vikram Sethi
@ 2017-06-28 15:22   ` Julien Grall
  2017-06-29 15:17     ` Vikram Sethi
  2017-07-04  8:30     ` roger.pau
  0 siblings, 2 replies; 35+ messages in thread
From: Julien Grall @ 2017-06-28 15:22 UTC (permalink / raw)
  To: Vikram Sethi, Stefano Stabellini
  Cc: edgar.iglesias, Sinan Kaya, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, Sameer Goel,
	xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau



On 20/06/17 01:19, Vikram Sethi wrote:
> Hi Julien,

Hi Vikram,

Thank you for your feedbacks.

> Thanks for posting this. I think some additional topics need to be covered in the design document, under 3 main topics:

I wanted to limit the scope of the PCI passthrough work to the strict 
minimum. I didn't consider hotplug and AER in the scope because it is 
optional feature.

>
> Hotplug: how will Xen support hotplug? Many rootports may require firmware hooks such as ACPI ASL to take care of platform specific MMIO initialization on hotplug. Normally firmware (UEFI) would have done that platform specific setup at boot.

We don't have ASL support in Xen. So I would expect the hotplug to be 
handled by the hardware domain and then report it to Xen.

This would also fit quite well to the current design as the hardware 
domain will scan PCI devices at boot and then register them to Xen via 
an hypercall.

>
> AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) be recoverable in Xen?
> Will drivers in doms be notified about fatal errors so they can be quiesced before doing secondary bus reset in Xen?
> Will Xen support Firmware First Error handling for AER? i.e When platform does Firmware first error handling for AER and/or filtering of AER, sends associated ACPI HEST logs to Xen
> How will AER notification and logs be propagated to the doms: injected ACPI HEST?
>
> PCIe DPC (Downstream Port Containment): will it be supported in Xen, and Xen will register for DPC interrupt? When Xen brings the link back up will it send a simulated hotplug to dom0 to show link back up?

I don't feel it is necessary to look at AER for the first work of PCI 
passthrough. I consider it as a separate feature that could probably 
come with the RAS story.

At the moment, I don't know who is going to handle the error and even 
how they will be reported to the guest. But I don't think this will have 
any impact on our design choice here.

Let me know if you think it may have an impact.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-06-28 15:22   ` Julien Grall
@ 2017-06-29 15:17     ` Vikram Sethi
  2017-07-03 14:35       ` Julien Grall
  2017-07-04  8:30     ` roger.pau
  1 sibling, 1 reply; 35+ messages in thread
From: Vikram Sethi @ 2017-06-29 15:17 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: edgar.iglesias, Sinan Kaya, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, Sameer Goel,
	xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau

Hi Julien, 
My thoughts are that while it is not essential to recover from AER and DPC initially, it is critical to at least take the slot offline and notify drivers so they quiesce.
Without this basic handling, it is possible to create backups in some hardware that result in CPU hangs for loads to adapter MMIO/cfg space and we don't want that.
i.e it is probably OK to lose the slot/adapter in initial implementation, but IMO it is NOT ok to crash/reboot the system by having watchdog kick in.
We do need to minimally describe what we will do with the AER and DPC interrupts: are they first handled by Xen and sent as "emulated" interrupt to owning domain?
Or are the interrupts ignored in initial implementation (not a good idea IMO)?

Hotplug also does not need to be solved right away. But we need to at least walk through the flows and convince ourselves we are not painting ourselves in a corner.
I will be in Budapest for Xen developer summit and we can walk through the ACPI hotplug flow and see how that *could* fit into proposed Xen design.

Thanks,
Vikram

-----Original Message-----
From: Julien Grall [mailto:julien.grall@linaro.org] 
Sent: Wednesday, June 28, 2017 10:23 AM
To: Vikram Sethi <vikrams@qti.qualcomm.com>; Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel <xen-devel@lists.xenproject.org>; edgar.iglesias@xilinx.com; Steve Capper <Steve.Capper@arm.com>; punit.agrawal@arm.com; Wei Chen <Wei.Chen@arm.com>; Dave P Martin <Dave.Martin@arm.com>; Sameer Goel <sgoel@qti.qualcomm.com>; Sinan Kaya <okaya@qti.qualcomm.com>; roger.pau@citrix.com; manish.jaggi@caviumnetworks.com; Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>; Andre Przywara <andre.przywara@arm.com>
Subject: Re: [RFC] ARM PCI Passthrough design document



On 20/06/17 01:19, Vikram Sethi wrote:
> Hi Julien,

Hi Vikram,

Thank you for your feedbacks.

> Thanks for posting this. I think some additional topics need to be covered in the design document, under 3 main topics:

I wanted to limit the scope of the PCI passthrough work to the strict minimum. I didn't consider hotplug and AER in the scope because it is optional feature.

>
> Hotplug: how will Xen support hotplug? Many rootports may require firmware hooks such as ACPI ASL to take care of platform specific MMIO initialization on hotplug. Normally firmware (UEFI) would have done that platform specific setup at boot.

We don't have ASL support in Xen. So I would expect the hotplug to be 
handled by the hardware domain and then report it to Xen.

This would also fit quite well to the current design as the hardware 
domain will scan PCI devices at boot and then register them to Xen via 
an hypercall.

>
> AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) be recoverable in Xen?
> Will drivers in doms be notified about fatal errors so they can be quiesced before doing secondary bus reset in Xen?
> Will Xen support Firmware First Error handling for AER? i.e When platform does Firmware first error handling for AER and/or filtering of AER, sends associated ACPI HEST logs to Xen
> How will AER notification and logs be propagated to the doms: injected ACPI HEST?
>
> PCIe DPC (Downstream Port Containment): will it be supported in Xen, and Xen will register for DPC interrupt? When Xen brings the link back up will it send a simulated hotplug to dom0 to show link back up?

I don't feel it is necessary to look at AER for the first work of PCI 
passthrough. I consider it as a separate feature that could probably 
come with the RAS story.

At the moment, I don't know who is going to handle the error and even 
how they will be reported to the guest. But I don't think this will have 
any impact on our design choice here.

Let me know if you think it may have an impact.

Cheers,

-- 
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-06-29 15:17     ` Vikram Sethi
@ 2017-07-03 14:35       ` Julien Grall
  0 siblings, 0 replies; 35+ messages in thread
From: Julien Grall @ 2017-07-03 14:35 UTC (permalink / raw)
  To: Vikram Sethi, Stefano Stabellini
  Cc: edgar.iglesias, Sinan Kaya, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, Sameer Goel,
	xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau



On 29/06/17 16:17, Vikram Sethi wrote:
> Hi Julien,

Hi Vikram,

> My thoughts are that while it is not essential to recover from AER and DPC initially, it is critical to at least take the slot offline and notify drivers so they quiesce.
> Without this basic handling, it is possible to create backups in some hardware that result in CPU hangs for loads to adapter MMIO/cfg space and we don't want that.
> i.e it is probably OK to lose the slot/adapter in initial implementation, but IMO it is NOT ok to crash/reboot the system by having watchdog kick in.
> We do need to minimally describe what we will do with the AER and DPC interrupts: are they first handled by Xen and sent as "emulated" interrupt to owning domain?
> Or are the interrupts ignored in initial implementation (not a good idea IMO)?

I don't think it is possible to ask everything to be supported in the 
initial implementation. We have to draw a line so we can get a tech 
preview support in Xen as soon as possible.

At the moment, I am focusing on the foundation that will be required for 
all the boards. I have put them in my low priority tasks because AER, 
DPC, hotplug are optional features and hence not available everywhere.

Feel free to send me a proposal for the design document, patch series if 
you want them to be included in the initial implementation.

>
> Hotplug also does not need to be solved right away. But we need to at least walk through the flows and convince ourselves we are not painting ourselves in a corner.
> I will be in Budapest for Xen developer summit and we can walk through the ACPI hotplug flow and see how that *could* fit into proposed Xen design.

Glad to know that. Let's schedule some discussions during the summit.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-06-28 15:22   ` Julien Grall
  2017-06-29 15:17     ` Vikram Sethi
@ 2017-07-04  8:30     ` roger.pau
  2017-07-06 20:55       ` Vikram Sethi
  1 sibling, 1 reply; 35+ messages in thread
From: roger.pau @ 2017-07-04  8:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andre Przywara, manish.jaggi, punit.agrawal, Vikram Sethi,
	Sinan Kaya, Sameer Goel, xen-devel, Dave P Martin,
	Vijaya Kumar K

Hello,

My 2cents on what are the plans on PVH/x86.

On Wed, Jun 28, 2017 at 04:22:48PM +0100, Julien Grall wrote:
> 
> 
> On 20/06/17 01:19, Vikram Sethi wrote:
> > Hi Julien,
> 
> Hi Vikram,
> 
> Thank you for your feedbacks.
> 
> > Thanks for posting this. I think some additional topics need to be covered in the design document, under 3 main topics:
> 
> I wanted to limit the scope of the PCI passthrough work to the strict
> minimum. I didn't consider hotplug and AER in the scope because it is
> optional feature.
> 
> > 
> > Hotplug: how will Xen support hotplug? Many rootports may require firmware hooks such as ACPI ASL to take care of platform specific MMIO initialization on hotplug. Normally firmware (UEFI) would have done that platform specific setup at boot.
> 
> We don't have ASL support in Xen. So I would expect the hotplug to be
> handled by the hardware domain and then report it to Xen.
> 
> This would also fit quite well to the current design as the hardware domain
> will scan PCI devices at boot and then register them to Xen via an
> hypercall.

Hotplug will be done using an hypercall. We already have them in place
for PV, and this is simply going to be reused:

Hotplug PCI devices:
PHYSDEVOP_manage_pci_add{_ext}

hotplug MMCFG (ECAM) regions:
PHYSDEVOP_pci_mmcfg_reserved

> > 
> > AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) be recoverable in Xen?
> > Will drivers in doms be notified about fatal errors so they can be quiesced before doing secondary bus reset in Xen?
> > Will Xen support Firmware First Error handling for AER? i.e When platform does Firmware first error handling for AER and/or filtering of AER, sends associated ACPI HEST logs to Xen
> > How will AER notification and logs be propagated to the doms: injected ACPI HEST?

Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER is a
PCIe capability, and according to the spec can be setup completely
independent to ACPI.

In any case, Xen can trap or hide the capability from guests, Xen
could possibly even emulate AER somehow if that's more suitable (ie:
guest sets up AER, Xen traps accesses to this capability and filters
the errors Xen wants to handle itself vs the errors that should be
propagated to the guest).

The biggest issue I see with AER (and DPC) is that it requires an
interrupt. So Xen would have to stole one (or more) interrupts from
the guest in order to make use of those capabilities if they are to be
exclusively managed by Xen. This could be done by simply telling the
guest the device has less MSI/MSI-X interrupts than it really has.

> > PCIe DPC (Downstream Port Containment): will it be supported in Xen, and Xen will register for DPC interrupt? When Xen brings the link back up will it send a simulated hotplug to dom0 to show link back up?
> 
> I don't feel it is necessary to look at AER for the first work of PCI
> passthrough. I consider it as a separate feature that could probably come
> with the RAS story.
> 
> At the moment, I don't know who is going to handle the error and even how
> they will be reported to the guest. But I don't think this will have any
> impact on our design choice here.
> 
> Let me know if you think it may have an impact.

As Julien said, I think that you probably know more about AER/DPC than
we do, so it would be good if you could go over the design document
and mare sure that the current approach can work with the way you
intend to use AER/DPC.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-07-04  8:30     ` roger.pau
@ 2017-07-06 20:55       ` Vikram Sethi
  2017-07-07  8:49         ` Roger Pau Monné
  0 siblings, 1 reply; 35+ messages in thread
From: Vikram Sethi @ 2017-07-06 20:55 UTC (permalink / raw)
  To: roger.pau, 'Julien Grall'
  Cc: edgar.iglesias, 'Stefano Stabellini', 'Wei Chen',
	'Steve Capper', 'Andre Przywara',
	manish.jaggi, punit.agrawal, 'Vikram Sethi',
	'Sinan Kaya', 'Sameer Goel', 'xen-devel',
	'Dave P Martin', 'Vijaya Kumar K'

Hi Roger,
Thanks for your comments. My responses inline.


> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
> roger.pau@citrix.com
> Sent: Tuesday, July 4, 2017 3:31 AM
> To: Julien Grall <julien.grall@linaro.org>
> Cc: edgar.iglesias@xilinx.com; Stefano Stabellini <sstabellini@kernel.org>; 
> Wei
> Chen <Wei.Chen@arm.com>; Steve Capper <Steve.Capper@arm.com>; Andre
> Przywara <andre.przywara@arm.com>; manish.jaggi@caviumnetworks.com;
> punit.agrawal@arm.com; Vikram Sethi <vikrams@qti.qualcomm.com>; Sinan
> Kaya <okaya@qti.qualcomm.com>; Sameer Goel <sgoel@qti.qualcomm.com>;
> xen-devel <xen-devel@lists.xenproject.org>; Dave P Martin
> <Dave.Martin@arm.com>; Vijaya Kumar K
> <Vijaya.Kumar@caviumnetworks.com>
> Subject: Re: [Xen-devel] [RFC] ARM PCI Passthrough design document
>
> Hello,
>
> My 2cents on what are the plans on PVH/x86.
>
> On Wed, Jun 28, 2017 at 04:22:48PM +0100, Julien Grall wrote:
> >
> >
> > On 20/06/17 01:19, Vikram Sethi wrote:
> > > Hi Julien,
> >
> > Hi Vikram,
> >
> > Thank you for your feedbacks.
> >
> > > Thanks for posting this. I think some additional topics need to be covered 
> > > in
> the design document, under 3 main topics:
> >
> > I wanted to limit the scope of the PCI passthrough work to the strict
> > minimum. I didn't consider hotplug and AER in the scope because it is
> > optional feature.
> >
> > >
> > > Hotplug: how will Xen support hotplug? Many rootports may require
> firmware hooks such as ACPI ASL to take care of platform specific MMIO
> initialization on hotplug. Normally firmware (UEFI) would have done that
> platform specific setup at boot.
> >
> > We don't have ASL support in Xen. So I would expect the hotplug to be
> > handled by the hardware domain and then report it to Xen.
> >
> > This would also fit quite well to the current design as the hardware
> > domain will scan PCI devices at boot and then register them to Xen via
> > an hypercall.
>
> Hotplug will be done using an hypercall. We already have them in place for PV,
> and this is simply going to be reused:
>
> Hotplug PCI devices:
> PHYSDEVOP_manage_pci_add{_ext}
>
> hotplug MMCFG (ECAM) regions:
> PHYSDEVOP_pci_mmcfg_reserved
>
> > >
> > > AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) 
> > > be
> recoverable in Xen?
> > > Will drivers in doms be notified about fatal errors so they can be 
> > > quiesced
> before doing secondary bus reset in Xen?
> > > Will Xen support Firmware First Error handling for AER? i.e When
> > > platform does Firmware first error handling for AER and/or filtering of 
> > > AER,
> sends associated ACPI HEST logs to Xen How will AER notification and logs be
> propagated to the doms: injected ACPI HEST?
>
> Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER is a PCIe
> capability, and according to the spec can be setup completely independent to
> ACPI.
>
True, it can be independent if not using firmware first AER handling (FFH). But 
Firmware tells the OS whether firmware first is in use.
If FFH is in use, the AER interrupt goes to firmware and then firmware processes 
the AER logs, filters errors, and sends a ACPI HEST log with the filtered AER 
regs to OS along with an ACPI event/interrupt. Kernel is not supposed to touch 
the AER registers directly in this case, but act on the register values in the 
HEST log.
http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_acpi.c#L94
If Firmware is using FFH, Xen will get a HEST log with AER registers, and must 
parse those registers instead of reading AER config space.
After the AER registers have been parsed (either from HEST log or native Xen AER 
interrupt handler), at least for fatal errors, Xen needs to send notification to 
the DOM with the device passthrough so that it's driver(s) can be quiesced (via 
callbacks to dev->driver->err_handler->error_detected for linux) before hot 
reset/secondary bus reset.

Whether FFH is in use or not, Xen has 2 choices in how to present the error to 
doms for quiescing before secondary bus reset:
a. Send a HEST log and ACPI interrupt/event to dom if it booted ACPI dom and 
linux dom calls aer_recover_queue from ACPI ghes path 
http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_core.c#L592b. Present a Root port wired interrupt source in dom ACPI/DT, and inject that 
irq in the GIC LR registers. When dom kernel processes the interrupt and queries 
config space AER, Xen emulates the AER values it wants the dom to see (in FFH 
case based on register values in HEST), and if FFH was in use, not actually 
allow the dom to clear out the AER registers.

Option b is probably better/easier since it works for ACPI/DT dom.

In my view this is the basic AER error handling leaving the devices 
inaccessible.
To recover/resume the devices, the owning dom would need to signal Xen once all 
its driver(s) have quiesced, letting Xen know it is ok to do the secondary bus 
reset (for AER fatal errors). The best way to signal this would be to let the 
dom try to hit SBR in the Root port bridge control register in config space, and 
Xen traps that and actually does the BCR.SBR write.

Since Xen controls the ECAM config space access in Julien's proposed design, I 
don't see any fundamental issues with the above flow fitting into the design.

> In any case, Xen can trap or hide the capability from guests, Xen could 
> possibly
> even emulate AER somehow if that's more suitable (ie:
> guest sets up AER, Xen traps accesses to this capability and filters the 
> errors
> Xen wants to handle itself vs the errors that should be propagated to the
> guest).
>
> The biggest issue I see with AER (and DPC) is that it requires an interrupt. 
> So
> Xen would have to stole one (or more) interrupts from the guest in order to
> make use of those capabilities if they are to be exclusively managed by Xen.
> This could be done by simply telling the guest the device has less MSI/MSI-X
> interrupts than it really has.
>
> > > PCIe DPC (Downstream Port Containment): will it be supported in Xen, and
> Xen will register for DPC interrupt? When Xen brings the link back up will it 
> send
> a simulated hotplug to dom0 to show link back up?
> >
> > I don't feel it is necessary to look at AER for the first work of PCI
> > passthrough. I consider it as a separate feature that could probably
> > come with the RAS story.
> >
> > At the moment, I don't know who is going to handle the error and even
> > how they will be reported to the guest. But I don't think this will
> > have any impact on our design choice here.
> >
> > Let me know if you think it may have an impact.
>
> As Julien said, I think that you probably know more about AER/DPC than we do,
> so it would be good if you could go over the design document and mare sure
> that the current approach can work with the way you intend to use AER/DPC.
>

I think what I wrote above supplements the design, and I don't see any 
fundamental issue.
Let me know if you have any questions or concerns with proposed flow.

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


Thanks,
Vikram
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-07-06 20:55       ` Vikram Sethi
@ 2017-07-07  8:49         ` Roger Pau Monné
  2017-07-07 21:50           ` Stefano Stabellini
  0 siblings, 1 reply; 35+ messages in thread
From: Roger Pau Monné @ 2017-07-07  8:49 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: edgar.iglesias, 'Stefano Stabellini', 'Wei Chen',
	'Steve Capper', 'Andre Przywara',
	manish.jaggi, 'Julien Grall', 'Vikram Sethi',
	punit.agrawal, 'Sameer Goel', 'xen-devel',
	'Sinan Kaya', 'Dave P Martin',
	'Vijaya Kumar K'

On Thu, Jul 06, 2017 at 03:55:28PM -0500, Vikram Sethi wrote:
> > > > AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) 
> > > > be
> > recoverable in Xen?
> > > > Will drivers in doms be notified about fatal errors so they can be 
> > > > quiesced
> > before doing secondary bus reset in Xen?
> > > > Will Xen support Firmware First Error handling for AER? i.e When
> > > > platform does Firmware first error handling for AER and/or filtering of 
> > > > AER,
> > sends associated ACPI HEST logs to Xen How will AER notification and logs be
> > propagated to the doms: injected ACPI HEST?
> >
> > Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER is a PCIe
> > capability, and according to the spec can be setup completely independent to
> > ACPI.
> >
> True, it can be independent if not using firmware first AER handling (FFH). But 
> Firmware tells the OS whether firmware first is in use.
> If FFH is in use, the AER interrupt goes to firmware and then firmware processes 

I'm sorry, but how is the firmware supposed to know which interrupt is
AER using? That's AFAIK setup in the PCI AER capabilities, and
depends on whether the OS configures the device to use MSI or MSI-X.

Is there some kind of side-band mechanism that delivers the AER
interrupt using a different method?

> the AER logs, filters errors, and sends a ACPI HEST log with the filtered AER 
> regs to OS along with an ACPI event/interrupt. Kernel is not supposed to touch 
> the AER registers directly in this case, but act on the register values in the 
> HEST log.
> http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_acpi.c#L94

That's not a problem IMHO, Xen could even mask the AER capability from
the Dom0/guest completely if needed.

> If Firmware is using FFH, Xen will get a HEST log with AER registers, and must 
> parse those registers instead of reading AER config space.

Xen will not get an event, it's going to be delivered to Dom0 because
when using ACPI Dom0 is the OSPM (not Xen). I assume this event is
going to be notified by triggering an interrupt from the ACPI SCI?

> After the AER registers have been parsed (either from HEST log or native Xen AER 
> interrupt handler), at least for fatal errors, Xen needs to send notification to 
> the DOM with the device passthrough so that it's driver(s) can be quiesced (via 
> callbacks to dev->driver->err_handler->error_detected for linux) before hot 
> reset/secondary bus reset.

I don't think this is relevant/true given the statement above (Dom0
being OSPM and receiving the event).

> Whether FFH is in use or not, Xen has 2 choices in how to present the error to 
> doms for quiescing before secondary bus reset:

How is this secondary bus reset performed?

Is it something specific to each bridge or it's a standard
interface?

Can it be done directly by Dom0, or should it be done by Xen?

> a. Send a HEST log and ACPI interrupt/event to dom if it booted ACPI dom and 
> linux dom calls aer_recover_queue from ACPI ghes path 
> http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_core.c#L592b. Present a Root port wired interrupt source in dom ACPI/DT, and inject that 
> irq in the GIC LR registers. When dom kernel processes the interrupt and queries 

You lost me here, I have no knowledge of ARM, and I don't know what
GIC LR is at all.

> config space AER, Xen emulates the AER values it wants the dom to see (in FFH 
> case based on register values in HEST), and if FFH was in use, not actually 
> allow the dom to clear out the AER registers.
> 
> Option b is probably better/easier since it works for ACPI/DT dom.

So as I understand it, the flow is the following:

1. Hardware generates an error.
2. This error triggers an interrupt that's delivered to Dom0 (either
   using an ACPI SCI or a specific AER MSI vector)
3. *Someone* has to do a secondary bus reset.

My question would be, who (either Xen or Dom0) should perform the bus
reset? (and why).

> In my view this is the basic AER error handling leaving the devices 
> inaccessible.
> To recover/resume the devices, the owning dom would need to signal Xen once all 
> its driver(s) have quiesced, letting Xen know it is ok to do the secondary bus 
> reset (for AER fatal errors). The best way to signal this would be to let the 
> dom try to hit SBR in the Root port bridge control register in config space, and 
> Xen traps that and actually does the BCR.SBR write.
>
> Since Xen controls the ECAM config space access in Julien's proposed design, I 
> don't see any fundamental issues with the above flow fitting into the design.

I think it's very hard for me (or Julien) to know exactly how all the
PCI capabilities behave and interact with other components (like
ACPI).

You seem to have a good amount of knowledge about this stuff, would
you mind writing your proposal as a diff to Julien's original
proposal, so that it can be properly reviewed and merged into the
design document?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-07-07  8:49         ` Roger Pau Monné
@ 2017-07-07 21:50           ` Stefano Stabellini
  2017-07-07 23:40             ` Vikram Sethi
  2017-07-08  7:34             ` Roger Pau Monné
  0 siblings, 2 replies; 35+ messages in thread
From: Stefano Stabellini @ 2017-07-07 21:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: edgar.iglesias, 'Stefano Stabellini',
	Vikram Sethi, 'Wei Chen', 'Steve Capper',
	'Andre Przywara', manish.jaggi, 'Julien Grall',
	'Vikram Sethi', punit.agrawal, 'Sameer Goel',
	'xen-devel', 'Sinan Kaya',
	'Dave P Martin', 'Vijaya Kumar K'

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6399 bytes --]

On Fri, 7 Jul 2017, Roger Pau Monné wrote:
> On Thu, Jul 06, 2017 at 03:55:28PM -0500, Vikram Sethi wrote:
> > > > > AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) 
> > > > > be
> > > recoverable in Xen?
> > > > > Will drivers in doms be notified about fatal errors so they can be 
> > > > > quiesced
> > > before doing secondary bus reset in Xen?
> > > > > Will Xen support Firmware First Error handling for AER? i.e When
> > > > > platform does Firmware first error handling for AER and/or filtering of 
> > > > > AER,
> > > sends associated ACPI HEST logs to Xen How will AER notification and logs be
> > > propagated to the doms: injected ACPI HEST?
> > >
> > > Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER is a PCIe
> > > capability, and according to the spec can be setup completely independent to
> > > ACPI.
> > >
> > True, it can be independent if not using firmware first AER handling (FFH). But 
> > Firmware tells the OS whether firmware first is in use.
> > If FFH is in use, the AER interrupt goes to firmware and then firmware processes 
> 
> I'm sorry, but how is the firmware supposed to know which interrupt is
> AER using? That's AFAIK setup in the PCI AER capabilities, and
> depends on whether the OS configures the device to use MSI or MSI-X.
> 
> Is there some kind of side-band mechanism that delivers the AER
> interrupt using a different method?
> 
> > the AER logs, filters errors, and sends a ACPI HEST log with the filtered AER 
> > regs to OS along with an ACPI event/interrupt. Kernel is not supposed to touch 
> > the AER registers directly in this case, but act on the register values in the 
> > HEST log.
> > http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_acpi.c#L94
> 
> That's not a problem IMHO, Xen could even mask the AER capability from
> the Dom0/guest completely if needed.
> 
> > If Firmware is using FFH, Xen will get a HEST log with AER registers, and must 
> > parse those registers instead of reading AER config space.
> 
> Xen will not get an event, it's going to be delivered to Dom0 because
> when using ACPI Dom0 is the OSPM (not Xen). I assume this event is
> going to be notified by triggering an interrupt from the ACPI SCI?

It is still possible to get the event in Xen, either by having Dom0 tell
Xen about it, or my moving ACPI SCI handling in Xen. If we move ACPI SCI
handling in Xen, we could still forward a virtual SCI interrupt to Dom0
in cases where Xen decides that Dom0 should be the one handling the
event. In other cases, where Xen knows how to handle the event, then
nothing would be sent to Dom0. Would that work?


> > After the AER registers have been parsed (either from HEST log or native Xen AER 
> > interrupt handler), at least for fatal errors, Xen needs to send notification to 
> > the DOM with the device passthrough so that it's driver(s) can be quiesced (via 
> > callbacks to dev->driver->err_handler->error_detected for linux) before hot 
> > reset/secondary bus reset.
> 
> I don't think this is relevant/true given the statement above (Dom0
> being OSPM and receiving the event).
> 
> > Whether FFH is in use or not, Xen has 2 choices in how to present the error to 
> > doms for quiescing before secondary bus reset:
> 
> How is this secondary bus reset performed?

It is based on writing to PCI config space registers
(drivers/pci/pci.c:pci_reset_secondary_bus). If Xen is in charge of
ECAM, it shouldn't be an issue for Xen to do it.


> Is it something specific to each bridge or it's a standard
> interface?
> 
> Can it be done directly by Dom0, or should it be done by Xen?
> 
> > a. Send a HEST log and ACPI interrupt/event to dom if it booted ACPI dom and 
> > linux dom calls aer_recover_queue from ACPI ghes path 
> > http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_core.c#L592b. Present a Root port wired interrupt source in dom ACPI/DT, and inject that 
> > irq in the GIC LR registers. When dom kernel processes the interrupt and queries 
> 
> You lost me here, I have no knowledge of ARM, and I don't know what
> GIC LR is at all.

GIC LRs are registers specific to the ARM Generic Interrupt Controller
that allow an hypervisor to inject interrupts into a guest.  Vikram is
saying that the irq could be injected into the guest.


> > config space AER, Xen emulates the AER values it wants the dom to see (in FFH 
> > case based on register values in HEST), and if FFH was in use, not actually 
> > allow the dom to clear out the AER registers.
> > 
> > Option b is probably better/easier since it works for ACPI/DT dom.
> 
> So as I understand it, the flow is the following:
> 
> 1. Hardware generates an error.
> 2. This error triggers an interrupt that's delivered to Dom0 (either
>    using an ACPI SCI or a specific AER MSI vector)
> 3. *Someone* has to do a secondary bus reset.
> 
> My question would be, who (either Xen or Dom0) should perform the bus
> reset? (and why).

I am interested in Vikram's reply, he knows more than me about this.
However, my gut feeling is that it's best to do it in Xen because
otherwise Xen might end up having to wait for Dom0 for the completion of
the reset. The operation is now short and it includes a couple of
sleeps: each sleep is an opportunity to trap into Xen again and risk
descheduling the Dom0 vcpu.


> > In my view this is the basic AER error handling leaving the devices 
> > inaccessible.
> > To recover/resume the devices, the owning dom would need to signal Xen once all 
> > its driver(s) have quiesced, letting Xen know it is ok to do the secondary bus 
> > reset (for AER fatal errors). The best way to signal this would be to let the 
> > dom try to hit SBR in the Root port bridge control register in config space, and 
> > Xen traps that and actually does the BCR.SBR write.
> >
> > Since Xen controls the ECAM config space access in Julien's proposed design, I 
> > don't see any fundamental issues with the above flow fitting into the design.
> 
> I think it's very hard for me (or Julien) to know exactly how all the
> PCI capabilities behave and interact with other components (like
> ACPI).
> 
> You seem to have a good amount of knowledge about this stuff, would
> you mind writing your proposal as a diff to Julien's original
> proposal, so that it can be properly reviewed and merged into the
> design document?

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-07-07 21:50           ` Stefano Stabellini
@ 2017-07-07 23:40             ` Vikram Sethi
  2017-07-08  7:34             ` Roger Pau Monné
  1 sibling, 0 replies; 35+ messages in thread
From: Vikram Sethi @ 2017-07-07 23:40 UTC (permalink / raw)
  To: 'Stefano Stabellini', roger.pau
  Cc: edgar.iglesias, punit.agrawal, 'Wei Chen',
	'Steve Capper', 'Andre Przywara',
	manish.jaggi, 'Julien Grall', 'Vikram Sethi',
	'Sinan Kaya', 'Sameer Goel', 'xen-devel',
	'Dave P Martin', 'Vijaya Kumar K'

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 11238 bytes --]



> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
> Stefano Stabellini
> Sent: Friday, July 7, 2017 4:50 PM
> To: Roger Pau Monné <roger.pau@citrix.com>
> Cc: edgar.iglesias@xilinx.com; 'Stefano Stabellini' <sstabellini@kernel.org>;
> Vikram Sethi <vikrams@codeaurora.org>; 'Wei Chen' <Wei.Chen@arm.com>;
> 'Steve Capper' <Steve.Capper@arm.com>; 'Andre Przywara'
> <andre.przywara@arm.com>; manish.jaggi@caviumnetworks.com; 'Julien
> Grall' <julien.grall@linaro.org>; 'Vikram Sethi' <vikrams@qti.qualcomm.com>;
> punit.agrawal@arm.com; 'Sameer Goel' <sgoel@qti.qualcomm.com>; 'xen-
> devel' <xen-devel@lists.xenproject.org>; 'Sinan Kaya'
> <okaya@qti.qualcomm.com>; 'Dave P Martin' <Dave.Martin@arm.com>;
> 'Vijaya Kumar K' <Vijaya.Kumar@caviumnetworks.com>
> Subject: Re: [Xen-devel] [RFC] ARM PCI Passthrough design document
>
> On Fri, 7 Jul 2017, Roger Pau Monné wrote:
> > On Thu, Jul 06, 2017 at 03:55:28PM -0500, Vikram Sethi wrote:
> > > > > > AER: Will PCIe non-fatal and fatal errors (secondary bus reset
> > > > > > for fatal) be
> > > > recoverable in Xen?
> > > > > > Will drivers in doms be notified about fatal errors so they
> > > > > > can be quiesced
> > > > before doing secondary bus reset in Xen?
> > > > > > Will Xen support Firmware First Error handling for AER? i.e
> > > > > > When platform does Firmware first error handling for AER
> > > > > > and/or filtering of AER,
> > > > sends associated ACPI HEST logs to Xen How will AER notification
> > > > and logs be propagated to the doms: injected ACPI HEST?
> > > >
> > > > Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER
> > > > is a PCIe capability, and according to the spec can be setup
> > > > completely independent to ACPI.
> > > >
> > > True, it can be independent if not using firmware first AER handling
> > > (FFH). But Firmware tells the OS whether firmware first is in use.
> > > If FFH is in use, the AER interrupt goes to firmware and then
> > > firmware processes
> >
> > I'm sorry, but how is the firmware supposed to know which interrupt is
> > AER using? That's AFAIK setup in the PCI AER capabilities, and depends
> > on whether the OS configures the device to use MSI or MSI-X.
> >
> > Is there some kind of side-band mechanism that delivers the AER
> > interrupt using a different method?
> >
The AER interrupt is not generated by the device that sends the "AER message" to 
root port, it is from the root port aka "event collector" itself. i.e the 
endpoint/adapter sends an AER message to root port and root port sends interrupt 
to CPU
Firmware should just KNOW what the IRQ number for the root port is for AER when 
it is doing firmware first error handling (assuming the Root port generated a 
wired interrupt for AER).

The other part to this is, how do Firmware and OS exchange what is the 
event/interrupt number when FW sends the AER HEST log to this OS. This comes 
from ACPI GHES.
See 
http://elixir.free-electrons.com/linux/latest/source/drivers/acpi/apei/ghes.c#L954
There can be many possibilities such as SCI, IRQ/GSIV, GPIO event etc

> > > the AER logs, filters errors, and sends a ACPI HEST log with the
> > > filtered AER regs to OS along with an ACPI event/interrupt. Kernel
> > > is not supposed to touch the AER registers directly in this case,
> > > but act on the register values in the HEST log.
> > > http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pci
> > > e/aer/aerdrv_acpi.c#L94
> >
> > That's not a problem IMHO, Xen could even mask the AER capability from
> > the Dom0/guest completely if needed.
> >
> > > If Firmware is using FFH, Xen will get a HEST log with AER
> > > registers, and must parse those registers instead of reading AER config
> space.
> >
> > Xen will not get an event, it's going to be delivered to Dom0 because
> > when using ACPI Dom0 is the OSPM (not Xen). I assume this event is
> > going to be notified by triggering an interrupt from the ACPI SCI?
>

See above. It is obtained from GHES and can be SCI, GSIV, GPIO signal etc.

> It is still possible to get the event in Xen, either by having Dom0 tell Xen 
> about
> it, or my moving ACPI SCI handling in Xen. If we move ACPI SCI handling in 
> Xen,
> we could still forward a virtual SCI interrupt to Dom0 in cases where Xen
> decides that Dom0 should be the one handling the event. In other cases,
> where Xen knows how to handle the event, then nothing would be sent to
> Dom0. Would that work?
>

It could work for GSIV/irq or SCI. But one of the possibilities is a ACPI 6.1 
GED interrupt (GED= generic event device, yes there are way too many acronyms in 
ACPI :) ) and this requires ASL to be run, so would need dom0.
See https://patchwork.kernel.org/patch/8115901/>
> > > After the AER registers have been parsed (either from HEST log or
> > > native Xen AER interrupt handler), at least for fatal errors, Xen
> > > needs to send notification to the DOM with the device passthrough so
> > > that it's driver(s) can be quiesced (via callbacks to
> > > dev->driver->err_handler->error_detected for linux) before hot
> reset/secondary bus reset.
> >
> > I don't think this is relevant/true given the statement above (Dom0
> > being OSPM and receiving the event).
> >

Sure, if dom0 gets the AER interrupt or ACPI "event" for FFH, then there is no 
need to forward anything.

> > > Whether FFH is in use or not, Xen has 2 choices in how to present
> > > the error to doms for quiescing before secondary bus reset:
> >
> > How is this secondary bus reset performed?
>
> It is based on writing to PCI config space registers
> (drivers/pci/pci.c:pci_reset_secondary_bus). If Xen is in charge of ECAM, it
> shouldn't be an issue for Xen to do it.
>
>
> > Is it something specific to each bridge or it's a standard interface?
> >
> > Can it be done directly by Dom0, or should it be done by Xen?
> >

Triggering the Secondary bus reset is straightforward, It is a PCI defined bit 
(SBR) in root port Bridge control register.
It could be done in either Xen or dom0 but probably makes sense to do it where 
the config cycles are being "controlled" and by whoever is doing the PCI probe.
I had misunderstood Julien's design to mean PCI probing was being done by Xen, 
but on 2nd read he's saying dom0/hw domain does the PCI probe and notifies Xen 
of the config.

BTW this does raise the question of who reads the Root port Access Control 
Services config space capability to decide what is the safest "unit" of 
assignment to doms: Xen or dom0?
Clearly Xen should be the one deciding if root port (and any switches if 
present) supports ACS upstream forwarding etc and if it is safe to assign just a 
function/VF or if entire PCI tree under root port is the "minimum" assignable 
entity.
For background see 
http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
So is Xen issuing ECAM based config cycles to root port config space without 
serialization with dom0 which also can issue root port and downstream config 
accesses? I'm not sure if this can be an issue or not.
The other thing that I haven't fully processed yet is when dom0 sends 
information piece-meal to Xen with "here's a root port with SBDF1, here's some 
device with SBDF2", can Xen accurately reconstruct the entire PCI tree?
This is important because there could be PCIe switches under the root port which 
may or may not support ACS, so Xen has to know where in the tree the "min safe 
assignment" unit is.

> > > a. Send a HEST log and ACPI interrupt/event to dom if it booted ACPI
> > > dom and linux dom calls aer_recover_queue from ACPI ghes path
> > > http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pci
> > > e/aer/aerdrv_core.c#L592b. Present a Root port wired interrupt
> > > source in dom ACPI/DT, and inject that irq in the GIC LR registers.
> > > When dom kernel processes the interrupt and queries
> >
> > You lost me here, I have no knowledge of ARM, and I don't know what
> > GIC LR is at all.
>
> GIC LRs are registers specific to the ARM Generic Interrupt Controller that 
> allow
> an hypervisor to inject interrupts into a guest.  Vikram is saying that the 
> irq
> could be injected into the guest.
>
>
> > > config space AER, Xen emulates the AER values it wants the dom to
> > > see (in FFH case based on register values in HEST), and if FFH was
> > > in use, not actually allow the dom to clear out the AER registers.
> > >
> > > Option b is probably better/easier since it works for ACPI/DT dom.
> >
> > So as I understand it, the flow is the following:
> >
> > 1. Hardware generates an error.
> > 2. This error triggers an interrupt that's delivered to Dom0 (either
> >    using an ACPI SCI or a specific AER MSI vector) 3. *Someone* has to
> > do a secondary bus reset.
> >
> > My question would be, who (either Xen or Dom0) should perform the bus
> > reset? (and why).
>
> I am interested in Vikram's reply, he knows more than me about this.
> However, my gut feeling is that it's best to do it in Xen because otherwise 
> Xen
> might end up having to wait for Dom0 for the completion of the reset. The
> operation is now short and it includes a couple of
> sleeps: each sleep is an opportunity to trap into Xen again and risk
> descheduling the Dom0 vcpu.
>

Linux dom0 will attempt a SecBusReset config access to root port anyway for 
fatal errors. I earlier misunderstood that Xen would be trapping and issuing all 
config cycles, but since dom0 is controlling the config space access, might as 
well let the dom issued SBR go through.
Yes, there is a write to assert the SBR in bridge control wait a msec and 
another one to deassert.
Is your concern that if dom0 gets descheduled between the assert and deassert 
that the recovery is delayed? Yes, that is true, but it should be tolerable I 
think. Since devices are getting reset, and drivers reinitialized, there will 
always be a hiccup/gap/some temporary disruption.

>
> > > In my view this is the basic AER error handling leaving the devices
> > > inaccessible.
> > > To recover/resume the devices, the owning dom would need to signal
> > > Xen once all its driver(s) have quiesced, letting Xen know it is ok
> > > to do the secondary bus reset (for AER fatal errors). The best way
> > > to signal this would be to let the dom try to hit SBR in the Root
> > > port bridge control register in config space, and Xen traps that and 
> > > actually
> does the BCR.SBR write.
> > >
> > > Since Xen controls the ECAM config space access in Julien's proposed
> > > design, I don't see any fundamental issues with the above flow fitting 
> > > into
> the design.
> >
> > I think it's very hard for me (or Julien) to know exactly how all the
> > PCI capabilities behave and interact with other components (like
> > ACPI).
> >
> > You seem to have a good amount of knowledge about this stuff, would
> > you mind writing your proposal as a diff to Julien's original
> > proposal, so that it can be properly reviewed and merged into the
> > design document?

Thanks,
Vikram
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.




[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-07-07 21:50           ` Stefano Stabellini
  2017-07-07 23:40             ` Vikram Sethi
@ 2017-07-08  7:34             ` Roger Pau Monné
  2018-01-19 10:34               ` Manish Jaggi
  1 sibling, 1 reply; 35+ messages in thread
From: Roger Pau Monné @ 2017-07-08  7:34 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, punit.agrawal, Vikram Sethi, 'Wei Chen',
	'Steve Capper', 'Andre Przywara',
	manish.jaggi, 'Julien Grall', 'Vikram Sethi',
	'Sinan Kaya', 'Sameer Goel', 'xen-devel',
	'Dave P Martin', 'Vijaya Kumar K'

On Fri, Jul 07, 2017 at 02:50:01PM -0700, Stefano Stabellini wrote:
> On Fri, 7 Jul 2017, Roger Pau Monné wrote:
> > On Thu, Jul 06, 2017 at 03:55:28PM -0500, Vikram Sethi wrote:
> > > > > > AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal) 
> > > > > > be
> > > > recoverable in Xen?
> > > > > > Will drivers in doms be notified about fatal errors so they can be 
> > > > > > quiesced
> > > > before doing secondary bus reset in Xen?
> > > > > > Will Xen support Firmware First Error handling for AER? i.e When
> > > > > > platform does Firmware first error handling for AER and/or filtering of 
> > > > > > AER,
> > > > sends associated ACPI HEST logs to Xen How will AER notification and logs be
> > > > propagated to the doms: injected ACPI HEST?
> > > >
> > > > Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER is a PCIe
> > > > capability, and according to the spec can be setup completely independent to
> > > > ACPI.
> > > >
> > > True, it can be independent if not using firmware first AER handling (FFH). But 
> > > Firmware tells the OS whether firmware first is in use.
> > > If FFH is in use, the AER interrupt goes to firmware and then firmware processes 
> > 
> > I'm sorry, but how is the firmware supposed to know which interrupt is
> > AER using? That's AFAIK setup in the PCI AER capabilities, and
> > depends on whether the OS configures the device to use MSI or MSI-X.
> > 
> > Is there some kind of side-band mechanism that delivers the AER
> > interrupt using a different method?
> > 
> > > the AER logs, filters errors, and sends a ACPI HEST log with the filtered AER 
> > > regs to OS along with an ACPI event/interrupt. Kernel is not supposed to touch 
> > > the AER registers directly in this case, but act on the register values in the 
> > > HEST log.
> > > http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_acpi.c#L94
> > 
> > That's not a problem IMHO, Xen could even mask the AER capability from
> > the Dom0/guest completely if needed.
> > 
> > > If Firmware is using FFH, Xen will get a HEST log with AER registers, and must 
> > > parse those registers instead of reading AER config space.
> > 
> > Xen will not get an event, it's going to be delivered to Dom0 because
> > when using ACPI Dom0 is the OSPM (not Xen). I assume this event is
> > going to be notified by triggering an interrupt from the ACPI SCI?
> 
> It is still possible to get the event in Xen, either by having Dom0 tell
> Xen about it, or my moving ACPI SCI handling in Xen. If we move ACPI SCI
> handling in Xen, we could still forward a virtual SCI interrupt to Dom0
> in cases where Xen decides that Dom0 should be the one handling the
> event. In other cases, where Xen knows how to handle the event, then
> nothing would be sent to Dom0. Would that work?

Maybe that's different on ARM vs x86, but when receiving the SCI
interrupt the OSPM has to execute some AML in order to figure out
which event has triggered. Even if Xen can trap the SCI, it has no way
to execute AML, and that in any case can only be done by one entity,
the OSPM.

IMHO, for this to be viable Dom0 should notify the event to Xen.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Notes from PCI Passthrough design discussion at Xen Summit
  2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
                   ` (3 preceding siblings ...)
  2017-06-20  0:19 ` Vikram Sethi
@ 2017-07-19 14:41 ` Punit Agrawal
  2017-07-20  3:54   ` Manish Jaggi
  2018-01-22 11:10 ` [RFC] ARM PCI Passthrough design document Manish Jaggi
  5 siblings, 1 reply; 35+ messages in thread
From: Punit Agrawal @ 2017-07-19 14:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, okaya, vikrams, Goel,
	Sameer, xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau


I took some notes for the PCI Passthrough design discussion at Xen
Summit. Due to the wide range of topics covered, the notes got sparser
towards the end of the session. I've tried to attribute names against
comments but have very likely got things mixed up. Apologies in advance.

Although the session was well attended, some of the more active
discussions involved - Julien Grall, Stefano Stabillini, Roger Pau
Monné, Jan Beulich, Vikram Sethi. I'm sure I am missing some folks here.

Please do point out any mistakes I've made for the audience's benefit.

* Discovery of PCI hostbridges
  - Dom0 will be responsible for scanning the ECAM for devices and
    register them with Xen. This approach is chosen due to variety of
    non-standard PCI controllers on ARM platforms and the desire to
    not duplicate driver code between Linux and Xen.
  - Jan, Roger: Bus scan needs to happer before device discovery
    otherwise a small window where Xen doesn't know which host bridge
    the device is registered on (as it'll likely only refer to the
    segment number).
  - Roger: Registering config space with Xen before device discovery
    will allow the hypervisor to set access traps for certain
    functionality as appropriate.
  - Jan: Xen and Dom0 have to agree on the PCI segment number mapping
    to host bridges. This is so that for future calls, Dom0 and
    hypervisor can communicate using sBDF without ambiguity. 
  - Julien: Dom0 will register config space address and segment
    number. mcfg_add will be used to pass the segment to Xen.
  - PCI segment - it's purely a software construct so identify
    different host bridges.
  - Some discussion on whether boot devices need to be on
    Segment 0. Technically, MCFG is only required to describe Segment
    0 - other host bridges can be described in AML.

* Configuration accesses for non-ecam compliant host bridge
  - Julien proposed these to be forwarded to Dom0 for handling.
  - Audience: What kind of non-compliance are we talking about? If
    they are simple, can they be implemented in Xen in a few lines of
    code?
  - A few different types
    - restrictions on access size, e.g., only certain sizes supported 
    - register multiplexing via a window; similar to legacy x86 PCI
      access mechanism
    - ECAM compliant but with special casing for different devices

* Support on 32bit platforms
  - Is there enough address space to map ECAM into Dom0. Maximum ECAM
    size is 256MB.

* PCI ACS support
  - Vikram: Xen needs to be aware of the PCI device topology to
    correctly setup device groups for passthrough
  - Jan: Roger: IIRC, Xen is already aware of the device topology
    thought it doesn't use ACS to work out which devices need to be
    passed to guest as a group.
  - Stefano: There was support in xend (previous Xen toolstack) but the
    functionality has not yet been ported to libxl.

* Implementation milestones
  - Julien provided a summary of breakdown
    - M0 - design document, currently under discussion on xen-devel
    - M1 - PCI support in Xen
      - Xen aware of PCI devices (via Dom0 registration)
    - M2 - Guest PCIe passthrough
      - Julien: Some complexity in dealing with Legacy interrupts as they can be shared.
      - Roger: MSIs mandatory for PCIe. So legacy interrupts can be
        tackled at a later stage.
    - M3 - testing
      - fuzzing. Jan: If implemented it'll be better than what x86
        currently have.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-19 14:41 ` Notes from PCI Passthrough design discussion at Xen Summit Punit Agrawal
@ 2017-07-20  3:54   ` Manish Jaggi
  2017-07-20  8:24     ` Roger Pau Monné
  0 siblings, 1 reply; 35+ messages in thread
From: Manish Jaggi @ 2017-07-20  3:54 UTC (permalink / raw)
  To: Punit Agrawal, Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, okaya, vikrams, Goel,
	Sameer, xen-devel, Dave P Martin, Vijaya Kumar K, roger.pau

Hi Punit,

On 7/19/2017 8:11 PM, Punit Agrawal wrote:
> I took some notes for the PCI Passthrough design discussion at Xen
> Summit. Due to the wide range of topics covered, the notes got sparser
> towards the end of the session. I've tried to attribute names against
> comments but have very likely got things mixed up. Apologies in advance.
Was curious if any discussions happened on the RC Emu (config space 
emulation) as per slide 18
https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
> Although the session was well attended, some of the more active
> discussions involved - Julien Grall, Stefano Stabillini, Roger Pau
> Monné, Jan Beulich, Vikram Sethi. I'm sure I am missing some folks here.
>
> Please do point out any mistakes I've made for the audience's benefit.
>
> * Discovery of PCI hostbridges
>    - Dom0 will be responsible for scanning the ECAM for devices and
>      register them with Xen. This approach is chosen due to variety of
>      non-standard PCI controllers on ARM platforms and the desire to
>      not duplicate driver code between Linux and Xen.
>    - Jan, Roger: Bus scan needs to happer before device discovery
>      otherwise a small window where Xen doesn't know which host bridge
>      the device is registered on (as it'll likely only refer to the
>      segment number).
>    - Roger: Registering config space with Xen before device discovery
>      will allow the hypervisor to set access traps for certain
>      functionality as appropriate.
>    - Jan: Xen and Dom0 have to agree on the PCI segment number mapping
>      to host bridges. This is so that for future calls, Dom0 and
>      hypervisor can communicate using sBDF without ambiguity.
>    - Julien: Dom0 will register config space address and segment
>      number. mcfg_add will be used to pass the segment to Xen.
>    - PCI segment - it's purely a software construct so identify
>      different host bridges.
>    - Some discussion on whether boot devices need to be on
>      Segment 0. Technically, MCFG is only required to describe Segment
>      0 - other host bridges can be described in AML.
>
> * Configuration accesses for non-ecam compliant host bridge
>    - Julien proposed these to be forwarded to Dom0 for handling.
>    - Audience: What kind of non-compliance are we talking about? If
>      they are simple, can they be implemented in Xen in a few lines of
>      code?
>    - A few different types
>      - restrictions on access size, e.g., only certain sizes supported
>      - register multiplexing via a window; similar to legacy x86 PCI
>        access mechanism
>      - ECAM compliant but with special casing for different devices
>
> * Support on 32bit platforms
>    - Is there enough address space to map ECAM into Dom0. Maximum ECAM
>      size is 256MB.
>
> * PCI ACS support
>    - Vikram: Xen needs to be aware of the PCI device topology to
>      correctly setup device groups for passthrough
>    - Jan: Roger: IIRC, Xen is already aware of the device topology
>      thought it doesn't use ACS to work out which devices need to be
>      passed to guest as a group.
>    - Stefano: There was support in xend (previous Xen toolstack) but the
>      functionality has not yet been ported to libxl.
>
> * Implementation milestones
>    - Julien provided a summary of breakdown
>      - M0 - design document, currently under discussion on xen-devel
>      - M1 - PCI support in Xen
>        - Xen aware of PCI devices (via Dom0 registration)
>      - M2 - Guest PCIe passthrough
>        - Julien: Some complexity in dealing with Legacy interrupts as they can be shared.
>        - Roger: MSIs mandatory for PCIe. So legacy interrupts can be
>          tackled at a later stage.
>      - M3 - testing
>        - fuzzing. Jan: If implemented it'll be better than what x86
>          currently have.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20  3:54   ` Manish Jaggi
@ 2017-07-20  8:24     ` Roger Pau Monné
  2017-07-20  9:32       ` Manish Jaggi
  0 siblings, 1 reply; 35+ messages in thread
From: Roger Pau Monné @ 2017-07-20  8:24 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	Julien Grall, vikrams, okaya, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K

On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
> Hi Punit,
> 
> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
> > I took some notes for the PCI Passthrough design discussion at Xen
> > Summit. Due to the wide range of topics covered, the notes got sparser
> > towards the end of the session. I've tried to attribute names against
> > comments but have very likely got things mixed up. Apologies in advance.
> Was curious if any discussions happened on the RC Emu (config space
> emulation) as per slide 18
> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf

Part of this is already posted on the list (ATM for x86 only) but the
PCI specification (and therefore the config space emulation) is not
tied to any arch:

https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20  8:24     ` Roger Pau Monné
@ 2017-07-20  9:32       ` Manish Jaggi
  2017-07-20 10:29         ` Roger Pau Monné
  2017-07-20 10:41         ` Julien Grall
  0 siblings, 2 replies; 35+ messages in thread
From: Manish Jaggi @ 2017-07-20  9:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	Julien Grall, vikrams, okaya, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K

Hi Roger,

On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
> On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
>> Hi Punit,
>>
>> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
>>> I took some notes for the PCI Passthrough design discussion at Xen
>>> Summit. Due to the wide range of topics covered, the notes got sparser
>>> towards the end of the session. I've tried to attribute names against
>>> comments but have very likely got things mixed up. Apologies in advance.
>> Was curious if any discussions happened on the RC Emu (config space
>> emulation) as per slide 18
>> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
> Part of this is already posted on the list (ATM for x86 only) but the
> PCI specification (and therefore the config space emulation) is not
> tied to any arch:
>
> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html
 From the summary, I have a  questions on
"
  - Roger: Registering config space with Xen before device discovery
   will allow the hypervisor to set access traps for certain
  functionality as appropriate"

Traps will do emulation or something else ?
  Is the config space emulation only for DomU or it for Dom0 as well ?
Slide 18 shows only for DomU ?

-manish

> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20  9:32       ` Manish Jaggi
@ 2017-07-20 10:29         ` Roger Pau Monné
  2017-07-20 10:47           ` Julien Grall
  2017-07-20 11:02           ` Manish Jaggi
  2017-07-20 10:41         ` Julien Grall
  1 sibling, 2 replies; 35+ messages in thread
From: Roger Pau Monné @ 2017-07-20 10:29 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	Julien Grall, vikrams, okaya, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K

On Thu, Jul 20, 2017 at 03:02:19PM +0530, Manish Jaggi wrote:
> Hi Roger,
> 
> On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
> > On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
> > > Hi Punit,
> > > 
> > > On 7/19/2017 8:11 PM, Punit Agrawal wrote:
> > > > I took some notes for the PCI Passthrough design discussion at Xen
> > > > Summit. Due to the wide range of topics covered, the notes got sparser
> > > > towards the end of the session. I've tried to attribute names against
> > > > comments but have very likely got things mixed up. Apologies in advance.
> > > Was curious if any discussions happened on the RC Emu (config space
> > > emulation) as per slide 18
> > > https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
> > Part of this is already posted on the list (ATM for x86 only) but the
> > PCI specification (and therefore the config space emulation) is not
> > tied to any arch:
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html
> From the summary, I have a  questions on
> "
>  - Roger: Registering config space with Xen before device discovery
>   will allow the hypervisor to set access traps for certain
>  functionality as appropriate"
> 
> Traps will do emulation or something else ?

Have you read the series?

What else could the traps do? I'm not sure I understand the question.

>  Is the config space emulation only for DomU or it for Dom0 as well ?

Again, have you read the series? This is explained in the cover letter
(0/9).

On x86 this is initially for Dom0 only, DomU will continue to use QEMU
until the implementation inside the hypervisor (vPCI) is complete
enough to handle DomU securely.

> Slide 18 shows only for DomU ?

ARM folks believe this is not needed for Dom0 in the ARM case, I don't
have an opinion, I know it's certainly mandatory for x86 PVH Dom0.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20  9:32       ` Manish Jaggi
  2017-07-20 10:29         ` Roger Pau Monné
@ 2017-07-20 10:41         ` Julien Grall
  2017-07-20 11:00           ` Manish Jaggi
  1 sibling, 1 reply; 35+ messages in thread
From: Julien Grall @ 2017-07-20 10:41 UTC (permalink / raw)
  To: Manish Jaggi, Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	vikrams, okaya, Goel, Sameer, xen-devel, Dave P Martin,
	Vijaya Kumar K



On 20/07/17 10:32, Manish Jaggi wrote:
> Hi Roger,
>
> On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
>> On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
>>> Hi Punit,
>>>
>>> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
>>>> I took some notes for the PCI Passthrough design discussion at Xen
>>>> Summit. Due to the wide range of topics covered, the notes got sparser
>>>> towards the end of the session. I've tried to attribute names against
>>>> comments but have very likely got things mixed up. Apologies in
>>>> advance.
>>> Was curious if any discussions happened on the RC Emu (config space
>>> emulation) as per slide 18
>>> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
>>>
>> Part of this is already posted on the list (ATM for x86 only) but the
>> PCI specification (and therefore the config space emulation) is not
>> tied to any arch:
>>
>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html
>>
> From the summary, I have a  questions on
> "
>  - Roger: Registering config space with Xen before device discovery
>   will allow the hypervisor to set access traps for certain
>  functionality as appropriate"
>
> Traps will do emulation or something else ?
>  Is the config space emulation only for DomU or it for Dom0 as well ?
> Slide 18 shows only for DomU ?

My slides are not meant to be read without the talk. In this particular 
case, this is only explaining how passthrough will work for DomU.

Roger series is at the moment focusing on emulating a fully ECAM 
compliant hostbridge for the hardware domain. This is because Xen and 
the hardware domain should not access the configuration space at the 
same time. We may also perform some tasks (i.e MSI mapping, memory 
mapping) or sanitizing when the configuration space is updated by the 
hardware domain.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20 10:29         ` Roger Pau Monné
@ 2017-07-20 10:47           ` Julien Grall
  2017-07-20 11:06             ` Roger Pau Monné
  2017-07-20 11:02           ` Manish Jaggi
  1 sibling, 1 reply; 35+ messages in thread
From: Julien Grall @ 2017-07-20 10:47 UTC (permalink / raw)
  To: Roger Pau Monné, Manish Jaggi
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	vikrams, okaya, Goel, Sameer, xen-devel, Dave P Martin,
	Vijaya Kumar K



On 20/07/17 11:29, Roger Pau Monné wrote:
> On Thu, Jul 20, 2017 at 03:02:19PM +0530, Manish Jaggi wrote:
>> Hi Roger,
>>
>> On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
>>> On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
>>>> Hi Punit,
>>>>
>>>> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
>>>>> I took some notes for the PCI Passthrough design discussion at Xen
>>>>> Summit. Due to the wide range of topics covered, the notes got sparser
>>>>> towards the end of the session. I've tried to attribute names against
>>>>> comments but have very likely got things mixed up. Apologies in advance.
>>>> Was curious if any discussions happened on the RC Emu (config space
>>>> emulation) as per slide 18
>>>> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
>>> Part of this is already posted on the list (ATM for x86 only) but the
>>> PCI specification (and therefore the config space emulation) is not
>>> tied to any arch:
>>>
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html
>> From the summary, I have a  questions on
>> "
>>  - Roger: Registering config space with Xen before device discovery
>>   will allow the hypervisor to set access traps for certain
>>  functionality as appropriate"
>>
>> Traps will do emulation or something else ?
>
> Have you read the series?
>
> What else could the traps do? I'm not sure I understand the question.
>
>>  Is the config space emulation only for DomU or it for Dom0 as well ?
>
> Again, have you read the series? This is explained in the cover letter
> (0/9).
>
> On x86 this is initially for Dom0 only, DomU will continue to use QEMU
> until the implementation inside the hypervisor (vPCI) is complete
> enough to handle DomU securely.
>
>> Slide 18 shows only for DomU ?
>
> ARM folks believe this is not needed for Dom0 in the ARM case, I don't
> have an opinion, I know it's certainly mandatory for x86 PVH Dom0.

That was 8 months ago, you managed to convince me we should also trap 
for DOM0 last time we met at the Haymakers :).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20 10:41         ` Julien Grall
@ 2017-07-20 11:00           ` Manish Jaggi
  2017-07-20 12:24             ` Julien Grall
  0 siblings, 1 reply; 35+ messages in thread
From: Manish Jaggi @ 2017-07-20 11:00 UTC (permalink / raw)
  To: Julien Grall, Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	vikrams, okaya, Goel, Sameer, xen-devel, Dave P Martin,
	Vijaya Kumar K

HI Julien,

On 7/20/2017 4:11 PM, Julien Grall wrote:
>
>
> On 20/07/17 10:32, Manish Jaggi wrote:
>> Hi Roger,
>>
>> On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
>>> On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
>>>> Hi Punit,
>>>>
>>>> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
>>>>> I took some notes for the PCI Passthrough design discussion at Xen
>>>>> Summit. Due to the wide range of topics covered, the notes got 
>>>>> sparser
>>>>> towards the end of the session. I've tried to attribute names against
>>>>> comments but have very likely got things mixed up. Apologies in
>>>>> advance.
>>>> Was curious if any discussions happened on the RC Emu (config space
>>>> emulation) as per slide 18
>>>> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf 
>>>>
>>>>
>>> Part of this is already posted on the list (ATM for x86 only) but the
>>> PCI specification (and therefore the config space emulation) is not
>>> tied to any arch:
>>>
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html 
>>>
>>>
>> From the summary, I have a  questions on
>> "
>>  - Roger: Registering config space with Xen before device discovery
>>   will allow the hypervisor to set access traps for certain
>>  functionality as appropriate"
>>
>> Traps will do emulation or something else ?
>>  Is the config space emulation only for DomU or it for Dom0 as well ?
>> Slide 18 shows only for DomU ?
>
> My slides are not meant to be read without the talk. In this 
> particular case, this is only explaining how passthrough will work for 
> DomU.
>
Thanks for clarification.
Ah ok, The single slide created confusion, It would be nice if you have 
added one more describing dom0 config access. I will wait for the video 
to get posted.
> Roger series is at the moment focusing on emulating a fully ECAM 
> compliant hostbridge for the hardware domain. This is because Xen and 
> the hardware domain should not access the configuration space at the 
> same time. 
Yes as discussed on this topic on list few weeks back.
> We may also perform some tasks (i.e MSI mapping, memory mapping) or 
> sanitizing when the configuration space is updated by the hardware 
> domain.
>
> Cheers,
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20 10:29         ` Roger Pau Monné
  2017-07-20 10:47           ` Julien Grall
@ 2017-07-20 11:02           ` Manish Jaggi
  1 sibling, 0 replies; 35+ messages in thread
From: Manish Jaggi @ 2017-07-20 11:02 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	Julien Grall, vikrams, okaya, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K

Hi Roger,

On 7/20/2017 3:59 PM, Roger Pau Monné wrote:
> On Thu, Jul 20, 2017 at 03:02:19PM +0530, Manish Jaggi wrote:
>> Hi Roger,
>>
>> On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
>>> On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
>>>> Hi Punit,
>>>>
>>>> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
>>>>> I took some notes for the PCI Passthrough design discussion at Xen
>>>>> Summit. Due to the wide range of topics covered, the notes got sparser
>>>>> towards the end of the session. I've tried to attribute names against
>>>>> comments but have very likely got things mixed up. Apologies in advance.
>>>> Was curious if any discussions happened on the RC Emu (config space
>>>> emulation) as per slide 18
>>>> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
>>> Part of this is already posted on the list (ATM for x86 only) but the
>>> PCI specification (and therefore the config space emulation) is not
>>> tied to any arch:
>>>
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html
>>  From the summary, I have a  questions on
>> "
>>   - Roger: Registering config space with Xen before device discovery
>>    will allow the hypervisor to set access traps for certain
>>   functionality as appropriate"
>>
>> Traps will do emulation or something else ?
> Have you read the series?
>
> What else could the traps do? I'm not sure I understand the question.
>
>>   Is the config space emulation only for DomU or it for Dom0 as well ?
> Again, have you read the series? This is explained in the cover letter
> (0/9).
>
> On x86 this is initially for Dom0 only, DomU will continue to use QEMU
> until the implementation inside the hypervisor (vPCI) is complete
> enough to handle DomU securely.
>
>> Slide 18 shows only for DomU ?
> ARM folks believe this is not needed for Dom0 in the ARM case, I don't
> have an opinion, I know it's certainly mandatory for x86 PVH Dom0.
Julien clarified about Slide18.
> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20 10:47           ` Julien Grall
@ 2017-07-20 11:06             ` Roger Pau Monné
  2017-07-20 11:52               ` Julien Grall
  0 siblings, 1 reply; 35+ messages in thread
From: Roger Pau Monné @ 2017-07-20 11:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Manish Jaggi, manish.jaggi, Punit Agrawal, vikrams,
	okaya, Goel, Sameer, Andre Przywara, xen-devel, Dave P Martin,
	Vijaya Kumar K

On Thu, Jul 20, 2017 at 11:47:04AM +0100, Julien Grall wrote:
> > > Slide 18 shows only for DomU ?
> > 
> > ARM folks believe this is not needed for Dom0 in the ARM case, I don't
> > have an opinion, I know it's certainly mandatory for x86 PVH Dom0.
> 
> That was 8 months ago, you managed to convince me we should also trap for
> DOM0 last time we met at the Haymakers :).

Right, my bad. I was indeed confused. We spoke during the design
session about ARM not needing to trap MSI/MSI-X probably (which x86
must do).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20 11:06             ` Roger Pau Monné
@ 2017-07-20 11:52               ` Julien Grall
  0 siblings, 0 replies; 35+ messages in thread
From: Julien Grall @ 2017-07-20 11:52 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Manish Jaggi, manish.jaggi, Punit Agrawal, vikrams,
	okaya, Goel, Sameer, Andre Przywara, xen-devel, Dave P Martin,
	Vijaya Kumar K



On 20/07/17 12:06, Roger Pau Monné wrote:
> On Thu, Jul 20, 2017 at 11:47:04AM +0100, Julien Grall wrote:
>>>> Slide 18 shows only for DomU ?
>>>
>>> ARM folks believe this is not needed for Dom0 in the ARM case, I don't
>>> have an opinion, I know it's certainly mandatory for x86 PVH Dom0.
>>
>> That was 8 months ago, you managed to convince me we should also trap for
>> DOM0 last time we met at the Haymakers :).
>
> Right, my bad. I was indeed confused. We spoke during the design
> session about ARM not needing to trap MSI/MSI-X probably (which x86
> must do).

It will depend on the MSI controllers. For GICv3 ITS, it will not need 
to trap them for Dom0 because we expose the same number of controllers 
as the host and the MSIs will be configured directly via the virtual 
interrupt controller.

This might be different for other controllers, but I haven't yet fully 
looked at them yet.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Notes from PCI Passthrough design discussion at Xen Summit
  2017-07-20 11:00           ` Manish Jaggi
@ 2017-07-20 12:24             ` Julien Grall
  0 siblings, 0 replies; 35+ messages in thread
From: Julien Grall @ 2017-07-20 12:24 UTC (permalink / raw)
  To: Manish Jaggi, Roger Pau Monné
  Cc: edgar.iglesias, Stefano Stabellini, Jan Beulich, Wei Chen,
	Steve Capper, Andre Przywara, manish.jaggi, Punit Agrawal,
	vikrams, okaya, Goel, Sameer, xen-devel, Dave P Martin,
	Vijaya Kumar K



On 20/07/17 12:00, Manish Jaggi wrote:
> On 7/20/2017 4:11 PM, Julien Grall wrote:
>>
>>
>> On 20/07/17 10:32, Manish Jaggi wrote:
>>> Hi Roger,
>>>
>>> On 7/20/2017 1:54 PM, Roger Pau Monné wrote:
>>>> On Thu, Jul 20, 2017 at 09:24:36AM +0530, Manish Jaggi wrote:
>>>>> Hi Punit,
>>>>>
>>>>> On 7/19/2017 8:11 PM, Punit Agrawal wrote:
>>>>>> I took some notes for the PCI Passthrough design discussion at Xen
>>>>>> Summit. Due to the wide range of topics covered, the notes got
>>>>>> sparser
>>>>>> towards the end of the session. I've tried to attribute names against
>>>>>> comments but have very likely got things mixed up. Apologies in
>>>>>> advance.
>>>>> Was curious if any discussions happened on the RC Emu (config space
>>>>> emulation) as per slide 18
>>>>> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/76/slides.pdf
>>>>>
>>>>>
>>>> Part of this is already posted on the list (ATM for x86 only) but the
>>>> PCI specification (and therefore the config space emulation) is not
>>>> tied to any arch:
>>>>
>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg03698.html
>>>>
>>>>
>>> From the summary, I have a  questions on
>>> "
>>>  - Roger: Registering config space with Xen before device discovery
>>>   will allow the hypervisor to set access traps for certain
>>>  functionality as appropriate"
>>>
>>> Traps will do emulation or something else ?
>>>  Is the config space emulation only for DomU or it for Dom0 as well ?
>>> Slide 18 shows only for DomU ?
>>
>> My slides are not meant to be read without the talk. In this
>> particular case, this is only explaining how passthrough will work for
>> DomU.
>>
> Thanks for clarification.
> Ah ok, The single slide created confusion, It would be nice if you have
> added one more describing dom0 config access. I will wait for the video
> to get posted.

Well as I said my slides are not meant to be used without the talk.

Now, if you want the longer story. The decision for DOM0 is more blur. 
As written in the design document and also reported in the notes from 
Punit, supported all the hostbridges in Xen may not be possible.

At the moment, we are thinking to only support fully ECAM compliant in 
Xen (i.e the ones not requiring specific PCI hostbridges driver). We 
might bend the rule on the case by case basis in the future.

For the hostbridges not supported in Xen, they will be driven by the 
hardware domain. So all configuration access will be forwarded to the 
hardware domain. The way to communicate between Xen and the hardware 
Domain is still undecided and out of scope of this design document.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-07-08  7:34             ` Roger Pau Monné
@ 2018-01-19 10:34               ` Manish Jaggi
  0 siblings, 0 replies; 35+ messages in thread
From: Manish Jaggi @ 2018-01-19 10:34 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: edgar.iglesias, punit.agrawal, Vikram Sethi, 'Wei Chen',
	'Steve Capper', 'Andre Przywara',
	manish.jaggi, 'Julien Grall', 'Vikram Sethi',
	'Sinan Kaya', 'Sameer Goel', 'xen-devel',
	'Dave P Martin', 'Vijaya Kumar K'

Hi Roger/Vikram/Stefano,


On 07/08/2017 01:04 PM, Roger Pau Monné wrote:
> On Fri, Jul 07, 2017 at 02:50:01PM -0700, Stefano Stabellini wrote:
>> On Fri, 7 Jul 2017, Roger Pau Monné wrote:
>>> On Thu, Jul 06, 2017 at 03:55:28PM -0500, Vikram Sethi wrote:
>>>>>>> AER: Will PCIe non-fatal and fatal errors (secondary bus reset for fatal)
>>>>>>> be
>>>>> recoverable in Xen?
>>>>>>> Will drivers in doms be notified about fatal errors so they can be
>>>>>>> quiesced
>>>>> before doing secondary bus reset in Xen?
>>>>>>> Will Xen support Firmware First Error handling for AER? i.e When
>>>>>>> platform does Firmware first error handling for AER and/or filtering of
>>>>>>> AER,
>>>>> sends associated ACPI HEST logs to Xen How will AER notification and logs be
>>>>> propagated to the doms: injected ACPI HEST?
>>>>>
>>>>> Hm, I'm not sure I follow here, I don't see AER tied to ACPI. AER is a PCIe
>>>>> capability, and according to the spec can be setup completely independent to
>>>>> ACPI.
>>>>>
>>>> True, it can be independent if not using firmware first AER handling (FFH). But
>>>> Firmware tells the OS whether firmware first is in use.
>>>> If FFH is in use, the AER interrupt goes to firmware and then firmware processes
>>> I'm sorry, but how is the firmware supposed to know which interrupt is
>>> AER using? That's AFAIK setup in the PCI AER capabilities, and
>>> depends on whether the OS configures the device to use MSI or MSI-X.
>>>
>>> Is there some kind of side-band mechanism that delivers the AER
>>> interrupt using a different method?
>>>
>>>> the AER logs, filters errors, and sends a ACPI HEST log with the filtered AER
>>>> regs to OS along with an ACPI event/interrupt. Kernel is not supposed to touch
>>>> the AER registers directly in this case, but act on the register values in the
>>>> HEST log.
>>>> http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_acpi.c#L94
>>> That's not a problem IMHO, Xen could even mask the AER capability from
>>> the Dom0/guest completely if needed.
>>>
>>>> If Firmware is using FFH, Xen will get a HEST log with AER registers, and must
>>>> parse those registers instead of reading AER config space.
>>> Xen will not get an event, it's going to be delivered to Dom0 because
>>> when using ACPI Dom0 is the OSPM (not Xen). I assume this event is
>>> going to be notified by triggering an interrupt from the ACPI SCI?
>> It is still possible to get the event in Xen, either by having Dom0 tell
>> Xen about it, or my moving ACPI SCI handling in Xen. If we move ACPI SCI
>> handling in Xen, we could still forward a virtual SCI interrupt to Dom0
>> in cases where Xen decides that Dom0 should be the one handling the
>> event. In other cases, where Xen knows how to handle the event, then
>> nothing would be sent to Dom0. Would that work?
> Maybe that's different on ARM vs x86, but when receiving the SCI
> interrupt the OSPM has to execute some AML in order to figure out
> which event has triggered. Even if Xen can trap the SCI, it has no way
> to execute AML, and that in any case can only be done by one entity,
> the OSPM.
>
> IMHO, for this to be viable Dom0 should notify the event to Xen.
Any further update on this discussion ?
>
> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] ARM PCI Passthrough design document
  2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
                   ` (4 preceding siblings ...)
  2017-07-19 14:41 ` Notes from PCI Passthrough design discussion at Xen Summit Punit Agrawal
@ 2018-01-22 11:10 ` Manish Jaggi
  5 siblings, 0 replies; 35+ messages in thread
From: Manish Jaggi @ 2018-01-22 11:10 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: edgar.iglesias, okaya, Wei Chen, Steve Capper, Andre Przywara,
	manish.jaggi, punit.agrawal, vikrams, Goel, Sameer, xen-devel,
	Dave P Martin, Vijaya Kumar K, roger.pau



On 05/26/2017 10:44 PM, Julien Grall wrote:
> Hi all,
Hi Julien,

General consolidated comments first:

Review Comments:

a. The document talks about high level design and does not go into the 
implementation details
  and detailed code flows. So this is missing if adding such detail is 
intended.

b. Document only covers PCI device assignment from the POV of hardware 
domain.
But it does not talks about high level from of PHYSDEVOP_pci_device_add.

c. In the mail chain there was a discussion on Xen only touching config 
space.
Can you add that mail chain discussion on this and config space 
emulation here.

d. Please resolve sections marked as XXX  in the document. We can 
revisit this review after that.

e. Please provide separate flow description for DT and ACPI, it will 
help in understanding.

f. A general picture on how guest domain device assignment would work at 
a high level. As you are covering
it in phase 2, you can add more detail later. This would really help 
completing the design understanding.

Apart from that the document looks ok.

WBR
-Manish
>
> The document below is an RFC version of a design proposal for PCI
> Passthrough in Xen on ARM. It aims to describe from an high level perspective
> the interaction with the different subsystems and how guest will be able
> to discover and access PCI.
>
> Currently on ARM, Xen does not have any knowledge about PCI devices. This
> means that IOMMU and interrupt controller (such as ITS) requiring specific
> configuration will not work with PCI even with DOM0.
>
> The PCI Passthrough work could be divided in 2 phases:
>          * Phase 1: Register all PCI devices in Xen => will allow
>                     to use ITS and SMMU with PCI in Xen
>          * Phase 2: Assign devices to guests
>
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
>
>
> I think I was able to gather all of the feedbacks and come up with a solution
> that will satisfy all the parties. The design document has changed quite a lot
> compare to the early draft sent few months ago. The major changes are:
> 	* Provide more details how PCI works on ARM and the interactions with
> 	MSI controller and IOMMU
> 	* Provide details on the existing host bridge implementations
> 	* Give more explanation and justifications on the approach chosen
> 	* Describing the hypercalls used and how they should be called
>
> Feedbacks are welcomed.
>
> Cheers,
>
> --------------------------------------------------------------------------------
>
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft B
>
> # Preface
>
> This document aims to describe the components required to enable the PCI
> pass-through on ARM.
>
> This is an early draft and some questions are still unanswered. When this is
> the case, the text will contain XXX.
>
> # Introduction
>
> PCI pass-through allows the guest to receive full control of physical PCI
> devices. This means the guest will have full and direct access to the PCI
> device.
>
> ARM is supporting a kind of guest that exploits as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network) and interrupts will come through the virtualized
> interrupt controller, therefore there are no big changes required within the
> kernel.
>
> As a consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
>
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (there will be more details later). A guest would be able to take
> advantage of the firmware tables, obviating the need for a specific driver
> for Xen.
>
> Thus, in this document we follow the emulated host bridge approach.
>
> # PCI terminologies
>
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
>
> When the platform has multiple host bridges, the software can add a fourth
> number called Segment (sometimes called Domain) to differentiate host bridges.
> A PCI device will then uniquely by segment:bus:device:function (AKA SBDF).
>
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device. The pair (host bridge, RID) will often be used
> to find the relevant information for configuring the different subsystems (e.g
> IOMMU, MSI controller). For convenience, the rest of the document will use
> SBDF to refer to the pair (host bridge, RID).
>
> # PCI host bridge
>
> PCI host bridge enables data transfer between a host processor and PCI bus
> based devices. The bridge is used to access the configuration space of each
> PCI devices and, on some platform may also act as an MSI controller.
>
> ## Initialization of the PCI host bridge
>
> Whilst it would be expected that the bootloader takes care of initializing
> the PCI host bridge, on some platforms it is done in the Operating System.
>
> This may include enabling/configuring the clocks that could be shared among
> multiple devices.
>
> ## Accessing PCI configuration space
>
> Accessing the PCI configuration space can be divided in 2 category:
>      * Indirect access, where the configuration spaces are multiplexed. An
>      example would be legacy method on x86 (e.g 0xcf8 and 0xcfc). On ARM a
>      similar method is used by PCIe RCar root complex (see [12]).
>      * ECAM access, each configuration space will have its own address space.
>
> Whilst ECAM is a standard, some PCI host bridges will require specific fiddling
> when access the registers (see thunder-ecam [13]).
>
> In most of the cases, accessing all the PCI configuration spaces under a
> given PCI host will be done the same way (i.e either indirect access or ECAM
> access). However, there are a few cases, dependent on the PCI devices accessed,
> which will use different methods (see thunder-pem [14]).
>
> ## Generic host bridge
>
> For the purpose of this document, the term "generic host bridge" will be used
> to describe any host bridge ECAM-compliant and the initialization, if required,
> will be already done by the firmware/bootloader.
>
> # Interaction of the PCI subsystem with other subsystems
>
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems such as the IOMMU and the Interrupt Controller.
>
> The interaction expected between the PCI subsystem and the other subsystems is:
>      * Add a device
>      * Remove a device
>      * Assign a device to a guest
>      * Deassign a device from a guest
>
> XXX: Detail the interaction when assigning/deassigning device
>
> In the following subsections, the interactions will be briefly described from a
> higher level perspective. However, implementation details such as callback,
> structure, etc... are beyond the scope of this document.
>
> ## IOMMU
>
> The IOMMU will be used to isolate the PCI device when accessing the memory (e.g
> DMA and MSI Doorbells). Often the IOMMU will be configured using a MasterID
> (aka StreamID for ARM SMMU)  that can be deduced from the SBDF with the help
> of the firmware tables (see below).
>
> Whilst in theory, all the memory transactions issued by a PCI device should
> go through the IOMMU, on certain platforms some of the memory transaction may
> not reach the IOMMU because they are interpreted by the host bridge. For
> instance, this could happen if the MSI doorbell is built into the PCI host
> bridge or for P2P traffic. See [6] for more details.
>
> XXX: I think this could be solved by using direct mapping (e.g GFN == MFN),
> this would mean the guest memory layout would be similar to the host one when
> PCI devices will be pass-throughed => Detail it.
>
> ## Interrupt controller
>
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM,
> legacy interrupts will be mapped to SPIs. MSI and MSI-X will write their
> payload in a doorbell belonging to a MSI controller.
>
> ### Existing MSI controllers
>
> In this section some of the existing controllers and their interaction with
> the devices will be briefly described. More details can be found in the
> respective specifications of each MSI controller.
>
> MSIs can be distinguished by some combination of
>      * the Doorbell
>          It is the MMIO address written to. Devices may be configured by
>          software to write to arbitrary doorbells which they can address.
>          An MSI controller may feature a number of doorbells.
>      * the Payload
>          Devices may be configured to write an arbitrary payload chosen by
>          software. MSI controllers may have restrictions on permitted payload.
>          Xen will have to sanitize the payload unless it is known to be always
>          safe.
>      * Sideband information accompanying the write
>          Typically this is neither configurable nor probeable, and depends on
>          the path taken through the memory system (i.e it is a property of the
>          combination of MSI controller and device rather than a property of
>          either in isolation).
>
> ### GICv3/GICv4 ITS
>
> The Interrupt Translation Service (ITS) is a MSI controller designed by ARM
> and integrated in the GICv3/GICv4 interrupt controller. For the specification
> see [GICV3]. Each MSI/MSI-X will be mapped to a new type of interrupt called
> LPI. This interrupt will be configured by the software using a pair (DeviceID,
> EventID).
>
> A platform may have multiple ITS block (e.g one per NUMA node), each of them
> belong to an ITS group.
>
> The DeviceID is a unique identifier with an ITS group for each MSI-capable
> device that can be deduced from the RID with the help of the firmware tables
> (see below).
>
> The EventID is a unique identifier to distinguish different event sending
> by a device.
>
> The MSI payload will only contain the EventID as the DeviceID will be added
> afterwards by the hardware in a way that will prevent any tampering.
>
> The [SBSA] appendix I describes the set of rules for the integration of the
> ITS that any compliant platform should follow. Some of the rules will explain
> the security implication of a misbehaving devices. It ensures that a guest
> will never be able to trigger an MSI on behalf of another guest.
>
> XXX: The security implication is described in the [SBSA] but I haven't found
> any similar working in the GICv3 specification. It is unclear to me if
> non-SBSA compliant platform (e.g embedded) will follow those rules.
>
> ### GICv2m
>
> The GICv2m is an extension of the GICv2 to convert MSI/MSI-X writes to unique
> interrupts. The specification can be found in the [SBSA] appendix E.
>
> Depending on the platform, the GICv2m will provide one or multiple instance
> of register frames. Each frame is composed of a doorbell and associated to
> a set of SPIs that can be discovered by reading the register MSI_TYPER.
>
> On an MSI write, the payload will contain the SPI ID to generate. Note that
> on some platform the MSI payload may contain an offset form the base SPI
> rather than the SPI itself.
>
> The frame will only generate SPI if the written value corresponds to an SPI
> allocated to the frame. Each VM should have exclusity to the frame to ensure
> isolation and prevent a guest OS to trigger an MSI on-behalf of another guest
> OS.
>
> XXX: Linux seems to consider GICv2m as unsafe by default. From my understanding,
> it is still unclear how we should proceed on Xen, as GICv2m should be safe
> as long as the frame is only accessed by one guest.
>
> ### Other MSI controllers
>
> Servers compliant with SBSA level 1 and higher will have to use either ITS
> or GICv2m. However, it is by no means the only MSI controllers available.
> The hardware vendor may decide to use their custom MSI controller which can be
> integrated in the PCI host bridge.
>
> Whether it will be possible to write securely an MSI will depend on the
> MSI controller implementations.
>
> XXX: I am happy to give a brief explanation on more MSI controller (such
> as Xilinx and Renesas) if people think it is necessary.
>
> This design document does not pertain to a specific MSI controller and will try
> to be as agnostic is possible. When possible, it will give insight how to
> integrate the MSI controller.
>
> # Information available in the firmware tables
>
> ## ACPI
>
> ### Host bridges
>
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately, there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
>
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for this, see [2] and [3],
> where quirks will be detected with:
>      * OEM ID
>      * OEM Table ID
>      * OEM Revision
>      * PCI Segment
>      * PCI bus number range (wildcard allowed)
>
> Based on what Linux is currently doing, there are two kind of quirks:
>      * Accesses to the configuration space of certain sizes are not allowed
>      * A specific driver is necessary for driving the host bridge
>
> The former is straightforward to solve but the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
>
> The number of platforms requiring specific PCI host bridge driver is currently
> limited. Whilst it is not possible to predict the future, it will be expected
> upcoming platform to have fully ECAM compliant PCI host bridges. Therefore,
> given Xen does not have any ASL parser, the approach suggested is to hardcode
> the missing values. This could be revisit in the future if necessary.
>
> ### Finding information to configure IOMMU and MSI controller
>
> The static table [IORT] will provide information that will help to deduce
> data (such as MasterID and DeviceID) to configure both the IOMMU and the MSI
> controller from a given SBDF.
>
> ## Finding which NUMA node a PCI device belongs to
>
> On NUMA system, the NUMA node associated to a PCI device can be found using
> the _PXM method of the host bridge (?).
>
> XXX: I am not entirely sure where the _PXM will be (i.e host bridge vs PCI
> device).
>
> ## Device Tree
>
> ### Host bridges
>
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>      - device_type: will always be "pci".
>      - compatible: a string indicating which driver to instanciate
>
> The node may also contain optional properties such as:
>      - linux,pci-domain: assign a fix segment number
>      - bus-range: indicate the range of bus numbers supported
>
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges.
>
> ### Finding information to configure IOMMU and MSI controller
>
> ### Configuring the IOMMU
>
> The Device Treee provides a generic IOMMU bindings (see [10]) which uses the
> properties "iommu-map" and "iommu-map-mask" to described the relationship
> between RID and a MasterID.
>
> These properties will be present in the host bridge Device Tree node. From a
> given SBDF, it will be possible to find the corresponding MasterID.
>
> Note that the ARM SMMU also have a legacy binding (see [9]), but it does not
> have a way to describe the relationship between RID and StreamID. Instead it
> assumed that StreamID == RID. This binding has now been deprecated in favor
> of the generic IOMMU binding.
>
> ### Configuring the MSI controller
>
> The relationship between the RID and data required to configure the MSI
> controller (such as DeviceID) can be found using the property "msi-map"
> (see [11]).
>
> This property will be present in the host bridge Device Tree node. From a
> given SBDF, it will be possible to find the corresponding MasterID.
>
> ## Finding which NUMA node a PCI device belongs to
>
> On NUMA system, the NUMA node associated to a PCI device can be found using
> the property "numa-node-id" (see [15]) presents in the host bridge Device Tree
> node.
>
> # Discovering PCI devices
>
> Whilst PCI devices are currently available in the hardware domain, the
> hypervisor does not have any knowledge of them. The first step of supporting
> PCI pass-through is to make Xen aware of the PCI devices.
>
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
>
> This means that Xen should be in charge of controlling the host bridge. However,
> for some host controller, this may be difficult to implement in Xen because of
> depencencies on other components (e.g clocks, see more details in "PCI host
> bridge" section).
>
> For this reason, the approach chosen in this document is to let the hardware
> domain to discover the host bridges, scan the PCI devices and then report
> everything to Xen. This does not rule out the possibility of doing everything
> without the help of the hardware domain in the future.
>
> ## Who is in charge of the host bridge?
>
> There are numerous implementation of host bridges which exist on ARM. A part of
> them requires a specific driver as they cannot be driven by a generic host bridge
> driver. Porting those drivers may be complex due to dependencies on other
> components.
>
> This would be seen as signal to leave the host bridge drivers in the hardware
> domain. Because Xen would need to access the configuration space, all the access
> would have to be forwarded to hardware domain which in turn will access the
> hardware.
>
> In this design document, we are considering that the host bridge driver can
> be ported in Xen. In the case it is not possible, a interface to forward
> configuration space access would need to be defined. The interface details
> is out of scope.
>
> ## Discovering and registering host bridge
Please add if this would be required both for APCI or only for DT.
>
> The approach taken in the document will require communication between Xen and
> the hardware domain. In this case, they would need to agree on the segment
> number associated to an host bridge. However, this number is not available in
> the Device Tree case.
>
> The hardware domain will register new host bridges using the existing hypercall
> PHYSDEV_mmcfg_reserved:
>
> #define XEN_PCI_MMCFG_RESERVED 1
>
> struct physdev_pci_mmcfg_reserved {
>      /* IN */
>      uint64_t    address;
>      uint16_t    segment;
>      /* Range of bus supported by the host bridge */
>      uint8_t     start_bus;
>      uint8_t     end_bus;
>
>      uint32_t    flags;
> }
>
> Some of the host bridges may not have a separate configuration address space
> region described in the firmware tables. To simplify the registration, the
> field 'address' should contains the base address of one of the region
> described in the firmware tables.
>      * For ACPI, it would be the base address specified in the MCFG or in the
>      _CBA method.
>      * For Device Tree, this would be any base address of region
>      specified in the "reg" property.
>
> The field 'flags' is expected to have XEN_PCI_MMCFG_RESERVED set.
>
> It is expected that this hypercall is called before any PCI devices is
> registered to Xen.
>
> When the hardware domain is in charge of the host bridge, this hypercall will
> be used to tell Xen the existence of an host bridge in order to find the
> associated information for configuring the MSI controller and the IOMMU.
>
> ## Discovering and registering PCI devices
>
> The hardware domain will scan the host bridge to find the list of PCI devices
> available and then report it to Xen using the existing hypercall
> PHYSDEV_pci_device_add:
>
> #define XEN_PCI_DEV_EXTFN   0x1
> #define XEN_PCI_DEV_VIRTFN  0x2
> #define XEN_PCI_DEV_PXM     0x3
>
> struct physdev_pci_device_add {
>      /* IN */
>      uint16_t    seg;
>      uint8_t     bus;
>      uint8_t     devfn;
>      uint32_t    flags;
>      struct {
>          uint8_t bus;
>          uint8_t devfn;
>      } physfn;
>      /*
>       * Optional parameters array.
>       * First element ([0]) is PXM domain associated with the device (if
>       * XEN_PCI_DEV_PXM is set)
>       */
>      uint32_t optarr[0];
> }
>
> When XEN_PCI_DEV_PXM is set in the field 'flag', optarr[0] will contain the
> NUMA node ID associated with the device:
>      * For ACPI, it would be the value returned by the method _PXM
>      * For Device Tree, this would the value found in the property "numa-node-id".
> For more details see the section "Finding which NUMA node a PCI device belongs
> to" in "ACPI" and "Device Tree".
>
> XXX: I still don't fully understand how XEN_PCI_DEV_EXTFN and XEN_PCI_DEV_VIRTFN
> wil work. AFAICT, the former is used with the bus support ARI and the only usage
> is in the x86 IOMMU code. For the latter, this is related to IOV but I am not
> sure what devfn and physfn.devfn will correspond too.
>
> Note that x86 currently provides two more hypercalls (PHYSDEVOP_manage_pci_add
> and PHYSDEVOP_manage_pci_add_ext) to register PCI devices. However they are
> subset of the hypercall PHYSDEVOP_pci_device_add. Therefore, it is suggested
> to leave them unimplemented on ARM.
>
> ## Removing PCI devices
>
> The hardware domain will be in charge Xen a device has been removed using
> the existing hypercall PHYSDEV_pci_device_remove:
>
> struct physdev_pci_device {
>      /* IN */
>      uint16_t    seg;
>      uint8_t     bus;
>      uint8_t     devfn;
> }
>
> Note that x86 currently provide one more hypercall (PHYSDEVOP_manage_pci_remove)
> to remove PCI devices. However it does not allow to pass a segment number.
> Therefore it is suggested to leave unimplemented on ARM.
Please add a flow from linux hypercall to calling SMMU api. This would 
make picture more clear.
> # Glossary
>
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> MSI doorbell: MMIO address written to by a device to generate an MSI
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
>
> # Specifications
> [SBSA]  ARM-DEN-0029 v3.0
> [GICV3] IHI0069C
> [IORT]  DEN0049B
>
> # Bibliography
>
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> [12] drivers/pci/host/pcie-rcar.c
> [13] drivers/pci/host/pci-thunder-ecam.c
> [14] drivers/pci/host/pci-thunder-pem.c
> [15] Documents/devicetree/bindings/numa.txt


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-01-22 11:10 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-26 17:14 [RFC] ARM PCI Passthrough design document Julien Grall
2017-05-29  2:30 ` Manish Jaggi
2017-05-29 18:14   ` Julien Grall
2017-05-30  5:53     ` Manish Jaggi
2017-05-30  9:33       ` Julien Grall
2017-05-30  7:53     ` Roger Pau Monné
2017-05-30  9:42       ` Julien Grall
2017-05-30  7:40 ` Roger Pau Monné
2017-05-30  9:54   ` Julien Grall
2017-06-16  0:31     ` Stefano Stabellini
2017-06-16  0:23 ` Stefano Stabellini
2017-06-20  0:19 ` Vikram Sethi
2017-06-28 15:22   ` Julien Grall
2017-06-29 15:17     ` Vikram Sethi
2017-07-03 14:35       ` Julien Grall
2017-07-04  8:30     ` roger.pau
2017-07-06 20:55       ` Vikram Sethi
2017-07-07  8:49         ` Roger Pau Monné
2017-07-07 21:50           ` Stefano Stabellini
2017-07-07 23:40             ` Vikram Sethi
2017-07-08  7:34             ` Roger Pau Monné
2018-01-19 10:34               ` Manish Jaggi
2017-07-19 14:41 ` Notes from PCI Passthrough design discussion at Xen Summit Punit Agrawal
2017-07-20  3:54   ` Manish Jaggi
2017-07-20  8:24     ` Roger Pau Monné
2017-07-20  9:32       ` Manish Jaggi
2017-07-20 10:29         ` Roger Pau Monné
2017-07-20 10:47           ` Julien Grall
2017-07-20 11:06             ` Roger Pau Monné
2017-07-20 11:52               ` Julien Grall
2017-07-20 11:02           ` Manish Jaggi
2017-07-20 10:41         ` Julien Grall
2017-07-20 11:00           ` Manish Jaggi
2017-07-20 12:24             ` Julien Grall
2018-01-22 11:10 ` [RFC] ARM PCI Passthrough design document Manish Jaggi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.