All of lore.kernel.org
 help / color / mirror / Atom feed
* [early RFC] ARM PCI Passthrough design document
@ 2016-12-29 14:04 Julien Grall
  2016-12-29 14:16 ` Jaggi, Manish
                   ` (5 more replies)
  0 siblings, 6 replies; 82+ messages in thread
From: Julien Grall @ 2016-12-29 14:04 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Jiandi An, Punit Agrawal,
	alistair.francis, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi all,

The document below is an early version of a design
proposal for PCI Passthrough in Xen. It aims to
describe from an high level perspective the interaction
with the different subsystems and how guest will be able
to discover and access PCI.

I am aware that a similar design has been posted recently
by Cavium (see [1]), however the approach to expose PCI
to guest is different. We have request to run unmodified
baremetal OS on Xen, a such guest would directly
access the devices and no PV drivers will be used.

That's why this design is based on emulating a root controller.
This also has the advantage to have the VM interface as close
as baremetal allowing the guest to use firmware tables to discover
the devices.

Currently on ARM, Xen does not have any knowledge about PCI devices.
This means that IOMMU and interrupt controller (such as ITS)
requiring specific configuration will not work with PCI even with
DOM0.

The PCI Passthrough work could be divided in 2 phases:
	* Phase 1: Register all PCI devices in Xen => will allow
		   to use ITS and SMMU with PCI in Xen
        * Phase 2: Assign devices to guests

This document aims to describe the 2 phases, but for now only phase
1 is fully described.

I have sent the design document to start to gather feedback on
phase 1.

Cheers,

[1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 

========================
% PCI pass-through support on ARM
% Julien Grall <julien.grall@linaro.org>
% Draft A

# Preface

This document aims to describe the components required to enable PCI
passthrough on ARM.

This is an early draft and some questions are still unanswered, when this is
the case the text will contain XXX.

# Introduction

PCI passthrough allows to give control of physical PCI devices to guest. This
means that the guest will have full and direct access to the PCI device.

ARM is supporting one kind of guest that is exploiting as much as possible
virtualization support in hardware. The guest will rely on PV driver only
for IO (e.g block, network), interrupts will come through the virtualized
interrupt controller. This means that there are no big changes required
within the kernel.

By consequence, it would be possible to replace PV drivers by assigning real
devices to the guest for I/O access. Xen on ARM would therefore be able to
run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating the
host bridge (we will go into more details later). A guest would be able
to take advantage of the firmware tables and obviating the need for a specific
driver for Xen.

Thus in this document we follow the emulated host bridge approach.

# PCI terminologies

Each PCI device under a host bridge is uniquely identified by its Requester ID
(AKA RID). A Requester ID is a triplet of Bus number, Device number, and
Function.

When the platform has multiple host bridges, the software can add fourth
number called Segment to differentiate host bridges. A PCI device will
then uniquely by segment:bus:device:function (AKA SBDF).

So given a specific SBDF, it would be possible to find the host bridge and the
RID associated to a PCI device.

# Interaction of the PCI subsystem with other subsystems

In order to have a PCI device fully working, Xen will need to configure
other subsystems subsytems such as the SMMU and the Interrupt Controller.

The interaction expected between the PCI subsystem and the other is:
    * Add a device
    * Remove a device
    * Assign a device to a guest
    * Deassign a device from a guest

XXX: Detail the interaction when assigning/deassigning device

The following subsections will briefly describe the interaction from an
higher level perspective. Implementation details (callback, structure...)
is out of scope.

## SMMU

The SMMU will be used to isolate the PCI device when accessing the memory
(for instance DMA and MSI Doorbells). Often the SMMU will be configured using
a StreamID (SID) that can be deduced from the RID with the help of the firmware
tables (see below).

Whilst in theory all the memory transaction issued by a PCI device should
go through the SMMU, on certain platforms some of the memory transaction may
not reach the SMMU because they are interpreted by the host bridge. For
instance this could happen if the MSI doorbell is built into the PCI host
bridge. See [6] for more details.

XXX: I think this could be solved by using the host memory layout when
creating a guest with PCI devices => Detail it.

## Interrupt controller

PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
either mapped to SPIs or LPIs.

Whilst SPIs can be programmed using an interrupt number, LPIs can be
identified via a pair (DeviceID, EventID) when configure through the ITS.

The DeviceID is a unique identifier for each MSI-capable device that can
be deduced from the RID with the help of the firmware tables (see below).

XXX: Figure out if something is necessary for GICv2m

# Information available in the firmware tables

## ACPI

### Host bridges

The static table MCFG (see 4.2 in [1]) will describe the host bridges available
at boot and supporting ECAM. Unfortunately there are platforms out there
(see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
compatible.

This means that Xen needs to account for possible quirks in the host bridge.
The Linux community are working on a patch series for see (see [2] and [3])
where quirks will be detected with:
    * OEM ID
    * OEM Table ID
    * OEM Revision
    * PCI Segment (from _SEG)
    * PCI bus number range (from _CRS, wildcard allowed)

Based on what Linux is currently doing, there are two kind of quirks:
    * Accesses to the configuration space of certain sizes are not allowed
    * A specific driver is necessary for driving the host bridge

The former is straight forward to solve, the latter will require more thought.
Instantiation of a specific driver for the host controller can be easily done
if Xen has the information to detect it. However, those drivers may require
resources described in ASL (see [4] for instance).

XXX: Need more investigation to know whether the missing information should
be passed by DOM0 or hardcoded in the driver.

### Finding the StreamID and DeviceID

The static table IORT (see [5]) will provide information that will help to
deduce the StreamID and DeviceID from a given RID.

## Device Tree

### Host bridges

Each Device Tree node associated to a host bridge will have at least the
following properties (see bindings in [8]):
    - device_type: will always be "pci".
    - compatible: a string indicating which driver to instantiate

The node may also contain optional properties such as:
    - linux,pci-domain: assign a fix segment number
    - bus-range: indicate the range of bus numbers supported

When the property linux,pci-domain is not present, the operating system would
have to allocate the segment number for each host bridges. Because the
algorithm to allocate the segment is not specified, it is necessary for
DOM0 and Xen to agree on the number before any PCI is been added.

### Finding the StreamID and DeviceID

### StreamID

The first binding existing (see [9]) for SMMU didn't have a way to describe the
relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
This bindins has now been deprecated in favor of a generic binding (see [10])
which will use the property "iommu-map" to describe the relationship between
an RID, the associated IOMMU and the StreamID.

### DeviceID

The relationship between the RID and the DeviceID can be found using the
property "msi-map" (see [11]).

# Discovering PCI devices

Whilst PCI devices are currently available in DOM0, the hypervisor does not
have any knowledge of them. The first step of supporting PCI passthrough is
to make Xen aware of the PCI devices.

Xen will require access to the PCI configuration space to retrieve information
for the PCI devices or access it on behalf of the guest via the emulated
host bridge.

## Discovering and register hostbridge

Both ACPI and Device Tree do not provide enough information to fully
instantiate an host bridge driver. In the case of ACPI, some data may come
from ASL, whilst for Device Tree the segment number is not available.

So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
with all the relevant informations. This will be done via a new hypercall
PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:

struct physdev_pci_host_bridge_add
{
    /* IN */
    uint16_t seg;
    /* Range of bus supported by the host bridge */
    uint8_t  bus_start;
    uint8_t  bus_nr;
    uint32_t res0;  /* Padding */
    /* Information about the configuration space region */
    uint64_t cfg_base;
    uint64_t cfg_size;
}

DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
bridge available on the platform. When Xen is receiving the hypercall, the
the driver associated to the host bridge will be instantiated.

XXX: Shall we limit DOM0 the access to the configuration space from that
moment?

## Discovering and register PCI

Similarly to x86, PCI devices will be discovered by DOM0 and register
using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.

By default all the PCI devices will be assigned to DOM0. So Xen would have
to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
devices. As mentioned earlier, those subsystems will require the StreamID
and DeviceID. Both can be deduced from the RID.

XXX: How to hide PCI devices from DOM0?

# Glossary

ECAM: Enhanced Configuration Mechanism
SBDF: Segment Bus Device Function. The segment is a software concept.
MSI: Message Signaled Interrupt
SPI: Shared Peripheral Interrupt
LPI: Locality-specific Peripheral Interrupt
ITS: Interrupt Translation Service

# Bibliography

[1] PCI firmware specification, rev 3.2
[2] https://www.spinics.net/lists/linux-pci/msg56715.html
[3] https://www.spinics.net/lists/linux-pci/msg56723.html
[4] https://www.spinics.net/lists/linux-pci/msg56728.html
[5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
[6] https://www.spinics.net/lists/kvm/msg140116.html
[7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
[8] Documents/devicetree/bindings/pci
[9] Documents/devicetree/bindings/iommu/arm,smmu.txt
[10] Document/devicetree/bindings/pci/pci-iommu.txt
[11] Documents/devicetree/bindings/pci/pci-msi.txt


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
@ 2016-12-29 14:16 ` Jaggi, Manish
  2016-12-29 17:03   ` Julien Grall
  2017-01-04  0:24 ` Stefano Stabellini
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 82+ messages in thread
From: Jaggi, Manish @ 2016-12-29 14:16 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Kapoor, Prasun, Nair, Jayachandran, Wei Chen, Campbell Sean,
	Jiandi An, Punit Agrawal, alistair.francis, Roger Pau Monné,
	Shanker Donthineni, Steve Capper


[-- Attachment #1.1: Type: text/plain, Size: 11639 bytes --]

Hi Julien,


Wouldnt it be better if the design proposed by cavium be extended by discussions and comeup with an agreeable to all design.

I didnt see any comments on the one I posted.

Putting an altogether new design without commenting on the one posted a month back, might not be a right approach



Regards,
Manish Jaggi
________________________________
From: Julien Grall <julien.grall@linaro.org>
Sent: Thursday, December 29, 2016 7:34:15 PM
To: xen-devel; Stefano Stabellini
Cc: Edgar Iglesias (edgar.iglesias@xilinx.com); Steve Capper; Punit Agrawal; Wei Chen; Campbell Sean; Shanker Donthineni; Jiandi An; Jaggi, Manish; Roger Pau Monné; alistair.francis@xilinx.com
Subject: [early RFC] ARM PCI Passthrough design document

Hi all,

The document below is an early version of a design
proposal for PCI Passthrough in Xen. It aims to
describe from an high level perspective the interaction
with the different subsystems and how guest will be able
to discover and access PCI.

I am aware that a similar design has been posted recently
by Cavium (see [1]), however the approach to expose PCI
to guest is different. We have request to run unmodified
baremetal OS on Xen, a such guest would directly
access the devices and no PV drivers will be used.

That's why this design is based on emulating a root controller.
This also has the advantage to have the VM interface as close
as baremetal allowing the guest to use firmware tables to discover
the devices.

Currently on ARM, Xen does not have any knowledge about PCI devices.
This means that IOMMU and interrupt controller (such as ITS)
requiring specific configuration will not work with PCI even with
DOM0.

The PCI Passthrough work could be divided in 2 phases:
        * Phase 1: Register all PCI devices in Xen => will allow
                   to use ITS and SMMU with PCI in Xen
        * Phase 2: Assign devices to guests

This document aims to describe the 2 phases, but for now only phase
1 is fully described.

I have sent the design document to start to gather feedback on
phase 1.

Cheers,

[1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html

========================
% PCI pass-through support on ARM
% Julien Grall <julien.grall@linaro.org>
% Draft A

# Preface

This document aims to describe the components required to enable PCI
passthrough on ARM.

This is an early draft and some questions are still unanswered, when this is
the case the text will contain XXX.

# Introduction

PCI passthrough allows to give control of physical PCI devices to guest. This
means that the guest will have full and direct access to the PCI device.

ARM is supporting one kind of guest that is exploiting as much as possible
virtualization support in hardware. The guest will rely on PV driver only
for IO (e.g block, network), interrupts will come through the virtualized
interrupt controller. This means that there are no big changes required
within the kernel.

By consequence, it would be possible to replace PV drivers by assigning real
devices to the guest for I/O access. Xen on ARM would therefore be able to
run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating the
host bridge (we will go into more details later). A guest would be able
to take advantage of the firmware tables and obviating the need for a specific
driver for Xen.

Thus in this document we follow the emulated host bridge approach.

# PCI terminologies

Each PCI device under a host bridge is uniquely identified by its Requester ID
(AKA RID). A Requester ID is a triplet of Bus number, Device number, and
Function.

When the platform has multiple host bridges, the software can add fourth
number called Segment to differentiate host bridges. A PCI device will
then uniquely by segment:bus:device:function (AKA SBDF).

So given a specific SBDF, it would be possible to find the host bridge and the
RID associated to a PCI device.

# Interaction of the PCI subsystem with other subsystems

In order to have a PCI device fully working, Xen will need to configure
other subsystems subsytems such as the SMMU and the Interrupt Controller.

The interaction expected between the PCI subsystem and the other is:
    * Add a device
    * Remove a device
    * Assign a device to a guest
    * Deassign a device from a guest

XXX: Detail the interaction when assigning/deassigning device

The following subsections will briefly describe the interaction from an
higher level perspective. Implementation details (callback, structure...)
is out of scope.

## SMMU

The SMMU will be used to isolate the PCI device when accessing the memory
(for instance DMA and MSI Doorbells). Often the SMMU will be configured using
a StreamID (SID) that can be deduced from the RID with the help of the firmware
tables (see below).

Whilst in theory all the memory transaction issued by a PCI device should
go through the SMMU, on certain platforms some of the memory transaction may
not reach the SMMU because they are interpreted by the host bridge. For
instance this could happen if the MSI doorbell is built into the PCI host
bridge. See [6] for more details.

XXX: I think this could be solved by using the host memory layout when
creating a guest with PCI devices => Detail it.

## Interrupt controller

PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
either mapped to SPIs or LPIs.

Whilst SPIs can be programmed using an interrupt number, LPIs can be
identified via a pair (DeviceID, EventID) when configure through the ITS.

The DeviceID is a unique identifier for each MSI-capable device that can
be deduced from the RID with the help of the firmware tables (see below).

XXX: Figure out if something is necessary for GICv2m

# Information available in the firmware tables

## ACPI

### Host bridges

The static table MCFG (see 4.2 in [1]) will describe the host bridges available
at boot and supporting ECAM. Unfortunately there are platforms out there
(see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
compatible.

This means that Xen needs to account for possible quirks in the host bridge.
The Linux community are working on a patch series for see (see [2] and [3])
where quirks will be detected with:
    * OEM ID
    * OEM Table ID
    * OEM Revision
    * PCI Segment (from _SEG)
    * PCI bus number range (from _CRS, wildcard allowed)

Based on what Linux is currently doing, there are two kind of quirks:
    * Accesses to the configuration space of certain sizes are not allowed
    * A specific driver is necessary for driving the host bridge

The former is straight forward to solve, the latter will require more thought.
Instantiation of a specific driver for the host controller can be easily done
if Xen has the information to detect it. However, those drivers may require
resources described in ASL (see [4] for instance).

XXX: Need more investigation to know whether the missing information should
be passed by DOM0 or hardcoded in the driver.

### Finding the StreamID and DeviceID

The static table IORT (see [5]) will provide information that will help to
deduce the StreamID and DeviceID from a given RID.

## Device Tree

### Host bridges

Each Device Tree node associated to a host bridge will have at least the
following properties (see bindings in [8]):
    - device_type: will always be "pci".
    - compatible: a string indicating which driver to instantiate

The node may also contain optional properties such as:
    - linux,pci-domain: assign a fix segment number
    - bus-range: indicate the range of bus numbers supported

When the property linux,pci-domain is not present, the operating system would
have to allocate the segment number for each host bridges. Because the
algorithm to allocate the segment is not specified, it is necessary for
DOM0 and Xen to agree on the number before any PCI is been added.

### Finding the StreamID and DeviceID

### StreamID

The first binding existing (see [9]) for SMMU didn't have a way to describe the
relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
This bindins has now been deprecated in favor of a generic binding (see [10])
which will use the property "iommu-map" to describe the relationship between
an RID, the associated IOMMU and the StreamID.

### DeviceID

The relationship between the RID and the DeviceID can be found using the
property "msi-map" (see [11]).

# Discovering PCI devices

Whilst PCI devices are currently available in DOM0, the hypervisor does not
have any knowledge of them. The first step of supporting PCI passthrough is
to make Xen aware of the PCI devices.

Xen will require access to the PCI configuration space to retrieve information
for the PCI devices or access it on behalf of the guest via the emulated
host bridge.

## Discovering and register hostbridge

Both ACPI and Device Tree do not provide enough information to fully
instantiate an host bridge driver. In the case of ACPI, some data may come
from ASL, whilst for Device Tree the segment number is not available.

So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
with all the relevant informations. This will be done via a new hypercall
PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:

struct physdev_pci_host_bridge_add
{
    /* IN */
    uint16_t seg;
    /* Range of bus supported by the host bridge */
    uint8_t  bus_start;
    uint8_t  bus_nr;
    uint32_t res0;  /* Padding */
    /* Information about the configuration space region */
    uint64_t cfg_base;
    uint64_t cfg_size;
}

DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
bridge available on the platform. When Xen is receiving the hypercall, the
the driver associated to the host bridge will be instantiated.

XXX: Shall we limit DOM0 the access to the configuration space from that
moment?

## Discovering and register PCI

Similarly to x86, PCI devices will be discovered by DOM0 and register
using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.

By default all the PCI devices will be assigned to DOM0. So Xen would have
to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
devices. As mentioned earlier, those subsystems will require the StreamID
and DeviceID. Both can be deduced from the RID.

XXX: How to hide PCI devices from DOM0?

# Glossary

ECAM: Enhanced Configuration Mechanism
SBDF: Segment Bus Device Function. The segment is a software concept.
MSI: Message Signaled Interrupt
SPI: Shared Peripheral Interrupt
LPI: Locality-specific Peripheral Interrupt
ITS: Interrupt Translation Service

# Bibliography

[1] PCI firmware specification, rev 3.2
[2] https://www.spinics.net/lists/linux-pci/msg56715.html
[3] https://www.spinics.net/lists/linux-pci/msg56723.html
[4] https://www.spinics.net/lists/linux-pci/msg56728.html
[5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
[6] https://www.spinics.net/lists/kvm/msg140116.html
[7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
[8] Documents/devicetree/bindings/pci
[9] Documents/devicetree/bindings/iommu/arm,smmu.txt
[10] Document/devicetree/bindings/pci/pci-iommu.txt
[11] Documents/devicetree/bindings/pci/pci-msi.txt


[-- Attachment #1.2: Type: text/html, Size: 15111 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:16 ` Jaggi, Manish
@ 2016-12-29 17:03   ` Julien Grall
  2016-12-29 18:41     ` Jaggi, Manish
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2016-12-29 17:03 UTC (permalink / raw)
  To: Jaggi, Manish, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Kapoor, Prasun, Nair, Jayachandran, Wei Chen, Campbell Sean,
	Jiandi An, Punit Agrawal, alistair.francis, Roger Pau Monné,
	Shanker Donthineni, Steve Capper



On 29/12/2016 14:16, Jaggi, Manish wrote:
> Hi Julien,

Hello Manish,

>
> Wouldnt it be better if the design proposed by cavium be extended by
> discussions and comeup with an agreeable to all design.

As I mentioned in my mail, this design is a completely different 
approach (emulation vs PV). This is a distinct proposal because 
emulation vs PV impact a lot the overall design.

> I didnt see any comments on the one I posted.

Whilst I haven't commented on your design document, I have read 
carefully your last version of the design. But even after 5 version and 
nearly 2 years of work this is still DT and ITS focused. No words about 
interrupt legacy, no words about ACPI... And as you can see in this 
draft, ACPI will have an impact on the overall.

Some part of this design document is based on all the discussion we had 
over last year on your design. However, most of the comments have not 
been addressed despite the fact they have been repeated multiple time by 
various reviewers. For example the bus number has not been added 
PHYSDEVOP_pci_host_bridge_add as requested in one of the first version 
of the design.

> Putting an altogether new design without commenting on the one posted a
> month back, might not be a right approach

Speaking for myself, my bandwidth is limited and I am going to 
prioritize review on series where my comments have been addressed.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 17:03   ` Julien Grall
@ 2016-12-29 18:41     ` Jaggi, Manish
  2016-12-29 19:38       ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Jaggi, Manish @ 2016-12-29 18:41 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Kapoor, Prasun, Nair, Jayachandran, Wei Chen, Campbell Sean,
	Jiandi An, Punit Agrawal, alistair.francis, Roger Pau Monné,
	Shanker Donthineni, Steve Capper


[-- Attachment #1.1: Type: text/plain, Size: 3691 bytes --]

Hi Julien,

________________________________
From: Julien Grall <julien.grall@linaro.org>
Sent: Thursday, December 29, 2016 10:33 PM
To: Jaggi, Manish; xen-devel; Stefano Stabellini
Cc: Edgar Iglesias (edgar.iglesias@xilinx.com); Steve Capper; Punit Agrawal; Wei Chen; Campbell Sean; Shanker Donthineni; Jiandi An; Roger Pau Monné; alistair.francis@xilinx.com; Kapoor, Prasun; Nair, Jayachandran
Subject: Re: [early RFC] ARM PCI Passthrough design document



On 29/12/2016 14:16, Jaggi, Manish wrote:
> Hi Julien,

Hello Manish,

>
> Wouldnt it be better if the design proposed by cavium be extended by
> discussions and comeup with an agreeable to all design.

As I mentioned in my mail, this design is a completely different
approach (emulation vs PV).
[manish] It would have been better if you had suggested in the design posted by me, that the following 1.2.3 points would change.
Since a design is already been posted, it makes sense to focus on that and reach a common understanding. And is a bit _rude_.

This is a distinct proposal because
emulation vs PV impact a lot the overall design.

[manish] There are 3 aspects
a. changes needed in Xen / Dom0 for
- registering of pci host controller driver in xen
- mapping between sdbf and streamid
- adding enumerated pci devices to xen dom0
- making devices use SMMU in dom0

b.1 How domU is assigned  a PCI device.
b.2 How a domU PCI driver reads configuration space
I think only at this point PV vs emulation matters. As of now the frontend backend driver allow reading PCI space.
Adding an its node in domU device tree will make traps to xen for ITS emulation.

c. DT vs ACPI
I havent seen in your design how it is captured to support both dt and acpi together.
A good appraoch would be to extend Draft5 with ACPI.

> I didnt see any comments on the one I posted.

Whilst I haven't commented on your design document, I have read
carefully your last version of the design. But even after 5 version and
nearly 2 years of work this is still DT and ITS focused.
[manish] Two reasons for  that
a) PCI driver in linux was evolving and only after msi-map we have a way to map sbdf with steamID
b) Market interest in Xen ARM64

No words about
interrupt legacy,
[Manish] At the time of Xen dev summit 2015 we agreed to keep legacy as a secondary item, so that we can get something in xen for PCI pt.
no words about ACPI... And as you can see in this
draft, ACPI will have an impact on the overall.

Some part of this design document is based on all the discussion we had
over last year on your design. However, most of the comments have not
been addressed despite the fact they have been repeated multiple time by
various reviewers. For example the bus number has not been added
PHYSDEVOP_pci_host_bridge_add as requested in one of the first version
of the design.
[manish] I disagree, since the same pci node is passed to dom0 and xen and it has a bus-nr property
there is no need for it.
Moreover this hypercall was suggested by Ian and the requirement was to only add segno so that xen and dom0 bind to a PCI RC.

> Putting an altogether new design without commenting on the one posted a
> month back, might not be a right approach

Speaking for myself, my bandwidth is limited and I am going to
prioritize review on series where my comments have been addressed.

[manish] All have limited bandwidth and priorities are loaded.  Your comments are dully taken care of in the document, espcially sbdf-streamID mapping.
I would have appreciated that you could have sent a draft 6 with your additions so that a good design be produced.

Regards,

--
Julien Grall

[-- Attachment #1.2: Type: text/html, Size: 5943 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 18:41     ` Jaggi, Manish
@ 2016-12-29 19:38       ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2016-12-29 19:38 UTC (permalink / raw)
  To: Jaggi, Manish, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Kapoor, Prasun, Nair, Jayachandran, Wei Chen, Campbell Sean,
	Jiandi An, Punit Agrawal, alistair.francis, Roger Pau Monné,
	Shanker Donthineni, Steve Capper



On 29/12/2016 18:41, Jaggi, Manish wrote:
> Hi Julien,

Please configure you e-mail client to properly quote a e-mail. The 
[manish] solution is hard to read.

> ------------------------------------------------------------------------
> *From:* Julien Grall <julien.grall@linaro.org>
> *Sent:* Thursday, December 29, 2016 10:33 PM
> *To:* Jaggi, Manish; xen-devel; Stefano Stabellini
> *Cc:* Edgar Iglesias (edgar.iglesias@xilinx.com); Steve Capper; Punit
> Agrawal; Wei Chen; Campbell Sean; Shanker Donthineni; Jiandi An; Roger
> Pau Monné; alistair.francis@xilinx.com; Kapoor, Prasun; Nair, Jayachandran
> *Subject:* Re: [early RFC] ARM PCI Passthrough design document
>
>
>
> On 29/12/2016 14:16, Jaggi, Manish wrote:
>> Hi Julien,
>
> Hello Manish,
>
>>
>> Wouldnt it be better if the design proposed by cavium be extended by
>> discussions and comeup with an agreeable to all design.
>
> As I mentioned in my mail, this design is a completely different
> approach (emulation vs PV).
> [manish] It would have been better if you had suggested in the design
> posted by me, that the following 1.2.3 points would change.
> Since a design is already been posted, it makes sense to focus on that
> and reach a common understanding. And is a bit _rude_.

Before saying it is rude, beware there was a gap of more than a year 
between v4 and v5. I also warned you on IRC that I was working on a new 
design document the last couple of months. So please don't do the person 
that was not aware.

>
> This is a distinct proposal because
> emulation vs PV impact a lot the overall design.
>
> [manish] There are 3 aspects
> a. changes needed in Xen / Dom0 for
> - registering of pci host controller driver in xen
> - mapping between sdbf and streamid
> - adding enumerated pci devices to xen dom0
> - making devices use SMMU in dom0
>
> b.1 How domU is assigned  a PCI device.
> b.2 How a domU PCI driver reads configuration space
> I think only at this point PV vs emulation matters. As of now the
> frontend backend driver allow reading PCI space.
> Adding an its node in domU device tree will make traps to xen for ITS
> emulation.
>
> c. DT vs ACPI
> I havent seen in your design how it is captured to support both dt and
> acpi together.
> A good appraoch would be to extend Draft5 with ACPI.

Please read again section "Information available in the firmware tables".

>
>> I didnt see any comments on the one I posted.
>
> Whilst I haven't commented on your design document, I have read
> carefully your last version of the design. But even after 5 version and
> nearly 2 years of work this is still DT and ITS focused.
> [manish] Two reasons for  that
> a) PCI driver in linux was evolving and only after msi-map we have a way
> to map sbdf with steamID

You are wrong here. msi-map is only here to do the mapping between RID 
and DeviceID. Not between RID and StreamID. Please read again the 
documentation (Documentation/devicetree/bindings/pci/pci-msi.txt). For 
RID to StreamID you want to look at the property iommu-map 
(Documentation/devicetree/bindings/pci/pci-iommu.txt).

[...]
>
> No words about
> interrupt legacy,
> [Manish] At the time of Xen dev summit 2015 we agreed to keep legacy as
> a secondary item, so that we can get something in xen for PCI pt.

Even if we don't implement it right now, we need to have a think about 
it as this may modify the overall design. What I want to avoid is have 
to redo everything in the future just because we knowingly ignored a bit.

> no words about ACPI... And as you can see in this
> draft, ACPI will have an impact on the overall.
>
> Some part of this design document is based on all the discussion we had
> over last year on your design. However, most of the comments have not
> been addressed despite the fact they have been repeated multiple time by
> various reviewers. For example the bus number has not been added
> PHYSDEVOP_pci_host_bridge_add as requested in one of the first version
> of the design.
> [manish] I disagree, since the same pci node is passed to dom0 and xen
> and it has a bus-nr property
> there is no need for it.

It is a bit annoying to have to repeat this, PCI passthrough is not only 
for the DT/ITS use case but also ACPI, GICv2m... This hypercall will be 
part of the stable ABI. Once it is defined, it can never change. So we 
have to make it correct from the beginning.

If you still disagree, please explain why. Ignoring a request is not the 
right thing to do if you want to see people reviewing your design document.

> Moreover this hypercall was suggested by Ian and the requirement was to
> only add segno so that xen and dom0 bind to a PCI RC.
>
>> Putting an altogether new design without commenting on the one posted a
>> month back, might not be a right approach
>
> Speaking for myself, my bandwidth is limited and I am going to
> prioritize review on series where my comments have been addressed.
>
> [manish] All have limited bandwidth and priorities are loaded.  Your
> comments are dully taken care of in the document, espcially
> sbdf-streamID mapping.

No it is not. You are still mixing DeviceID and StreamID. You added an 
hypercall without any justification and way to use it. I know we 
discussed about this on v4 and I was concerned about quirk for platform 
with a wrong RID. I was expecting from you to check if my concern was 
valid and come up with a justification.

Anyway, you are focusing on one comment, and not looking at the rest of 
them.

> I would have appreciated that you could have sent a draft 6 with your
> additions so that a good design be produced.

I already explained why I have sent a separate design document. I have 
been writing this design document before you sent the v5.

I first thought about re-using your design doc as a skeleton. But it was 
not fitting the way I wanted to describe the approach. Currently, I 
don't care about the implementation details. The important bits is the 
high level architecture and the justifications.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
  2016-12-29 14:16 ` Jaggi, Manish
@ 2017-01-04  0:24 ` Stefano Stabellini
  2017-01-24 14:28   ` Julien Grall
  2017-01-06 15:12 ` Roger Pau Monné
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-04  0:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean, Roger Pau Monné

On Thu, 29 Dec 2016, Julien Grall wrote:
> Hi all,
> 
> The document below is an early version of a design
> proposal for PCI Passthrough in Xen. It aims to
> describe from an high level perspective the interaction
> with the different subsystems and how guest will be able
> to discover and access PCI.
> 
> I am aware that a similar design has been posted recently
> by Cavium (see [1]), however the approach to expose PCI
> to guest is different. We have request to run unmodified
> baremetal OS on Xen, a such guest would directly
> access the devices and no PV drivers will be used.
> 
> That's why this design is based on emulating a root controller.
> This also has the advantage to have the VM interface as close
> as baremetal allowing the guest to use firmware tables to discover
> the devices.
> 
> Currently on ARM, Xen does not have any knowledge about PCI devices.
> This means that IOMMU and interrupt controller (such as ITS)
> requiring specific configuration will not work with PCI even with
> DOM0.
> 
> The PCI Passthrough work could be divided in 2 phases:
> 	* Phase 1: Register all PCI devices in Xen => will allow
> 		   to use ITS and SMMU with PCI in Xen
>         * Phase 2: Assign devices to guests
> 
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
> 
> I have sent the design document to start to gather feedback on
> phase 1.
> 
> Cheers,
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 
> 
> ========================
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft A
> 
> # Preface
> 
> This document aims to describe the components required to enable PCI
> passthrough on ARM.
> 
> This is an early draft and some questions are still unanswered, when this is
> the case the text will contain XXX.
> 
> # Introduction
> 
> PCI passthrough allows to give control of physical PCI devices to guest. This
> means that the guest will have full and direct access to the PCI device.
> 
> ARM is supporting one kind of guest that is exploiting as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network), interrupts will come through the virtualized
> interrupt controller. This means that there are no big changes required
> within the kernel.
> 
> By consequence, it would be possible to replace PV drivers by assigning real
  ^ As a consequence


> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
> 
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (we will go into more details later). A guest would be able
> to take advantage of the firmware tables and obviating the need for a specific
> driver for Xen.
> 
> Thus in this document we follow the emulated host bridge approach.
> 
> # PCI terminologies
> 
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
> 
> When the platform has multiple host bridges, the software can add fourth
                                                                   ^ a fourth

> number called Segment to differentiate host bridges. A PCI device will
> then uniquely by segment:bus:device:function (AKA SBDF).
> 
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device.
> 
> # Interaction of the PCI subsystem with other subsystems
> 
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems subsytems such as the SMMU and the Interrupt Controller.
                      ^ repetition

> The interaction expected between the PCI subsystem and the other is:
>     * Add a device
>     * Remove a device
>     * Assign a device to a guest
>     * Deassign a device from a guest
> 
> XXX: Detail the interaction when assigning/deassigning device
> 
> The following subsections will briefly describe the interaction from an
> higher level perspective. Implementation details (callback, structure...)
> is out of scope.
> 
> ## SMMU
> 
> The SMMU will be used to isolate the PCI device when accessing the memory
> (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
> a StreamID (SID) that can be deduced from the RID with the help of the firmware
> tables (see below).
> 
> Whilst in theory all the memory transaction issued by a PCI device should
                                      ^ transactions

> go through the SMMU, on certain platforms some of the memory transaction may
                                                                ^ transactions

> not reach the SMMU because they are interpreted by the host bridge. For
> instance this could happen if the MSI doorbell is built into the PCI host
> bridge. See [6] for more details.
> 
> XXX: I think this could be solved by using the host memory layout when
> creating a guest with PCI devices => Detail it.
> 
> ## Interrupt controller
> 
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> either mapped to SPIs or LPIs.
> 
> Whilst SPIs can be programmed using an interrupt number, LPIs can be
> identified via a pair (DeviceID, EventID) when configure through the ITS.
> 
> The DeviceID is a unique identifier for each MSI-capable device that can
> be deduced from the RID with the help of the firmware tables (see below).
> 
> XXX: Figure out if something is necessary for GICv2m
> 
> # Information available in the firmware tables
> 
> ## ACPI
> 
> ### Host bridges
> 
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
> 
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for see (see [2] and [3])
                                                    ^ for this, see


> where quirks will be detected with:
>     * OEM ID
>     * OEM Table ID
>     * OEM Revision
>     * PCI Segment (from _SEG)
>     * PCI bus number range (from _CRS, wildcard allowed)
> 
> Based on what Linux is currently doing, there are two kind of quirks:
>     * Accesses to the configuration space of certain sizes are not allowed
>     * A specific driver is necessary for driving the host bridge
> 
> The former is straight forward to solve, the latter will require more thought.
                ^ straightforward


> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
>
> XXX: Need more investigation to know whether the missing information should
> be passed by DOM0 or hardcoded in the driver.

Given that we are talking about quirks here, it would be better to just
hardcode them in the drivers, if possible.


> ### Finding the StreamID and DeviceID
> 
> The static table IORT (see [5]) will provide information that will help to
> deduce the StreamID and DeviceID from a given RID.
> 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instantiate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges. Because the
> algorithm to allocate the segment is not specified, it is necessary for
> DOM0 and Xen to agree on the number before any PCI is been added.
> 
> ### Finding the StreamID and DeviceID
> 
> ### StreamID
> 
> The first binding existing (see [9]) for SMMU didn't have a way to describe the
> relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
> This bindins has now been deprecated in favor of a generic binding (see [10])
       ^binding


> which will use the property "iommu-map" to describe the relationship between
> an RID, the associated IOMMU and the StreamID.
> 
> ### DeviceID
> 
> The relationship between the RID and the DeviceID can be found using the
> property "msi-map" (see [11]).
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in DOM0, the hypervisor does not
> have any knowledge of them. The first step of supporting PCI passthrough is
> to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
> 
> ## Discovering and register hostbridge
> 
> Both ACPI and Device Tree do not provide enough information to fully
> instantiate an host bridge driver. In the case of ACPI, some data may come
> from ASL,

The data available from ASL is just to initialize quirks and non-ECAM
controllers, right? Given that SBSA mandates ECAM, and we assume that
ACPI is mostly (if not only) for servers, then I think it is safe to say
that in the case of ACPI we should have all the info to fully
instantiate an host bridge driver.


> whilst for Device Tree the segment number is not available.
> 
> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> with all the relevant informations. This will be done via a new hypercall
> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:

I understand that the main purpose of this hypercall is to get Xen and Dom0 to
agree on the segment numbers, but why is it necessary? If Dom0 has an
emulated contoller like any other guest, do we care what segment numbers
Dom0 will use?


> struct physdev_pci_host_bridge_add
> {
>     /* IN */
>     uint16_t seg;
>     /* Range of bus supported by the host bridge */
>     uint8_t  bus_start;
>     uint8_t  bus_nr;
>     uint32_t res0;  /* Padding */
>     /* Information about the configuration space region */
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> }
> 
> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> bridge available on the platform. When Xen is receiving the hypercall, the
> the driver associated to the host bridge will be instantiated.

I think we should mention the relationship with the existing
PHYSDEVOP_pci_mmcfg_reserved hypercall.


> XXX: Shall we limit DOM0 the access to the configuration space from that
> moment?

If we can, we should


> ## Discovering and register PCI
> 
> Similarly to x86, PCI devices will be discovered by DOM0 and register
> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> 
> By default all the PCI devices will be assigned to DOM0. So Xen would have
> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> devices. As mentioned earlier, those subsystems will require the StreamID
> and DeviceID. Both can be deduced from the RID.
> 
> XXX: How to hide PCI devices from DOM0?
> 
> # Glossary
> 
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
> 
> # Bibliography
> 
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
  2016-12-29 14:16 ` Jaggi, Manish
  2017-01-04  0:24 ` Stefano Stabellini
@ 2017-01-06 15:12 ` Roger Pau Monné
  2017-01-06 21:16   ` Stefano Stabellini
  2017-01-24 17:17   ` Julien Grall
  2017-01-06 16:27 ` Edgar E. Iglesias
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 82+ messages in thread
From: Roger Pau Monné @ 2017-01-06 15:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean

On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
> Hi all,
> 
> The document below is an early version of a design
> proposal for PCI Passthrough in Xen. It aims to
> describe from an high level perspective the interaction
> with the different subsystems and how guest will be able
> to discover and access PCI.
> 
> I am aware that a similar design has been posted recently
> by Cavium (see [1]), however the approach to expose PCI
> to guest is different. We have request to run unmodified
> baremetal OS on Xen, a such guest would directly
> access the devices and no PV drivers will be used.
> 
> That's why this design is based on emulating a root controller.
> This also has the advantage to have the VM interface as close
> as baremetal allowing the guest to use firmware tables to discover
> the devices.
> 
> Currently on ARM, Xen does not have any knowledge about PCI devices.
> This means that IOMMU and interrupt controller (such as ITS)
> requiring specific configuration will not work with PCI even with
> DOM0.
> 
> The PCI Passthrough work could be divided in 2 phases:
> 	* Phase 1: Register all PCI devices in Xen => will allow
> 		   to use ITS and SMMU with PCI in Xen
>         * Phase 2: Assign devices to guests
> 
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
> 
> I have sent the design document to start to gather feedback on
> phase 1.

Thanks, this approach looks quite similar to what I have in mind for PVHv2
DomU/Dom0 pci-passthrough.

> Cheers,
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 
> 
> ========================
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft A
> 
> # Preface
> 
> This document aims to describe the components required to enable PCI
> passthrough on ARM.
> 
> This is an early draft and some questions are still unanswered, when this is
> the case the text will contain XXX.
> 
> # Introduction
> 
> PCI passthrough allows to give control of physical PCI devices to guest. This
> means that the guest will have full and direct access to the PCI device.
> 
> ARM is supporting one kind of guest that is exploiting as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network), interrupts will come through the virtualized
> interrupt controller. This means that there are no big changes required
> within the kernel.
> 
> By consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
> 
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (we will go into more details later). A guest would be able
> to take advantage of the firmware tables and obviating the need for a specific
> driver for Xen.
> 
> Thus in this document we follow the emulated host bridge approach.
> 
> # PCI terminologies
> 
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
> 
> When the platform has multiple host bridges, the software can add fourth
> number called Segment to differentiate host bridges. A PCI device will
> then uniquely by segment:bus:device:function (AKA SBDF).

From my reading of the above sentence, this implies that the segment is an
arbitrary number chosen by the OS? Isn't this picked from the MCFG ACPI table?

> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device.
> 
> # Interaction of the PCI subsystem with other subsystems
> 
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems subsytems such as the SMMU and the Interrupt Controller.
                   ^ duplicated.
> 
> The interaction expected between the PCI subsystem and the other is:
                                                         ^ this seems quite
                                                         confusing, what's "the
                                                         other"?
>     * Add a device
>     * Remove a device
>     * Assign a device to a guest
>     * Deassign a device from a guest
> 
> XXX: Detail the interaction when assigning/deassigning device

Assigning a device will probably entangle setting up some direct MMIO mappings
(BARs and ROMs) plus a bunch of traps in order to perform emulation of accesses
to the PCI config space (or those can be setup when a new bridge is registered
with Xen).

> The following subsections will briefly describe the interaction from an
> higher level perspective. Implementation details (callback, structure...)
> is out of scope.
> 
> ## SMMU
> 
> The SMMU will be used to isolate the PCI device when accessing the memory
> (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
> a StreamID (SID) that can be deduced from the RID with the help of the firmware
> tables (see below).
> 
> Whilst in theory all the memory transaction issued by a PCI device should
> go through the SMMU, on certain platforms some of the memory transaction may
> not reach the SMMU because they are interpreted by the host bridge. For
> instance this could happen if the MSI doorbell is built into the PCI host

I would elaborate on what is a MSI doorbell.

> bridge. See [6] for more details.
> 
> XXX: I think this could be solved by using the host memory layout when
> creating a guest with PCI devices => Detail it.

I'm not really sure I follow here, but if this write to the MSI doorbell
doesn't go through the SMMU, and instead is handled by the bridge, isn't there
a chance that a gust might be able to write anywhere in physical memory?

Or this only happens when a guest writes to a MSI doorbell that's trapped by
the bridge and not forwarded anywhere else?

> ## Interrupt controller
> 
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> either mapped to SPIs or LPIs.
> 
> Whilst SPIs can be programmed using an interrupt number, LPIs can be
> identified via a pair (DeviceID, EventID) when configure through the ITS.
                                                          ^d

> 
> The DeviceID is a unique identifier for each MSI-capable device that can
> be deduced from the RID with the help of the firmware tables (see below).
> 
> XXX: Figure out if something is necessary for GICv2m
> 
> # Information available in the firmware tables
> 
> ## ACPI
> 
> ### Host bridges
> 
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
                                                    ^s

> compatible.
> 
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for see (see [2] and [3])
> where quirks will be detected with:
>     * OEM ID
>     * OEM Table ID
>     * OEM Revision
>     * PCI Segment (from _SEG)
>     * PCI bus number range (from _CRS, wildcard allowed)

So segment and bus number range needs to be fetched from ACPI objects? Is that
because the information in the MCFG is lacking/wrong?

> 
> Based on what Linux is currently doing, there are two kind of quirks:
>     * Accesses to the configuration space of certain sizes are not allowed
>     * A specific driver is necessary for driving the host bridge

Hm, so what are the issues that make this bridges need specific drivers?

This might be quite problematic if you also have to emulate this broken
behavior inside of Xen (because Dom0 is using a specific driver).

> The former is straight forward to solve, the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
> 
> XXX: Need more investigation to know whether the missing information should
> be passed by DOM0 or hardcoded in the driver.

... or poke the ThunderX guys with a pointy stick until they get their act
together.

> ### Finding the StreamID and DeviceID
> 
> The static table IORT (see [5]) will provide information that will help to
> deduce the StreamID and DeviceID from a given RID.
> 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instantiate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges. Because the
> algorithm to allocate the segment is not specified, it is necessary for
> DOM0 and Xen to agree on the number before any PCI is been added.

Since this is all static, can't Xen just assign segment and bus-ranges for
bridges that lack them? (also why it's "linux,pci-domain", instead of just
"pci-domain"?)

> ### Finding the StreamID and DeviceID
> 
> ### StreamID
> 
> The first binding existing (see [9]) for SMMU didn't have a way to describe the
> relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
> This bindins has now been deprecated in favor of a generic binding (see [10])
> which will use the property "iommu-map" to describe the relationship between
> an RID, the associated IOMMU and the StreamID.
> 
> ### DeviceID
> 
> The relationship between the RID and the DeviceID can be found using the
> property "msi-map" (see [11]).
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in DOM0, the hypervisor does not
> have any knowledge of them. The first step of supporting PCI passthrough is
> to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated

I know this is not the intention, but the above sentence makes it look like
Xen is using an emulated host bridge IMHO (although I'm not a native speaker
anyway, so I can be wrong).

> host bridge.
> 
> ## Discovering and register hostbridge
> 
> Both ACPI and Device Tree do not provide enough information to fully
> instantiate an host bridge driver. In the case of ACPI, some data may come
> from ASL, whilst for Device Tree the segment number is not available.

For device-tree can't you just add a pci-domain to each bridge device on the DT
if none is specified?

For ACPI I understand that it's harder. Maybe ARM can somehow assure that MCFG
tables completely describe the system, so that you don't need this anymore.

> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> with all the relevant informations. This will be done via a new hypercall
> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> 
> struct physdev_pci_host_bridge_add
> {
>     /* IN */
>     uint16_t seg;
>     /* Range of bus supported by the host bridge */
>     uint8_t  bus_start;
>     uint8_t  bus_nr;
>     uint32_t res0;  /* Padding */
>     /* Information about the configuration space region */
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> }

Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes
in size?

If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
hypercalls.

> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> bridge available on the platform. When Xen is receiving the hypercall, the
> the driver associated to the host bridge will be instantiated.
> 
> XXX: Shall we limit DOM0 the access to the configuration space from that
> moment?

Most definitely yes, you should instantiate an emulated bridge over the real
one, in order to proxy Dom0 accesses to the PCI configuration space. You for
example don't want Dom0 moving the position of the BARs of PCI devices without
Xen being aware (and properly changing the second stage translation).

> ## Discovering and register PCI
> 
> Similarly to x86, PCI devices will be discovered by DOM0 and register
> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.

Why do you need this? If you have access to the bridges you can scan them from
Xen and discover the devices AFAICT.

> By default all the PCI devices will be assigned to DOM0. So Xen would have
> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> devices. As mentioned earlier, those subsystems will require the StreamID
> and DeviceID. Both can be deduced from the RID.
> 
> XXX: How to hide PCI devices from DOM0?

By adding the ACPI namespace of the device to the STAO and blocking Dom0
access to this device in the emulated bridge that Dom0 will have access to
(returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
                   ` (2 preceding siblings ...)
  2017-01-06 15:12 ` Roger Pau Monné
@ 2017-01-06 16:27 ` Edgar E. Iglesias
  2017-01-06 21:12   ` Stefano Stabellini
  2017-01-19  5:09 ` Manish Jaggi
  2017-05-19  6:38 ` Goel, Sameer
  5 siblings, 1 reply; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-01-06 16:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Jiandi An,
	Punit Agrawal, alistair.francis, xen-devel, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
> Hi all,
> 
> The document below is an early version of a design
> proposal for PCI Passthrough in Xen. It aims to
> describe from an high level perspective the interaction
> with the different subsystems and how guest will be able
> to discover and access PCI.
> 
> I am aware that a similar design has been posted recently
> by Cavium (see [1]), however the approach to expose PCI
> to guest is different. We have request to run unmodified
> baremetal OS on Xen, a such guest would directly
> access the devices and no PV drivers will be used.
> 
> That's why this design is based on emulating a root controller.
> This also has the advantage to have the VM interface as close
> as baremetal allowing the guest to use firmware tables to discover
> the devices.
> 
> Currently on ARM, Xen does not have any knowledge about PCI devices.
> This means that IOMMU and interrupt controller (such as ITS)
> requiring specific configuration will not work with PCI even with
> DOM0.
> 
> The PCI Passthrough work could be divided in 2 phases:
> 	* Phase 1: Register all PCI devices in Xen => will allow
> 		   to use ITS and SMMU with PCI in Xen
>         * Phase 2: Assign devices to guests
> 
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.

Thanks Julien,

A question.
IIUC, Dom0 will own the real host bridge and DomUs will access a
virtual emulated one.
In the case of an ECAM compatible host bridge that only needs to
be initialized via a host bridge specific register sequence,
do I understand correctly that the amount of emulation would be
very small (just enough to fool the guest that the init sequence
passed). Beyond that we could have a generic ECAM emulator/mappings?

How will we handle BAR setups?
Will we filter and make sure guests don't try to do funny stuff?
Perhaps Xen already has code for this (I'm guessing it does).

Thanks,
Edgar



> 
> I have sent the design document to start to gather feedback on
> phase 1.
> 
> Cheers,
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 
> 
> ========================
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft A
> 
> # Preface
> 
> This document aims to describe the components required to enable PCI
> passthrough on ARM.
> 
> This is an early draft and some questions are still unanswered, when this is
> the case the text will contain XXX.
> 
> # Introduction
> 
> PCI passthrough allows to give control of physical PCI devices to guest. This
> means that the guest will have full and direct access to the PCI device.
> 
> ARM is supporting one kind of guest that is exploiting as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network), interrupts will come through the virtualized
> interrupt controller. This means that there are no big changes required
> within the kernel.
> 
> By consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
> 
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (we will go into more details later). A guest would be able
> to take advantage of the firmware tables and obviating the need for a specific
> driver for Xen.
> 
> Thus in this document we follow the emulated host bridge approach.
> 
> # PCI terminologies
> 
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
> 
> When the platform has multiple host bridges, the software can add fourth
> number called Segment to differentiate host bridges. A PCI device will
> then uniquely by segment:bus:device:function (AKA SBDF).
> 
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device.
> 
> # Interaction of the PCI subsystem with other subsystems
> 
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems subsytems such as the SMMU and the Interrupt Controller.
> 
> The interaction expected between the PCI subsystem and the other is:
>     * Add a device
>     * Remove a device
>     * Assign a device to a guest
>     * Deassign a device from a guest
> 
> XXX: Detail the interaction when assigning/deassigning device
> 
> The following subsections will briefly describe the interaction from an
> higher level perspective. Implementation details (callback, structure...)
> is out of scope.
> 
> ## SMMU
> 
> The SMMU will be used to isolate the PCI device when accessing the memory
> (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
> a StreamID (SID) that can be deduced from the RID with the help of the firmware
> tables (see below).
> 
> Whilst in theory all the memory transaction issued by a PCI device should
> go through the SMMU, on certain platforms some of the memory transaction may
> not reach the SMMU because they are interpreted by the host bridge. For
> instance this could happen if the MSI doorbell is built into the PCI host
> bridge. See [6] for more details.
> 
> XXX: I think this could be solved by using the host memory layout when
> creating a guest with PCI devices => Detail it.
> 
> ## Interrupt controller
> 
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> either mapped to SPIs or LPIs.
> 
> Whilst SPIs can be programmed using an interrupt number, LPIs can be
> identified via a pair (DeviceID, EventID) when configure through the ITS.
> 
> The DeviceID is a unique identifier for each MSI-capable device that can
> be deduced from the RID with the help of the firmware tables (see below).
> 
> XXX: Figure out if something is necessary for GICv2m
> 
> # Information available in the firmware tables
> 
> ## ACPI
> 
> ### Host bridges
> 
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
> 
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for see (see [2] and [3])
> where quirks will be detected with:
>     * OEM ID
>     * OEM Table ID
>     * OEM Revision
>     * PCI Segment (from _SEG)
>     * PCI bus number range (from _CRS, wildcard allowed)
> 
> Based on what Linux is currently doing, there are two kind of quirks:
>     * Accesses to the configuration space of certain sizes are not allowed
>     * A specific driver is necessary for driving the host bridge
> 
> The former is straight forward to solve, the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
> 
> XXX: Need more investigation to know whether the missing information should
> be passed by DOM0 or hardcoded in the driver.
> 
> ### Finding the StreamID and DeviceID
> 
> The static table IORT (see [5]) will provide information that will help to
> deduce the StreamID and DeviceID from a given RID.
> 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instantiate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges. Because the
> algorithm to allocate the segment is not specified, it is necessary for
> DOM0 and Xen to agree on the number before any PCI is been added.
> 
> ### Finding the StreamID and DeviceID
> 
> ### StreamID
> 
> The first binding existing (see [9]) for SMMU didn't have a way to describe the
> relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
> This bindins has now been deprecated in favor of a generic binding (see [10])
> which will use the property "iommu-map" to describe the relationship between
> an RID, the associated IOMMU and the StreamID.
> 
> ### DeviceID
> 
> The relationship between the RID and the DeviceID can be found using the
> property "msi-map" (see [11]).
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in DOM0, the hypervisor does not
> have any knowledge of them. The first step of supporting PCI passthrough is
> to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
> 
> ## Discovering and register hostbridge
> 
> Both ACPI and Device Tree do not provide enough information to fully
> instantiate an host bridge driver. In the case of ACPI, some data may come
> from ASL, whilst for Device Tree the segment number is not available.
> 
> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> with all the relevant informations. This will be done via a new hypercall
> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> 
> struct physdev_pci_host_bridge_add
> {
>     /* IN */
>     uint16_t seg;
>     /* Range of bus supported by the host bridge */
>     uint8_t  bus_start;
>     uint8_t  bus_nr;
>     uint32_t res0;  /* Padding */
>     /* Information about the configuration space region */
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> }
> 
> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> bridge available on the platform. When Xen is receiving the hypercall, the
> the driver associated to the host bridge will be instantiated.
> 
> XXX: Shall we limit DOM0 the access to the configuration space from that
> moment?
> 
> ## Discovering and register PCI
> 
> Similarly to x86, PCI devices will be discovered by DOM0 and register
> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> 
> By default all the PCI devices will be assigned to DOM0. So Xen would have
> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> devices. As mentioned earlier, those subsystems will require the StreamID
> and DeviceID. Both can be deduced from the RID.
> 
> XXX: How to hide PCI devices from DOM0?
> 
> # Glossary
> 
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
> 
> # Bibliography
> 
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-06 16:27 ` Edgar E. Iglesias
@ 2017-01-06 21:12   ` Stefano Stabellini
  2017-01-09 17:50     ` Edgar E. Iglesias
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-06 21:12 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Julien Grall, alistair.francis, Punit Agrawal,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean,
	Roger Pau Monné

On Fri, 6 Jan 2017, Edgar E. Iglesias wrote:
> On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
> > Hi all,
> > 
> > The document below is an early version of a design
> > proposal for PCI Passthrough in Xen. It aims to
> > describe from an high level perspective the interaction
> > with the different subsystems and how guest will be able
> > to discover and access PCI.
> > 
> > I am aware that a similar design has been posted recently
> > by Cavium (see [1]), however the approach to expose PCI
> > to guest is different. We have request to run unmodified
> > baremetal OS on Xen, a such guest would directly
> > access the devices and no PV drivers will be used.
> > 
> > That's why this design is based on emulating a root controller.
> > This also has the advantage to have the VM interface as close
> > as baremetal allowing the guest to use firmware tables to discover
> > the devices.
> > 
> > Currently on ARM, Xen does not have any knowledge about PCI devices.
> > This means that IOMMU and interrupt controller (such as ITS)
> > requiring specific configuration will not work with PCI even with
> > DOM0.
> > 
> > The PCI Passthrough work could be divided in 2 phases:
> > 	* Phase 1: Register all PCI devices in Xen => will allow
> > 		   to use ITS and SMMU with PCI in Xen
> >         * Phase 2: Assign devices to guests
> > 
> > This document aims to describe the 2 phases, but for now only phase
> > 1 is fully described.
> 
> Thanks Julien,
> 
> A question.
> IIUC, Dom0 will own the real host bridge and DomUs will access a
> virtual emulated one.
> In the case of an ECAM compatible host bridge that only needs to
> be initialized via a host bridge specific register sequence,
> do I understand correctly that the amount of emulation would be
> very small (just enough to fool the guest that the init sequence
> passed). Beyond that we could have a generic ECAM emulator/mappings?

I think so.


> How will we handle BAR setups?
> Will we filter and make sure guests don't try to do funny stuff?
> Perhaps Xen already has code for this (I'm guessing it does).

Yes, we'll have to filter guest accesses. There is already some code in
Xen to do that, especially in regard to MSI and MSI-X setup.


> > 
> > I have sent the design document to start to gather feedback on
> > phase 1.
> > 
> > Cheers,
> > 
> > [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 
> > 
> > ========================
> > % PCI pass-through support on ARM
> > % Julien Grall <julien.grall@linaro.org>
> > % Draft A
> > 
> > # Preface
> > 
> > This document aims to describe the components required to enable PCI
> > passthrough on ARM.
> > 
> > This is an early draft and some questions are still unanswered, when this is
> > the case the text will contain XXX.
> > 
> > # Introduction
> > 
> > PCI passthrough allows to give control of physical PCI devices to guest. This
> > means that the guest will have full and direct access to the PCI device.
> > 
> > ARM is supporting one kind of guest that is exploiting as much as possible
> > virtualization support in hardware. The guest will rely on PV driver only
> > for IO (e.g block, network), interrupts will come through the virtualized
> > interrupt controller. This means that there are no big changes required
> > within the kernel.
> > 
> > By consequence, it would be possible to replace PV drivers by assigning real
> > devices to the guest for I/O access. Xen on ARM would therefore be able to
> > run unmodified operating system.
> > 
> > To achieve this goal, it looks more sensible to go towards emulating the
> > host bridge (we will go into more details later). A guest would be able
> > to take advantage of the firmware tables and obviating the need for a specific
> > driver for Xen.
> > 
> > Thus in this document we follow the emulated host bridge approach.
> > 
> > # PCI terminologies
> > 
> > Each PCI device under a host bridge is uniquely identified by its Requester ID
> > (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> > Function.
> > 
> > When the platform has multiple host bridges, the software can add fourth
> > number called Segment to differentiate host bridges. A PCI device will
> > then uniquely by segment:bus:device:function (AKA SBDF).
> > 
> > So given a specific SBDF, it would be possible to find the host bridge and the
> > RID associated to a PCI device.
> > 
> > # Interaction of the PCI subsystem with other subsystems
> > 
> > In order to have a PCI device fully working, Xen will need to configure
> > other subsystems subsytems such as the SMMU and the Interrupt Controller.
> > 
> > The interaction expected between the PCI subsystem and the other is:
> >     * Add a device
> >     * Remove a device
> >     * Assign a device to a guest
> >     * Deassign a device from a guest
> > 
> > XXX: Detail the interaction when assigning/deassigning device
> > 
> > The following subsections will briefly describe the interaction from an
> > higher level perspective. Implementation details (callback, structure...)
> > is out of scope.
> > 
> > ## SMMU
> > 
> > The SMMU will be used to isolate the PCI device when accessing the memory
> > (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
> > a StreamID (SID) that can be deduced from the RID with the help of the firmware
> > tables (see below).
> > 
> > Whilst in theory all the memory transaction issued by a PCI device should
> > go through the SMMU, on certain platforms some of the memory transaction may
> > not reach the SMMU because they are interpreted by the host bridge. For
> > instance this could happen if the MSI doorbell is built into the PCI host
> > bridge. See [6] for more details.
> > 
> > XXX: I think this could be solved by using the host memory layout when
> > creating a guest with PCI devices => Detail it.
> > 
> > ## Interrupt controller
> > 
> > PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> > legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> > either mapped to SPIs or LPIs.
> > 
> > Whilst SPIs can be programmed using an interrupt number, LPIs can be
> > identified via a pair (DeviceID, EventID) when configure through the ITS.
> > 
> > The DeviceID is a unique identifier for each MSI-capable device that can
> > be deduced from the RID with the help of the firmware tables (see below).
> > 
> > XXX: Figure out if something is necessary for GICv2m
> > 
> > # Information available in the firmware tables
> > 
> > ## ACPI
> > 
> > ### Host bridges
> > 
> > The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> > at boot and supporting ECAM. Unfortunately there are platforms out there
> > (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> > compatible.
> > 
> > This means that Xen needs to account for possible quirks in the host bridge.
> > The Linux community are working on a patch series for see (see [2] and [3])
> > where quirks will be detected with:
> >     * OEM ID
> >     * OEM Table ID
> >     * OEM Revision
> >     * PCI Segment (from _SEG)
> >     * PCI bus number range (from _CRS, wildcard allowed)
> > 
> > Based on what Linux is currently doing, there are two kind of quirks:
> >     * Accesses to the configuration space of certain sizes are not allowed
> >     * A specific driver is necessary for driving the host bridge
> > 
> > The former is straight forward to solve, the latter will require more thought.
> > Instantiation of a specific driver for the host controller can be easily done
> > if Xen has the information to detect it. However, those drivers may require
> > resources described in ASL (see [4] for instance).
> > 
> > XXX: Need more investigation to know whether the missing information should
> > be passed by DOM0 or hardcoded in the driver.
> > 
> > ### Finding the StreamID and DeviceID
> > 
> > The static table IORT (see [5]) will provide information that will help to
> > deduce the StreamID and DeviceID from a given RID.
> > 
> > ## Device Tree
> > 
> > ### Host bridges
> > 
> > Each Device Tree node associated to a host bridge will have at least the
> > following properties (see bindings in [8]):
> >     - device_type: will always be "pci".
> >     - compatible: a string indicating which driver to instantiate
> > 
> > The node may also contain optional properties such as:
> >     - linux,pci-domain: assign a fix segment number
> >     - bus-range: indicate the range of bus numbers supported
> > 
> > When the property linux,pci-domain is not present, the operating system would
> > have to allocate the segment number for each host bridges. Because the
> > algorithm to allocate the segment is not specified, it is necessary for
> > DOM0 and Xen to agree on the number before any PCI is been added.
> > 
> > ### Finding the StreamID and DeviceID
> > 
> > ### StreamID
> > 
> > The first binding existing (see [9]) for SMMU didn't have a way to describe the
> > relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
> > This bindins has now been deprecated in favor of a generic binding (see [10])
> > which will use the property "iommu-map" to describe the relationship between
> > an RID, the associated IOMMU and the StreamID.
> > 
> > ### DeviceID
> > 
> > The relationship between the RID and the DeviceID can be found using the
> > property "msi-map" (see [11]).
> > 
> > # Discovering PCI devices
> > 
> > Whilst PCI devices are currently available in DOM0, the hypervisor does not
> > have any knowledge of them. The first step of supporting PCI passthrough is
> > to make Xen aware of the PCI devices.
> > 
> > Xen will require access to the PCI configuration space to retrieve information
> > for the PCI devices or access it on behalf of the guest via the emulated
> > host bridge.
> > 
> > ## Discovering and register hostbridge
> > 
> > Both ACPI and Device Tree do not provide enough information to fully
> > instantiate an host bridge driver. In the case of ACPI, some data may come
> > from ASL, whilst for Device Tree the segment number is not available.
> > 
> > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > with all the relevant informations. This will be done via a new hypercall
> > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > 
> > struct physdev_pci_host_bridge_add
> > {
> >     /* IN */
> >     uint16_t seg;
> >     /* Range of bus supported by the host bridge */
> >     uint8_t  bus_start;
> >     uint8_t  bus_nr;
> >     uint32_t res0;  /* Padding */
> >     /* Information about the configuration space region */
> >     uint64_t cfg_base;
> >     uint64_t cfg_size;
> > }
> > 
> > DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> > bridge available on the platform. When Xen is receiving the hypercall, the
> > the driver associated to the host bridge will be instantiated.
> > 
> > XXX: Shall we limit DOM0 the access to the configuration space from that
> > moment?
> > 
> > ## Discovering and register PCI
> > 
> > Similarly to x86, PCI devices will be discovered by DOM0 and register
> > using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> > 
> > By default all the PCI devices will be assigned to DOM0. So Xen would have
> > to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> > devices. As mentioned earlier, those subsystems will require the StreamID
> > and DeviceID. Both can be deduced from the RID.
> > 
> > XXX: How to hide PCI devices from DOM0?
> > 
> > # Glossary
> > 
> > ECAM: Enhanced Configuration Mechanism
> > SBDF: Segment Bus Device Function. The segment is a software concept.
> > MSI: Message Signaled Interrupt
> > SPI: Shared Peripheral Interrupt
> > LPI: Locality-specific Peripheral Interrupt
> > ITS: Interrupt Translation Service
> > 
> > # Bibliography
> > 
> > [1] PCI firmware specification, rev 3.2
> > [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> > [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> > [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> > [5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
> > [6] https://www.spinics.net/lists/kvm/msg140116.html
> > [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> > [8] Documents/devicetree/bindings/pci
> > [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> > [10] Document/devicetree/bindings/pci/pci-iommu.txt
> > [11] Documents/devicetree/bindings/pci/pci-msi.txt
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-06 15:12 ` Roger Pau Monné
@ 2017-01-06 21:16   ` Stefano Stabellini
  2017-01-24 17:17   ` Julien Grall
  1 sibling, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-06 21:16 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Punit Agrawal, Steve Capper,
	Jiandi An, Julien Grall, alistair.francis, Shanker Donthineni,
	xen-devel, manish.jaggi, Campbell Sean

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2988 bytes --]

On Fri, 6 Jan 2017, Roger Pau Monné wrote:
> > bridge. See [6] for more details.
> > 
> > XXX: I think this could be solved by using the host memory layout when
> > creating a guest with PCI devices => Detail it.
> 
> I'm not really sure I follow here, but if this write to the MSI doorbell
> doesn't go through the SMMU, and instead is handled by the bridge, isn't there
> a chance that a gust might be able to write anywhere in physical memory?
> 
> Or this only happens when a guest writes to a MSI doorbell that's trapped by
> the bridge and not forwarded anywhere else?

It only happens when a device (not a cpu) writes to the MSI doorbell.


> > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > with all the relevant informations. This will be done via a new hypercall
> > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > 
> > struct physdev_pci_host_bridge_add
> > {
> >     /* IN */
> >     uint16_t seg;
> >     /* Range of bus supported by the host bridge */
> >     uint8_t  bus_start;
> >     uint8_t  bus_nr;
> >     uint32_t res0;  /* Padding */
> >     /* Information about the configuration space region */
> >     uint64_t cfg_base;
> >     uint64_t cfg_size;
> > }
> 
> Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes
> in size?
> 
> If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
> hypercalls.
> 
> > DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> > bridge available on the platform. When Xen is receiving the hypercall, the
> > the driver associated to the host bridge will be instantiated.
> > 
> > XXX: Shall we limit DOM0 the access to the configuration space from that
> > moment?
> 
> Most definitely yes, you should instantiate an emulated bridge over the real
> one, in order to proxy Dom0 accesses to the PCI configuration space. You for
> example don't want Dom0 moving the position of the BARs of PCI devices without
> Xen being aware (and properly changing the second stage translation).
> 
> > ## Discovering and register PCI
> > 
> > Similarly to x86, PCI devices will be discovered by DOM0 and register
> > using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> 
> Why do you need this? If you have access to the bridges you can scan them from
> Xen and discover the devices AFAICT.

I think the same


> > By default all the PCI devices will be assigned to DOM0. So Xen would have
> > to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> > devices. As mentioned earlier, those subsystems will require the StreamID
> > and DeviceID. Both can be deduced from the RID.
> > 
> > XXX: How to hide PCI devices from DOM0?
> 
> By adding the ACPI namespace of the device to the STAO and blocking Dom0
> access to this device in the emulated bridge that Dom0 will have access to
> (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).

Good suggestion

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-06 21:12   ` Stefano Stabellini
@ 2017-01-09 17:50     ` Edgar E. Iglesias
  0 siblings, 0 replies; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-01-09 17:50 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Punit Agrawal, Wei Chen, Campbell Sean, Jiandi An, Julien Grall,
	alistair.francis, xen-devel, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On Fri, Jan 06, 2017 at 01:12:44PM -0800, Stefano Stabellini wrote:
> On Fri, 6 Jan 2017, Edgar E. Iglesias wrote:
> > On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
> > > Hi all,
> > > 
> > > The document below is an early version of a design
> > > proposal for PCI Passthrough in Xen. It aims to
> > > describe from an high level perspective the interaction
> > > with the different subsystems and how guest will be able
> > > to discover and access PCI.
> > > 
> > > I am aware that a similar design has been posted recently
> > > by Cavium (see [1]), however the approach to expose PCI
> > > to guest is different. We have request to run unmodified
> > > baremetal OS on Xen, a such guest would directly
> > > access the devices and no PV drivers will be used.
> > > 
> > > That's why this design is based on emulating a root controller.
> > > This also has the advantage to have the VM interface as close
> > > as baremetal allowing the guest to use firmware tables to discover
> > > the devices.
> > > 
> > > Currently on ARM, Xen does not have any knowledge about PCI devices.
> > > This means that IOMMU and interrupt controller (such as ITS)
> > > requiring specific configuration will not work with PCI even with
> > > DOM0.
> > > 
> > > The PCI Passthrough work could be divided in 2 phases:
> > > 	* Phase 1: Register all PCI devices in Xen => will allow
> > > 		   to use ITS and SMMU with PCI in Xen
> > >         * Phase 2: Assign devices to guests
> > > 
> > > This document aims to describe the 2 phases, but for now only phase
> > > 1 is fully described.
> > 
> > Thanks Julien,
> > 
> > A question.
> > IIUC, Dom0 will own the real host bridge and DomUs will access a
> > virtual emulated one.
> > In the case of an ECAM compatible host bridge that only needs to
> > be initialized via a host bridge specific register sequence,
> > do I understand correctly that the amount of emulation would be
> > very small (just enough to fool the guest that the init sequence
> > passed). Beyond that we could have a generic ECAM emulator/mappings?
> 
> I think so.
> 
> 
> > How will we handle BAR setups?
> > Will we filter and make sure guests don't try to do funny stuff?
> > Perhaps Xen already has code for this (I'm guessing it does).
> 
> Yes, we'll have to filter guest accesses. There is already some code in
> Xen to do that, especially in regard to MSI and MSI-X setup.
> 

Thanks for clarifying!

I think the proposal looks good so far.

Cheers,
Edgar

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
                   ` (3 preceding siblings ...)
  2017-01-06 16:27 ` Edgar E. Iglesias
@ 2017-01-19  5:09 ` Manish Jaggi
  2017-01-24 17:43   ` Julien Grall
  2017-05-19  6:38 ` Goel, Sameer
  5 siblings, 1 reply; 82+ messages in thread
From: Manish Jaggi @ 2017-01-19  5:09 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Kapoor, Prasun, Jiandi An,
	Punit Agrawal, alistair.francis, jnair, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Julien,

On 12/29/2016 07:34 PM, Julien Grall wrote:
> Hi all,
> 
> The document below is an early version of a design
> proposal for PCI Passthrough in Xen. It aims to
> describe from an high level perspective the interaction
> with the different subsystems and how guest will be able
> to discover and access PCI.
> 
> I am aware that a similar design has been posted recently
> by Cavium (see [1]), however the approach to expose PCI
> to guest is different. We have request to run unmodified
> baremetal OS on Xen, a such guest would directly
> access the devices and no PV drivers will be used.
> 
> That's why this design is based on emulating a root controller.
> This also has the advantage to have the VM interface as close
> as baremetal allowing the guest to use firmware tables to discover
> the devices.
> 
> Currently on ARM, Xen does not have any knowledge about PCI devices.
> This means that IOMMU and interrupt controller (such as ITS)
> requiring specific configuration will not work with PCI even with
> DOM0.
> 
> The PCI Passthrough work could be divided in 2 phases:
> 	* Phase 1: Register all PCI devices in Xen => will allow
> 		   to use ITS and SMMU with PCI in Xen
>         * Phase 2: Assign devices to guests
> 
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
> 
> I have sent the design document to start to gather feedback on
> phase 1.
> 
> Cheers,
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 
> 
> ========================
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@linaro.org>
> % Draft A
> 
> # Preface
> 
> This document aims to describe the components required to enable PCI
> passthrough on ARM.
> 
> This is an early draft and some questions are still unanswered, when this is
> the case the text will contain XXX.
> 
> # Introduction
> 
> PCI passthrough allows to give control of physical PCI devices to guest. This
> means that the guest will have full and direct access to the PCI device.
> 
> ARM is supporting one kind of guest that is exploiting as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network), interrupts will come through the virtualized
> interrupt controller. This means that there are no big changes required
> within the kernel.
> 
> By consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
> 
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (we will go into more details later). A guest would be able
> to take advantage of the firmware tables and obviating the need for a specific
> driver for Xen.
> 
> Thus in this document we follow the emulated host bridge approach.
> 
> # PCI terminologies
> 
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
> 
> When the platform has multiple host bridges, the software can add fourth
> number called Segment to differentiate host bridges. A PCI device will
> then uniquely by segment:bus:device:function (AKA SBDF).
> 
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device.
> 
> # Interaction of the PCI subsystem with other subsystems
> 
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems subsytems such as the SMMU and the Interrupt Controller.
> 
> The interaction expected between the PCI subsystem and the other is:
>     * Add a device
>     * Remove a device
>     * Assign a device to a guest
>     * Deassign a device from a guest
> 
> XXX: Detail the interaction when assigning/deassigning device
> 
> The following subsections will briefly describe the interaction from an
> higher level perspective. Implementation details (callback, structure...)
> is out of scope.
> 
> ## SMMU
> 
> The SMMU will be used to isolate the PCI device when accessing the memory
> (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
> a StreamID (SID) that can be deduced from the RID with the help of the firmware
> tables (see below).
> 
> Whilst in theory all the memory transaction issued by a PCI device should
> go through the SMMU, on certain platforms some of the memory transaction may
> not reach the SMMU because they are interpreted by the host bridge. For
> instance this could happen if the MSI doorbell is built into the PCI host
> bridge. See [6] for more details.
> 
> XXX: I think this could be solved by using the host memory layout when
> creating a guest with PCI devices => Detail it.
> 
> ## Interrupt controller
> 
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> either mapped to SPIs or LPIs.
> 
> Whilst SPIs can be programmed using an interrupt number, LPIs can be
> identified via a pair (DeviceID, EventID) when configure through the ITS.
> 
> The DeviceID is a unique identifier for each MSI-capable device that can
> be deduced from the RID with the help of the firmware tables (see below).
> 
> XXX: Figure out if something is necessary for GICv2m
> 
> # Information available in the firmware tables
> 
> ## ACPI
> 
> ### Host bridges
> 
> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> at boot and supporting ECAM. Unfortunately there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
> 
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for see (see [2] and [3])
> where quirks will be detected with:
>     * OEM ID
>     * OEM Table ID
>     * OEM Revision
>     * PCI Segment (from _SEG)
>     * PCI bus number range (from _CRS, wildcard allowed)
> 
> Based on what Linux is currently doing, there are two kind of quirks:
>     * Accesses to the configuration space of certain sizes are not allowed
>     * A specific driver is necessary for driving the host bridge
> 
> The former is straight forward to solve, the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
> 
> XXX: Need more investigation to know whether the missing information should
> be passed by DOM0 or hardcoded in the driver.
> 
> ### Finding the StreamID and DeviceID
> 
> The static table IORT (see [5]) will provide information that will help to
> deduce the StreamID and DeviceID from a given RID.
> 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instantiate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges. Because the
> algorithm to allocate the segment is not specified, it is necessary for
> DOM0 and Xen to agree on the number before any PCI is been added.
> 
> ### Finding the StreamID and DeviceID
> 
> ### StreamID
> 
> The first binding existing (see [9]) for SMMU didn't have a way to describe the
> relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
> This bindins has now been deprecated in favor of a generic binding (see [10])
> which will use the property "iommu-map" to describe the relationship between
> an RID, the associated IOMMU and the StreamID.
> 
> ### DeviceID
> 
> The relationship between the RID and the DeviceID can be found using the
> property "msi-map" (see [11]).
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in DOM0, the hypervisor does not
> have any knowledge of them. The first step of supporting PCI passthrough is
> to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
> 
> ## Discovering and register hostbridge
> 
> Both ACPI and Device Tree do not provide enough information to fully
> instantiate an host bridge driver. In the case of ACPI, some data may come
> from ASL, whilst for Device Tree the segment number is not available.
> 
> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> with all the relevant informations. This will be done via a new hypercall
> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> 
> struct physdev_pci_host_bridge_add
> {
>     /* IN */
>     uint16_t seg;
>     /* Range of bus supported by the host bridge */
>     uint8_t  bus_start;
>     uint8_t  bus_nr;
>     uint32_t res0;  /* Padding */
>     /* Information about the configuration space region */
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> }
> 
> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> bridge available on the platform. When Xen is receiving the hypercall, the
> the driver associated to the host bridge will be instantiated.
> 

I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are separate features.
Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont boot.
Currently for dt xen does that.

So can we have 2 design documents
a) PCI passthrough
b) ACPI dom0/domU support in Xen and Linux
- this may include:
b.1 Passing IORT to Dom0 without smmu
b.2 Hypercall to map PCI config space in dom0
b.3 <more>

What do you think?


> XXX: Shall we limit DOM0 the access to the configuration space from that
> moment?
> 
> ## Discovering and register PCI
> 
> Similarly to x86, PCI devices will be discovered by DOM0 and register
> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> 
> By default all the PCI devices will be assigned to DOM0. So Xen would have
> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> devices. As mentioned earlier, those subsystems will require the StreamID
> and DeviceID. Both can be deduced from the RID.
> 
> XXX: How to hide PCI devices from DOM0?
> 
> # Glossary
> 
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
> 
> # Bibliography
> 
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-04  0:24 ` Stefano Stabellini
@ 2017-01-24 14:28   ` Julien Grall
  2017-01-24 20:07     ` Stefano Stabellini
  2017-01-25  4:23     ` Manish Jaggi
  0 siblings, 2 replies; 82+ messages in thread
From: Julien Grall @ 2017-01-24 14:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Jiandi An, Punit Agrawal,
	alistair.francis, xen-devel, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Stefano,

On 04/01/17 00:24, Stefano Stabellini wrote:
> On Thu, 29 Dec 2016, Julien Grall wrote:

[...]

>> # Introduction
>>
>> PCI passthrough allows to give control of physical PCI devices to guest. This
>> means that the guest will have full and direct access to the PCI device.
>>
>> ARM is supporting one kind of guest that is exploiting as much as possible
>> virtualization support in hardware. The guest will rely on PV driver only
>> for IO (e.g block, network), interrupts will come through the virtualized
>> interrupt controller. This means that there are no big changes required
>> within the kernel.
>>
>> By consequence, it would be possible to replace PV drivers by assigning real
>   ^ As a consequence

I will fix all the typoes in the next version.

>
>
>> devices to the guest for I/O access. Xen on ARM would therefore be able to
>> run unmodified operating system.

[...]

>> Instantiation of a specific driver for the host controller can be easily done
>> if Xen has the information to detect it. However, those drivers may require
>> resources described in ASL (see [4] for instance).
>>
>> XXX: Need more investigation to know whether the missing information should
>> be passed by DOM0 or hardcoded in the driver.
>
> Given that we are talking about quirks here, it would be better to just
> hardcode them in the drivers, if possible.

Indeed hardcoded would be the preferred way to avoid introduce new 
hypercall for quirk.

For instance, in the case of Thunder-X (see commit 44f22bd "PCI: Add 
MCFG quirks for Cavium ThunderX pass2.x host controller) some region are 
read from ACPI. What I'd like to understand is whether this could be 
hardcoded or can it change between platform? If it can change, is there 
a way in ACPI to differentiate 2 platforms?

Maybe this is a question that Cavium can answer? (in CC).


[...]

>> ## Discovering and register hostbridge
>>
>> Both ACPI and Device Tree do not provide enough information to fully
>> instantiate an host bridge driver. In the case of ACPI, some data may come
>> from ASL,
>
> The data available from ASL is just to initialize quirks and non-ECAM
> controllers, right? Given that SBSA mandates ECAM, and we assume that
> ACPI is mostly (if not only) for servers, then I think it is safe to say
> that in the case of ACPI we should have all the info to fully
> instantiate an host bridge driver.

 From the spec, the MCFG will only describe host bridge available at 
boot (see 4.2 in "PCI firmware specification, rev 3.2"). All the other 
host bridges will be described in ASL.

So we need DOM0 to feed Xen about the latter host bridges.

>
>
>> whilst for Device Tree the segment number is not available.
>>
>> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
>> with all the relevant informations. This will be done via a new hypercall
>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>
> I understand that the main purpose of this hypercall is to get Xen and Dom0 to
> agree on the segment numbers, but why is it necessary? If Dom0 has an
> emulated contoller like any other guest, do we care what segment numbers
> Dom0 will use?

I was not planning to have a emulated controller for DOM0. The physical 
one is not necessarily ECAM compliant so we would have to either emulate 
the physical one (meaning multiple different emulation) or an ECAM 
compliant.

The latter is not possible because you don't know if there is enough 
free MMIO space for the emulation.

In the case on ARM, I don't see much the point to emulate the host 
bridge for DOM0. The only thing we need in Xen is to access the 
configuration space, we don't have about driving the host bridge. So I 
would let DOM0 dealing with that.

Also, I don't see any reason for ARM to trap DOM0 configuration space 
access. The MSI will be configured using the interrupt controller and it 
is a trusted Domain.

>
>
>> struct physdev_pci_host_bridge_add
>> {
>>     /* IN */
>>     uint16_t seg;
>>     /* Range of bus supported by the host bridge */
>>     uint8_t  bus_start;
>>     uint8_t  bus_nr;
>>     uint32_t res0;  /* Padding */
>>     /* Information about the configuration space region */
>>     uint64_t cfg_base;
>>     uint64_t cfg_size;
>> }
>>
>> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
>> bridge available on the platform. When Xen is receiving the hypercall, the
>> the driver associated to the host bridge will be instantiated.
>
> I think we should mention the relationship with the existing
> PHYSDEVOP_pci_mmcfg_reserved hypercall.

Sorry, I did not spot this hypercall until now. From a brief look, this 
hypercall would be redundant, but I will investigate a bit more.

>
>
>> XXX: Shall we limit DOM0 the access to the configuration space from that
>> moment?
>
> If we can, we should

Why would be the benefits? For now, I see a big drawback: resetting a 
PCI devices would need to be done in Xen rather than DOM0. As you may 
now there are a lot of quirks for reset.

So for me, it looks more sensible to handle this in DOM0 and let DOM0 a 
full access to the configuration space. Overall he is a trusted domain.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-06 15:12 ` Roger Pau Monné
  2017-01-06 21:16   ` Stefano Stabellini
@ 2017-01-24 17:17   ` Julien Grall
  2017-01-25 11:42     ` Roger Pau Monné
  1 sibling, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-01-24 17:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean

Hi Roger,

On 06/01/17 15:12, Roger Pau Monné wrote:
> On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
>> Hi all,
>>
>> The document below is an early version of a design
>> proposal for PCI Passthrough in Xen. It aims to
>> describe from an high level perspective the interaction
>> with the different subsystems and how guest will be able
>> to discover and access PCI.
>>
>> I am aware that a similar design has been posted recently
>> by Cavium (see [1]), however the approach to expose PCI
>> to guest is different. We have request to run unmodified
>> baremetal OS on Xen, a such guest would directly
>> access the devices and no PV drivers will be used.
>>
>> That's why this design is based on emulating a root controller.
>> This also has the advantage to have the VM interface as close
>> as baremetal allowing the guest to use firmware tables to discover
>> the devices.
>>
>> Currently on ARM, Xen does not have any knowledge about PCI devices.
>> This means that IOMMU and interrupt controller (such as ITS)
>> requiring specific configuration will not work with PCI even with
>> DOM0.
>>
>> The PCI Passthrough work could be divided in 2 phases:
>> 	* Phase 1: Register all PCI devices in Xen => will allow
>> 		   to use ITS and SMMU with PCI in Xen
>>         * Phase 2: Assign devices to guests
>>
>> This document aims to describe the 2 phases, but for now only phase
>> 1 is fully described.
>>
>> I have sent the design document to start to gather feedback on
>> phase 1.
>
> Thanks, this approach looks quite similar to what I have in mind for PVHv2
> DomU/Dom0 pci-passthrough.
>
>> Cheers,
>>
>> [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html
>>
>> ========================
>> % PCI pass-through support on ARM
>> % Julien Grall <julien.grall@linaro.org>
>> % Draft A
>>
>> # Preface
>>
>> This document aims to describe the components required to enable PCI
>> passthrough on ARM.
>>
>> This is an early draft and some questions are still unanswered, when this is
>> the case the text will contain XXX.
>>
>> # Introduction
>>
>> PCI passthrough allows to give control of physical PCI devices to guest. This
>> means that the guest will have full and direct access to the PCI device.
>>
>> ARM is supporting one kind of guest that is exploiting as much as possible
>> virtualization support in hardware. The guest will rely on PV driver only
>> for IO (e.g block, network), interrupts will come through the virtualized
>> interrupt controller. This means that there are no big changes required
>> within the kernel.
>>
>> By consequence, it would be possible to replace PV drivers by assigning real
>> devices to the guest for I/O access. Xen on ARM would therefore be able to
>> run unmodified operating system.
>>
>> To achieve this goal, it looks more sensible to go towards emulating the
>> host bridge (we will go into more details later). A guest would be able
>> to take advantage of the firmware tables and obviating the need for a specific
>> driver for Xen.
>>
>> Thus in this document we follow the emulated host bridge approach.
>>
>> # PCI terminologies
>>
>> Each PCI device under a host bridge is uniquely identified by its Requester ID
>> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
>> Function.
>>
>> When the platform has multiple host bridges, the software can add fourth
>> number called Segment to differentiate host bridges. A PCI device will
>> then uniquely by segment:bus:device:function (AKA SBDF).
>
> From my reading of the above sentence, this implies that the segment is an
> arbitrary number chosen by the OS? Isn't this picked from the MCFG ACPI table?

The number is chosen by the software. In the case of ACPI, it is 
"hardcoded" in the MCFG table, but for Device Tree this number could be 
chosen by the OS unless the property "linux,pci-domain" is present.

>
>> So given a specific SBDF, it would be possible to find the host bridge and the
>> RID associated to a PCI device.
>>
>> # Interaction of the PCI subsystem with other subsystems
>>
>> In order to have a PCI device fully working, Xen will need to configure
>> other subsystems subsytems such as the SMMU and the Interrupt Controller.
>                    ^ duplicated.
>>
>> The interaction expected between the PCI subsystem and the other is:
>                                                          ^ this seems quite
>                                                          confusing, what's "the
>                                                          other"?

By "other" I meant "IOMMU and Interrupt Controller". Would the wording 
"and the other subsystems" be better?

>>     * Add a device
>>     * Remove a device
>>     * Assign a device to a guest
>>     * Deassign a device from a guest
>>
>> XXX: Detail the interaction when assigning/deassigning device
>
> Assigning a device will probably entangle setting up some direct MMIO mappings
> (BARs and ROMs) plus a bunch of traps in order to perform emulation of accesses
> to the PCI config space (or those can be setup when a new bridge is registered
> with Xen).

I am planning to details the root complex emulation in a separate 
section. I sent the design document before writing it.

In brief, I would expect the registration of a new bridge to setup the 
trap to emulation access to the PCI configuration space. On ARM, the 
first approach will rely on the OS to setup the BARs and ROMs. So they 
will be mapped by the PCI configuration space emulation.

The reason on relying on the OS to setup the BARs/ROMs reducing the work 
to do for a first version. Otherwise we would have to add code in the 
toolstack to decide where to place the BARs/ROMs. I don't think it is a 
lot of work, but it is not that important because it does not require a 
stable ABI (this is an interaction between the hypervisor and the 
toolstack). Furthermore, Linux (at least on ARM) is assigning the BARs 
at the setup. From my understanding, this is the expected behavior with 
both DT (the DT has a property to skip the scan) and ACPI.

>
>> The following subsections will briefly describe the interaction from an
>> higher level perspective. Implementation details (callback, structure...)
>> is out of scope.
>>
>> ## SMMU
>>
>> The SMMU will be used to isolate the PCI device when accessing the memory
>> (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
>> a StreamID (SID) that can be deduced from the RID with the help of the firmware
>> tables (see below).
>>
>> Whilst in theory all the memory transaction issued by a PCI device should
>> go through the SMMU, on certain platforms some of the memory transaction may
>> not reach the SMMU because they are interpreted by the host bridge. For
>> instance this could happen if the MSI doorbell is built into the PCI host
>
> I would elaborate on what is a MSI doorbell.

I can add an explanation in the glossary.

>
>> bridge. See [6] for more details.
>>
>> XXX: I think this could be solved by using the host memory layout when
>> creating a guest with PCI devices => Detail it.
>
> I'm not really sure I follow here, but if this write to the MSI doorbell
> doesn't go through the SMMU, and instead is handled by the bridge, isn't there
> a chance that a gust might be able to write anywhere in physical memory?

The problem is more subtle. On some platform the MSI doorbell is 
built-in the host bridge. Some of those host bridges will intercept any 
access to this doorbell coming from the PCI devices and interpret it 
directly rather than going through the SMMU.

This mean that the physical address of the MSI doorbell is always be 
interpreted. Even if the guest is using a intermediate address, this 
will be considered as a physical address because the SMMU has been 
by-passed.

Furthermore, some platform may have other set of address not going 
through the SMMU (such as P2P traffic). So we have to prevent mapping 
anything on those regions.

>
> Or this only happens when a guest writes to a MSI doorbell that's trapped by
> the bridge and not forwarded anywhere else?

See above.

>
>> ## Interrupt controller
>>
>> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
>> legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
>> either mapped to SPIs or LPIs.
>>
>> Whilst SPIs can be programmed using an interrupt number, LPIs can be
>> identified via a pair (DeviceID, EventID) when configure through the ITS.
>                                                           ^d
>
>>
>> The DeviceID is a unique identifier for each MSI-capable device that can
>> be deduced from the RID with the help of the firmware tables (see below).
>>
>> XXX: Figure out if something is necessary for GICv2m
>>
>> # Information available in the firmware tables
>>
>> ## ACPI
>>
>> ### Host bridges
>>
>> The static table MCFG (see 4.2 in [1]) will describe the host bridges available
>> at boot and supporting ECAM. Unfortunately there are platforms out there
>> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
>                                                     ^s
>
>> compatible.
>>
>> This means that Xen needs to account for possible quirks in the host bridge.
>> The Linux community are working on a patch series for see (see [2] and [3])
>> where quirks will be detected with:
>>     * OEM ID
>>     * OEM Table ID
>>     * OEM Revision
>>     * PCI Segment (from _SEG)
>>     * PCI bus number range (from _CRS, wildcard allowed)
>
> So segment and bus number range needs to be fetched from ACPI objects? Is that
> because the information in the MCFG is lacking/wrong?

All the host bridges will be described in ASL. Only the one available at 
boot will be described in the MCFG. So it looks more sensible to rely on 
the ASL from Linux POV.

>
>>
>> Based on what Linux is currently doing, there are two kind of quirks:
>>     * Accesses to the configuration space of certain sizes are not allowed
>>     * A specific driver is necessary for driving the host bridge
>
> Hm, so what are the issues that make this bridges need specific drivers?
>
> This might be quite problematic if you also have to emulate this broken
> behavior inside of Xen (because Dom0 is using a specific driver).

I am not expecting to emulate the configuration space access for DOM0. I 
know you mentioned that it would be necessary to hide PCI used by Xen 
(such as the UART) to DOM0 or configuring MSI. But for ARM, the UART is 
integrated in the SOC and MSI will be configured through the interrupt 
controller.

>
>> The former is straight forward to solve, the latter will require more thought.
>> Instantiation of a specific driver for the host controller can be easily done
>> if Xen has the information to detect it. However, those drivers may require
>> resources described in ASL (see [4] for instance).
>>
>> XXX: Need more investigation to know whether the missing information should
>> be passed by DOM0 or hardcoded in the driver.
>
> ... or poke the ThunderX guys with a pointy stick until they get their act
> together.

I would love to do that, but platform is already out. So I am afraid 
that we have to deal with that.

Although I am hoping *fingers crossed* that future platform will be 
fully ECAM compliant.

>
>> ### Finding the StreamID and DeviceID
>>
>> The static table IORT (see [5]) will provide information that will help to
>> deduce the StreamID and DeviceID from a given RID.
>>
>> ## Device Tree
>>
>> ### Host bridges
>>
>> Each Device Tree node associated to a host bridge will have at least the
>> following properties (see bindings in [8]):
>>     - device_type: will always be "pci".
>>     - compatible: a string indicating which driver to instantiate
>>
>> The node may also contain optional properties such as:
>>     - linux,pci-domain: assign a fix segment number
>>     - bus-range: indicate the range of bus numbers supported
>>
>> When the property linux,pci-domain is not present, the operating system would
>> have to allocate the segment number for each host bridges. Because the
>> algorithm to allocate the segment is not specified, it is necessary for
>> DOM0 and Xen to agree on the number before any PCI is been added.
>
> Since this is all static, can't Xen just assign segment and bus-ranges for
> bridges that lack them? (also why it's "linux,pci-domain", instead of just
> "pci-domain"?)

I am not the one who decided the name of those properties. This is from 
the existing binding in Linux (I though it was obvious with the link [8] 
to the binding).

Usually any property that are added by the Linux community (i.e not part 
of the Open Firmware standards) will be prefixed by "linux,". So I would 
rather avoid to

The lack of bus-ranges is not an issue because it has been formalized in 
the binding: "If absent, defaults to <0 255> (i.e all buses)".

>
>> ### Finding the StreamID and DeviceID
>>
>> ### StreamID
>>
>> The first binding existing (see [9]) for SMMU didn't have a way to describe the
>> relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
>> This bindins has now been deprecated in favor of a generic binding (see [10])
>> which will use the property "iommu-map" to describe the relationship between
>> an RID, the associated IOMMU and the StreamID.
>>
>> ### DeviceID
>>
>> The relationship between the RID and the DeviceID can be found using the
>> property "msi-map" (see [11]).
>>
>> # Discovering PCI devices
>>
>> Whilst PCI devices are currently available in DOM0, the hypervisor does not
>> have any knowledge of them. The first step of supporting PCI passthrough is
>> to make Xen aware of the PCI devices.
>>
>> Xen will require access to the PCI configuration space to retrieve information
>> for the PCI devices or access it on behalf of the guest via the emulated
>
> I know this is not the intention, but the above sentence makes it look like
> Xen is using an emulated host bridge IMHO (although I'm not a native speaker
> anyway, so I can be wrong).

How about "Xen will require access to the host PCI configuration space..."?

>
>> host bridge.
>>
>> ## Discovering and register hostbridge
>>
>> Both ACPI and Device Tree do not provide enough information to fully
>> instantiate an host bridge driver. In the case of ACPI, some data may come
>> from ASL, whilst for Device Tree the segment number is not available.
>
> For device-tree can't you just add a pci-domain to each bridge device on the DT
> if none is specified?

The "linux,pci-domain" is a Linux specific property. We've been avoided 
to re-use linux specific property recently (see the case of xen,uefi-*). 
So we would have to introduce a new one.

> For ACPI I understand that it's harder. Maybe ARM can somehow assure that MCFG
> tables completely describe the system, so that you don't need this anymore.

This is not ARM but the spec. The PCI spec specifies that MCFG will only 
describe host bridges available at boot. The rest will be in ASL.

>
>> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
>> with all the relevant informations. This will be done via a new hypercall
>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>>
>> struct physdev_pci_host_bridge_add
>> {
>>     /* IN */
>>     uint16_t seg;
>>     /* Range of bus supported by the host bridge */
>>     uint8_t  bus_start;
>>     uint8_t  bus_nr;
>>     uint32_t res0;  /* Padding */
>>     /* Information about the configuration space region */
>>     uint64_t cfg_base;
>>     uint64_t cfg_size;
>> }
>
> Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes
> in size?

The cfg_size is here to help us to match the corresponding node in the 
device tree. The cfg_size may differ depending on how the hardware has 
implemented the access to the configuration space.

But to be fair, I think we can deal without this property. For ACPI, the 
size will vary following the number of bus handled and can be deduced. 
For DT, the base address and bus range should be enough to find the 
associated node.

>
> If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
> hypercalls.
>
>> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
>> bridge available on the platform. When Xen is receiving the hypercall, the
>> the driver associated to the host bridge will be instantiated.
>>
>> XXX: Shall we limit DOM0 the access to the configuration space from that
>> moment?
>
> Most definitely yes, you should instantiate an emulated bridge over the real
> one, in order to proxy Dom0 accesses to the PCI configuration space. You for
> example don't want Dom0 moving the position of the BARs of PCI devices without
> Xen being aware (and properly changing the second stage translation).

The problem is on ARM we don't have a single way to access the 
configuration space. So we would need different emulator in Xen, which I 
don't like unless there is a strong reason to do it.

We could avoid DOM0s to modify the position of the BARs after setup. I 
also remembered you mention about MSI configuration, for ARM this is 
done via the interrupt controller.

>
>> ## Discovering and register PCI
>>
>> Similarly to x86, PCI devices will be discovered by DOM0 and register
>> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
>
> Why do you need this? If you have access to the bridges you can scan them from
> Xen and discover the devices AFAICT.

I am a bit confused. Are you saying that you plan to ditch them for PVH? 
If so, why are they called by Linux today?

>
>> By default all the PCI devices will be assigned to DOM0. So Xen would have
>> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
>> devices. As mentioned earlier, those subsystems will require the StreamID
>> and DeviceID. Both can be deduced from the RID.
>>
>> XXX: How to hide PCI devices from DOM0?
>
> By adding the ACPI namespace of the device to the STAO and blocking Dom0
> access to this device in the emulated bridge that Dom0 will have access to
> (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).

Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a 
driver (similarly to xen-pciback.hide). We still want DOM0 to access the 
PCI config space in order to reset the device. Unless you plan to import 
all the reset quirks in Xen?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-19  5:09 ` Manish Jaggi
@ 2017-01-24 17:43   ` Julien Grall
  2017-01-25  4:37     ` Manish Jaggi
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-01-24 17:43 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Kapoor, Prasun, Jiandi An,
	Punit Agrawal, alistair.francis, jnair, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper



On 19/01/17 05:09, Manish Jaggi wrote:
> Hi Julien,

Hello Manish,

Please trim the quoted e-mail, it is a bit annoying to try to find where 
you answer.

> On 12/29/2016 07:34 PM, Julien Grall wrote:
>> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
>> bridge available on the platform. When Xen is receiving the hypercall, the
>> the driver associated to the host bridge will be instantiated.
>>
>
> I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are separate features.
> Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont boot.
> Currently for dt xen does that.
>
> So can we have 2 design documents
> a) PCI passthrough
> b) ACPI dom0/domU support in Xen and Linux
> - this may include:
> b.1 Passing IORT to Dom0 without smmu
> b.2 Hypercall to map PCI config space in dom0
> b.3 <more>
>
> What do you think?

I don't think ACPI should be treated in a separate design document. The 
support of ACPI may affect some of the decisions (such as hypercall) and 
we have to know them now.

Regarding the ECAM region not mapped. This is not related to PCI 
passthrough but how MMIO are mapped with ACPI. This is a separate 
subject already in discussion (see [1]).

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01607.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-24 14:28   ` Julien Grall
@ 2017-01-24 20:07     ` Stefano Stabellini
  2017-01-25 11:21       ` Roger Pau Monné
  2017-01-25 18:53       ` Julien Grall
  2017-01-25  4:23     ` Manish Jaggi
  1 sibling, 2 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-24 20:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean, Roger Pau Monné

On Tue, 24 Jan 2017, Julien Grall wrote:
> > > ## Discovering and register hostbridge
> > > 
> > > Both ACPI and Device Tree do not provide enough information to fully
> > > instantiate an host bridge driver. In the case of ACPI, some data may come
> > > from ASL,
> > 
> > The data available from ASL is just to initialize quirks and non-ECAM
> > controllers, right? Given that SBSA mandates ECAM, and we assume that
> > ACPI is mostly (if not only) for servers, then I think it is safe to say
> > that in the case of ACPI we should have all the info to fully
> > instantiate an host bridge driver.
> 
> From the spec, the MCFG will only describe host bridge available at boot (see
> 4.2 in "PCI firmware specification, rev 3.2"). All the other host bridges will
> be described in ASL.
> 
> So we need DOM0 to feed Xen about the latter host bridges.

Unfortunately PCI specs are only accessible by PCI SIG members
organizations. In other words, I cannot read the doc.

Could you please explain what kind of host bridges are not expected to
be available at boot? Do you know of any examples?


> > > whilst for Device Tree the segment number is not available.
> > > 
> > > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > > with all the relevant informations. This will be done via a new hypercall
> > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > 
> > I understand that the main purpose of this hypercall is to get Xen and Dom0
> > to
> > agree on the segment numbers, but why is it necessary? If Dom0 has an
> > emulated contoller like any other guest, do we care what segment numbers
> > Dom0 will use?
> 
> I was not planning to have a emulated controller for DOM0. The physical one is
> not necessarily ECAM compliant so we would have to either emulate the physical
> one (meaning multiple different emulation) or an ECAM compliant.
> 
> The latter is not possible because you don't know if there is enough free MMIO
> space for the emulation.
> 
> In the case on ARM, I don't see much the point to emulate the host bridge for
> DOM0. The only thing we need in Xen is to access the configuration space, we
> don't have about driving the host bridge. So I would let DOM0 dealing with
> that.
> 
> Also, I don't see any reason for ARM to trap DOM0 configuration space access.
> The MSI will be configured using the interrupt controller and it is a trusted
> Domain.

These last you sentences raise a lot of questions. Maybe I am missing
something. You might want to clarify the strategy for Dom0 and DomUs,
and how they differ, in the next version of the doc.

At some point you wrote "Instantiation of a specific driver for the host
controller can be easily done if Xen has the information to detect it.
However, those drivers may require resources described in ASL." Does it
mean you plan to drive the physical host bridge from Xen and Dom0
simultaneously?

Otherwise, if Dom0 is the only one to drive the physical host bridge,
and Xen is the one to provide the emulated host bridge, how are DomU PCI
config reads and writes supposed to work in details?  How is MSI
configuration supposed to work?


> > > XXX: Shall we limit DOM0 the access to the configuration space from that
> > > moment?
> > 
> > If we can, we should
> 
> Why would be the benefits? For now, I see a big drawback: resetting a PCI
> devices would need to be done in Xen rather than DOM0. As you may now there
> are a lot of quirks for reset.
> 
> So for me, it looks more sensible to handle this in DOM0 and let DOM0 a full
> access to the configuration space. Overall he is a trusted domain.

PCI reset is worth of its own chapter in the doc :-)

Dom0 is a trusted domain, but when possible, I consider an improvement
to limit the amount of trust we put in it. Also, as I wrote above, I
don't understand what is the plan to deal with concurrent accesses to
the host bridge from Dom0 and Xen.

In any case, regarding PCI reset, we should dig out past discussions on
the merits of doing reset in the hypervisor vs. dom0. I agree
introducing PCI reset quirks in Xen is not nice but I recall that
XenClient did it to avoid possible misbehaviors of the device. We need
to be careful about ordering PCI reset against domain destruction. I
couldn't find any email discussions to reference, maybe it is worth
contacting the OpenXT guys about it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-24 14:28   ` Julien Grall
  2017-01-24 20:07     ` Stefano Stabellini
@ 2017-01-25  4:23     ` Manish Jaggi
  1 sibling, 0 replies; 82+ messages in thread
From: Manish Jaggi @ 2017-01-25  4:23 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Jiandi An, Punit Agrawal,
	alistair.francis, xen-devel, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Julien/Stefano,

On 01/24/2017 07:58 PM, Julien Grall wrote:
> Hi Stefano,
> 
> On 04/01/17 00:24, Stefano Stabellini wrote:
>> On Thu, 29 Dec 2016, Julien Grall wrote:
> 
> [...]
> 
>>> # Introduction
>>>
>>> PCI passthrough allows to give control of physical PCI devices to guest. This
>>> means that the guest will have full and direct access to the PCI device.
>>>
>>> ARM is supporting one kind of guest that is exploiting as much as possible
>>> virtualization support in hardware. The guest will rely on PV driver only
>>> for IO (e.g block, network), interrupts will come through the virtualized
>>> interrupt controller. This means that there are no big changes required
>>> within the kernel.
>>>
>>> By consequence, it would be possible to replace PV drivers by assigning real
>>   ^ As a consequence
> 
> I will fix all the typoes in the next version.
> 
>>
>>
>>> devices to the guest for I/O access. Xen on ARM would therefore be able to
>>> run unmodified operating system.
> 
> [...]
> 
>>> Instantiation of a specific driver for the host controller can be easily done
>>> if Xen has the information to detect it. However, those drivers may require
>>> resources described in ASL (see [4] for instance).
q. would these drivers (like ecam/pem) be added in xen ?
If yes how would xen have the information to detect host controller compatible.
Should it be passed in the hypercall physdev_pci_host_bridge_add below.
>>>
>>> XXX: Need more investigation to know whether the missing information should
>>> be passed by DOM0 or hardcoded in the driver.
>>
>> Given that we are talking about quirks here, it would be better to just
>> hardcode them in the drivers, if possible.
> 
> Indeed hardcoded would be the preferred way to avoid introduce new hypercall for quirk.
> 
> For instance, in the case of Thunder-X (see commit 44f22bd "PCI: Add MCFG quirks for Cavium ThunderX pass2.x host controller) some region are read from ACPI. What I'd like to understand is whether
> this could be hardcoded or can it change between platform? If it can change, is there a way in ACPI to differentiate 2 platforms?
> 
> Maybe this is a question that Cavium can answer? (in CC).
> 
I think it is ok to hardcode.
You might need to see 648d93f "PCI: Add MCFG quirks for Cavium ThunderX pass1.x host controller" as well.

> 
> [...]
> 
>>> ## Discovering and register hostbridge
>>>
>>> Both ACPI and Device Tree do not provide enough information to fully
>>> instantiate an host bridge driver. In the case of ACPI, some data may come
>>> from ASL,
>>
>> The data available from ASL is just to initialize quirks and non-ECAM
>> controllers, right? Given that SBSA mandates ECAM, and we assume that
>> ACPI is mostly (if not only) for servers, then I think it is safe to say
>> that in the case of ACPI we should have all the info to fully
>> instantiate an host bridge driver.
> 
> From the spec, the MCFG will only describe host bridge available at boot (see 4.2 in "PCI firmware specification, rev 3.2"). All the other host bridges will be described in ASL.
> 
> So we need DOM0 to feed Xen about the latter host bridges.
> 
>>
>>
>>> whilst for Device Tree the segment number is not available.
>>>
>>> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
>>> with all the relevant informations. This will be done via a new hypercall
>>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>>
>> I understand that the main purpose of this hypercall is to get Xen and Dom0 to
>> agree on the segment numbers, but why is it necessary? If Dom0 has an
>> emulated contoller like any other guest, do we care what segment numbers
>> Dom0 will use?
> 
> I was not planning to have a emulated controller for DOM0. The physical one is not necessarily ECAM compliant so we would have to either emulate the physical one (meaning multiple different emulation)
> or an ECAM compliant.
> 
> The latter is not possible because you don't know if there is enough free MMIO space for the emulation.
> 
> In the case on ARM, I don't see much the point to emulate the host bridge for DOM0. The only thing we need in Xen is to access the configuration space, we don't have about driving the host bridge. So
> I would let DOM0 dealing with that.
> 
> Also, I don't see any reason for ARM to trap DOM0 configuration space access. The MSI will be configured using the interrupt controller and it is a trusted Domain.
> 
>>
>>
>>> struct physdev_pci_host_bridge_add
>>> {
>>>     /* IN */
>>>     uint16_t seg;
>>>     /* Range of bus supported by the host bridge */
>>>     uint8_t  bus_start;
>>>     uint8_t  bus_nr;
>>>     uint32_t res0;  /* Padding */
>>>     /* Information about the configuration space region */
>>>     uint64_t cfg_base;
>>>     uint64_t cfg_size;
>>> }
>>>
>>> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
>>> bridge available on the platform. When Xen is receiving the hypercall, the
>>> the driver associated to the host bridge will be instantiated.
>>
>> I think we should mention the relationship with the existing
>> PHYSDEVOP_pci_mmcfg_reserved hypercall.
[...]

-Manish


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-24 17:43   ` Julien Grall
@ 2017-01-25  4:37     ` Manish Jaggi
  2017-01-25 15:25       ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Manish Jaggi @ 2017-01-25  4:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Kapoor, Prasun, Jiandi An,
	Punit Agrawal, alistair.francis, jnair, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper



On 01/24/2017 11:13 PM, Julien Grall wrote:
> 
> 
> On 19/01/17 05:09, Manish Jaggi wrote:
>> Hi Julien,
> 
> Hello Manish,
[snip]

>> I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are separate features.
>> Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont boot.
>> Currently for dt xen does that.
>>
>> So can we have 2 design documents
>> a) PCI passthrough
>> b) ACPI dom0/domU support in Xen and Linux
>> - this may include:
>> b.1 Passing IORT to Dom0 without smmu
>> b.2 Hypercall to map PCI config space in dom0
>> b.3 <more>
>>
>> What do you think?
> 
> I don't think ACPI should be treated in a separate design document.
As PCI passthrough support will take time to mature, why should we hold the ACPI design ?
If I can boot dom0/domU with ACPI as it works with dt today, it would be a good milestone.
Later when PCI passthrough design gets mature and implemented the support can be extended.
> The support of ACPI may affect some of the decisions (such as hypercall) and we have to know them now.
> 
Still it can be an independent with only dependent features implemented or placeholders can be addded
> Regarding the ECAM region not mapped. This is not related to PCI passthrough but how MMIO are mapped with ACPI. This is a separate subject already in discussion (see [1]).
> 
What about IORT generation for Dom0 without smmu ?
I believe, It is not dependent on [1]
> Cheers,
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01607.html
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-24 20:07     ` Stefano Stabellini
@ 2017-01-25 11:21       ` Roger Pau Monné
  2017-01-25 18:53       ` Julien Grall
  1 sibling, 0 replies; 82+ messages in thread
From: Roger Pau Monné @ 2017-01-25 11:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Punit Agrawal, Wei Chen, Steve Capper, Jiandi An, Julien Grall,
	alistair.francis, Shanker Donthineni, xen-devel, manish.jaggi,
	Campbell Sean

On Tue, Jan 24, 2017 at 12:07:16PM -0800, Stefano Stabellini wrote:
> On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > ## Discovering and register hostbridge
> > > > 
> > > > Both ACPI and Device Tree do not provide enough information to fully
> > > > instantiate an host bridge driver. In the case of ACPI, some data may come
> > > > from ASL,
> > > 
> > > The data available from ASL is just to initialize quirks and non-ECAM
> > > controllers, right? Given that SBSA mandates ECAM, and we assume that
> > > ACPI is mostly (if not only) for servers, then I think it is safe to say
> > > that in the case of ACPI we should have all the info to fully
> > > instantiate an host bridge driver.
> > 
> > From the spec, the MCFG will only describe host bridge available at boot (see
> > 4.2 in "PCI firmware specification, rev 3.2"). All the other host bridges will
> > be described in ASL.
> > 
> > So we need DOM0 to feed Xen about the latter host bridges.
> 
> Unfortunately PCI specs are only accessible by PCI SIG members
> organizations. In other words, I cannot read the doc.

I know, I had to register at PCI-SIG in order to access the specs, which makes
no sense to me.

> Could you please explain what kind of host bridges are not expected to
> be available at boot? Do you know of any examples?

It's possible for host bridges to be hot-plugged during runtime according to
the spec. Those bridges will not appear in the MCFG, but will have objects in
the ACPI namespace with _CBA methods that will return this information.

I have never seen anything like this on real systems, and to put an example
FreeBSD AFAICT will only detect host bridges present in the MCFG (and I've
never seen anyone complaining about it).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-24 17:17   ` Julien Grall
@ 2017-01-25 11:42     ` Roger Pau Monné
  2017-01-31 15:59       ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-01-25 11:42 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean

On Tue, Jan 24, 2017 at 05:17:06PM +0000, Julien Grall wrote:
> Hi Roger,
> 
> On 06/01/17 15:12, Roger Pau Monné wrote:
> > On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
> > > So given a specific SBDF, it would be possible to find the host bridge and the
> > > RID associated to a PCI device.
> > > 
> > > # Interaction of the PCI subsystem with other subsystems
> > > 
> > > In order to have a PCI device fully working, Xen will need to configure
> > > other subsystems subsytems such as the SMMU and the Interrupt Controller.
> >                    ^ duplicated.
> > > 
> > > The interaction expected between the PCI subsystem and the other is:
> >                                                          ^ this seems quite
> >                                                          confusing, what's "the
> >                                                          other"?
> 
> By "other" I meant "IOMMU and Interrupt Controller". Would the wording "and
> the other subsystems" be better?

Yes, I think so.

> > >     * Add a device
> > >     * Remove a device
> > >     * Assign a device to a guest
> > >     * Deassign a device from a guest
> > > 
> > > XXX: Detail the interaction when assigning/deassigning device
> > 
> > Assigning a device will probably entangle setting up some direct MMIO mappings
> > (BARs and ROMs) plus a bunch of traps in order to perform emulation of accesses
> > to the PCI config space (or those can be setup when a new bridge is registered
> > with Xen).
> 
> I am planning to details the root complex emulation in a separate section. I
> sent the design document before writing it.
> 
> In brief, I would expect the registration of a new bridge to setup the trap
> to emulation access to the PCI configuration space. On ARM, the first
> approach will rely on the OS to setup the BARs and ROMs. So they will be
> mapped by the PCI configuration space emulation.
> 
> The reason on relying on the OS to setup the BARs/ROMs reducing the work to
> do for a first version. Otherwise we would have to add code in the toolstack
> to decide where to place the BARs/ROMs. I don't think it is a lot of work,
> but it is not that important because it does not require a stable ABI (this
> is an interaction between the hypervisor and the toolstack). Furthermore,
> Linux (at least on ARM) is assigning the BARs at the setup. From my
> understanding, this is the expected behavior with both DT (the DT has a
> property to skip the scan) and ACPI.

This approach might work for Dom0, but for DomU you certainly need to know
where the MMIO regions of a device are, and either the toolstack or Xen needs
to setup this in advance (or at least mark which MMIO regions are available to
the DomU). Allowing a DomU to map random MMIO regions is certainly a security
issue.

> > 
> > > ## Interrupt controller
> > > 
> > > PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> > > legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> > > either mapped to SPIs or LPIs.
> > > 
> > > Whilst SPIs can be programmed using an interrupt number, LPIs can be
> > > identified via a pair (DeviceID, EventID) when configure through the ITS.
> >                                                           ^d
> > 
> > > 
> > > The DeviceID is a unique identifier for each MSI-capable device that can
> > > be deduced from the RID with the help of the firmware tables (see below).
> > > 
> > > XXX: Figure out if something is necessary for GICv2m
> > > 
> > > # Information available in the firmware tables
> > > 
> > > ## ACPI
> > > 
> > > ### Host bridges
> > > 
> > > The static table MCFG (see 4.2 in [1]) will describe the host bridges available
> > > at boot and supporting ECAM. Unfortunately there are platforms out there
> > > (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> >                                                     ^s
> > 
> > > compatible.
> > > 
> > > This means that Xen needs to account for possible quirks in the host bridge.
> > > The Linux community are working on a patch series for see (see [2] and [3])
> > > where quirks will be detected with:
> > >     * OEM ID
> > >     * OEM Table ID
> > >     * OEM Revision
> > >     * PCI Segment (from _SEG)
> > >     * PCI bus number range (from _CRS, wildcard allowed)
> > 
> > So segment and bus number range needs to be fetched from ACPI objects? Is that
> > because the information in the MCFG is lacking/wrong?
> 
> All the host bridges will be described in ASL. Only the one available at
> boot will be described in the MCFG. So it looks more sensible to rely on the
> ASL from Linux POV.

Yes, that's right. We need to rely on PHYSDEVOP_pci_mmcfg_reserved or similar
so that Dom0 can tell Xen about hotplug host bridges found in the ACPI
namespace.

> > 
> > > 
> > > Based on what Linux is currently doing, there are two kind of quirks:
> > >     * Accesses to the configuration space of certain sizes are not allowed
> > >     * A specific driver is necessary for driving the host bridge
> > 
> > Hm, so what are the issues that make this bridges need specific drivers?
> > 
> > This might be quite problematic if you also have to emulate this broken
> > behavior inside of Xen (because Dom0 is using a specific driver).
> 
> I am not expecting to emulate the configuration space access for DOM0. I
> know you mentioned that it would be necessary to hide PCI used by Xen (such
> as the UART) to DOM0 or configuring MSI. But for ARM, the UART is integrated
> in the SOC and MSI will be configured through the interrupt controller.

Right, we certainly need to do it for x86, but I don't know that much of the
ARM architecture in order to know if that's needed or not. I'm also wondering
if having both Xen and the Dom0 directly accessing the ECAM area is fine, even
if they use different cache mapping attributes?

> > > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > > with all the relevant informations. This will be done via a new hypercall
> > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > > 
> > > struct physdev_pci_host_bridge_add
> > > {
> > >     /* IN */
> > >     uint16_t seg;
> > >     /* Range of bus supported by the host bridge */
> > >     uint8_t  bus_start;
> > >     uint8_t  bus_nr;
> > >     uint32_t res0;  /* Padding */
> > >     /* Information about the configuration space region */
> > >     uint64_t cfg_base;
> > >     uint64_t cfg_size;
> > > }
> > 
> > Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes
> > in size?
> 
> The cfg_size is here to help us to match the corresponding node in the
> device tree. The cfg_size may differ depending on how the hardware has
> implemented the access to the configuration space.

But certainly cfg_base needs to be aligned to a PAGE_SIZE? And according to the
spec cfg_size cannot be bigger than 4KB (PAGE_SIZE), so in any case you will
end up mapping a whole 4KB page, because that's the minimum granularity of the
p2m?

> But to be fair, I think we can deal without this property. For ACPI, the
> size will vary following the number of bus handled and can be deduced. For
> DT, the base address and bus range should be enough to find the associated
> node.
> 
> > 
> > If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
> > hypercalls.
> > 
> > > DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> > > bridge available on the platform. When Xen is receiving the hypercall, the
> > > the driver associated to the host bridge will be instantiated.
> > > 
> > > XXX: Shall we limit DOM0 the access to the configuration space from that
> > > moment?
> > 
> > Most definitely yes, you should instantiate an emulated bridge over the real
> > one, in order to proxy Dom0 accesses to the PCI configuration space. You for
> > example don't want Dom0 moving the position of the BARs of PCI devices without
> > Xen being aware (and properly changing the second stage translation).
> 
> The problem is on ARM we don't have a single way to access the configuration
> space. So we would need different emulator in Xen, which I don't like unless
> there is a strong reason to do it.
> 
> We could avoid DOM0s to modify the position of the BARs after setup. I also
> remembered you mention about MSI configuration, for ARM this is done via the
> interrupt controller.
> 
> > 
> > > ## Discovering and register PCI
> > > 
> > > Similarly to x86, PCI devices will be discovered by DOM0 and register
> > > using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> > 
> > Why do you need this? If you have access to the bridges you can scan them from
> > Xen and discover the devices AFAICT.
> 
> I am a bit confused. Are you saying that you plan to ditch them for PVH? If
> so, why are they called by Linux today?

I think we can get away with PHYSDEVOP_pci_mmcfg_reserved only, but maybe I'm
missing something. AFAICT Xen should be able to gather all the other data by
itself from the PCI config space once it knows the details about the host
bridge.

> > 
> > > By default all the PCI devices will be assigned to DOM0. So Xen would have
> > > to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> > > devices. As mentioned earlier, those subsystems will require the StreamID
> > > and DeviceID. Both can be deduced from the RID.
> > > 
> > > XXX: How to hide PCI devices from DOM0?
> > 
> > By adding the ACPI namespace of the device to the STAO and blocking Dom0
> > access to this device in the emulated bridge that Dom0 will have access to
> > (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).
> 
> Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
> driver (similarly to xen-pciback.hide). We still want DOM0 to access the PCI
> config space in order to reset the device. Unless you plan to import all the
> reset quirks in Xen?

I don't have a clear opinion here, and I don't know all thew details of this
reset hacks.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-25  4:37     ` Manish Jaggi
@ 2017-01-25 15:25       ` Julien Grall
  2017-01-30  7:41         ` Manish Jaggi
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-01-25 15:25 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Kapoor, Prasun, Jiandi An,
	Punit Agrawal, alistair.francis, jnair, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hello Manish,

On 25/01/17 04:37, Manish Jaggi wrote:
> On 01/24/2017 11:13 PM, Julien Grall wrote:
>>
>>
>> On 19/01/17 05:09, Manish Jaggi wrote:
>>> I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are separate features.
>>> Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont boot.
>>> Currently for dt xen does that.
>>>
>>> So can we have 2 design documents
>>> a) PCI passthrough
>>> b) ACPI dom0/domU support in Xen and Linux
>>> - this may include:
>>> b.1 Passing IORT to Dom0 without smmu
>>> b.2 Hypercall to map PCI config space in dom0
>>> b.3 <more>
>>>
>>> What do you think?
>>
>> I don't think ACPI should be treated in a separate design document.
> As PCI passthrough support will take time to mature, why should we hold the ACPI design ?
> If I can boot dom0/domU with ACPI as it works with dt today, it would be a good milestone.

The way PCI is working on DT today is a hack. There is no SMMU support 
and the first version of GICv3 ITS support will contain hardcoded 
DeviceID (or very similar).

The current hack will introduce problem on platform where a specific 
host controller is necessary to access the configuration space. Indeed, 
at the beginning Xen may not have a driver available (this will depend 
on the contribution), but we still need to be able to use PCI with Xen.

We chose this way on DT because we didn't know when the PCI passthrough 
will be added in Xen.

As mentioned in the introduction of the design document, I envision PCI 
passthrough implementation in 2 phases:
	- Phase 1: Register all PCI devices in Xen => will allow to use ITS and 
SMMU with PCI in Xen
	- Phase 2: Assign devices to guests

This design document will cover both phases because they are tight 
together. But the implementation can be decoupled, it would be possible 
(and also my plan) to see the 2 phases upstreamed in different Xen release.

Phase 1, will cover anything necessary for Xen to discover and register 
PCI devices. This include the ACPI support (IORT,...).

I see little point to have a temporary solution for ACPI that will 
require bandwidth review. It would be better to put this bandwidth 
focusing on getting a good design document.

When we brainstormed about PCI passthrough, we identified some tasks 
that could be done in parallel of the design document. The list I have 
in mind is:
	* SMMUv3: I am aware of a company working on this
	* GICv3-ITS: work done by ARM (see [2])
	* IORT: it is required to discover ITSes and SMMU with ACPI. So it can 
at least be parsed (I will speak about hiding some part to DOM0 later)
	* PCI support for SMMUv2

There are quite a few companies willing to contribute to PCI 
passthrough. So we need some coordination to avoid redundancy. Please 
get in touch with me if you are interested to work on one of these items.

> Later when PCI passthrough design gets mature and implemented the support can be extended.
>> The support of ACPI may affect some of the decisions (such as hypercall) and we have to know them now.
>>
> Still it can be an independent with only dependent features implemented or placeholders can be addded
>> Regarding the ECAM region not mapped. This is not related to PCI passthrough but how MMIO are mapped with ACPI. This is a separate subject already in discussion (see [1]).
>>
> What about IORT generation for Dom0 without smmu ?

Looking at the IORT, the ITS node is taking the StreamID in input when 
the device is protected by an SMMU.

Rather than removing the SMMU node in the IORT, I would blacklist them 
using the STAO table (or maybe we could introduce a disable flag in the 
IORT?).

Stefano, IIRC you took part of the design of IORT. How did you envision 
hiding SMMU from DOM0?

> I believe, It is not dependent on [1]
>> Cheers,
>>
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01607.html
>>

Cheers,

[2] 
https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02742.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-24 20:07     ` Stefano Stabellini
  2017-01-25 11:21       ` Roger Pau Monné
@ 2017-01-25 18:53       ` Julien Grall
  2017-01-31 16:53         ` Edgar E. Iglesias
                           ` (2 more replies)
  1 sibling, 3 replies; 82+ messages in thread
From: Julien Grall @ 2017-01-25 18:53 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Andrew Cooper, Jiandi An, Punit Agrawal,
	alistair.francis, xen-devel, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Stefano,

On 24/01/17 20:07, Stefano Stabellini wrote:
> On Tue, 24 Jan 2017, Julien Grall wrote:
>>>> ## Discovering and register hostbridge
>>>>
>>>> Both ACPI and Device Tree do not provide enough information to fully
>>>> instantiate an host bridge driver. In the case of ACPI, some data may come
>>>> from ASL,
>>>
>>> The data available from ASL is just to initialize quirks and non-ECAM
>>> controllers, right? Given that SBSA mandates ECAM, and we assume that
>>> ACPI is mostly (if not only) for servers, then I think it is safe to say
>>> that in the case of ACPI we should have all the info to fully
>>> instantiate an host bridge driver.
>>
>> From the spec, the MCFG will only describe host bridge available at boot (see
>> 4.2 in "PCI firmware specification, rev 3.2"). All the other host bridges will
>> be described in ASL.
>>
>> So we need DOM0 to feed Xen about the latter host bridges.
>
> Unfortunately PCI specs are only accessible by PCI SIG members
> organizations. In other words, I cannot read the doc.
>
> Could you please explain what kind of host bridges are not expected to
> be available at boot? Do you know of any examples?

Roger answered to this answer in on a reply to this e-mail. So I will 
skip it. Let me know if you need for details.

>
>
>>>> whilst for Device Tree the segment number is not available.
>>>>
>>>> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
>>>> with all the relevant informations. This will be done via a new hypercall
>>>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>>>
>>> I understand that the main purpose of this hypercall is to get Xen and Dom0
>>> to
>>> agree on the segment numbers, but why is it necessary? If Dom0 has an
>>> emulated contoller like any other guest, do we care what segment numbers
>>> Dom0 will use?
>>
>> I was not planning to have a emulated controller for DOM0. The physical one is
>> not necessarily ECAM compliant so we would have to either emulate the physical
>> one (meaning multiple different emulation) or an ECAM compliant.
>>
>> The latter is not possible because you don't know if there is enough free MMIO
>> space for the emulation.
>>
>> In the case on ARM, I don't see much the point to emulate the host bridge for
>> DOM0. The only thing we need in Xen is to access the configuration space, we
>> don't have about driving the host bridge. So I would let DOM0 dealing with
>> that.
>>
>> Also, I don't see any reason for ARM to trap DOM0 configuration space access.
>> The MSI will be configured using the interrupt controller and it is a trusted
>> Domain.
>
> These last you sentences raise a lot of questions. Maybe I am missing
> something. You might want to clarify the strategy for Dom0 and DomUs,
> and how they differ, in the next version of the doc.
>
> At some point you wrote "Instantiation of a specific driver for the host
> controller can be easily done if Xen has the information to detect it.
> However, those drivers may require resources described in ASL." Does it
> mean you plan to drive the physical host bridge from Xen and Dom0
> simultaneously?

I may miss some bits, so feel free to correct me if I am wrong.

My understanding is host bridge can be divided in 2 parts:
	- Initialization of the host bridge
	- Access the configuration space

For generic host bridge, the initialization is inexistent. However some 
host bridge (e.g xgene, xilinx) may require some specific setup and also 
configuring clocks. Given that Xen only requires to access the 
configuration space, I was thinking to let DOM0 initialization the host 
bridge. This would avoid to import a lot of code in Xen, however this 
means that we need to know when the host bridge has been initialized 
before accessing the configuration space.

Now regarding the configuration space, I think we can divide in 2 category:
	- indirect access, the configuration space are multiplexed. An example 
would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar 
method is used for x-gene PCI driver ([1]).
	- ECAM like access, where each PCI configuration space will have it is 
own address space. I said "ECAM like" because some host bridge will 
require some bits fiddling when accessing register (see thunder-ecam [2])

There are also host bridges that mix both indirect access and ECAM like 
access depending on the device configuration space accessed (see 
thunder-pem [3]).

When using ECAM like host bridge, I don't think it will be an issue to 
have both DOM0 and Xen accessing configuration space at the same time. 
Although, we need to define who is doing what. In general case, DOM0 
should not touched an assigned PCI device. The only possible interaction 
would be resetting a device (see my answer below).

When using indirect access, we cannot let DOM0 and Xen accessing any PCI 
configuration space at the same time. So I think we would have to 
emulate the physical host controller.

Unless we have a big requirement to trap DOM0 access to the 
configuration space, I would only keep the emulation to the strict 
minimum (e.g for indirect access) to avoid ending-up handling all the 
quirks for ECAM like host bridge.

If we need to trap the configuration space, I would suggest the 
following for ECAM like host bridge:
	- For physical host bridge that does not require initialization and is 
nearly ECAM compatible (e.g require register fiddling) => replace by a 
generic host bridge emulation for DOM0
	- For physical host bridge that require initialization but is ECAM 
compatible (e.g AFAICT xilinx [4]) => trap the ECAM access but let DOM0 
handling the host bridge initialization
	- For all other host bridges => I don't know if there are host bridges 
falling under this category. I also don't have any idea how to handle this.

>
> Otherwise, if Dom0 is the only one to drive the physical host bridge,
> and Xen is the one to provide the emulated host bridge, how are DomU PCI
> config reads and writes supposed to work in details?

I think I have answered to this question with my explanation above. Let 
me know if it is not the case.

 >  How is MSI configuration supposed to work?

For GICv3 ITS, the MSI will be configured with the eventID (it is uniq 
per-device) and the address of the doorbell. The linkage between the LPI 
and "MSI" will be done through the ITS.

For GICv2m, the MSI will be configured with an SPIs (or offset on some 
GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are 
mapped 1:1.

So in both case, I don't think it is necessary to trap MSI configuration 
for DOM0. This may not be true if we want to handle other MSI controller.

I have in mind the xilinx MSI controller (embedded in the host bridge? 
[4]) and xgene MSI controller ([5]). But I have no idea how they work 
and if we need to support them. Maybe Edgar could share details on the 
Xilinx one?

>
>
>>>> XXX: Shall we limit DOM0 the access to the configuration space from that
>>>> moment?
>>>
>>> If we can, we should
>>
>> Why would be the benefits? For now, I see a big drawback: resetting a PCI
>> devices would need to be done in Xen rather than DOM0. As you may now there
>> are a lot of quirks for reset.
>>
>> So for me, it looks more sensible to handle this in DOM0 and let DOM0 a full
>> access to the configuration space. Overall he is a trusted domain.
>
> PCI reset is worth of its own chapter in the doc :-)
>
> Dom0 is a trusted domain, but when possible, I consider an improvement
> to limit the amount of trust we put in it. Also, as I wrote above, I
> don't understand what is the plan to deal with concurrent accesses to
> the host bridge from Dom0 and Xen.

I believe I gave more details now :). If it sounds sensible, I will add 
it in the next version of the design doc.

>
> In any case, regarding PCI reset, we should dig out past discussions on
> the merits of doing reset in the hypervisor vs. dom0. I agree
> introducing PCI reset quirks in Xen is not nice but I recall that
> XenClient did it to avoid possible misbehaviors of the device. We need
> to be careful about ordering PCI reset against domain destruction. I
> couldn't find any email discussions to reference, maybe it is worth
> contacting the OpenXT guys about it.

I've got a vague recall of this code back when I was working at XenClient.

I gave a brief look to the Xen patchqueue [6] of openxt and was not able 
to find a patch to reset PCI in Xen. However, the have a patch to fix 
the one in Linux [7].

There is some ex-XenClient working on OpenXt. I will ask them to see if 
they remember anything. Also CC Andrew, just in case he knows the story.

Cheers,

[1] drivers/pci/host/pci-xgene.c
[2] drivers/pci/host/pci-thunder-ecam.c
[3] drivers/pci/host/pci-thunder-pem.c
[4] drivers/pci/host/pcie-xilinx-nwl.c
[5] drivers/pci/host/pcie-xgene-msi.c
[6] https://github.com/OpenXT-Extras/xen-common-pq
[7] 
https://github.com/OpenXT-Extras/linux-3.11-pq/blob/master/master/pci-pt-flr

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-25 15:25       ` Julien Grall
@ 2017-01-30  7:41         ` Manish Jaggi
  2017-01-31 13:33           ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Manish Jaggi @ 2017-01-30  7:41 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Kapoor, Prasun, Jiandi An,
	Punit Agrawal, alistair.francis, jnair, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hello Julien,

On 01/25/2017 08:55 PM, Julien Grall wrote:
> Hello Manish,
> 
> On 25/01/17 04:37, Manish Jaggi wrote:
>> On 01/24/2017 11:13 PM, Julien Grall wrote:
>>>
>>>
>>> On 19/01/17 05:09, Manish Jaggi wrote:
>>>> I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are separate features.
>>>> Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont boot.
>>>> Currently for dt xen does that.
>>>>
>>>> So can we have 2 design documents
>>>> a) PCI passthrough
>>>> b) ACPI dom0/domU support in Xen and Linux
>>>> - this may include:
>>>> b.1 Passing IORT to Dom0 without smmu
>>>> b.2 Hypercall to map PCI config space in dom0
>>>> b.3 <more>
>>>>
>>>> What do you think?
>>>
>>> I don't think ACPI should be treated in a separate design document.
>> As PCI passthrough support will take time to mature, why should we hold the ACPI design ?
>> If I can boot dom0/domU with ACPI as it works with dt today, it would be a good milestone.
> 
> The way PCI is working on DT today is a hack.
Can you please elaborate why it is a hack ?
> There is no SMMU support
SMMU support can be turned on and off by iommu=0 and also by not having an smmu node in device tree.
So not having an smmu support for dom0 is not a hack IMHO.
domUs can continue with PV devices

And if you term without smmu as a hack, if I may suggest lets use this as a phase 0 for ACPI.

> and the first version of GICv3 ITS support will contain hardcoded DeviceID (or very similar). 
I have a disagreement on this, why should it contain hardcoded device ID, what prevents it today technically?
Can you please elaborate.
If you are ok to have a first limited version of GICV3 ITS why not have a Phase0 for ACPI?

> 
> The current hack will introduce problem on platform where a specific host controller is necessary to access the configuration space.
The specific host controller can be accessed by dom0 with Xen mapping stage2, then we dont need a driver? right?
Can you please elaborate on the problem?
> Indeed, at the beginning Xen may not have a driver available (this
> will depend on the contribution), but we still need to be able to use PCI with Xen. 
ACPI dom0 boot can and should be done without smmu support.

> We chose this way on DT because we didn't know when the PCI passthrough will be added in Xen.
not a technical argument.

> 
> As mentioned in the introduction of the design document, I envision PCI passthrough implementation in 2 phases:
>     - Phase 1: Register all PCI devices in Xen => will allow to use ITS and SMMU with PCI in Xen
>     - Phase 2: Assign devices to guests
> 
I think 3 phases, Lets add phase 0.
- Phase 0: Dom0 ACPI without SMMU, DomU with PV devices, ITS in Xen

> This design document will cover both phases because they are tight together. But the implementation can be decoupled, it would be possible (and also my plan) to see the 2 phases upstreamed in
> different Xen release.
> 
> Phase 1, will cover anything necessary for Xen to discover and register PCI devices. This include the ACPI support (IORT,...).
> 
> I see little point to have a temporary solution for ACPI that will require bandwidth review. It would be better to put this bandwidth focusing on getting a good design document.
I disagree, it is not a temporary solution. There are several use cases where PCI pass-through is not required but ACPI is.
> 
> When we brainstormed about PCI passthrough, we identified some tasks that could be done in parallel of the design document. The list I have in mind is:
>     * SMMUv3: I am aware of a company working on this
>     * GICv3-ITS: work done by ARM (see [2])
>     * IORT: it is required to discover ITSes and SMMU with ACPI. So it can at least be parsed (I will speak about hiding some part to DOM0 later)
>     * PCI support for SMMUv2
> 
> There are quite a few companies willing to contribute to PCI passthrough. So we need some coordination to avoid redundancy. Please get in touch with me if you are interested to work on one of these
> items.
> 
Will mail you.
>> Later when PCI passthrough design gets mature and implemented the support can be extended.
>>> The support of ACPI may affect some of the decisions (such as hypercall) and we have to know them now.
>>>
>> Still it can be an independent with only dependent features implemented or placeholders can be addded
>>> Regarding the ECAM region not mapped. This is not related to PCI passthrough but how MMIO are mapped with ACPI. This is a separate subject already in discussion (see [1]).
>>>
>> What about IORT generation for Dom0 without smmu ?
> 
> Looking at the IORT, the ITS node is taking the StreamID in input when the device is protected by an SMMU.
> 
> Rather than removing the SMMU node in the IORT, I would blacklist them using the STAO table (or maybe we could introduce a disable flag in the IORT?).
> 
> Stefano, IIRC you took part of the design of IORT. How did you envision hiding SMMU from DOM0?
> 
>> I believe, It is not dependent on [1]
>>> Cheers,
>>>
>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01607.html
>>>
> 
> Cheers,
> 
> [2] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02742.html
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-30  7:41         ` Manish Jaggi
@ 2017-01-31 13:33           ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-01-31 13:33 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Kapoor, Prasun, Jiandi An,
	Punit Agrawal, alistair.francis, jnair, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper



On 30/01/17 07:41, Manish Jaggi wrote:
> Hello Julien,
>
> On 01/25/2017 08:55 PM, Julien Grall wrote:
>> Hello Manish,
>>
>> On 25/01/17 04:37, Manish Jaggi wrote:
>>> On 01/24/2017 11:13 PM, Julien Grall wrote:
>>>>
>>>>
>>>> On 19/01/17 05:09, Manish Jaggi wrote:
>>>>> I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are separate features.
>>>>> Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont boot.
>>>>> Currently for dt xen does that.
>>>>>
>>>>> So can we have 2 design documents
>>>>> a) PCI passthrough
>>>>> b) ACPI dom0/domU support in Xen and Linux
>>>>> - this may include:
>>>>> b.1 Passing IORT to Dom0 without smmu
>>>>> b.2 Hypercall to map PCI config space in dom0
>>>>> b.3 <more>
>>>>>
>>>>> What do you think?
>>>>
>>>> I don't think ACPI should be treated in a separate design document.
>>> As PCI passthrough support will take time to mature, why should we hold the ACPI design ?
>>> If I can boot dom0/domU with ACPI as it works with dt today, it would be a good milestone.
>>
>> The way PCI is working on DT today is a hack.
> Can you please elaborate why it is a hack ?

I think I gave enough explanation in my previous e-mail to why I 
consider it as a hack.

>> There is no SMMU support
> SMMU support can be turned on and off by iommu=0 and also by not having an smmu node in device tree.
> So not having an smmu support for dom0 is not a hack IMHO.
> domUs can continue with PV devices
>
> And if you term without smmu as a hack, if I may suggest lets use this as a phase 0 for ACPI.
>
>> and the first version of GICv3 ITS support will contain hardcoded DeviceID (or very similar).
> I have a disagreement on this, why should it contain hardcoded device ID, what prevents it today technically?

As you may know, PCI devices are not described in the firmware tables 
and discoverable via the host bridges. We don't have this support in Xen 
today.

You may argue can we could use the existing physdev hypercalls. However 
they are not enough to find the DeviceID associated to a specific 
devices. Indeed we are not able to find the host bridge DT node in order 
to translate the RID.

> Can you please elaborate.
> If you are ok to have a first limited version of GICV3 ITS why not have a Phase0 for ACPI?

See my answer below.

>
>>
>> The current hack will introduce problem on platform where a specific host controller is necessary to access the configuration space.
> The specific host controller can be accessed by dom0 with Xen mapping stage2, then we dont need a driver? right?
> Can you please elaborate on the problem?
>> Indeed, at the beginning Xen may not have a driver available (this
>> will depend on the contribution), but we still need to be able to use PCI with Xen.
> ACPI dom0 boot can and should be done without smmu support.
>
>> We chose this way on DT because we didn't know when the PCI passthrough will be added in Xen.
> not a technical argument.

That's not a helpful comment... Some arguments are not necessarily 
technical but also based on a forward plan and bandwidth review.

What you are arguing is for is whether PCI support for ACPI should land 
in Xen two weeks earlier or not. We both want to get PCI support with 
ACPI as soon as possible in Xen, but for an open source perspective, 
there is little point to have a phase 0 for ACPI as it will likely land 
in the same release.

One way to get things going faster is providing reviews.

>>
>> As mentioned in the introduction of the design document, I envision PCI passthrough implementation in 2 phases:
>>     - Phase 1: Register all PCI devices in Xen => will allow to use ITS and SMMU with PCI in Xen
>>     - Phase 2: Assign devices to guests
>>
> I think 3 phases, Lets add phase 0.
> - Phase 0: Dom0 ACPI without SMMU, DomU with PV devices, ITS in Xen
>
>> This design document will cover both phases because they are tight together. But the implementation can be decoupled, it would be possible (and also my plan) to see the 2 phases upstreamed in
>> different Xen release.
>>
>> Phase 1, will cover anything necessary for Xen to discover and register PCI devices. This include the ACPI support (IORT,...).
>>
>> I see little point to have a temporary solution for ACPI that will require bandwidth review. It would be better to put this bandwidth focusing on getting a good design document.
> I disagree, it is not a temporary solution. There are several use cases where PCI pass-through is not required but ACPI is.

Sometimes I am wondering if you read what I wrote. The phase 1: 
"Discovering PCI" is exactly what you are looking for.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-25 11:42     ` Roger Pau Monné
@ 2017-01-31 15:59       ` Julien Grall
  2017-01-31 22:03         ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-01-31 15:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean

Hi Roger,

On 25/01/17 11:42, Roger Pau Monné wrote:
> On Tue, Jan 24, 2017 at 05:17:06PM +0000, Julien Grall wrote:
>> On 06/01/17 15:12, Roger Pau Monné wrote:
>>> On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:
>>>>     * Add a device
>>>>     * Remove a device
>>>>     * Assign a device to a guest
>>>>     * Deassign a device from a guest
>>>>
>>>> XXX: Detail the interaction when assigning/deassigning device
>>>
>>> Assigning a device will probably entangle setting up some direct MMIO mappings
>>> (BARs and ROMs) plus a bunch of traps in order to perform emulation of accesses
>>> to the PCI config space (or those can be setup when a new bridge is registered
>>> with Xen).
>>
>> I am planning to details the root complex emulation in a separate section. I
>> sent the design document before writing it.
>>
>> In brief, I would expect the registration of a new bridge to setup the trap
>> to emulation access to the PCI configuration space. On ARM, the first
>> approach will rely on the OS to setup the BARs and ROMs. So they will be
>> mapped by the PCI configuration space emulation.
>>
>> The reason on relying on the OS to setup the BARs/ROMs reducing the work to
>> do for a first version. Otherwise we would have to add code in the toolstack
>> to decide where to place the BARs/ROMs. I don't think it is a lot of work,
>> but it is not that important because it does not require a stable ABI (this
>> is an interaction between the hypervisor and the toolstack). Furthermore,
>> Linux (at least on ARM) is assigning the BARs at the setup. From my
>> understanding, this is the expected behavior with both DT (the DT has a
>> property to skip the scan) and ACPI.
>
> This approach might work for Dom0, but for DomU you certainly need to know
> where the MMIO regions of a device are, and either the toolstack or Xen needs
> to setup this in advance (or at least mark which MMIO regions are available to
> the DomU). Allowing a DomU to map random MMIO regions is certainly a security
> issue.

I agree here. I provided more feedback on an answer to Stefano, I would 
your input there to if possible. See

<8ca91073-09e7-57ca-9063-b47e0aced39d@linaro.org>

[...]

>>>
>>>>
>>>> Based on what Linux is currently doing, there are two kind of quirks:
>>>>     * Accesses to the configuration space of certain sizes are not allowed
>>>>     * A specific driver is necessary for driving the host bridge
>>>
>>> Hm, so what are the issues that make this bridges need specific drivers?
>>>
>>> This might be quite problematic if you also have to emulate this broken
>>> behavior inside of Xen (because Dom0 is using a specific driver).
>>
>> I am not expecting to emulate the configuration space access for DOM0. I
>> know you mentioned that it would be necessary to hide PCI used by Xen (such
>> as the UART) to DOM0 or configuring MSI. But for ARM, the UART is integrated
>> in the SOC and MSI will be configured through the interrupt controller.
>
> Right, we certainly need to do it for x86, but I don't know that much of the
> ARM architecture in order to know if that's needed or not. I'm also wondering
> if having both Xen and the Dom0 directly accessing the ECAM area is fine, even
> if they use different cache mapping attributes?

I don't know much x86, but on ARM we could specify caching attributes in 
the stage-2 page tables (aka EPT on x86). The MMU will use the stricter 
memory attributes between stage-2 and the guest page tables.

In the case of ECAM, we could disable the caching in stage-2 page 
tables. So the ECAM will always access uncached.

>
>>>> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
>>>> with all the relevant informations. This will be done via a new hypercall
>>>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>>>>
>>>> struct physdev_pci_host_bridge_add
>>>> {
>>>>     /* IN */
>>>>     uint16_t seg;
>>>>     /* Range of bus supported by the host bridge */
>>>>     uint8_t  bus_start;
>>>>     uint8_t  bus_nr;
>>>>     uint32_t res0;  /* Padding */
>>>>     /* Information about the configuration space region */
>>>>     uint64_t cfg_base;
>>>>     uint64_t cfg_size;
>>>> }
>>>
>>> Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes
>>> in size?
>>
>> The cfg_size is here to help us to match the corresponding node in the
>> device tree. The cfg_size may differ depending on how the hardware has
>> implemented the access to the configuration space.
>
> But certainly cfg_base needs to be aligned to a PAGE_SIZE? And according to the
> spec cfg_size cannot be bigger than 4KB (PAGE_SIZE), so in any case you will
> end up mapping a whole 4KB page, because that's the minimum granularity of the
> p2m?

cfg_size would be a multiple of 4KB as each configuration space would 
have a unique region. But as you mentioned later we could re-use 
MMCFG_reserved.

>
>> But to be fair, I think we can deal without this property. For ACPI, the
>> size will vary following the number of bus handled and can be deduced. For
>> DT, the base address and bus range should be enough to find the associated
>> node.
>>
>>>
>>> If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved
>>> hypercalls.
>>>
>>>> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
>>>> bridge available on the platform. When Xen is receiving the hypercall, the
>>>> the driver associated to the host bridge will be instantiated.
>>>>
>>>> XXX: Shall we limit DOM0 the access to the configuration space from that
>>>> moment?
>>>
>>> Most definitely yes, you should instantiate an emulated bridge over the real
>>> one, in order to proxy Dom0 accesses to the PCI configuration space. You for
>>> example don't want Dom0 moving the position of the BARs of PCI devices without
>>> Xen being aware (and properly changing the second stage translation).
>>
>> The problem is on ARM we don't have a single way to access the configuration
>> space. So we would need different emulator in Xen, which I don't like unless
>> there is a strong reason to do it.
>>
>> We could avoid DOM0s to modify the position of the BARs after setup. I also
>> remembered you mention about MSI configuration, for ARM this is done via the
>> interrupt controller.
>>
>>>
>>>> ## Discovering and register PCI
>>>>
>>>> Similarly to x86, PCI devices will be discovered by DOM0 and register
>>>> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
>>>
>>> Why do you need this? If you have access to the bridges you can scan them from
>>> Xen and discover the devices AFAICT.
>>
>> I am a bit confused. Are you saying that you plan to ditch them for PVH? If
>> so, why are they called by Linux today?
>
> I think we can get away with PHYSDEVOP_pci_mmcfg_reserved only, but maybe I'm
> missing something. AFAICT Xen should be able to gather all the other data by
> itself from the PCI config space once it knows the details about the host
> bridge.

 From my understanding, some host bridges would need to be configured 
before been able to be used (TBC). Bringing this initialization in Xen 
may be complex. For instance the xgene hostbridge (see 
linux/drivers/pci/host/pci-xgene.c) will require to enable the clock.

I would let the initialization of the hostbridge in Linux, so we are 
doing the scanning in Xen we will need an hypercall to let them knows 
the host bridges has been initialized.

I gave a bit more background on my answer to Stefano. So I would 
recommend to continue the conversation there.


>
>>>
>>>> By default all the PCI devices will be assigned to DOM0. So Xen would have
>>>> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
>>>> devices. As mentioned earlier, those subsystems will require the StreamID
>>>> and DeviceID. Both can be deduced from the RID.
>>>>
>>>> XXX: How to hide PCI devices from DOM0?
>>>
>>> By adding the ACPI namespace of the device to the STAO and blocking Dom0
>>> access to this device in the emulated bridge that Dom0 will have access to
>>> (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).
>>
>> Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
>> driver (similarly to xen-pciback.hide). We still want DOM0 to access the PCI
>> config space in order to reset the device. Unless you plan to import all the
>> reset quirks in Xen?
>
> I don't have a clear opinion here, and I don't know all thew details of this
> reset hacks.

Actually I looked at the Linux code (see __pci_dev_reset in 
drivers/pci/pci.c) and there are less quirks than I expected. The list 
of quirks can be found in pci_dev_reset_methods in drivers/pci/quirks.c.

There are few way to reset a device (see __pci_dev_reset), they look all 
based on accessing the configuration space. So I guess it should be fine 
to import that in Xen. Any opinions?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-25 18:53       ` Julien Grall
@ 2017-01-31 16:53         ` Edgar E. Iglesias
  2017-01-31 17:09           ` Julien Grall
  2017-01-31 21:58         ` Stefano Stabellini
  2017-02-01 10:55         ` Roger Pau Monné
  2 siblings, 1 reply; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-01-31 16:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/01/17 20:07, Stefano Stabellini wrote:
> >On Tue, 24 Jan 2017, Julien Grall wrote:
> >>>>## Discovering and register hostbridge
> >>>>
> >>>>Both ACPI and Device Tree do not provide enough information to fully
> >>>>instantiate an host bridge driver. In the case of ACPI, some data may come
> >>>>from ASL,
> >>>
> >>>The data available from ASL is just to initialize quirks and non-ECAM
> >>>controllers, right? Given that SBSA mandates ECAM, and we assume that
> >>>ACPI is mostly (if not only) for servers, then I think it is safe to say
> >>>that in the case of ACPI we should have all the info to fully
> >>>instantiate an host bridge driver.
> >>
> >>From the spec, the MCFG will only describe host bridge available at boot (see
> >>4.2 in "PCI firmware specification, rev 3.2"). All the other host bridges will
> >>be described in ASL.
> >>
> >>So we need DOM0 to feed Xen about the latter host bridges.
> >
> >Unfortunately PCI specs are only accessible by PCI SIG members
> >organizations. In other words, I cannot read the doc.
> >
> >Could you please explain what kind of host bridges are not expected to
> >be available at boot? Do you know of any examples?
> 
> Roger answered to this answer in on a reply to this e-mail. So I will skip
> it. Let me know if you need for details.
> 
> >
> >
> >>>>whilst for Device Tree the segment number is not available.
> >>>>
> >>>>So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> >>>>with all the relevant informations. This will be done via a new hypercall
> >>>>PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> >>>
> >>>I understand that the main purpose of this hypercall is to get Xen and Dom0
> >>>to
> >>>agree on the segment numbers, but why is it necessary? If Dom0 has an
> >>>emulated contoller like any other guest, do we care what segment numbers
> >>>Dom0 will use?
> >>
> >>I was not planning to have a emulated controller for DOM0. The physical one is
> >>not necessarily ECAM compliant so we would have to either emulate the physical
> >>one (meaning multiple different emulation) or an ECAM compliant.
> >>
> >>The latter is not possible because you don't know if there is enough free MMIO
> >>space for the emulation.
> >>
> >>In the case on ARM, I don't see much the point to emulate the host bridge for
> >>DOM0. The only thing we need in Xen is to access the configuration space, we
> >>don't have about driving the host bridge. So I would let DOM0 dealing with
> >>that.
> >>
> >>Also, I don't see any reason for ARM to trap DOM0 configuration space access.
> >>The MSI will be configured using the interrupt controller and it is a trusted
> >>Domain.
> >
> >These last you sentences raise a lot of questions. Maybe I am missing
> >something. You might want to clarify the strategy for Dom0 and DomUs,
> >and how they differ, in the next version of the doc.
> >
> >At some point you wrote "Instantiation of a specific driver for the host
> >controller can be easily done if Xen has the information to detect it.
> >However, those drivers may require resources described in ASL." Does it
> >mean you plan to drive the physical host bridge from Xen and Dom0
> >simultaneously?
> 
> I may miss some bits, so feel free to correct me if I am wrong.
> 
> My understanding is host bridge can be divided in 2 parts:
> 	- Initialization of the host bridge
> 	- Access the configuration space
> 
> For generic host bridge, the initialization is inexistent. However some host
> bridge (e.g xgene, xilinx) may require some specific setup and also
> configuring clocks. Given that Xen only requires to access the configuration
> space, I was thinking to let DOM0 initialization the host bridge. This would
> avoid to import a lot of code in Xen, however this means that we need to
> know when the host bridge has been initialized before accessing the
> configuration space.


Yes, that's correct.
There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
enabling clocks and configuring a few registers to enable ECAM and MSI.

I'm not sure if this could be done prior to starting Xen. Perhaps.
If so, bootloaders would have to know a head of time what devices
the GTs are supposed to be configured for.



> 
> Now regarding the configuration space, I think we can divide in 2 category:
> 	- indirect access, the configuration space are multiplexed. An example
> would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar method is
> used for x-gene PCI driver ([1]).
> 	- ECAM like access, where each PCI configuration space will have it is own
> address space. I said "ECAM like" because some host bridge will require some
> bits fiddling when accessing register (see thunder-ecam [2])
> 
> There are also host bridges that mix both indirect access and ECAM like
> access depending on the device configuration space accessed (see thunder-pem
> [3]).
> 
> When using ECAM like host bridge, I don't think it will be an issue to have
> both DOM0 and Xen accessing configuration space at the same time. Although,
> we need to define who is doing what. In general case, DOM0 should not
> touched an assigned PCI device. The only possible interaction would be
> resetting a device (see my answer below).
> 
> When using indirect access, we cannot let DOM0 and Xen accessing any PCI
> configuration space at the same time. So I think we would have to emulate
> the physical host controller.
> 
> Unless we have a big requirement to trap DOM0 access to the configuration
> space, I would only keep the emulation to the strict minimum (e.g for
> indirect access) to avoid ending-up handling all the quirks for ECAM like
> host bridge.
> 
> If we need to trap the configuration space, I would suggest the following
> for ECAM like host bridge:
> 	- For physical host bridge that does not require initialization and is
> nearly ECAM compatible (e.g require register fiddling) => replace by a
> generic host bridge emulation for DOM0
> 	- For physical host bridge that require initialization but is ECAM
> compatible (e.g AFAICT xilinx [4]) => trap the ECAM access but let DOM0
> handling the host bridge initialization

Sounds good to me.


> 	- For all other host bridges => I don't know if there are host bridges
> falling under this category. I also don't have any idea how to handle this.
> 
> >
> >Otherwise, if Dom0 is the only one to drive the physical host bridge,
> >and Xen is the one to provide the emulated host bridge, how are DomU PCI
> >config reads and writes supposed to work in details?
> 
> I think I have answered to this question with my explanation above. Let me
> know if it is not the case.
> 
> >  How is MSI configuration supposed to work?
> 
> For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> per-device) and the address of the doorbell. The linkage between the LPI and
> "MSI" will be done through the ITS.
> 
> For GICv2m, the MSI will be configured with an SPIs (or offset on some
> GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> 1:1.
> 
> So in both case, I don't think it is necessary to trap MSI configuration for
> DOM0. This may not be true if we want to handle other MSI controller.
> 
> I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> and xgene MSI controller ([5]). But I have no idea how they work and if we
> need to support them. Maybe Edgar could share details on the Xilinx one?


The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
So perhaps trapped config accesses from domUs can help by adding this protection
as drivers configure the device.

On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
out the EventID from a FIFO in the controller and injects a new IRQ into
the kernel.

I hope that helps!
Best regards,
Edgar


> 
> >
> >
> >>>>XXX: Shall we limit DOM0 the access to the configuration space from that
> >>>>moment?
> >>>
> >>>If we can, we should
> >>
> >>Why would be the benefits? For now, I see a big drawback: resetting a PCI
> >>devices would need to be done in Xen rather than DOM0. As you may now there
> >>are a lot of quirks for reset.
> >>
> >>So for me, it looks more sensible to handle this in DOM0 and let DOM0 a full
> >>access to the configuration space. Overall he is a trusted domain.
> >
> >PCI reset is worth of its own chapter in the doc :-)
> >
> >Dom0 is a trusted domain, but when possible, I consider an improvement
> >to limit the amount of trust we put in it. Also, as I wrote above, I
> >don't understand what is the plan to deal with concurrent accesses to
> >the host bridge from Dom0 and Xen.
> 
> I believe I gave more details now :). If it sounds sensible, I will add it
> in the next version of the design doc.
> 
> >
> >In any case, regarding PCI reset, we should dig out past discussions on
> >the merits of doing reset in the hypervisor vs. dom0. I agree
> >introducing PCI reset quirks in Xen is not nice but I recall that
> >XenClient did it to avoid possible misbehaviors of the device. We need
> >to be careful about ordering PCI reset against domain destruction. I
> >couldn't find any email discussions to reference, maybe it is worth
> >contacting the OpenXT guys about it.
> 
> I've got a vague recall of this code back when I was working at XenClient.
> 
> I gave a brief look to the Xen patchqueue [6] of openxt and was not able to
> find a patch to reset PCI in Xen. However, the have a patch to fix the one
> in Linux [7].
> 
> There is some ex-XenClient working on OpenXt. I will ask them to see if they
> remember anything. Also CC Andrew, just in case he knows the story.
> 
> Cheers,
> 
> [1] drivers/pci/host/pci-xgene.c
> [2] drivers/pci/host/pci-thunder-ecam.c
> [3] drivers/pci/host/pci-thunder-pem.c
> [4] drivers/pci/host/pcie-xilinx-nwl.c
> [5] drivers/pci/host/pcie-xgene-msi.c
> [6] https://github.com/OpenXT-Extras/xen-common-pq
> [7]
> https://github.com/OpenXT-Extras/linux-3.11-pq/blob/master/master/pci-pt-flr
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 16:53         ` Edgar E. Iglesias
@ 2017-01-31 17:09           ` Julien Grall
  2017-01-31 19:06             ` Edgar E. Iglesias
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-01-31 17:09 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Edgar,

Thank you for the feedbacks.

On 31/01/17 16:53, Edgar E. Iglesias wrote:
> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>> For generic host bridge, the initialization is inexistent. However some host
>> bridge (e.g xgene, xilinx) may require some specific setup and also
>> configuring clocks. Given that Xen only requires to access the configuration
>> space, I was thinking to let DOM0 initialization the host bridge. This would
>> avoid to import a lot of code in Xen, however this means that we need to
>> know when the host bridge has been initialized before accessing the
>> configuration space.
>
>
> Yes, that's correct.
> There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> enabling clocks and configuring a few registers to enable ECAM and MSI.
>
> I'm not sure if this could be done prior to starting Xen. Perhaps.
> If so, bootloaders would have to know a head of time what devices
> the GTs are supposed to be configured for.

I've got further questions regarding the Gigabit Transceivers. You 
mention they are shared, do you mean that multiple devices can use a GT 
at the same time? Or the software is deciding at startup which device 
will use a given GT? If so, how does the software make this decision?

>> 	- For all other host bridges => I don't know if there are host bridges
>> falling under this category. I also don't have any idea how to handle this.
>>
>>>
>>> Otherwise, if Dom0 is the only one to drive the physical host bridge,
>>> and Xen is the one to provide the emulated host bridge, how are DomU PCI
>>> config reads and writes supposed to work in details?
>>
>> I think I have answered to this question with my explanation above. Let me
>> know if it is not the case.
>>
>>>  How is MSI configuration supposed to work?
>>
>> For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
>> per-device) and the address of the doorbell. The linkage between the LPI and
>> "MSI" will be done through the ITS.
>>
>> For GICv2m, the MSI will be configured with an SPIs (or offset on some
>> GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
>> 1:1.
>>
>> So in both case, I don't think it is necessary to trap MSI configuration for
>> DOM0. This may not be true if we want to handle other MSI controller.
>>
>> I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
>> and xgene MSI controller ([5]). But I have no idea how they work and if we
>> need to support them. Maybe Edgar could share details on the Xilinx one?
>
>
> The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> So perhaps trapped config accesses from domUs can help by adding this protection
> as drivers configure the device.
>
> On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> out the EventID from a FIFO in the controller and injects a new IRQ into
> the kernel.

It might be early to ask, but how do you expect  MSI to work with DOMU 
on your hardware? Does your MSI controller supports virtualization? Or 
are you looking for a different way to inject MSI?

>
> I hope that helps!

It helped thank you!

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 17:09           ` Julien Grall
@ 2017-01-31 19:06             ` Edgar E. Iglesias
  2017-01-31 22:08               ` Stefano Stabellini
  2017-02-01 19:04               ` Julien Grall
  0 siblings, 2 replies; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-01-31 19:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> Hi Edgar,
> 
> Thank you for the feedbacks.

Hi Julien,

> 
> On 31/01/17 16:53, Edgar E. Iglesias wrote:
> >On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> >>On 24/01/17 20:07, Stefano Stabellini wrote:
> >>>On Tue, 24 Jan 2017, Julien Grall wrote:
> >>For generic host bridge, the initialization is inexistent. However some host
> >>bridge (e.g xgene, xilinx) may require some specific setup and also
> >>configuring clocks. Given that Xen only requires to access the configuration
> >>space, I was thinking to let DOM0 initialization the host bridge. This would
> >>avoid to import a lot of code in Xen, however this means that we need to
> >>know when the host bridge has been initialized before accessing the
> >>configuration space.
> >
> >
> >Yes, that's correct.
> >There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> >to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> >enabling clocks and configuring a few registers to enable ECAM and MSI.
> >
> >I'm not sure if this could be done prior to starting Xen. Perhaps.
> >If so, bootloaders would have to know a head of time what devices
> >the GTs are supposed to be configured for.
> 
> I've got further questions regarding the Gigabit Transceivers. You mention
> they are shared, do you mean that multiple devices can use a GT at the same
> time? Or the software is deciding at startup which device will use a given
> GT? If so, how does the software make this decision?

Software will decide at startup. AFAIK, the allocation is normally done
once but I guess that in theory you could design boards that could switch
at runtime. I'm not sure we need to worry about that use-case though.

The details can be found here:
https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf

I suggest looking at pages 672 and 733.



> 
> >>	- For all other host bridges => I don't know if there are host bridges
> >>falling under this category. I also don't have any idea how to handle this.
> >>
> >>>
> >>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> >>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> >>>config reads and writes supposed to work in details?
> >>
> >>I think I have answered to this question with my explanation above. Let me
> >>know if it is not the case.
> >>
> >>> How is MSI configuration supposed to work?
> >>
> >>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> >>per-device) and the address of the doorbell. The linkage between the LPI and
> >>"MSI" will be done through the ITS.
> >>
> >>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> >>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> >>1:1.
> >>
> >>So in both case, I don't think it is necessary to trap MSI configuration for
> >>DOM0. This may not be true if we want to handle other MSI controller.
> >>
> >>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> >>and xgene MSI controller ([5]). But I have no idea how they work and if we
> >>need to support them. Maybe Edgar could share details on the Xilinx one?
> >
> >
> >The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> >way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> >So perhaps trapped config accesses from domUs can help by adding this protection
> >as drivers configure the device.
> >
> >On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> >out the EventID from a FIFO in the controller and injects a new IRQ into
> >the kernel.
> 
> It might be early to ask, but how do you expect  MSI to work with DOMU on
> your hardware? Does your MSI controller supports virtualization? Or are you
> looking for a different way to inject MSI?

MSI support in HW is quite limited to support domU and will require SW hacks :-(

Anyway, something along the lines of this might work:

* Trap domU CPU writes to MSI descriptors in config space.
  Force real MSI descriptors to the address of the door bell area.
  Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
  Remember what EventID domU requested per device and descriptor.

* Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
  At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
  Xen or Dom0 inject the expected EventID into domU

Do you have any good ideas? :-)

Cheers,
Edgar


> 
> >
> >I hope that helps!
> 
> It helped thank you!
> 
> Cheers,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-25 18:53       ` Julien Grall
  2017-01-31 16:53         ` Edgar E. Iglesias
@ 2017-01-31 21:58         ` Stefano Stabellini
  2017-02-01 20:12           ` Julien Grall
  2017-02-01 10:55         ` Roger Pau Monné
  2 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-31 21:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Shanker Donthineni,
	xen-devel, manish.jaggi, Campbell Sean, Roger Pau Monné

On Wed, 25 Jan 2017, Julien Grall wrote:
> > > > > whilst for Device Tree the segment number is not available.
> > > > > 
> > > > > So Xen needs to rely on DOM0 to discover the host bridges and notify
> > > > > Xen
> > > > > with all the relevant informations. This will be done via a new
> > > > > hypercall
> > > > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > > > 
> > > > I understand that the main purpose of this hypercall is to get Xen and
> > > > Dom0
> > > > to
> > > > agree on the segment numbers, but why is it necessary? If Dom0 has an
> > > > emulated contoller like any other guest, do we care what segment numbers
> > > > Dom0 will use?
> > > 
> > > I was not planning to have a emulated controller for DOM0. The physical
> > > one is
> > > not necessarily ECAM compliant so we would have to either emulate the
> > > physical
> > > one (meaning multiple different emulation) or an ECAM compliant.
> > > 
> > > The latter is not possible because you don't know if there is enough free
> > > MMIO
> > > space for the emulation.
> > > 
> > > In the case on ARM, I don't see much the point to emulate the host bridge
> > > for
> > > DOM0. The only thing we need in Xen is to access the configuration space,
> > > we
> > > don't have about driving the host bridge. So I would let DOM0 dealing with
> > > that.
> > > 
> > > Also, I don't see any reason for ARM to trap DOM0 configuration space
> > > access.
> > > The MSI will be configured using the interrupt controller and it is a
> > > trusted
> > > Domain.
> > 
> > These last you sentences raise a lot of questions. Maybe I am missing
> > something. You might want to clarify the strategy for Dom0 and DomUs,
> > and how they differ, in the next version of the doc.
> > 
> > At some point you wrote "Instantiation of a specific driver for the host
> > controller can be easily done if Xen has the information to detect it.
> > However, those drivers may require resources described in ASL." Does it
> > mean you plan to drive the physical host bridge from Xen and Dom0
> > simultaneously?
> 
> I may miss some bits, so feel free to correct me if I am wrong.
> 
> My understanding is host bridge can be divided in 2 parts:
> 	- Initialization of the host bridge
> 	- Access the configuration space
> 
> For generic host bridge, the initialization is inexistent. However some host
> bridge (e.g xgene, xilinx) may require some specific setup and also
> configuring clocks. Given that Xen only requires to access the configuration
> space, I was thinking to let DOM0 initialization the host bridge. This would
> avoid to import a lot of code in Xen, however this means that we need to know
> when the host bridge has been initialized before accessing the configuration
> space.

I prefer to avoid a split-mind approach, where some PCI things are
initialized/owned by one component and some others are initialized/owned
by another component. It creates complexity. Of course, we have to face
the reality that the alternatives might be worse, but let's take a look
at the other options first.

How hard would it be to bring the PCI host bridge initialization in Xen,
for example in the case of the Xilinx ZynqMP? Traditionally, PCI host
bridges have not required any initialization on x86. PCI is still new to
the ARM ecosystems. I think it is reasonable to expect that going
forward, as the ARM ecosystem matures, PCI host bridges will require
little to no initialization on ARM too.


> Now regarding the configuration space, I think we can divide in 2 category:
> 	- indirect access, the configuration space are multiplexed. An example
> would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar method is
> used for x-gene PCI driver ([1]).
> 	- ECAM like access, where each PCI configuration space will have it is
> own address space. I said "ECAM like" because some host bridge will require
> some bits fiddling when accessing register (see thunder-ecam [2])
> 
> There are also host bridges that mix both indirect access and ECAM like access
> depending on the device configuration space accessed (see thunder-pem [3]).
> 
> When using ECAM like host bridge, I don't think it will be an issue to have
> both DOM0 and Xen accessing configuration space at the same time. Although, we
> need to define who is doing what. In general case, DOM0 should not touched an
> assigned PCI device. The only possible interaction would be resetting a device
> (see my answer below).

Even if the hardware allows it, I think it is a bad idea to access the
same hardware component from two different entities simultaneously.

I suggest we trap Dom0 reads/writes to ECAM, and execute them in Xen,
which I think it's what x86 does today.


> When using indirect access, we cannot let DOM0 and Xen accessing any PCI
> configuration space at the same time. So I think we would have to emulate the
> physical host controller.
> 
> Unless we have a big requirement to trap DOM0 access to the configuration
> space, I would only keep the emulation to the strict minimum (e.g for indirect
> access) to avoid ending-up handling all the quirks for ECAM like host bridge.
> 
> If we need to trap the configuration space, I would suggest the following for
> ECAM like host bridge:
> 	- For physical host bridge that does not require initialization and is
> nearly ECAM compatible (e.g require register fiddling) => replace by a generic
> host bridge emulation for DOM0

Sounds good.


> 	- For physical host bridge that require initialization but is ECAM
> compatible (e.g AFAICT xilinx [4]) => trap the ECAM access but let DOM0
> handling the host bridge initialization

I would consider doing the initialization in Xen. It would simplify the
architecture significantly.


> 	- For all other host bridges => I don't know if there are host bridges
> falling under this category. I also don't have any idea how to handle this.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 15:59       ` Julien Grall
@ 2017-01-31 22:03         ` Stefano Stabellini
  2017-02-01 10:28           ` Roger Pau Monné
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-31 22:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, alistair.francis, Shanker Donthineni, xen-devel,
	manish.jaggi, Campbell Sean, Roger Pau Monné

On Tue, 31 Jan 2017, Julien Grall wrote:
> > > > > By default all the PCI devices will be assigned to DOM0. So Xen would
> > > > > have
> > > > > to configure the SMMU and Interrupt Controller to allow DOM0 to use
> > > > > the PCI
> > > > > devices. As mentioned earlier, those subsystems will require the
> > > > > StreamID
> > > > > and DeviceID. Both can be deduced from the RID.
> > > > > 
> > > > > XXX: How to hide PCI devices from DOM0?
> > > > 
> > > > By adding the ACPI namespace of the device to the STAO and blocking Dom0
> > > > access to this device in the emulated bridge that Dom0 will have access
> > > > to
> > > > (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI
> > > > header).
> > > 
> > > Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
> > > driver (similarly to xen-pciback.hide). We still want DOM0 to access the
> > > PCI
> > > config space in order to reset the device. Unless you plan to import all
> > > the
> > > reset quirks in Xen?
> > 
> > I don't have a clear opinion here, and I don't know all thew details of this
> > reset hacks.
> 
> Actually I looked at the Linux code (see __pci_dev_reset in drivers/pci/pci.c)
> and there are less quirks than I expected. The list of quirks can be found in
> pci_dev_reset_methods in drivers/pci/quirks.c.
> 
> There are few way to reset a device (see __pci_dev_reset), they look all based
> on accessing the configuration space. So I guess it should be fine to import
> that in Xen. Any opinions?

I think it is a good idea: we don't want to end up with a motley
solution with bits and pieces scattered across the system. If we give
Xen ownership over PCI, it should be Xen to do device reset. Thus, it
would be OK to import those functions into the hypervisor.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 19:06             ` Edgar E. Iglesias
@ 2017-01-31 22:08               ` Stefano Stabellini
  2017-02-01 19:04               ` Julien Grall
  1 sibling, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-01-31 22:08 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Julien Grall, Steve Capper, alistair.francis,
	Punit Agrawal, xen-devel, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

On Tue, 31 Jan 2017, Edgar E. Iglesias wrote:
> On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > Hi Edgar,
> > 
> > Thank you for the feedbacks.
> 
> Hi Julien,
> 
> > 
> > On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > >On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > >>On 24/01/17 20:07, Stefano Stabellini wrote:
> > >>>On Tue, 24 Jan 2017, Julien Grall wrote:
> > >>For generic host bridge, the initialization is inexistent. However some host
> > >>bridge (e.g xgene, xilinx) may require some specific setup and also
> > >>configuring clocks. Given that Xen only requires to access the configuration
> > >>space, I was thinking to let DOM0 initialization the host bridge. This would
> > >>avoid to import a lot of code in Xen, however this means that we need to
> > >>know when the host bridge has been initialized before accessing the
> > >>configuration space.
> > >
> > >
> > >Yes, that's correct.
> > >There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> > >to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > >enabling clocks and configuring a few registers to enable ECAM and MSI.
> > >
> > >I'm not sure if this could be done prior to starting Xen. Perhaps.
> > >If so, bootloaders would have to know a head of time what devices
> > >the GTs are supposed to be configured for.
> > 
> > I've got further questions regarding the Gigabit Transceivers. You mention
> > they are shared, do you mean that multiple devices can use a GT at the same
> > time? Or the software is deciding at startup which device will use a given
> > GT? If so, how does the software make this decision?
> 
> Software will decide at startup. AFAIK, the allocation is normally done
> once but I guess that in theory you could design boards that could switch
> at runtime. I'm not sure we need to worry about that use-case though.
> 
> The details can be found here:
> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> 
> I suggest looking at pages 672 and 733.
> 
> 
> 
> > 
> > >>	- For all other host bridges => I don't know if there are host bridges
> > >>falling under this category. I also don't have any idea how to handle this.
> > >>
> > >>>
> > >>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> > >>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> > >>>config reads and writes supposed to work in details?
> > >>
> > >>I think I have answered to this question with my explanation above. Let me
> > >>know if it is not the case.
> > >>
> > >>> How is MSI configuration supposed to work?
> > >>
> > >>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > >>per-device) and the address of the doorbell. The linkage between the LPI and
> > >>"MSI" will be done through the ITS.
> > >>
> > >>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > >>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> > >>1:1.
> > >>
> > >>So in both case, I don't think it is necessary to trap MSI configuration for
> > >>DOM0. This may not be true if we want to handle other MSI controller.
> > >>
> > >>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> > >>and xgene MSI controller ([5]). But I have no idea how they work and if we
> > >>need to support them. Maybe Edgar could share details on the Xilinx one?
> > >
> > >
> > >The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> > >way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> > >So perhaps trapped config accesses from domUs can help by adding this protection
> > >as drivers configure the device.
> > >
> > >On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > >out the EventID from a FIFO in the controller and injects a new IRQ into
> > >the kernel.
> > 
> > It might be early to ask, but how do you expect  MSI to work with DOMU on
> > your hardware? Does your MSI controller supports virtualization? Or are you
> > looking for a different way to inject MSI?
> 
> MSI support in HW is quite limited to support domU and will require SW hacks :-(
> 
> Anyway, something along the lines of this might work:
> 
> * Trap domU CPU writes to MSI descriptors in config space.
>   Force real MSI descriptors to the address of the door bell area.
>   Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
>   Remember what EventID domU requested per device and descriptor.
> 
> * Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
>   At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
>   Xen or Dom0 inject the expected EventID into domU
> 
> Do you have any good ideas? :-)

That's pretty much the same workflow as for Xen on x86. It's doable, and
we already have a lot of code to implement it, although it is scattered
across Xen, Dom0, and QEMU, that is a pain. It's one of the reasons I am
insisting on having only one component owning PCI.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 22:03         ` Stefano Stabellini
@ 2017-02-01 10:28           ` Roger Pau Monné
  2017-02-01 18:45             ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-02-01 10:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Punit Agrawal, Wei Chen, Steve Capper, Jiandi An, Julien Grall,
	alistair.francis, Shanker Donthineni, xen-devel, manish.jaggi,
	Campbell Sean

On Tue, Jan 31, 2017 at 02:03:16PM -0800, Stefano Stabellini wrote:
> On Tue, 31 Jan 2017, Julien Grall wrote:
> > > > > > By default all the PCI devices will be assigned to DOM0. So Xen would
> > > > > > have
> > > > > > to configure the SMMU and Interrupt Controller to allow DOM0 to use
> > > > > > the PCI
> > > > > > devices. As mentioned earlier, those subsystems will require the
> > > > > > StreamID
> > > > > > and DeviceID. Both can be deduced from the RID.
> > > > > > 
> > > > > > XXX: How to hide PCI devices from DOM0?
> > > > > 
> > > > > By adding the ACPI namespace of the device to the STAO and blocking Dom0
> > > > > access to this device in the emulated bridge that Dom0 will have access
> > > > > to
> > > > > (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI
> > > > > header).
> > > > 
> > > > Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
> > > > driver (similarly to xen-pciback.hide). We still want DOM0 to access the
> > > > PCI
> > > > config space in order to reset the device. Unless you plan to import all
> > > > the
> > > > reset quirks in Xen?
> > > 
> > > I don't have a clear opinion here, and I don't know all thew details of this
> > > reset hacks.
> > 
> > Actually I looked at the Linux code (see __pci_dev_reset in drivers/pci/pci.c)
> > and there are less quirks than I expected. The list of quirks can be found in
> > pci_dev_reset_methods in drivers/pci/quirks.c.
> > 
> > There are few way to reset a device (see __pci_dev_reset), they look all based
> > on accessing the configuration space. So I guess it should be fine to import
> > that in Xen. Any opinions?
> 
> I think it is a good idea: we don't want to end up with a motley
> solution with bits and pieces scattered across the system. If we give
> Xen ownership over PCI, it should be Xen to do device reset. Thus, it
> would be OK to import those functions into the hypervisor.

+1. Then AFAICT PCI-passthrough would be completely handled by Xen, without
needing any Dom0 kernel interaction? (apart from the toolstack, of course)

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-25 18:53       ` Julien Grall
  2017-01-31 16:53         ` Edgar E. Iglesias
  2017-01-31 21:58         ` Stefano Stabellini
@ 2017-02-01 10:55         ` Roger Pau Monné
  2017-02-01 18:50           ` Stefano Stabellini
  2017-02-02 12:38           ` Julien Grall
  2 siblings, 2 replies; 82+ messages in thread
From: Roger Pau Monné @ 2017-02-01 10:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Shanker Donthineni,
	xen-devel, manish.jaggi, Campbell Sean

On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/01/17 20:07, Stefano Stabellini wrote:
> > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > whilst for Device Tree the segment number is not available.
> > > > > 
> > > > > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > > > > with all the relevant informations. This will be done via a new hypercall
> > > > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > > > 
> > > > I understand that the main purpose of this hypercall is to get Xen and Dom0
> > > > to
> > > > agree on the segment numbers, but why is it necessary? If Dom0 has an
> > > > emulated contoller like any other guest, do we care what segment numbers
> > > > Dom0 will use?
> > > 
> > > I was not planning to have a emulated controller for DOM0. The physical one is
> > > not necessarily ECAM compliant so we would have to either emulate the physical
> > > one (meaning multiple different emulation) or an ECAM compliant.
> > > 
> > > The latter is not possible because you don't know if there is enough free MMIO
> > > space for the emulation.
> > > 
> > > In the case on ARM, I don't see much the point to emulate the host bridge for
> > > DOM0. The only thing we need in Xen is to access the configuration space, we
> > > don't have about driving the host bridge. So I would let DOM0 dealing with
> > > that.
> > > 
> > > Also, I don't see any reason for ARM to trap DOM0 configuration space access.
> > > The MSI will be configured using the interrupt controller and it is a trusted
> > > Domain.
> > 
> > These last you sentences raise a lot of questions. Maybe I am missing
> > something. You might want to clarify the strategy for Dom0 and DomUs,
> > and how they differ, in the next version of the doc.
> > 
> > At some point you wrote "Instantiation of a specific driver for the host
> > controller can be easily done if Xen has the information to detect it.
> > However, those drivers may require resources described in ASL." Does it
> > mean you plan to drive the physical host bridge from Xen and Dom0
> > simultaneously?
> 
> I may miss some bits, so feel free to correct me if I am wrong.
> 
> My understanding is host bridge can be divided in 2 parts:
> 	- Initialization of the host bridge
> 	- Access the configuration space
> 
> For generic host bridge, the initialization is inexistent. However some host
> bridge (e.g xgene, xilinx) may require some specific setup and also
> configuring clocks. Given that Xen only requires to access the configuration
> space, I was thinking to let DOM0 initialization the host bridge. This would
> avoid to import a lot of code in Xen, however this means that we need to
> know when the host bridge has been initialized before accessing the
> configuration space.

Can the bridge be initialized without Dom0 having access to the ECAM area? If
that's possible I would do:

1. Dom0 initializes the bridge (whatever that involves).
2. Dom0 calls PHYSDEVOP_pci_mmcfg_reserved to register the bridge with Xen:
 2.1 Xen scans the bridge and detects the devices.
 2.2 Xen maps the ECAM area into Dom0 stage-2 p2m.
3. Dom0 scans the bridge &c (whatever is done on native).

> Now regarding the configuration space, I think we can divide in 2 category:
> 	- indirect access, the configuration space are multiplexed. An example
> would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar method is
> used for x-gene PCI driver ([1]).
> 	- ECAM like access, where each PCI configuration space will have it is own
> address space. I said "ECAM like" because some host bridge will require some
> bits fiddling when accessing register (see thunder-ecam [2])
> 
> There are also host bridges that mix both indirect access and ECAM like
> access depending on the device configuration space accessed (see thunder-pem
> [3]).

Hay! Sounds like fun...

> When using ECAM like host bridge, I don't think it will be an issue to have
> both DOM0 and Xen accessing configuration space at the same time. Although,
> we need to define who is doing what. In general case, DOM0 should not
> touched an assigned PCI device. The only possible interaction would be
> resetting a device (see my answer below).

Iff Xen is really going to perform the reset of passthrough devices, then I
don't see any reason to expose those devices to Dom0 at all, IMHO you should
hide them from ACPI and ideally prevent Dom0 from interacting with them using
the PCI configuration space (although that would require trapping on accesses
to the PCI config space, which AFAIK you would like to avoid).

> When using indirect access, we cannot let DOM0 and Xen accessing any PCI
> configuration space at the same time. So I think we would have to emulate
> the physical host controller.
> 
> Unless we have a big requirement to trap DOM0 access to the configuration
> space, I would only keep the emulation to the strict minimum (e.g for
> indirect access) to avoid ending-up handling all the quirks for ECAM like
> host bridge.
> 
> If we need to trap the configuration space, I would suggest the following
> for ECAM like host bridge:
> 	- For physical host bridge that does not require initialization and is
> nearly ECAM compatible (e.g require register fiddling) => replace by a
> generic host bridge emulation for DOM0
> 	- For physical host bridge that require initialization but is ECAM
> compatible (e.g AFAICT xilinx [4]) => trap the ECAM access but let DOM0
> handling the host bridge initialization
> 	- For all other host bridges => I don't know if there are host bridges
> falling under this category. I also don't have any idea how to handle this.

Without knowing much about this it's hard for me to have an opinion. IMHO you
should prevent Dom0 from accessing the configuration space of devices it's not
supposed to manage (eg: passthrough), but since Dom0 is trusted I guess you can
find some other way to tell Dom0 to avoid poking at those devices, and avoid
trapping accesses to the configuration space.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 10:28           ` Roger Pau Monné
@ 2017-02-01 18:45             ` Stefano Stabellini
  0 siblings, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-01 18:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Jiandi An,
	Punit Agrawal, Julien Grall, alistair.francis,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2218 bytes --]

On Wed, 1 Feb 2017, Roger Pau Monné wrote:
> On Tue, Jan 31, 2017 at 02:03:16PM -0800, Stefano Stabellini wrote:
> > On Tue, 31 Jan 2017, Julien Grall wrote:
> > > > > > > By default all the PCI devices will be assigned to DOM0. So Xen would
> > > > > > > have
> > > > > > > to configure the SMMU and Interrupt Controller to allow DOM0 to use
> > > > > > > the PCI
> > > > > > > devices. As mentioned earlier, those subsystems will require the
> > > > > > > StreamID
> > > > > > > and DeviceID. Both can be deduced from the RID.
> > > > > > > 
> > > > > > > XXX: How to hide PCI devices from DOM0?
> > > > > > 
> > > > > > By adding the ACPI namespace of the device to the STAO and blocking Dom0
> > > > > > access to this device in the emulated bridge that Dom0 will have access
> > > > > > to
> > > > > > (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI
> > > > > > header).
> > > > > 
> > > > > Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a
> > > > > driver (similarly to xen-pciback.hide). We still want DOM0 to access the
> > > > > PCI
> > > > > config space in order to reset the device. Unless you plan to import all
> > > > > the
> > > > > reset quirks in Xen?
> > > > 
> > > > I don't have a clear opinion here, and I don't know all thew details of this
> > > > reset hacks.
> > > 
> > > Actually I looked at the Linux code (see __pci_dev_reset in drivers/pci/pci.c)
> > > and there are less quirks than I expected. The list of quirks can be found in
> > > pci_dev_reset_methods in drivers/pci/quirks.c.
> > > 
> > > There are few way to reset a device (see __pci_dev_reset), they look all based
> > > on accessing the configuration space. So I guess it should be fine to import
> > > that in Xen. Any opinions?
> > 
> > I think it is a good idea: we don't want to end up with a motley
> > solution with bits and pieces scattered across the system. If we give
> > Xen ownership over PCI, it should be Xen to do device reset. Thus, it
> > would be OK to import those functions into the hypervisor.
> 
> +1. Then AFAICT PCI-passthrough would be completely handled by Xen, without
> needing any Dom0 kernel interaction? (apart from the toolstack, of course)

indeed

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 10:55         ` Roger Pau Monné
@ 2017-02-01 18:50           ` Stefano Stabellini
  2017-02-10  9:48             ` Roger Pau Monné
  2017-02-02 12:38           ` Julien Grall
  1 sibling, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-01 18:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Punit Agrawal, Steve Capper,
	Andrew Cooper, Jiandi An, Julien Grall, alistair.francis,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5251 bytes --]

On Wed, 1 Feb 2017, Roger Pau Monné wrote:
> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > Hi Stefano,
> > 
> > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > > whilst for Device Tree the segment number is not available.
> > > > > > 
> > > > > > So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> > > > > > with all the relevant informations. This will be done via a new hypercall
> > > > > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> > > > > 
> > > > > I understand that the main purpose of this hypercall is to get Xen and Dom0
> > > > > to
> > > > > agree on the segment numbers, but why is it necessary? If Dom0 has an
> > > > > emulated contoller like any other guest, do we care what segment numbers
> > > > > Dom0 will use?
> > > > 
> > > > I was not planning to have a emulated controller for DOM0. The physical one is
> > > > not necessarily ECAM compliant so we would have to either emulate the physical
> > > > one (meaning multiple different emulation) or an ECAM compliant.
> > > > 
> > > > The latter is not possible because you don't know if there is enough free MMIO
> > > > space for the emulation.
> > > > 
> > > > In the case on ARM, I don't see much the point to emulate the host bridge for
> > > > DOM0. The only thing we need in Xen is to access the configuration space, we
> > > > don't have about driving the host bridge. So I would let DOM0 dealing with
> > > > that.
> > > > 
> > > > Also, I don't see any reason for ARM to trap DOM0 configuration space access.
> > > > The MSI will be configured using the interrupt controller and it is a trusted
> > > > Domain.
> > > 
> > > These last you sentences raise a lot of questions. Maybe I am missing
> > > something. You might want to clarify the strategy for Dom0 and DomUs,
> > > and how they differ, in the next version of the doc.
> > > 
> > > At some point you wrote "Instantiation of a specific driver for the host
> > > controller can be easily done if Xen has the information to detect it.
> > > However, those drivers may require resources described in ASL." Does it
> > > mean you plan to drive the physical host bridge from Xen and Dom0
> > > simultaneously?
> > 
> > I may miss some bits, so feel free to correct me if I am wrong.
> > 
> > My understanding is host bridge can be divided in 2 parts:
> > 	- Initialization of the host bridge
> > 	- Access the configuration space
> > 
> > For generic host bridge, the initialization is inexistent. However some host
> > bridge (e.g xgene, xilinx) may require some specific setup and also
> > configuring clocks. Given that Xen only requires to access the configuration
> > space, I was thinking to let DOM0 initialization the host bridge. This would
> > avoid to import a lot of code in Xen, however this means that we need to
> > know when the host bridge has been initialized before accessing the
> > configuration space.
> 
> Can the bridge be initialized without Dom0 having access to the ECAM area? If
> that's possible I would do:
> 
> 1. Dom0 initializes the bridge (whatever that involves).
> 2. Dom0 calls PHYSDEVOP_pci_mmcfg_reserved to register the bridge with Xen:
>  2.1 Xen scans the bridge and detects the devices.
>  2.2 Xen maps the ECAM area into Dom0 stage-2 p2m.
> 3. Dom0 scans the bridge &c (whatever is done on native).

This doesn't seem too bad. We could live with it. But I would still
consider doing the bridge initialization in Xen if it is not too
inconvenient. It would be great to have dom0 be just like any other
domain in regards to PCI.


> > Now regarding the configuration space, I think we can divide in 2 category:
> > 	- indirect access, the configuration space are multiplexed. An example
> > would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar method is
> > used for x-gene PCI driver ([1]).
> > 	- ECAM like access, where each PCI configuration space will have it is own
> > address space. I said "ECAM like" because some host bridge will require some
> > bits fiddling when accessing register (see thunder-ecam [2])
> > 
> > There are also host bridges that mix both indirect access and ECAM like
> > access depending on the device configuration space accessed (see thunder-pem
> > [3]).
> 
> Hay! Sounds like fun...
> 
> > When using ECAM like host bridge, I don't think it will be an issue to have
> > both DOM0 and Xen accessing configuration space at the same time. Although,
> > we need to define who is doing what. In general case, DOM0 should not
> > touched an assigned PCI device. The only possible interaction would be
> > resetting a device (see my answer below).
> 
> Iff Xen is really going to perform the reset of passthrough devices, then I
> don't see any reason to expose those devices to Dom0 at all, IMHO you should
> hide them from ACPI and ideally prevent Dom0 from interacting with them using
> the PCI configuration space (although that would require trapping on accesses
> to the PCI config space, which AFAIK you would like to avoid).

Right! A much cleaner solution! If we are going to have Xen handle ECAM
and emulating PCI host bridges, then we should go all the way and have
Xen do everything about PCI.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 19:06             ` Edgar E. Iglesias
  2017-01-31 22:08               ` Stefano Stabellini
@ 2017-02-01 19:04               ` Julien Grall
  2017-02-01 19:31                 ` Stefano Stabellini
                                   ` (2 more replies)
  1 sibling, 3 replies; 82+ messages in thread
From: Julien Grall @ 2017-02-01 19:04 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Edgar,

On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
>> On 31/01/17 16:53, Edgar E. Iglesias wrote:
>>> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>>>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>>>> For generic host bridge, the initialization is inexistent. However some host
>>>> bridge (e.g xgene, xilinx) may require some specific setup and also
>>>> configuring clocks. Given that Xen only requires to access the configuration
>>>> space, I was thinking to let DOM0 initialization the host bridge. This would
>>>> avoid to import a lot of code in Xen, however this means that we need to
>>>> know when the host bridge has been initialized before accessing the
>>>> configuration space.
>>>
>>>
>>> Yes, that's correct.
>>> There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
>>> to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
>>> enabling clocks and configuring a few registers to enable ECAM and MSI.
>>>
>>> I'm not sure if this could be done prior to starting Xen. Perhaps.
>>> If so, bootloaders would have to know a head of time what devices
>>> the GTs are supposed to be configured for.
>>
>> I've got further questions regarding the Gigabit Transceivers. You mention
>> they are shared, do you mean that multiple devices can use a GT at the same
>> time? Or the software is deciding at startup which device will use a given
>> GT? If so, how does the software make this decision?
>
> Software will decide at startup. AFAIK, the allocation is normally done
> once but I guess that in theory you could design boards that could switch
> at runtime. I'm not sure we need to worry about that use-case though.
>
> The details can be found here:
> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>
> I suggest looking at pages 672 and 733.

Thank you for the documentation. I am trying to understand if we could 
move initialization in Xen as suggested by Stefano. I looked at the 
driver in Linux and the code looks simple not many dependencies. 
However, I was not able to find where the Gigabit Transceivers are 
configured. Do you have any link to the code for that?

This would also mean that the MSI interrupt controller will be moved in 
Xen. Which I think is a more sensible design (see more below).

>>
>>>> 	- For all other host bridges => I don't know if there are host bridges
>>>> falling under this category. I also don't have any idea how to handle this.
>>>>
>>>>>
>>>>> Otherwise, if Dom0 is the only one to drive the physical host bridge,
>>>>> and Xen is the one to provide the emulated host bridge, how are DomU PCI
>>>>> config reads and writes supposed to work in details?
>>>>
>>>> I think I have answered to this question with my explanation above. Let me
>>>> know if it is not the case.
>>>>
>>>>> How is MSI configuration supposed to work?
>>>>
>>>> For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
>>>> per-device) and the address of the doorbell. The linkage between the LPI and
>>>> "MSI" will be done through the ITS.
>>>>
>>>> For GICv2m, the MSI will be configured with an SPIs (or offset on some
>>>> GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
>>>> 1:1.
>>>>
>>>> So in both case, I don't think it is necessary to trap MSI configuration for
>>>> DOM0. This may not be true if we want to handle other MSI controller.
>>>>
>>>> I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
>>>> and xgene MSI controller ([5]). But I have no idea how they work and if we
>>>> need to support them. Maybe Edgar could share details on the Xilinx one?
>>>
>>>
>>> The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
>>> way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
>>> So perhaps trapped config accesses from domUs can help by adding this protection
>>> as drivers configure the device.
>>>
>>> On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
>>> out the EventID from a FIFO in the controller and injects a new IRQ into
>>> the kernel.
>>
>> It might be early to ask, but how do you expect  MSI to work with DOMU on
>> your hardware? Does your MSI controller supports virtualization? Or are you
>> looking for a different way to inject MSI?
>
> MSI support in HW is quite limited to support domU and will require SW hacks :-(
>
> Anyway, something along the lines of this might work:
>
> * Trap domU CPU writes to MSI descriptors in config space.
>   Force real MSI descriptors to the address of the door bell area.
>   Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
>   Remember what EventID domU requested per device and descriptor.
>
> * Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
>   At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
>   Xen or Dom0 inject the expected EventID into domU
>
> Do you have any good ideas? :-)

 From my understanding your MSI controller is embedded in the 
hostbridge, right? If so, the MSIs would need to be handled where the 
host bridge will be initialized (e.g either Xen or DOM0).

 From a design point of view, it would make more sense to have the MSI 
controller driver in Xen as the hostbridge emulation for guest will also 
live there.

So if we receive MSI in Xen, we need to figure out a way for DOM0 and 
guest to receive MSI. The same way would be the best, and I guess non-PV 
if possible. I know you are looking to boot unmodified OS in a VM. This 
would mean we need to emulate the MSI controller and potentially xilinx 
PCI controller. How much are you willing to modify the OS?

Regarding the MSI doorbell, I have seen it is configured by the software 
using a physical address of a page allocated in the RAM. When the PCI 
devices is writing into the doorbell does the access go through the SMMU?

Regardless the answer, I think we would need to map the MSI doorbell 
page in the guest. Meaning that even if we trap MSI configuration 
access, a guess could DMA in the page. So if I am not mistaken, MSI 
would be insecure in this case :/.

Or maybe we could avoid mapping the doorbell in the guest and let Xen 
receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize 
the value and write into the real MSI doorbell. Not sure if it would 
works thought.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 19:04               ` Julien Grall
@ 2017-02-01 19:31                 ` Stefano Stabellini
  2017-02-01 20:24                   ` Julien Grall
  2017-02-02 15:33                 ` Edgar E. Iglesias
  2017-02-02 15:40                 ` Roger Pau Monné
  2 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-01 19:31 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, alistair.francis,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean,
	Roger Pau Monné

On Wed, 1 Feb 2017, Julien Grall wrote:
> Hi Edgar,
> 
> On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> > On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > > On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > For generic host bridge, the initialization is inexistent. However
> > > > > some host
> > > > > bridge (e.g xgene, xilinx) may require some specific setup and also
> > > > > configuring clocks. Given that Xen only requires to access the
> > > > > configuration
> > > > > space, I was thinking to let DOM0 initialization the host bridge. This
> > > > > would
> > > > > avoid to import a lot of code in Xen, however this means that we need
> > > > > to
> > > > > know when the host bridge has been initialized before accessing the
> > > > > configuration space.
> > > > 
> > > > 
> > > > Yes, that's correct.
> > > > There's a sequence on the ZynqMP that involves assiging Gigabit
> > > > Transceivers
> > > > to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > > > enabling clocks and configuring a few registers to enable ECAM and MSI.
> > > > 
> > > > I'm not sure if this could be done prior to starting Xen. Perhaps.
> > > > If so, bootloaders would have to know a head of time what devices
> > > > the GTs are supposed to be configured for.
> > > 
> > > I've got further questions regarding the Gigabit Transceivers. You mention
> > > they are shared, do you mean that multiple devices can use a GT at the
> > > same
> > > time? Or the software is deciding at startup which device will use a given
> > > GT? If so, how does the software make this decision?
> > 
> > Software will decide at startup. AFAIK, the allocation is normally done
> > once but I guess that in theory you could design boards that could switch
> > at runtime. I'm not sure we need to worry about that use-case though.
> > 
> > The details can be found here:
> > https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > 
> > I suggest looking at pages 672 and 733.
> 
> Thank you for the documentation. I am trying to understand if we could move
> initialization in Xen as suggested by Stefano. I looked at the driver in Linux
> and the code looks simple not many dependencies. However, I was not able to
> find where the Gigabit Transceivers are configured. Do you have any link to
> the code for that?
> 
> This would also mean that the MSI interrupt controller will be moved in Xen.
> Which I think is a more sensible design (see more below).
> 
> > > 
> > > > > 	- For all other host bridges => I don't know if there are host
> > > > > bridges
> > > > > falling under this category. I also don't have any idea how to handle
> > > > > this.
> > > > > 
> > > > > > 
> > > > > > Otherwise, if Dom0 is the only one to drive the physical host
> > > > > > bridge,
> > > > > > and Xen is the one to provide the emulated host bridge, how are DomU
> > > > > > PCI
> > > > > > config reads and writes supposed to work in details?
> > > > > 
> > > > > I think I have answered to this question with my explanation above.
> > > > > Let me
> > > > > know if it is not the case.
> > > > > 
> > > > > > How is MSI configuration supposed to work?
> > > > > 
> > > > > For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > > > > per-device) and the address of the doorbell. The linkage between the
> > > > > LPI and
> > > > > "MSI" will be done through the ITS.
> > > > > 
> > > > > For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > > > > GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are
> > > > > mapped
> > > > > 1:1.
> > > > > 
> > > > > So in both case, I don't think it is necessary to trap MSI
> > > > > configuration for
> > > > > DOM0. This may not be true if we want to handle other MSI controller.
> > > > > 
> > > > > I have in mind the xilinx MSI controller (embedded in the host bridge?
> > > > > [4])
> > > > > and xgene MSI controller ([5]). But I have no idea how they work and
> > > > > if we
> > > > > need to support them. Maybe Edgar could share details on the Xilinx
> > > > > one?
> > > > 
> > > > 
> > > > The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK,
> > > > there's no
> > > > way to protect the MSI doorbells from mal-configured end-points raising
> > > > malicious EventIDs.
> > > > So perhaps trapped config accesses from domUs can help by adding this
> > > > protection
> > > > as drivers configure the device.
> > > > 
> > > > On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > > > out the EventID from a FIFO in the controller and injects a new IRQ into
> > > > the kernel.
> > > 
> > > It might be early to ask, but how do you expect  MSI to work with DOMU on
> > > your hardware? Does your MSI controller supports virtualization? Or are
> > > you
> > > looking for a different way to inject MSI?
> > 
> > MSI support in HW is quite limited to support domU and will require SW hacks
> > :-(
> > 
> > Anyway, something along the lines of this might work:
> > 
> > * Trap domU CPU writes to MSI descriptors in config space.
> >   Force real MSI descriptors to the address of the door bell area.
> >   Force real MSI descriptors to use a specific device unique Event ID
> > allocated by Xen.
> >   Remember what EventID domU requested per device and descriptor.
> > 
> > * Xen or Dom0 take the real SPI generated when device writes into the
> > doorbell area.
> >   At this point, we can read out the EventID from the MSI FIFO and map it to
> > the one requested from domU.
> >   Xen or Dom0 inject the expected EventID into domU
> > 
> > Do you have any good ideas? :-)
> 
> From my understanding your MSI controller is embedded in the hostbridge,
> right? If so, the MSIs would need to be handled where the host bridge will be
> initialized (e.g either Xen or DOM0).
> 
> From a design point of view, it would make more sense to have the MSI
> controller driver in Xen as the hostbridge emulation for guest will also live
> there.
> 
> So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest to
> receive MSI. The same way would be the best, and I guess non-PV if possible. I
> know you are looking to boot unmodified OS in a VM. This would mean we need to
> emulate the MSI controller and potentially xilinx PCI controller. How much are
> you willing to modify the OS?
> 
> Regarding the MSI doorbell, I have seen it is configured by the software using
> a physical address of a page allocated in the RAM. When the PCI devices is
> writing into the doorbell does the access go through the SMMU?
> 
> Regardless the answer, I think we would need to map the MSI doorbell page in
> the guest.

Why? We should be able to handle the case by trapping and emulating PCI
config accesses. Xen can force the real MSI descriptors to use whatever
Xen wants them to use. With an SMMU, we need to find a way to map the
MSI doorbell in the SMMU pagetable to allow the device to write to it.
Without SMMU, it's unneeded.


> Meaning that even if we trap MSI configuration access, a guess
> could DMA in the page. So if I am not mistaken, MSI would be insecure in this
> case :/.

That's right: if a device capable of DMA to an arbitrary address in
memory is assigned to the guest, the guest can write to the MSI doorbell
if an SMMU is present, otherwise, the guest can write to any address in
memory without SMMU. Completely insecure.

It is the same security compromised offered by PV PCI passthrough today
with no VT-D on the platform. I think it's still usable in some cases,
but we need to be very clear about its security properties.


> Or maybe we could avoid mapping the doorbell in the guest and let Xen receive
> an SMMU abort. When receiving the SMMU abort, Xen could sanitize the value and
> write into the real MSI doorbell. Not sure if it would works thought.

I thought that SMMU aborts are too slow for this?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-01-31 21:58         ` Stefano Stabellini
@ 2017-02-01 20:12           ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-02-01 20:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Pooya.Keshavarzi, Dirk Behme, Wei Chen, Campbell Sean,
	Andrew Cooper, Jiandi An, Punit Agrawal, Iurii Mykhalskyi,
	alistair.francis, xen-devel, Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Stefano,

On 31/01/2017 21:58, Stefano Stabellini wrote:
> On Wed, 25 Jan 2017, Julien Grall wrote:
>>>>>> whilst for Device Tree the segment number is not available.
>>>>>>
>>>>>> So Xen needs to rely on DOM0 to discover the host bridges and notify
>>>>>> Xen
>>>>>> with all the relevant informations. This will be done via a new
>>>>>> hypercall
>>>>>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>>>>>
>>>>> I understand that the main purpose of this hypercall is to get Xen and
>>>>> Dom0
>>>>> to
>>>>> agree on the segment numbers, but why is it necessary? If Dom0 has an
>>>>> emulated contoller like any other guest, do we care what segment numbers
>>>>> Dom0 will use?
>>>>
>>>> I was not planning to have a emulated controller for DOM0. The physical
>>>> one is
>>>> not necessarily ECAM compliant so we would have to either emulate the
>>>> physical
>>>> one (meaning multiple different emulation) or an ECAM compliant.
>>>>
>>>> The latter is not possible because you don't know if there is enough free
>>>> MMIO
>>>> space for the emulation.
>>>>
>>>> In the case on ARM, I don't see much the point to emulate the host bridge
>>>> for
>>>> DOM0. The only thing we need in Xen is to access the configuration space,
>>>> we
>>>> don't have about driving the host bridge. So I would let DOM0 dealing with
>>>> that.
>>>>
>>>> Also, I don't see any reason for ARM to trap DOM0 configuration space
>>>> access.
>>>> The MSI will be configured using the interrupt controller and it is a
>>>> trusted
>>>> Domain.
>>>
>>> These last you sentences raise a lot of questions. Maybe I am missing
>>> something. You might want to clarify the strategy for Dom0 and DomUs,
>>> and how they differ, in the next version of the doc.
>>>
>>> At some point you wrote "Instantiation of a specific driver for the host
>>> controller can be easily done if Xen has the information to detect it.
>>> However, those drivers may require resources described in ASL." Does it
>>> mean you plan to drive the physical host bridge from Xen and Dom0
>>> simultaneously?
>>
>> I may miss some bits, so feel free to correct me if I am wrong.
>>
>> My understanding is host bridge can be divided in 2 parts:
>> 	- Initialization of the host bridge
>> 	- Access the configuration space
>>
>> For generic host bridge, the initialization is inexistent. However some host
>> bridge (e.g xgene, xilinx) may require some specific setup and also
>> configuring clocks. Given that Xen only requires to access the configuration
>> space, I was thinking to let DOM0 initialization the host bridge. This would
>> avoid to import a lot of code in Xen, however this means that we need to know
>> when the host bridge has been initialized before accessing the configuration
>> space.
>
> I prefer to avoid a split-mind approach, where some PCI things are
> initialized/owned by one component and some others are initialized/owned
> by another component. It creates complexity. Of course, we have to face
> the reality that the alternatives might be worse, but let's take a look
> at the other options first.
>
> How hard would it be to bring the PCI host bridge initialization in Xen,
> for example in the case of the Xilinx ZynqMP? Traditionally, PCI host
> bridges have not required any initialization on x86. PCI is still new to
> the ARM ecosystems. I think it is reasonable to expect that going
> forward, as the ARM ecosystem matures, PCI host bridges will require
> little to no initialization on ARM too.

I would agree for servers but I am less sure for embedded systems. You 
may want to save address space or even power potentially requiring a 
custom hostbridge. I hope I am wrong here.

I think the xilinx host bridge is the simplest case. I am trying to 
understand better in a separate e-mail (see 
<a1120a60-b859-c7ff-9d4a-553c330669f1@linaro.org>).

There are more complex hostbridge such as x-gene [1] and R-Car [2].
If we take the example of the renesas salvator board been used on 
automotive (Globallogic and Bosh are working on support for Xen [3]), it 
contains an R-Car PCI root complex, below a part of the DTS

/* External PCIe clock - can be overridden by the board */
pcie_bus_clk: pcie_bus {
                 compatible = "fixed-clock";
                 #clock-cells = <0>;
                 clock-frequency = <0>;
};

pciec0: pcie@fe000000 {
	compatible = "renesas,pcie-r8a7795";
         reg = <0 0xfe000000 0 0x80000>;
         #address-cells = <3>;
         #size-cells = <2>;
         bus-range = <0x00 0xff>;
         device_type = "pci";
         ranges = <0x01000000 0 0x00000000 0 0xfe100000 0 0x00100000
                   0x02000000 0 0xfe200000 0 0xfe200000 0 0x00200000
                   0x02000000 0 0x30000000 0 0x30000000 0 0x08000000
                   0x42000000 0 0x38000000 0 0x38000000 0 0x08000000>;
                   /* Map all possible DDR as inbound ranges */
                   dma-ranges = <0x42000000 0 0x40000000 0 0x40000000 0 
0x40000000>;
                   interrupts = <GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>,
                                <GIC_SPI 117 IRQ_TYPE_LEVEL_HIGH>,
                                <GIC_SPI 118 IRQ_TYPE_LEVEL_HIGH>;
                   #interrupt-cells = <1>;
                   interrupt-map-mask = <0 0 0 0>;
                   interrupt-map = <0 0 0 0 &gic GIC_SPI 116 
IRQ_TYPE_LEVEL_HIGH>;
                   clocks = <&cpg CPG_MOD 319>, <&pcie_bus_clk>;
                   clock-names = "pcie", "pcie_bus";
                   power-domains = <&sysc R8A7795_PD_ALWAYS_ON>;
                   status = "disabled";
};

The PCI controller depends on 2 clocks, one of which requires a specific 
driver. It also contains a power domain, which I guess will require some 
configuration and would need to be shared with Linux.

Furthermore, the R-Car driver has a specific way to access the 
configuration space (see rcar_pcie_config_access). It is actually the 
first root complex I found falling under "For all host bridges" into the 
  actually find a root complex falling under the category "For all other 
host bridges" on my previous mail.

Lastly, the MSI controller is integrated in the root complex here too.

So I think the R-car root complex is the kind of hardware that would 
require merge half of Linux in Xen and potentially emulate some part of 
the hardware (such as the clock) for DOM0.

I don't have any good idea here which does not involve DOM0. I would be 
happy to know what other people thinks.

Note that I don't think we can possibly say we don't support PCI 
passthrough.

>> Now regarding the configuration space, I think we can divide in 2 category:
>> 	- indirect access, the configuration space are multiplexed. An example
>> would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar method is
>> used for x-gene PCI driver ([1]).
>> 	- ECAM like access, where each PCI configuration space will have it is
>> own address space. I said "ECAM like" because some host bridge will require
>> some bits fiddling when accessing register (see thunder-ecam [2])
>>
>> There are also host bridges that mix both indirect access and ECAM like access
>> depending on the device configuration space accessed (see thunder-pem [3]).
>>
>> When using ECAM like host bridge, I don't think it will be an issue to have
>> both DOM0 and Xen accessing configuration space at the same time. Although, we
>> need to define who is doing what. In general case, DOM0 should not touched an
>> assigned PCI device. The only possible interaction would be resetting a device
>> (see my answer below).
>
> Even if the hardware allows it, I think it is a bad idea to access the
> same hardware component from two different entities simultaneously.
>
> I suggest we trap Dom0 reads/writes to ECAM, and execute them in Xen,
> which I think it's what x86 does today.

FWIW, Roger confirmed me IRL. So will update the design document to 
specific that DOM0 access are trapped, even if we may not take advantage 
of it today.

>
>
>> When using indirect access, we cannot let DOM0 and Xen accessing any PCI
>> configuration space at the same time. So I think we would have to emulate the
>> physical host controller.
>>
>> Unless we have a big requirement to trap DOM0 access to the configuration
>> space, I would only keep the emulation to the strict minimum (e.g for indirect
>> access) to avoid ending-up handling all the quirks for ECAM like host bridge.
>>
>> If we need to trap the configuration space, I would suggest the following for
>> ECAM like host bridge:
>> 	- For physical host bridge that does not require initialization and is
>> nearly ECAM compatible (e.g require register fiddling) => replace by a generic
>> host bridge emulation for DOM0
>
> Sounds good.
>
>
>> 	- For physical host bridge that require initialization but is ECAM
>> compatible (e.g AFAICT xilinx [4]) => trap the ECAM access but let DOM0
>> handling the host bridge initialization
>
> I would consider doing the initialization in Xen. It would simplify the
> architecture significantly.

See above an example where it does not fit.

>> 	- For all other host bridges => I don't know if there are host bridges
>> falling under this category. I also don't have any idea how to handle this.

Cheers,

[1] linux/drivers/pci/host/pci-xgene.c
[2] linux/drivers/pci/host/pcie-rcar.c
[3] https://lists.xen.org/archives/html/xen-devel/2016-11/msg00594.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 19:31                 ` Stefano Stabellini
@ 2017-02-01 20:24                   ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-02-01 20:24 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar E. Iglesias, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

Hi Stefano,

On 01/02/2017 19:31, Stefano Stabellini wrote:
> On Wed, 1 Feb 2017, Julien Grall wrote:
>> On 31/01/2017 19:06, Edgar E. Iglesias wrote:
>>> On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
>>>> On 31/01/17 16:53, Edgar E. Iglesias wrote:
>>>>> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>>>>>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>>>>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>> From my understanding your MSI controller is embedded in the hostbridge,
>> right? If so, the MSIs would need to be handled where the host bridge will be
>> initialized (e.g either Xen or DOM0).
>>
>> From a design point of view, it would make more sense to have the MSI
>> controller driver in Xen as the hostbridge emulation for guest will also live
>> there.
>>
>> So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest to
>> receive MSI. The same way would be the best, and I guess non-PV if possible. I
>> know you are looking to boot unmodified OS in a VM. This would mean we need to
>> emulate the MSI controller and potentially xilinx PCI controller. How much are
>> you willing to modify the OS?
>>
>> Regarding the MSI doorbell, I have seen it is configured by the software using
>> a physical address of a page allocated in the RAM. When the PCI devices is
>> writing into the doorbell does the access go through the SMMU?
>>
>> Regardless the answer, I think we would need to map the MSI doorbell page in
>> the guest.
>
> Why? We should be able to handle the case by trapping and emulating PCI
> config accesses. Xen can force the real MSI descriptors to use whatever
> Xen wants them to use. With an SMMU, we need to find a way to map the
> MSI doorbell in the SMMU pagetable to allow the device to write to it.
> Without SMMU, it's unneeded.

My point was using guest with SMMU, if you want to support PCI 
passthrough without SMMU then it is another subject and I would rather 
postpone this discussion.

>
>
>> Meaning that even if we trap MSI configuration access, a guess
>> could DMA in the page. So if I am not mistaken, MSI would be insecure in this
>> case :/.
>
> That's right: if a device capable of DMA to an arbitrary address in
> memory is assigned to the guest, the guest can write to the MSI doorbell
> if an SMMU is present, otherwise, the guest can write to any address in
> memory without SMMU. Completely insecure.
>
> It is the same security compromised offered by PV PCI passthrough today
> with no VT-D on the platform. I think it's still usable in some cases,
> but we need to be very clear about its security properties.

The guest would have to be mapped 1:1 in order to do DMA. And this is 
not supported today.

>
>
>> Or maybe we could avoid mapping the doorbell in the guest and let Xen receive
>> an SMMU abort. When receiving the SMMU abort, Xen could sanitize the value and
>> write into the real MSI doorbell. Not sure if it would works thought.
>
> I thought that SMMU aborts are too slow for this?

I got no idea here. However, I think it would be better than a security 
hole. So this could be an option for the user.

Cheers,

--
Julien grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 10:55         ` Roger Pau Monné
  2017-02-01 18:50           ` Stefano Stabellini
@ 2017-02-02 12:38           ` Julien Grall
  2017-02-02 23:06             ` Stefano Stabellini
  1 sibling, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-02-02 12:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Shanker Donthineni,
	xen-devel, manish.jaggi, Campbell Sean

Hi Roger,

On 01/02/17 10:55, Roger Pau Monné wrote:
> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>>>>>> whilst for Device Tree the segment number is not available.
>>>>>>
>>>>>> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
>>>>>> with all the relevant informations. This will be done via a new hypercall
>>>>>> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
>>>>>
>>>>> I understand that the main purpose of this hypercall is to get Xen and Dom0
>>>>> to
>>>>> agree on the segment numbers, but why is it necessary? If Dom0 has an
>>>>> emulated contoller like any other guest, do we care what segment numbers
>>>>> Dom0 will use?
>>>>
>>>> I was not planning to have a emulated controller for DOM0. The physical one is
>>>> not necessarily ECAM compliant so we would have to either emulate the physical
>>>> one (meaning multiple different emulation) or an ECAM compliant.
>>>>
>>>> The latter is not possible because you don't know if there is enough free MMIO
>>>> space for the emulation.
>>>>
>>>> In the case on ARM, I don't see much the point to emulate the host bridge for
>>>> DOM0. The only thing we need in Xen is to access the configuration space, we
>>>> don't have about driving the host bridge. So I would let DOM0 dealing with
>>>> that.
>>>>
>>>> Also, I don't see any reason for ARM to trap DOM0 configuration space access.
>>>> The MSI will be configured using the interrupt controller and it is a trusted
>>>> Domain.
>>>
>>> These last you sentences raise a lot of questions. Maybe I am missing
>>> something. You might want to clarify the strategy for Dom0 and DomUs,
>>> and how they differ, in the next version of the doc.
>>>
>>> At some point you wrote "Instantiation of a specific driver for the host
>>> controller can be easily done if Xen has the information to detect it.
>>> However, those drivers may require resources described in ASL." Does it
>>> mean you plan to drive the physical host bridge from Xen and Dom0
>>> simultaneously?
>>
>> I may miss some bits, so feel free to correct me if I am wrong.
>>
>> My understanding is host bridge can be divided in 2 parts:
>> 	- Initialization of the host bridge
>> 	- Access the configuration space
>>
>> For generic host bridge, the initialization is inexistent. However some host
>> bridge (e.g xgene, xilinx) may require some specific setup and also
>> configuring clocks. Given that Xen only requires to access the configuration
>> space, I was thinking to let DOM0 initialization the host bridge. This would
>> avoid to import a lot of code in Xen, however this means that we need to
>> know when the host bridge has been initialized before accessing the
>> configuration space.
>
> Can the bridge be initialized without Dom0 having access to the ECAM area? If
> that's possible I would do:
>
> 1. Dom0 initializes the bridge (whatever that involves).
> 2. Dom0 calls PHYSDEVOP_pci_mmcfg_reserved to register the bridge with Xen:
>  2.1 Xen scans the bridge and detects the devices.
>  2.2 Xen maps the ECAM area into Dom0 stage-2 p2m.
> 3. Dom0 scans the bridge &c (whatever is done on native).

As Stefano suggested, we should try to initialize the hostbridge in Xen 
when possible. This will avoid a split interaction and our hair too :).

I am looking at different hostbridge to see how much code would be 
required in Xen to handle them. I think the Xilinx root-complex is an 
easy one (see the discussion in [1]) and it is manageable to get the 
code in Xen.

But some are much more complex, for instance the R-Car (see discussion 
in [2]) requires clocks, use a specific way to access configuration 
space and has the MSI controller integrated in the root complex. This 
would require some work with DOM0. I will mention the problem in the 
design document but not going to address it at the moment (too complex). 
Although, we would have to support it at some point as the root complex 
is used in automotive board (see [3]).

For now I will address:
	- ECAM compliant/ECAM like root complex
	- Root complex with simple initialization

For DT, I would have a fallback on mapping the root complex to DOM0 if 
we don't support it. So DOM0 could still use PCI.

For ACPI, I am expecting all the platform ECAM compliant or require few 
quirks. So I would mandate the support of the root complex in Xen in 
order to get PCI supported.

>
>> Now regarding the configuration space, I think we can divide in 2 category:
>> 	- indirect access, the configuration space are multiplexed. An example
>> would be the legacy method on x86 (e.g 0xcf8 and 0xcfc). A similar method is
>> used for x-gene PCI driver ([1]).
>> 	- ECAM like access, where each PCI configuration space will have it is own
>> address space. I said "ECAM like" because some host bridge will require some
>> bits fiddling when accessing register (see thunder-ecam [2])
>>
>> There are also host bridges that mix both indirect access and ECAM like
>> access depending on the device configuration space accessed (see thunder-pem
>> [3]).
>
> Hay! Sounds like fun...
>
>> When using ECAM like host bridge, I don't think it will be an issue to have
>> both DOM0 and Xen accessing configuration space at the same time. Although,
>> we need to define who is doing what. In general case, DOM0 should not
>> touched an assigned PCI device. The only possible interaction would be
>> resetting a device (see my answer below).
>
> Iff Xen is really going to perform the reset of passthrough devices, then I
> don't see any reason to expose those devices to Dom0 at all, IMHO you should
> hide them from ACPI and ideally prevent Dom0 from interacting with them using
> the PCI configuration space (although that would require trapping on accesses
> to the PCI config space, which AFAIK you would like to avoid).

I was effectively thinking to avoid trapping PCI config space, but you 
and Stefano changed my mind. It does not cost too much to trap ECAM 
access and would be necessary on non-ECAM one.

This will also simplify the way to hide a PCI to DOM0. Xen can do it by 
making the config space of the device unavailable to DOM0 (similar to 
pciback.hide options today).

Cheers,

[1] <a1120a60-b859-c7ff-9d4a-553c330669f1@linaro.org>
[2] <616043e2-82d6-9f64-94fc-5c836d41818f@linaro.org>
[3] https://www.renesas.com/en-us/solutions/automotive/products/rcar-h3.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 19:04               ` Julien Grall
  2017-02-01 19:31                 ` Stefano Stabellini
@ 2017-02-02 15:33                 ` Edgar E. Iglesias
  2017-02-02 23:12                   ` Stefano Stabellini
  2017-02-13 15:35                   ` Julien Grall
  2017-02-02 15:40                 ` Roger Pau Monné
  2 siblings, 2 replies; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-02-02 15:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> Hi Edgar,
> 
> On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> >On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> >>On 31/01/17 16:53, Edgar E. Iglesias wrote:
> >>>On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> >>>>On 24/01/17 20:07, Stefano Stabellini wrote:
> >>>>>On Tue, 24 Jan 2017, Julien Grall wrote:
> >>>>For generic host bridge, the initialization is inexistent. However some host
> >>>>bridge (e.g xgene, xilinx) may require some specific setup and also
> >>>>configuring clocks. Given that Xen only requires to access the configuration
> >>>>space, I was thinking to let DOM0 initialization the host bridge. This would
> >>>>avoid to import a lot of code in Xen, however this means that we need to
> >>>>know when the host bridge has been initialized before accessing the
> >>>>configuration space.
> >>>
> >>>
> >>>Yes, that's correct.
> >>>There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> >>>to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> >>>enabling clocks and configuring a few registers to enable ECAM and MSI.
> >>>
> >>>I'm not sure if this could be done prior to starting Xen. Perhaps.
> >>>If so, bootloaders would have to know a head of time what devices
> >>>the GTs are supposed to be configured for.
> >>
> >>I've got further questions regarding the Gigabit Transceivers. You mention
> >>they are shared, do you mean that multiple devices can use a GT at the same
> >>time? Or the software is deciding at startup which device will use a given
> >>GT? If so, how does the software make this decision?
> >
> >Software will decide at startup. AFAIK, the allocation is normally done
> >once but I guess that in theory you could design boards that could switch
> >at runtime. I'm not sure we need to worry about that use-case though.
> >
> >The details can be found here:
> >https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> >
> >I suggest looking at pages 672 and 733.
> 
> Thank you for the documentation. I am trying to understand if we could move
> initialization in Xen as suggested by Stefano. I looked at the driver in
> Linux and the code looks simple not many dependencies. However, I was not
> able to find where the Gigabit Transceivers are configured. Do you have any
> link to the code for that?

Hi Julien,

I suspect that this setup has previously been done by the initial bootloader
auto-generated from design configuration tools.

Now, this is moving into Linux.
There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
You can see it here:
https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c

DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
&sata {
        phy-names = "sata-phy";
        phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
};

I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
DTS, Kernel etc.

If you are looking for a platform to get started, an option could be if I get you a build of
our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
These models are friendly wrt. PHY configs and initialization sequences, it will
accept pretty much any sequence and still work. This would allow you to focus on
architectural issues rather than exact details of init sequences (which we can
deal with later).



> 
> This would also mean that the MSI interrupt controller will be moved in Xen.
> Which I think is a more sensible design (see more below).
> 
> >>
> >>>>	- For all other host bridges => I don't know if there are host bridges
> >>>>falling under this category. I also don't have any idea how to handle this.
> >>>>
> >>>>>
> >>>>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> >>>>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> >>>>>config reads and writes supposed to work in details?
> >>>>
> >>>>I think I have answered to this question with my explanation above. Let me
> >>>>know if it is not the case.
> >>>>
> >>>>>How is MSI configuration supposed to work?
> >>>>
> >>>>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> >>>>per-device) and the address of the doorbell. The linkage between the LPI and
> >>>>"MSI" will be done through the ITS.
> >>>>
> >>>>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> >>>>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> >>>>1:1.
> >>>>
> >>>>So in both case, I don't think it is necessary to trap MSI configuration for
> >>>>DOM0. This may not be true if we want to handle other MSI controller.
> >>>>
> >>>>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> >>>>and xgene MSI controller ([5]). But I have no idea how they work and if we
> >>>>need to support them. Maybe Edgar could share details on the Xilinx one?
> >>>
> >>>
> >>>The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> >>>way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> >>>So perhaps trapped config accesses from domUs can help by adding this protection
> >>>as drivers configure the device.
> >>>
> >>>On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> >>>out the EventID from a FIFO in the controller and injects a new IRQ into
> >>>the kernel.
> >>
> >>It might be early to ask, but how do you expect  MSI to work with DOMU on
> >>your hardware? Does your MSI controller supports virtualization? Or are you
> >>looking for a different way to inject MSI?
> >
> >MSI support in HW is quite limited to support domU and will require SW hacks :-(
> >
> >Anyway, something along the lines of this might work:
> >
> >* Trap domU CPU writes to MSI descriptors in config space.
> >  Force real MSI descriptors to the address of the door bell area.
> >  Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
> >  Remember what EventID domU requested per device and descriptor.
> >
> >* Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
> >  At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
> >  Xen or Dom0 inject the expected EventID into domU
> >
> >Do you have any good ideas? :-)
> 
> From my understanding your MSI controller is embedded in the hostbridge,
> right? If so, the MSIs would need to be handled where the host bridge will
> be initialized (e.g either Xen or DOM0).

Yes, it is.

> 
> From a design point of view, it would make more sense to have the MSI
> controller driver in Xen as the hostbridge emulation for guest will also
> live there.
> 
> So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> to receive MSI. The same way would be the best, and I guess non-PV if
> possible. I know you are looking to boot unmodified OS in a VM. This would
> mean we need to emulate the MSI controller and potentially xilinx PCI
> controller. How much are you willing to modify the OS?

Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
things are very open and we could design with pretty much anything in mind.

Yes, we could perhaps include a very small model with most registers dummied.
Implementing the MSI read FIFO would allow us to:

1. Inject the MSI doorbell SPI into guests. The guest will then see the same
   IRQ as on real HW.

2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.



> Regarding the MSI doorbell, I have seen it is configured by the software
> using a physical address of a page allocated in the RAM. When the PCI
> devices is writing into the doorbell does the access go through the SMMU?

That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.

> Regardless the answer, I think we would need to map the MSI doorbell page in
> the guest. Meaning that even if we trap MSI configuration access, a guess
> could DMA in the page. So if I am not mistaken, MSI would be insecure in
> this case :/.
> 
> Or maybe we could avoid mapping the doorbell in the guest and let Xen
> receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> value and write into the real MSI doorbell. Not sure if it would works
> thought.

Yeah, this is a problem.
I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
Without the data, I'm not sure how we would distinguish between different MSI's from the same device.

Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.
BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?
Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?

Best regards,
Edgar


> 
> Cheers,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 19:04               ` Julien Grall
  2017-02-01 19:31                 ` Stefano Stabellini
  2017-02-02 15:33                 ` Edgar E. Iglesias
@ 2017-02-02 15:40                 ` Roger Pau Monné
  2017-02-13 16:22                   ` Julien Grall
  2 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-02-02 15:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, alistair.francis,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean

On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> Or maybe we could avoid mapping the doorbell in the guest and let Xen
> receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> value and write into the real MSI doorbell. Not sure if it would works
> thought.

AFAIK (and I might be wrong) you can only know the address that caused the
fault, but not the data that was attempted to be written there. TBH, I wouldn't
expect this approach to work.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 12:38           ` Julien Grall
@ 2017-02-02 23:06             ` Stefano Stabellini
  2017-03-08 19:06               ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-02 23:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Shanker Donthineni,
	xen-devel, manish.jaggi, Campbell Sean, Roger Pau Monné

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5066 bytes --]

On Thu, 2 Feb 2017, Julien Grall wrote:
> Hi Roger,
> 
> On 01/02/17 10:55, Roger Pau Monné wrote:
> > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > > > whilst for Device Tree the segment number is not available.
> > > > > > > 
> > > > > > > So Xen needs to rely on DOM0 to discover the host bridges and
> > > > > > > notify Xen
> > > > > > > with all the relevant informations. This will be done via a new
> > > > > > > hypercall
> > > > > > > PHYSDEVOP_pci_host_bridge_add. The layout of the structure will
> > > > > > > be:
> > > > > > 
> > > > > > I understand that the main purpose of this hypercall is to get Xen
> > > > > > and Dom0
> > > > > > to
> > > > > > agree on the segment numbers, but why is it necessary? If Dom0 has
> > > > > > an
> > > > > > emulated contoller like any other guest, do we care what segment
> > > > > > numbers
> > > > > > Dom0 will use?
> > > > > 
> > > > > I was not planning to have a emulated controller for DOM0. The
> > > > > physical one is
> > > > > not necessarily ECAM compliant so we would have to either emulate the
> > > > > physical
> > > > > one (meaning multiple different emulation) or an ECAM compliant.
> > > > > 
> > > > > The latter is not possible because you don't know if there is enough
> > > > > free MMIO
> > > > > space for the emulation.
> > > > > 
> > > > > In the case on ARM, I don't see much the point to emulate the host
> > > > > bridge for
> > > > > DOM0. The only thing we need in Xen is to access the configuration
> > > > > space, we
> > > > > don't have about driving the host bridge. So I would let DOM0 dealing
> > > > > with
> > > > > that.
> > > > > 
> > > > > Also, I don't see any reason for ARM to trap DOM0 configuration space
> > > > > access.
> > > > > The MSI will be configured using the interrupt controller and it is a
> > > > > trusted
> > > > > Domain.
> > > > 
> > > > These last you sentences raise a lot of questions. Maybe I am missing
> > > > something. You might want to clarify the strategy for Dom0 and DomUs,
> > > > and how they differ, in the next version of the doc.
> > > > 
> > > > At some point you wrote "Instantiation of a specific driver for the host
> > > > controller can be easily done if Xen has the information to detect it.
> > > > However, those drivers may require resources described in ASL." Does it
> > > > mean you plan to drive the physical host bridge from Xen and Dom0
> > > > simultaneously?
> > > 
> > > I may miss some bits, so feel free to correct me if I am wrong.
> > > 
> > > My understanding is host bridge can be divided in 2 parts:
> > > 	- Initialization of the host bridge
> > > 	- Access the configuration space
> > > 
> > > For generic host bridge, the initialization is inexistent. However some
> > > host
> > > bridge (e.g xgene, xilinx) may require some specific setup and also
> > > configuring clocks. Given that Xen only requires to access the
> > > configuration
> > > space, I was thinking to let DOM0 initialization the host bridge. This
> > > would
> > > avoid to import a lot of code in Xen, however this means that we need to
> > > know when the host bridge has been initialized before accessing the
> > > configuration space.
> > 
> > Can the bridge be initialized without Dom0 having access to the ECAM area?
> > If
> > that's possible I would do:
> > 
> > 1. Dom0 initializes the bridge (whatever that involves).
> > 2. Dom0 calls PHYSDEVOP_pci_mmcfg_reserved to register the bridge with Xen:
> >  2.1 Xen scans the bridge and detects the devices.
> >  2.2 Xen maps the ECAM area into Dom0 stage-2 p2m.
> > 3. Dom0 scans the bridge &c (whatever is done on native).
> 
> As Stefano suggested, we should try to initialize the hostbridge in Xen when
> possible. This will avoid a split interaction and our hair too :).
> 
> I am looking at different hostbridge to see how much code would be required in
> Xen to handle them. I think the Xilinx root-complex is an easy one (see the
> discussion in [1]) and it is manageable to get the code in Xen.
> 
> But some are much more complex, for instance the R-Car (see discussion in [2])
> requires clocks, use a specific way to access configuration space and has the
> MSI controller integrated in the root complex. This would require some work
> with DOM0. I will mention the problem in the design document but not going to
> address it at the moment (too complex). Although, we would have to support it
> at some point as the root complex is used in automotive board (see [3]).
> 
> For now I will address:
> 	- ECAM compliant/ECAM like root complex
> 	- Root complex with simple initialization
> 
> For DT, I would have a fallback on mapping the root complex to DOM0 if we
> don't support it. So DOM0 could still use PCI.
> 
> For ACPI, I am expecting all the platform ECAM compliant or require few
> quirks. So I would mandate the support of the root complex in Xen in order to
> get PCI supported.

Sound good. Ack.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 15:33                 ` Edgar E. Iglesias
@ 2017-02-02 23:12                   ` Stefano Stabellini
  2017-02-02 23:44                     ` Edgar E. Iglesias
  2017-02-13 15:35                   ` Julien Grall
  1 sibling, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-02 23:12 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean,
	Roger Pau Monné

On Thu, 2 Feb 2017, Edgar E. Iglesias wrote:
> On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> > Hi Edgar,
> > 
> > On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> > >On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > >>On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > >>>On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > >>>>On 24/01/17 20:07, Stefano Stabellini wrote:
> > >>>>>On Tue, 24 Jan 2017, Julien Grall wrote:
> > >>>>For generic host bridge, the initialization is inexistent. However some host
> > >>>>bridge (e.g xgene, xilinx) may require some specific setup and also
> > >>>>configuring clocks. Given that Xen only requires to access the configuration
> > >>>>space, I was thinking to let DOM0 initialization the host bridge. This would
> > >>>>avoid to import a lot of code in Xen, however this means that we need to
> > >>>>know when the host bridge has been initialized before accessing the
> > >>>>configuration space.
> > >>>
> > >>>
> > >>>Yes, that's correct.
> > >>>There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> > >>>to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > >>>enabling clocks and configuring a few registers to enable ECAM and MSI.
> > >>>
> > >>>I'm not sure if this could be done prior to starting Xen. Perhaps.
> > >>>If so, bootloaders would have to know a head of time what devices
> > >>>the GTs are supposed to be configured for.
> > >>
> > >>I've got further questions regarding the Gigabit Transceivers. You mention
> > >>they are shared, do you mean that multiple devices can use a GT at the same
> > >>time? Or the software is deciding at startup which device will use a given
> > >>GT? If so, how does the software make this decision?
> > >
> > >Software will decide at startup. AFAIK, the allocation is normally done
> > >once but I guess that in theory you could design boards that could switch
> > >at runtime. I'm not sure we need to worry about that use-case though.
> > >
> > >The details can be found here:
> > >https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > >
> > >I suggest looking at pages 672 and 733.
> > 
> > Thank you for the documentation. I am trying to understand if we could move
> > initialization in Xen as suggested by Stefano. I looked at the driver in
> > Linux and the code looks simple not many dependencies. However, I was not
> > able to find where the Gigabit Transceivers are configured. Do you have any
> > link to the code for that?
> 
> Hi Julien,
> 
> I suspect that this setup has previously been done by the initial bootloader
> auto-generated from design configuration tools.
> 
> Now, this is moving into Linux.
> There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> You can see it here:
> https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> 
> DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> &sata {
>         phy-names = "sata-phy";
>         phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> };
> 
> I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> DTS, Kernel etc.
> 
> If you are looking for a platform to get started, an option could be if I get you a build of
> our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
> These models are friendly wrt. PHY configs and initialization sequences, it will
> accept pretty much any sequence and still work. This would allow you to focus on
> architectural issues rather than exact details of init sequences (which we can
> deal with later).
> 
> 
> 
> > 
> > This would also mean that the MSI interrupt controller will be moved in Xen.
> > Which I think is a more sensible design (see more below).
> > 
> > >>
> > >>>>	- For all other host bridges => I don't know if there are host bridges
> > >>>>falling under this category. I also don't have any idea how to handle this.
> > >>>>
> > >>>>>
> > >>>>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> > >>>>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> > >>>>>config reads and writes supposed to work in details?
> > >>>>
> > >>>>I think I have answered to this question with my explanation above. Let me
> > >>>>know if it is not the case.
> > >>>>
> > >>>>>How is MSI configuration supposed to work?
> > >>>>
> > >>>>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > >>>>per-device) and the address of the doorbell. The linkage between the LPI and
> > >>>>"MSI" will be done through the ITS.
> > >>>>
> > >>>>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > >>>>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> > >>>>1:1.
> > >>>>
> > >>>>So in both case, I don't think it is necessary to trap MSI configuration for
> > >>>>DOM0. This may not be true if we want to handle other MSI controller.
> > >>>>
> > >>>>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> > >>>>and xgene MSI controller ([5]). But I have no idea how they work and if we
> > >>>>need to support them. Maybe Edgar could share details on the Xilinx one?
> > >>>
> > >>>
> > >>>The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> > >>>way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> > >>>So perhaps trapped config accesses from domUs can help by adding this protection
> > >>>as drivers configure the device.
> > >>>
> > >>>On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > >>>out the EventID from a FIFO in the controller and injects a new IRQ into
> > >>>the kernel.
> > >>
> > >>It might be early to ask, but how do you expect  MSI to work with DOMU on
> > >>your hardware? Does your MSI controller supports virtualization? Or are you
> > >>looking for a different way to inject MSI?
> > >
> > >MSI support in HW is quite limited to support domU and will require SW hacks :-(
> > >
> > >Anyway, something along the lines of this might work:
> > >
> > >* Trap domU CPU writes to MSI descriptors in config space.
> > >  Force real MSI descriptors to the address of the door bell area.
> > >  Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
> > >  Remember what EventID domU requested per device and descriptor.
> > >
> > >* Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
> > >  At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
> > >  Xen or Dom0 inject the expected EventID into domU
> > >
> > >Do you have any good ideas? :-)
> > 
> > From my understanding your MSI controller is embedded in the hostbridge,
> > right? If so, the MSIs would need to be handled where the host bridge will
> > be initialized (e.g either Xen or DOM0).
> 
> Yes, it is.
> 
> > 
> > From a design point of view, it would make more sense to have the MSI
> > controller driver in Xen as the hostbridge emulation for guest will also
> > live there.
> > 
> > So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> > to receive MSI. The same way would be the best, and I guess non-PV if
> > possible. I know you are looking to boot unmodified OS in a VM. This would
> > mean we need to emulate the MSI controller and potentially xilinx PCI
> > controller. How much are you willing to modify the OS?
> 
> Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> things are very open and we could design with pretty much anything in mind.
> 
> Yes, we could perhaps include a very small model with most registers dummied.
> Implementing the MSI read FIFO would allow us to:
> 
> 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
>    IRQ as on real HW.
> 
> 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> 
> 
> 
> > Regarding the MSI doorbell, I have seen it is configured by the software
> > using a physical address of a page allocated in the RAM. When the PCI
> > devices is writing into the doorbell does the access go through the SMMU?
> 
> That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.
> 
> > Regardless the answer, I think we would need to map the MSI doorbell page in
> > the guest. Meaning that even if we trap MSI configuration access, a guess
> > could DMA in the page. So if I am not mistaken, MSI would be insecure in
> > this case :/.
> > 
> > Or maybe we could avoid mapping the doorbell in the guest and let Xen
> > receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> > value and write into the real MSI doorbell. Not sure if it would works
> > thought.
> 
> Yeah, this is a problem.
> I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
> Without the data, I'm not sure how we would distinguish between different MSI's from the same device.
> 
> Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.

Does that mean that from the SMMU perspective you can only assign them
all or none?


> BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?
> Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?

No, we don't have anything like that. There are too many device specific
ways to request DMAs to do that. For devices that cannot be effectively
protected by IOMMU, (on x86) we support assignment but only in an
insecure fashion. 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 23:12                   ` Stefano Stabellini
@ 2017-02-02 23:44                     ` Edgar E. Iglesias
  2017-02-10  1:01                       ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-02-02 23:44 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar E. Iglesias, Punit Agrawal, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Julien Grall, alistair.francis,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

On Thu, Feb 02, 2017 at 03:12:52PM -0800, Stefano Stabellini wrote:
> On Thu, 2 Feb 2017, Edgar E. Iglesias wrote:
> > On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> > > Hi Edgar,
> > > 
> > > On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> > > >On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > > >>On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > > >>>On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > >>>>On 24/01/17 20:07, Stefano Stabellini wrote:
> > > >>>>>On Tue, 24 Jan 2017, Julien Grall wrote:
> > > >>>>For generic host bridge, the initialization is inexistent. However some host
> > > >>>>bridge (e.g xgene, xilinx) may require some specific setup and also
> > > >>>>configuring clocks. Given that Xen only requires to access the configuration
> > > >>>>space, I was thinking to let DOM0 initialization the host bridge. This would
> > > >>>>avoid to import a lot of code in Xen, however this means that we need to
> > > >>>>know when the host bridge has been initialized before accessing the
> > > >>>>configuration space.
> > > >>>
> > > >>>
> > > >>>Yes, that's correct.
> > > >>>There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> > > >>>to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > > >>>enabling clocks and configuring a few registers to enable ECAM and MSI.
> > > >>>
> > > >>>I'm not sure if this could be done prior to starting Xen. Perhaps.
> > > >>>If so, bootloaders would have to know a head of time what devices
> > > >>>the GTs are supposed to be configured for.
> > > >>
> > > >>I've got further questions regarding the Gigabit Transceivers. You mention
> > > >>they are shared, do you mean that multiple devices can use a GT at the same
> > > >>time? Or the software is deciding at startup which device will use a given
> > > >>GT? If so, how does the software make this decision?
> > > >
> > > >Software will decide at startup. AFAIK, the allocation is normally done
> > > >once but I guess that in theory you could design boards that could switch
> > > >at runtime. I'm not sure we need to worry about that use-case though.
> > > >
> > > >The details can be found here:
> > > >https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > >
> > > >I suggest looking at pages 672 and 733.
> > > 
> > > Thank you for the documentation. I am trying to understand if we could move
> > > initialization in Xen as suggested by Stefano. I looked at the driver in
> > > Linux and the code looks simple not many dependencies. However, I was not
> > > able to find where the Gigabit Transceivers are configured. Do you have any
> > > link to the code for that?
> > 
> > Hi Julien,
> > 
> > I suspect that this setup has previously been done by the initial bootloader
> > auto-generated from design configuration tools.
> > 
> > Now, this is moving into Linux.
> > There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> > You can see it here:
> > https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> > 
> > DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> > &sata {
> >         phy-names = "sata-phy";
> >         phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> > };
> > 
> > I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> > DTS, Kernel etc.
> > 
> > If you are looking for a platform to get started, an option could be if I get you a build of
> > our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
> > These models are friendly wrt. PHY configs and initialization sequences, it will
> > accept pretty much any sequence and still work. This would allow you to focus on
> > architectural issues rather than exact details of init sequences (which we can
> > deal with later).
> > 
> > 
> > 
> > > 
> > > This would also mean that the MSI interrupt controller will be moved in Xen.
> > > Which I think is a more sensible design (see more below).
> > > 
> > > >>
> > > >>>>	- For all other host bridges => I don't know if there are host bridges
> > > >>>>falling under this category. I also don't have any idea how to handle this.
> > > >>>>
> > > >>>>>
> > > >>>>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> > > >>>>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> > > >>>>>config reads and writes supposed to work in details?
> > > >>>>
> > > >>>>I think I have answered to this question with my explanation above. Let me
> > > >>>>know if it is not the case.
> > > >>>>
> > > >>>>>How is MSI configuration supposed to work?
> > > >>>>
> > > >>>>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > > >>>>per-device) and the address of the doorbell. The linkage between the LPI and
> > > >>>>"MSI" will be done through the ITS.
> > > >>>>
> > > >>>>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > > >>>>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> > > >>>>1:1.
> > > >>>>
> > > >>>>So in both case, I don't think it is necessary to trap MSI configuration for
> > > >>>>DOM0. This may not be true if we want to handle other MSI controller.
> > > >>>>
> > > >>>>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> > > >>>>and xgene MSI controller ([5]). But I have no idea how they work and if we
> > > >>>>need to support them. Maybe Edgar could share details on the Xilinx one?
> > > >>>
> > > >>>
> > > >>>The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> > > >>>way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> > > >>>So perhaps trapped config accesses from domUs can help by adding this protection
> > > >>>as drivers configure the device.
> > > >>>
> > > >>>On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > > >>>out the EventID from a FIFO in the controller and injects a new IRQ into
> > > >>>the kernel.
> > > >>
> > > >>It might be early to ask, but how do you expect  MSI to work with DOMU on
> > > >>your hardware? Does your MSI controller supports virtualization? Or are you
> > > >>looking for a different way to inject MSI?
> > > >
> > > >MSI support in HW is quite limited to support domU and will require SW hacks :-(
> > > >
> > > >Anyway, something along the lines of this might work:
> > > >
> > > >* Trap domU CPU writes to MSI descriptors in config space.
> > > >  Force real MSI descriptors to the address of the door bell area.
> > > >  Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
> > > >  Remember what EventID domU requested per device and descriptor.
> > > >
> > > >* Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
> > > >  At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
> > > >  Xen or Dom0 inject the expected EventID into domU
> > > >
> > > >Do you have any good ideas? :-)
> > > 
> > > From my understanding your MSI controller is embedded in the hostbridge,
> > > right? If so, the MSIs would need to be handled where the host bridge will
> > > be initialized (e.g either Xen or DOM0).
> > 
> > Yes, it is.
> > 
> > > 
> > > From a design point of view, it would make more sense to have the MSI
> > > controller driver in Xen as the hostbridge emulation for guest will also
> > > live there.
> > > 
> > > So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> > > to receive MSI. The same way would be the best, and I guess non-PV if
> > > possible. I know you are looking to boot unmodified OS in a VM. This would
> > > mean we need to emulate the MSI controller and potentially xilinx PCI
> > > controller. How much are you willing to modify the OS?
> > 
> > Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> > things are very open and we could design with pretty much anything in mind.
> > 
> > Yes, we could perhaps include a very small model with most registers dummied.
> > Implementing the MSI read FIFO would allow us to:
> > 
> > 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
> >    IRQ as on real HW.
> > 
> > 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> > 
> > 
> > 
> > > Regarding the MSI doorbell, I have seen it is configured by the software
> > > using a physical address of a page allocated in the RAM. When the PCI
> > > devices is writing into the doorbell does the access go through the SMMU?
> > 
> > That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.
> > 
> > > Regardless the answer, I think we would need to map the MSI doorbell page in
> > > the guest. Meaning that even if we trap MSI configuration access, a guess
> > > could DMA in the page. So if I am not mistaken, MSI would be insecure in
> > > this case :/.
> > > 
> > > Or maybe we could avoid mapping the doorbell in the guest and let Xen
> > > receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> > > value and write into the real MSI doorbell. Not sure if it would works
> > > thought.
> > 
> > Yeah, this is a problem.
> > I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
> > Without the data, I'm not sure how we would distinguish between different MSI's from the same device.
> > 
> > Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.
> 
> Does that mean that from the SMMU perspective you can only assign them
> all or none?

Unfortunately yes.


> > BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?
> > Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?
> 
> No, we don't have anything like that. There are too many device specific
> ways to request DMAs to do that. For devices that cannot be effectively
> protected by IOMMU, (on x86) we support assignment but only in an
> insecure fashion.

OK, I see.

A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
PCI DMA devs could be locked in to only be able to access this mem + MSI doorbell.
Guests can still screw each other up but at least it becomes harder to read/write directly from each others OS memory.
It may not be worth the effort though....

Cheers,
Edgar




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 23:44                     ` Edgar E. Iglesias
@ 2017-02-10  1:01                       ` Stefano Stabellini
  2017-02-13 15:39                         ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-10  1:01 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Julien Grall, alistair.francis,
	Punit Agrawal, Campbell Sean, xen-devel, manish.jaggi,
	Shanker Donthineni, Roger Pau Monné

On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
> On Thu, Feb 02, 2017 at 03:12:52PM -0800, Stefano Stabellini wrote:
> > On Thu, 2 Feb 2017, Edgar E. Iglesias wrote:
> > > On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> > > > Hi Edgar,
> > > > 
> > > > On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> > > > >On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > > > >>On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > > > >>>On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > >>>>On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > >>>>>On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > >>>>For generic host bridge, the initialization is inexistent. However some host
> > > > >>>>bridge (e.g xgene, xilinx) may require some specific setup and also
> > > > >>>>configuring clocks. Given that Xen only requires to access the configuration
> > > > >>>>space, I was thinking to let DOM0 initialization the host bridge. This would
> > > > >>>>avoid to import a lot of code in Xen, however this means that we need to
> > > > >>>>know when the host bridge has been initialized before accessing the
> > > > >>>>configuration space.
> > > > >>>
> > > > >>>
> > > > >>>Yes, that's correct.
> > > > >>>There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> > > > >>>to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > > > >>>enabling clocks and configuring a few registers to enable ECAM and MSI.
> > > > >>>
> > > > >>>I'm not sure if this could be done prior to starting Xen. Perhaps.
> > > > >>>If so, bootloaders would have to know a head of time what devices
> > > > >>>the GTs are supposed to be configured for.
> > > > >>
> > > > >>I've got further questions regarding the Gigabit Transceivers. You mention
> > > > >>they are shared, do you mean that multiple devices can use a GT at the same
> > > > >>time? Or the software is deciding at startup which device will use a given
> > > > >>GT? If so, how does the software make this decision?
> > > > >
> > > > >Software will decide at startup. AFAIK, the allocation is normally done
> > > > >once but I guess that in theory you could design boards that could switch
> > > > >at runtime. I'm not sure we need to worry about that use-case though.
> > > > >
> > > > >The details can be found here:
> > > > >https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > >
> > > > >I suggest looking at pages 672 and 733.
> > > > 
> > > > Thank you for the documentation. I am trying to understand if we could move
> > > > initialization in Xen as suggested by Stefano. I looked at the driver in
> > > > Linux and the code looks simple not many dependencies. However, I was not
> > > > able to find where the Gigabit Transceivers are configured. Do you have any
> > > > link to the code for that?
> > > 
> > > Hi Julien,
> > > 
> > > I suspect that this setup has previously been done by the initial bootloader
> > > auto-generated from design configuration tools.
> > > 
> > > Now, this is moving into Linux.
> > > There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> > > You can see it here:
> > > https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> > > 
> > > DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> > > &sata {
> > >         phy-names = "sata-phy";
> > >         phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> > > };
> > > 
> > > I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> > > DTS, Kernel etc.
> > > 
> > > If you are looking for a platform to get started, an option could be if I get you a build of
> > > our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
> > > These models are friendly wrt. PHY configs and initialization sequences, it will
> > > accept pretty much any sequence and still work. This would allow you to focus on
> > > architectural issues rather than exact details of init sequences (which we can
> > > deal with later).
> > > 
> > > 
> > > 
> > > > 
> > > > This would also mean that the MSI interrupt controller will be moved in Xen.
> > > > Which I think is a more sensible design (see more below).
> > > > 
> > > > >>
> > > > >>>>	- For all other host bridges => I don't know if there are host bridges
> > > > >>>>falling under this category. I also don't have any idea how to handle this.
> > > > >>>>
> > > > >>>>>
> > > > >>>>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> > > > >>>>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> > > > >>>>>config reads and writes supposed to work in details?
> > > > >>>>
> > > > >>>>I think I have answered to this question with my explanation above. Let me
> > > > >>>>know if it is not the case.
> > > > >>>>
> > > > >>>>>How is MSI configuration supposed to work?
> > > > >>>>
> > > > >>>>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > > > >>>>per-device) and the address of the doorbell. The linkage between the LPI and
> > > > >>>>"MSI" will be done through the ITS.
> > > > >>>>
> > > > >>>>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > > > >>>>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> > > > >>>>1:1.
> > > > >>>>
> > > > >>>>So in both case, I don't think it is necessary to trap MSI configuration for
> > > > >>>>DOM0. This may not be true if we want to handle other MSI controller.
> > > > >>>>
> > > > >>>>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> > > > >>>>and xgene MSI controller ([5]). But I have no idea how they work and if we
> > > > >>>>need to support them. Maybe Edgar could share details on the Xilinx one?
> > > > >>>
> > > > >>>
> > > > >>>The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> > > > >>>way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> > > > >>>So perhaps trapped config accesses from domUs can help by adding this protection
> > > > >>>as drivers configure the device.
> > > > >>>
> > > > >>>On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > > > >>>out the EventID from a FIFO in the controller and injects a new IRQ into
> > > > >>>the kernel.
> > > > >>
> > > > >>It might be early to ask, but how do you expect  MSI to work with DOMU on
> > > > >>your hardware? Does your MSI controller supports virtualization? Or are you
> > > > >>looking for a different way to inject MSI?
> > > > >
> > > > >MSI support in HW is quite limited to support domU and will require SW hacks :-(
> > > > >
> > > > >Anyway, something along the lines of this might work:
> > > > >
> > > > >* Trap domU CPU writes to MSI descriptors in config space.
> > > > >  Force real MSI descriptors to the address of the door bell area.
> > > > >  Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
> > > > >  Remember what EventID domU requested per device and descriptor.
> > > > >
> > > > >* Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
> > > > >  At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
> > > > >  Xen or Dom0 inject the expected EventID into domU
> > > > >
> > > > >Do you have any good ideas? :-)
> > > > 
> > > > From my understanding your MSI controller is embedded in the hostbridge,
> > > > right? If so, the MSIs would need to be handled where the host bridge will
> > > > be initialized (e.g either Xen or DOM0).
> > > 
> > > Yes, it is.
> > > 
> > > > 
> > > > From a design point of view, it would make more sense to have the MSI
> > > > controller driver in Xen as the hostbridge emulation for guest will also
> > > > live there.
> > > > 
> > > > So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> > > > to receive MSI. The same way would be the best, and I guess non-PV if
> > > > possible. I know you are looking to boot unmodified OS in a VM. This would
> > > > mean we need to emulate the MSI controller and potentially xilinx PCI
> > > > controller. How much are you willing to modify the OS?
> > > 
> > > Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> > > things are very open and we could design with pretty much anything in mind.
> > > 
> > > Yes, we could perhaps include a very small model with most registers dummied.
> > > Implementing the MSI read FIFO would allow us to:
> > > 
> > > 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
> > >    IRQ as on real HW.
> > > 
> > > 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> > > 
> > > 
> > > 
> > > > Regarding the MSI doorbell, I have seen it is configured by the software
> > > > using a physical address of a page allocated in the RAM. When the PCI
> > > > devices is writing into the doorbell does the access go through the SMMU?
> > > 
> > > That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.
> > > 
> > > > Regardless the answer, I think we would need to map the MSI doorbell page in
> > > > the guest. Meaning that even if we trap MSI configuration access, a guess
> > > > could DMA in the page. So if I am not mistaken, MSI would be insecure in
> > > > this case :/.
> > > > 
> > > > Or maybe we could avoid mapping the doorbell in the guest and let Xen
> > > > receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> > > > value and write into the real MSI doorbell. Not sure if it would works
> > > > thought.
> > > 
> > > Yeah, this is a problem.
> > > I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
> > > Without the data, I'm not sure how we would distinguish between different MSI's from the same device.
> > > 
> > > Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.
> > 
> > Does that mean that from the SMMU perspective you can only assign them
> > all or none?
> 
> Unfortunately yes.
> 
> 
> > > BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?
> > > Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?
> > 
> > No, we don't have anything like that. There are too many device specific
> > ways to request DMAs to do that. For devices that cannot be effectively
> > protected by IOMMU, (on x86) we support assignment but only in an
> > insecure fashion.
> 
> OK, I see.
> 
> A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
> PCI DMA devs could be locked in to only be able to access this mem + MSI doorbell.
> Guests can still screw each other up but at least it becomes harder to read/write directly from each others OS memory.
> It may not be worth the effort though....

Actually, we do have the swiotlb in Dom0, which can be used to bounce
DMA requests over a buffer that has been previously setup to be DMA safe
using an hypercall. That is how the swiotlb is used on x86. On ARM it is
used to issue cache flushes via hypercall, but it could be adapted to do
both. It would degrade performance, due to the additional memcpy, but it
would work, I believe.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-01 18:50           ` Stefano Stabellini
@ 2017-02-10  9:48             ` Roger Pau Monné
  2017-02-10 10:11               ` Paul Durrant
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-02-10  9:48 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Punit Agrawal, Anshul Makkar, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Julien Grall, alistair.francis,
	Paul.Durrant, Shanker Donthineni, xen-devel, manish.jaggi,
	Campbell Sean

On Wed, Feb 01, 2017 at 10:50:49AM -0800, Stefano Stabellini wrote:
> On Wed, 1 Feb 2017, Roger Pau Monné wrote:
> > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > When using ECAM like host bridge, I don't think it will be an issue to have
> > > both DOM0 and Xen accessing configuration space at the same time. Although,
> > > we need to define who is doing what. In general case, DOM0 should not
> > > touched an assigned PCI device. The only possible interaction would be
> > > resetting a device (see my answer below).
> > 
> > Iff Xen is really going to perform the reset of passthrough devices, then I
> > don't see any reason to expose those devices to Dom0 at all, IMHO you should
> > hide them from ACPI and ideally prevent Dom0 from interacting with them using
> > the PCI configuration space (although that would require trapping on accesses
> > to the PCI config space, which AFAIK you would like to avoid).
> 
> Right! A much cleaner solution! If we are going to have Xen handle ECAM
> and emulating PCI host bridges, then we should go all the way and have
> Xen do everything about PCI.

Replying here because this thread has become so long that's hard to find a good
place to put this information.

I've recently been told (f2f), that more complex passthrough (like Nvidia vGPU
or Intel XenGT) work in a slightly different way, which seems to be a bit
incompatible with what we are proposing. I've been told that Nvidia vGPU
passthrough requires a driver in Dom0 (closed-source Nvidia code AFAIK), and
that upon loading this driver a bunch of virtual functions appear out of the
blue in the PCI bus.

Now, if we completely hide passed-through devices from Dom0, it would be
impossible to load this driver, and thus to make the virtual functions appear.
I would like someone that's more familiar with this to comment, so I'm adding
Paul and Anshul to the conversation.

To give some context to them, we were currently discussing to completely hide
passthrough PCI devices from Dom0, and have Xen perform the reset of the
device. This would apply to PVH and ARM. Can you comment on whether such
approach would work with things like vGPU passthrough?

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-10  9:48             ` Roger Pau Monné
@ 2017-02-10 10:11               ` Paul Durrant
  2017-02-10 12:57                 ` Roger Pau Monne
  0 siblings, 1 reply; 82+ messages in thread
From: Paul Durrant @ 2017-02-10 10:11 UTC (permalink / raw)
  To: Roger Pau Monne, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Punit Agrawal, Anshul Makkar, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Julien Grall, alistair.francis,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 10 February 2017 09:49
> To: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@linaro.org>; xen-devel <xen-
> devel@lists.xenproject.org>; Edgar Iglesias (edgar.iglesias@xilinx.com)
> <edgar.iglesias@xilinx.com>; Steve Capper <Steve.Capper@arm.com>; Punit
> Agrawal <punit.agrawal@arm.com>; Wei Chen <Wei.Chen@arm.com>;
> Campbell Sean <scampbel@codeaurora.org>; Shanker Donthineni
> <shankerd@codeaurora.org>; Jiandi An <anjiandi@codeaurora.org>;
> manish.jaggi@caviumnetworks.com; alistair.francis@xilinx.com; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; Anshul Makkar
> <anshul.makkar@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: Re: [early RFC] ARM PCI Passthrough design document
> 
> On Wed, Feb 01, 2017 at 10:50:49AM -0800, Stefano Stabellini wrote:
> > On Wed, 1 Feb 2017, Roger Pau Monné wrote:
> > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > Hi Stefano,
> > > >
> > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > When using ECAM like host bridge, I don't think it will be an issue to
> have
> > > > both DOM0 and Xen accessing configuration space at the same time.
> Although,
> > > > we need to define who is doing what. In general case, DOM0 should
> not
> > > > touched an assigned PCI device. The only possible interaction would be
> > > > resetting a device (see my answer below).
> > >
> > > Iff Xen is really going to perform the reset of passthrough devices, then I
> > > don't see any reason to expose those devices to Dom0 at all, IMHO you
> should
> > > hide them from ACPI and ideally prevent Dom0 from interacting with
> them using
> > > the PCI configuration space (although that would require trapping on
> accesses
> > > to the PCI config space, which AFAIK you would like to avoid).
> >
> > Right! A much cleaner solution! If we are going to have Xen handle ECAM
> > and emulating PCI host bridges, then we should go all the way and have
> > Xen do everything about PCI.
> 
> Replying here because this thread has become so long that's hard to find a
> good
> place to put this information.
> 
> I've recently been told (f2f), that more complex passthrough (like Nvidia
> vGPU
> or Intel XenGT) work in a slightly different way, which seems to be a bit
> incompatible with what we are proposing. I've been told that Nvidia vGPU
> passthrough requires a driver in Dom0 (closed-source Nvidia code AFAIK),
> and
> that upon loading this driver a bunch of virtual functions appear out of the
> blue in the PCI bus.
> 
> Now, if we completely hide passed-through devices from Dom0, it would be
> impossible to load this driver, and thus to make the virtual functions appear.
> I would like someone that's more familiar with this to comment, so I'm
> adding
> Paul and Anshul to the conversation.
> 
> To give some context to them, we were currently discussing to completely
> hide
> passthrough PCI devices from Dom0, and have Xen perform the reset of the
> device. This would apply to PVH and ARM. Can you comment on whether
> such
> approach would work with things like vGPU passthrough?

Neither NVIDIA vGPU nor Intel GVT-g are pass-through. They both use emulation to synthesize GPU devices for guests and then use the actual GPU to service the commands sent by the guest driver to the virtual GPU. So, I think they fall outside the discussion here.
AMD MxGPU is somewhat different in that it is an almost-SRIOV solution. I say 'almost' because the VF's are not truly independent and so some interception of accesses to certain registers is required, so that arbitration can be applied, or they can be blocked. In this case a dedicated driver in dom0 is required, and I believe it needs access to both the PF and all the VFs to function correctly. However, once initial set-up is done, I think the VFs could then be hidden from dom0. The PF is never passed-through and so there should be no issue in leaving it visible to dom0.

There is a further complication with GVT-d (Intel's term for GPU pass-through) also because I believe there is also some initial set-up required and some supporting emulation (e.g. Intel's guest driver expects there to be an ISA bridge along with the GPU) which may need access to the real GPU. It is also possible that, once this set-up is done, the GPU can then be hidden from dom0 but I'm not sure because I was not involved with that code.

Full pass-through of NVIDIA and AMD GPUs does not involve access from dom0 at all though, so I don't think there should be any complication there.

Does that all make sense?

  Paul

> 
> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-10 10:11               ` Paul Durrant
@ 2017-02-10 12:57                 ` Roger Pau Monne
  2017-02-10 13:02                   ` Paul Durrant
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:57 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Anshul Makkar, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, Julien Grall,
	alistair.francis, Shanker Donthineni, xen-devel, manish.jaggi,
	Campbell Sean

On Fri, Feb 10, 2017 at 10:11:53AM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 10 February 2017 09:49
> > To: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Julien Grall <julien.grall@linaro.org>; xen-devel <xen-
> > devel@lists.xenproject.org>; Edgar Iglesias (edgar.iglesias@xilinx.com)
> > <edgar.iglesias@xilinx.com>; Steve Capper <Steve.Capper@arm.com>; Punit
> > Agrawal <punit.agrawal@arm.com>; Wei Chen <Wei.Chen@arm.com>;
> > Campbell Sean <scampbel@codeaurora.org>; Shanker Donthineni
> > <shankerd@codeaurora.org>; Jiandi An <anjiandi@codeaurora.org>;
> > manish.jaggi@caviumnetworks.com; alistair.francis@xilinx.com; Andrew
> > Cooper <Andrew.Cooper3@citrix.com>; Anshul Makkar
> > <anshul.makkar@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> > Subject: Re: [early RFC] ARM PCI Passthrough design document
> > 
> > On Wed, Feb 01, 2017 at 10:50:49AM -0800, Stefano Stabellini wrote:
> > > On Wed, 1 Feb 2017, Roger Pau Monné wrote:
> > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > >
> > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > When using ECAM like host bridge, I don't think it will be an issue to
> > have
> > > > > both DOM0 and Xen accessing configuration space at the same time.
> > Although,
> > > > > we need to define who is doing what. In general case, DOM0 should
> > not
> > > > > touched an assigned PCI device. The only possible interaction would be
> > > > > resetting a device (see my answer below).
> > > >
> > > > Iff Xen is really going to perform the reset of passthrough devices, then I
> > > > don't see any reason to expose those devices to Dom0 at all, IMHO you
> > should
> > > > hide them from ACPI and ideally prevent Dom0 from interacting with
> > them using
> > > > the PCI configuration space (although that would require trapping on
> > accesses
> > > > to the PCI config space, which AFAIK you would like to avoid).
> > >
> > > Right! A much cleaner solution! If we are going to have Xen handle ECAM
> > > and emulating PCI host bridges, then we should go all the way and have
> > > Xen do everything about PCI.
> > 
> > Replying here because this thread has become so long that's hard to find a
> > good
> > place to put this information.
> > 
> > I've recently been told (f2f), that more complex passthrough (like Nvidia
> > vGPU
> > or Intel XenGT) work in a slightly different way, which seems to be a bit
> > incompatible with what we are proposing. I've been told that Nvidia vGPU
> > passthrough requires a driver in Dom0 (closed-source Nvidia code AFAIK),
> > and
> > that upon loading this driver a bunch of virtual functions appear out of the
> > blue in the PCI bus.
> > 
> > Now, if we completely hide passed-through devices from Dom0, it would be
> > impossible to load this driver, and thus to make the virtual functions appear.
> > I would like someone that's more familiar with this to comment, so I'm
> > adding
> > Paul and Anshul to the conversation.
> > 
> > To give some context to them, we were currently discussing to completely
> > hide
> > passthrough PCI devices from Dom0, and have Xen perform the reset of the
> > device. This would apply to PVH and ARM. Can you comment on whether
> > such
> > approach would work with things like vGPU passthrough?
> 
> Neither NVIDIA vGPU nor Intel GVT-g are pass-through. They both use emulation to synthesize GPU devices for guests and then use the actual GPU to service the commands sent by the guest driver to the virtual GPU. So, I think they fall outside the discussion here.

So in this case those devices would simply be assigned to Dom0, and everything
would be trapped/emulated there? (by QEMU or whatever dm we are using)

> AMD MxGPU is somewhat different in that it is an almost-SRIOV solution. I say 'almost' because the VF's are not truly independent and so some interception of accesses to certain registers is required, so that arbitration can be applied, or they can be blocked. In this case a dedicated driver in dom0 is required, and I believe it needs access to both the PF and all the VFs to function correctly. However, once initial set-up is done, I think the VFs could then be hidden from dom0. The PF is never passed-through and so there should be no issue in leaving it visible to dom0.

The approach we where thinking of is hiding everything from Dom0 when it
boots, so that Dom0 would never really see those devices. This would be done by
Xen scanning the PCI bus and any ECAM areas. DEvices that first need to be
assigned to Dom0 and then hidden where not part of the approach here.

> There is a further complication with GVT-d (Intel's term for GPU pass-through) also because I believe there is also some initial set-up required and some supporting emulation (e.g. Intel's guest driver expects there to be an ISA bridge along with the GPU) which may need access to the real GPU. It is also possible that, once this set-up is done, the GPU can then be hidden from dom0 but I'm not sure because I was not involved with that code.

And then I guess some MMIO regions are assigned to the guest, and some dm
performs the trapping of the accesses to the configuration space?

> Full pass-through of NVIDIA and AMD GPUs does not involve access from dom0 at all though, so I don't think there should be any complication there.

Yes, in that case they would be treated as regular PCI devices, no involvement
from Dom0 would be needed. I'm more worried about this mixed cases, where some
Dom0 interaction is needed in order to perform the passthrough.

> Does that all make sense?

I guess, could you please keep an eye on further design documents? Just to
make sure that what's described here would work for the more complex
passthrough scenarios that XenServer supports.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-10 12:57                 ` Roger Pau Monne
@ 2017-02-10 13:02                   ` Paul Durrant
  2017-02-10 21:04                     ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Paul Durrant @ 2017-02-10 13:02 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Anshul Makkar, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, Julien Grall,
	alistair.francis, Shanker Donthineni, xen-devel, manish.jaggi,
	Campbell Sean

> -----Original Message-----
[snip]
> > Neither NVIDIA vGPU nor Intel GVT-g are pass-through. They both use
> emulation to synthesize GPU devices for guests and then use the actual GPU
> to service the commands sent by the guest driver to the virtual GPU. So, I
> think they fall outside the discussion here.
> 
> So in this case those devices would simply be assigned to Dom0, and
> everything
> would be trapped/emulated there? (by QEMU or whatever dm we are using)
> 

Basically, yes. (Actually QEMU isn't the dm in either case).

> > AMD MxGPU is somewhat different in that it is an almost-SRIOV solution. I
> say 'almost' because the VF's are not truly independent and so some
> interception of accesses to certain registers is required, so that arbitration
> can be applied, or they can be blocked. In this case a dedicated driver in
> dom0 is required, and I believe it needs access to both the PF and all the VFs
> to function correctly. However, once initial set-up is done, I think the VFs
> could then be hidden from dom0. The PF is never passed-through and so
> there should be no issue in leaving it visible to dom0.
> 
> The approach we where thinking of is hiding everything from Dom0 when it
> boots, so that Dom0 would never really see those devices. This would be
> done by
> Xen scanning the PCI bus and any ECAM areas. DEvices that first need to be
> assigned to Dom0 and then hidden where not part of the approach here.

That won't work for MxGPU then.

> 
> > There is a further complication with GVT-d (Intel's term for GPU pass-
> through) also because I believe there is also some initial set-up required and
> some supporting emulation (e.g. Intel's guest driver expects there to be an
> ISA bridge along with the GPU) which may need access to the real GPU. It is
> also possible that, once this set-up is done, the GPU can then be hidden from
> dom0 but I'm not sure because I was not involved with that code.
> 
> And then I guess some MMIO regions are assigned to the guest, and some
> dm
> performs the trapping of the accesses to the configuration space?
> 

Well, that's how passthrough to HVM guests works in general at the moment. My point was that there's still some need to see the device in the tools domain before it gets passed through.

> > Full pass-through of NVIDIA and AMD GPUs does not involve access from
> dom0 at all though, so I don't think there should be any complication there.
> 
> Yes, in that case they would be treated as regular PCI devices, no
> involvement
> from Dom0 would be needed. I'm more worried about this mixed cases,
> where some
> Dom0 interaction is needed in order to perform the passthrough.
> 
> > Does that all make sense?
> 
> I guess, could you please keep an eye on further design documents? Just to
> make sure that what's described here would work for the more complex
> passthrough scenarios that XenServer supports.

Ok, I will watch the list more closely for pass-through discussions, but please keep me cc-ed on anything you think may be relevant.

Thanks,

  Paul

> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-10 13:02                   ` Paul Durrant
@ 2017-02-10 21:04                     ` Stefano Stabellini
  0 siblings, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-10 21:04 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Punit Agrawal, Steve Capper,
	Andrew Cooper, Jiandi An, Julien Grall, alistair.francis,
	Shanker Donthineni, xen-devel, Anshul Makkar, manish.jaggi,
	Campbell Sean, Roger Pau Monne

On Fri, 10 Feb 2017, Paul Durrant wrote:
> > -----Original Message-----
> [snip]
> > > Neither NVIDIA vGPU nor Intel GVT-g are pass-through. They both use
> > emulation to synthesize GPU devices for guests and then use the actual GPU
> > to service the commands sent by the guest driver to the virtual GPU. So, I
> > think they fall outside the discussion here.
> > 
> > So in this case those devices would simply be assigned to Dom0, and
> > everything
> > would be trapped/emulated there? (by QEMU or whatever dm we are using)
> > 
> 
> Basically, yes. (Actually QEMU isn't the dm in either case).
> 
> > > AMD MxGPU is somewhat different in that it is an almost-SRIOV solution. I
> > say 'almost' because the VF's are not truly independent and so some
> > interception of accesses to certain registers is required, so that arbitration
> > can be applied, or they can be blocked. In this case a dedicated driver in
> > dom0 is required, and I believe it needs access to both the PF and all the VFs
> > to function correctly. However, once initial set-up is done, I think the VFs
> > could then be hidden from dom0. The PF is never passed-through and so
> > there should be no issue in leaving it visible to dom0.
> > 
> > The approach we where thinking of is hiding everything from Dom0 when it
> > boots, so that Dom0 would never really see those devices. This would be
> > done by
> > Xen scanning the PCI bus and any ECAM areas. DEvices that first need to be
> > assigned to Dom0 and then hidden where not part of the approach here.
> 
> That won't work for MxGPU then.
> 
> > 
> > > There is a further complication with GVT-d (Intel's term for GPU pass-
> > through) also because I believe there is also some initial set-up required and
> > some supporting emulation (e.g. Intel's guest driver expects there to be an
> > ISA bridge along with the GPU) which may need access to the real GPU. It is
> > also possible that, once this set-up is done, the GPU can then be hidden from
> > dom0 but I'm not sure because I was not involved with that code.
> > 
> > And then I guess some MMIO regions are assigned to the guest, and some
> > dm
> > performs the trapping of the accesses to the configuration space?
> > 
> 
> Well, that's how passthrough to HVM guests works in general at the moment. My point was that there's still some need to see the device in the tools domain before it gets passed through.

I understand and I think it is OK. Pretty much like you wrote, these are
not passthrough scenarios, they are a sort of hardware supported
emulated/PV graphics (for a lack of better term), so it's natural for
these devices to be assigned to dom0 (or another backend domain).


> > > Full pass-through of NVIDIA and AMD GPUs does not involve access from
> > dom0 at all though, so I don't think there should be any complication there.
> > 
> > Yes, in that case they would be treated as regular PCI devices, no
> > involvement
> > from Dom0 would be needed. I'm more worried about this mixed cases,
> > where some
> > Dom0 interaction is needed in order to perform the passthrough.
> > 
> > > Does that all make sense?
> > 
> > I guess, could you please keep an eye on further design documents? Just to
> > make sure that what's described here would work for the more complex
> > passthrough scenarios that XenServer supports.
> 
> Ok, I will watch the list more closely for pass-through discussions, but please keep me cc-ed on anything you think may be relevant.

Thank you, Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 15:33                 ` Edgar E. Iglesias
  2017-02-02 23:12                   ` Stefano Stabellini
@ 2017-02-13 15:35                   ` Julien Grall
  2017-02-22  4:03                     ` Edgar E. Iglesias
  1 sibling, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-02-13 15:35 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On 02/02/17 15:33, Edgar E. Iglesias wrote:
> On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
>> On 31/01/2017 19:06, Edgar E. Iglesias wrote:
>>> On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
>> Thank you for the documentation. I am trying to understand if we could move
>> initialization in Xen as suggested by Stefano. I looked at the driver in
>> Linux and the code looks simple not many dependencies. However, I was not
>> able to find where the Gigabit Transceivers are configured. Do you have any
>> link to the code for that?
>
> Hi Julien,

Hi Edgar,

>
> I suspect that this setup has previously been done by the initial bootloader
> auto-generated from design configuration tools.
>
> Now, this is moving into Linux.

Do you know why they decide to move the code in Linux? What would be the 
problem to let the bootloader configuring the GT?

> There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> You can see it here:
> https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
>
> DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> &sata {
>         phy-names = "sata-phy";
>         phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> };
>
> I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> DTS, Kernel etc.

I've found a device tree on the github from the ZCU102: 
zynqmp-zcu102.dts, it looks like there is no use of PHY for the pcie so 
far.

Lets imagine in the future, pcie will use the PHY. If we decide to 
initialize the hostbridge in Xen, we would also have to pull the PHY 
code in the hypervisor. Leaving aside the problem to pull more code in 
Xen, this is not nice because the PHY is used by different components 
(e.g SATA, USB). So Xen and DOM0 would have to share the PHY.

For Xen POV, the best solution would be the bootloader initializing the 
PHY because starting Xen. So we can keep all the hostbridge 
(initialization + access) in Xen.

If it is not possible, then I would prefer to see the hostbridge 
initialization in DOM0.

>
> If you are looking for a platform to get started, an option could be if I get you a build of
> our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
> These models are friendly wrt. PHY configs and initialization sequences, it will
> accept pretty much any sequence and still work. This would allow you to focus on
> architectural issues rather than exact details of init sequences (which we can
> deal with later).

 From my understanding the problem is where the hostbridge should be 
initialized. In an ideal world, I think this is the goal of the 
bootloader. If it is not possible then depending on the complexity, the 
initialization would have to be done either in Xen or DOM0.

I guess this could be decided on case by case basis. I will suggest 
different possibility in the design document.

[...]

>>
>> From a design point of view, it would make more sense to have the MSI
>> controller driver in Xen as the hostbridge emulation for guest will also
>> live there.
>>
>> So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
>> to receive MSI. The same way would be the best, and I guess non-PV if
>> possible. I know you are looking to boot unmodified OS in a VM. This would
>> mean we need to emulate the MSI controller and potentially xilinx PCI
>> controller. How much are you willing to modify the OS?
>
> Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> things are very open and we could design with pretty much anything in mind.
>
> Yes, we could perhaps include a very small model with most registers dummied.
> Implementing the MSI read FIFO would allow us to:
>
> 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
>    IRQ as on real HW.
>
> 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.

The Xilinx PCIe hostbridge is not the only hostbridge having MSI 
controller embedded. So I would like to see a generic solution if 
possible. This would avoid to increase the code required for emulation 
in Xen.

My concern with a FIFO is it will require an upper bound to avoid using 
to much memory in Xen. What if the FIFO is full? Will you drop MSI?

>> Regarding the MSI doorbell, I have seen it is configured by the software
>> using a physical address of a page allocated in the RAM. When the PCI
>> devices is writing into the doorbell does the access go through the SMMU?
>
> That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.
>
>> Regardless the answer, I think we would need to map the MSI doorbell page in
>> the guest. Meaning that even if we trap MSI configuration access, a guess
>> could DMA in the page. So if I am not mistaken, MSI would be insecure in
>> this case :/.
>>
>> Or maybe we could avoid mapping the doorbell in the guest and let Xen
>> receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
>> value and write into the real MSI doorbell. Not sure if it would works
>> thought.
>
> Yeah, this is a problem.
> I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
> Without the data, I'm not sure how we would distinguish between different MSI's from the same device.

You are right, you don't get the data written and therefore it is not 
possible to distinguish MSIs. I got confused with the data abort trap.

>
> Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.
> BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?

That's limitation is only for your current version of the hardware correct?

> Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?

Guest memory would have to be direct mapped as we do for DOM0. However, 
it means the guest should be able to parse the firmware table (DT, ACPI) 
in order to know where the RAM banks has been positioned.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-10  1:01                       ` Stefano Stabellini
@ 2017-02-13 15:39                         ` Julien Grall
  2017-02-13 19:59                           ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-02-13 15:39 UTC (permalink / raw)
  To: Stefano Stabellini, Edgar E. Iglesias
  Cc: Edgar E. Iglesias, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Campbell Sean,
	xen-devel, manish.jaggi, Shanker Donthineni, Roger Pau Monné

Hi Stefano,

On 10/02/17 01:01, Stefano Stabellini wrote:
> On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
>> A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
>> PCI DMA devs could be locked in to only be able to access this mem + MSI doorbell.
>> Guests can still screw each other up but at least it becomes harder to read/write directly from each others OS memory.
>> It may not be worth the effort though....
>
> Actually, we do have the swiotlb in Dom0, which can be used to bounce
> DMA requests over a buffer that has been previously setup to be DMA safe
> using an hypercall. That is how the swiotlb is used on x86. On ARM it is
> used to issue cache flushes via hypercall, but it could be adapted to do
> both. It would degrade performance, due to the additional memcpy, but it
> would work, I believe.

A while ago, Globallogic suggested to use direct memory mapping for the 
guest to allow guest using DMA on platform not supporting SMMU.

I believe we can use the same trick on platform where SMMUs can not 
distinguish PCI devices.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 15:40                 ` Roger Pau Monné
@ 2017-02-13 16:22                   ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-02-13 16:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, alistair.francis,
	Shanker Donthineni, xen-devel, manish.jaggi, Campbell Sean

Hi Roger,

On 02/02/17 15:40, Roger Pau Monné wrote:
> On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
>> Or maybe we could avoid mapping the doorbell in the guest and let Xen
>> receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
>> value and write into the real MSI doorbell. Not sure if it would works
>> thought.
>
> AFAIK (and I might be wrong) you can only know the address that caused the
> fault, but not the data that was attempted to be written there. TBH, I wouldn't
> expect this approach to work.

You are right, I got confused with the data abort path. So I guess there 
is no way to do secure MSI in this case :/

Cheers,

>
> Roger.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-13 15:39                         ` Julien Grall
@ 2017-02-13 19:59                           ` Stefano Stabellini
  2017-02-14 17:21                             ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-13 19:59 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, alistair.francis,
	Campbell Sean, xen-devel, Edgar E. Iglesias, manish.jaggi,
	Shanker Donthineni, Roger Pau Monné

On Mon, 13 Feb 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 10/02/17 01:01, Stefano Stabellini wrote:
> > On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
> > > A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
> > > PCI DMA devs could be locked in to only be able to access this mem + MSI
> > > doorbell.
> > > Guests can still screw each other up but at least it becomes harder to
> > > read/write directly from each others OS memory.
> > > It may not be worth the effort though....
> > 
> > Actually, we do have the swiotlb in Dom0, which can be used to bounce
> > DMA requests over a buffer that has been previously setup to be DMA safe
> > using an hypercall. That is how the swiotlb is used on x86. On ARM it is
> > used to issue cache flushes via hypercall, but it could be adapted to do
> > both. It would degrade performance, due to the additional memcpy, but it
> > would work, I believe.
> 
> A while ago, Globallogic suggested to use direct memory mapping for the guest
> to allow guest using DMA on platform not supporting SMMU.
> 
> I believe we can use the same trick on platform where SMMUs can not
> distinguish PCI devices.

Yes, that would work, but only on platforms with a very limited number
of guests. However, it might still be a very common use-case on a
platform such as the Zynq MPSoC.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-13 19:59                           ` Stefano Stabellini
@ 2017-02-14 17:21                             ` Julien Grall
  2017-02-14 18:20                               ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-02-14 17:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar E. Iglesias, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Campbell Sean,
	xen-devel, Edgar E. Iglesias, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

Hi Stefano,

On 02/13/2017 07:59 PM, Stefano Stabellini wrote:
> On Mon, 13 Feb 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 10/02/17 01:01, Stefano Stabellini wrote:
>>> On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
>>>> A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
>>>> PCI DMA devs could be locked in to only be able to access this mem + MSI
>>>> doorbell.
>>>> Guests can still screw each other up but at least it becomes harder to
>>>> read/write directly from each others OS memory.
>>>> It may not be worth the effort though....
>>>
>>> Actually, we do have the swiotlb in Dom0, which can be used to bounce
>>> DMA requests over a buffer that has been previously setup to be DMA safe
>>> using an hypercall. That is how the swiotlb is used on x86. On ARM it is
>>> used to issue cache flushes via hypercall, but it could be adapted to do
>>> both. It would degrade performance, due to the additional memcpy, but it
>>> would work, I believe.
>>
>> A while ago, Globallogic suggested to use direct memory mapping for the guest
>> to allow guest using DMA on platform not supporting SMMU.
>>
>> I believe we can use the same trick on platform where SMMUs can not
>> distinguish PCI devices.
>
> Yes, that would work, but only on platforms with a very limited number
> of guests. However, it might still be a very common use-case on a
> platform such as the Zynq MPSoC.

Can you explain why you think this could only work with limited number
of guests?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-14 17:21                             ` Julien Grall
@ 2017-02-14 18:20                               ` Stefano Stabellini
  2017-02-14 20:18                                 ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-02-14 18:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Steve Capper,
	Andrew Cooper, Jiandi An, Punit Agrawal, alistair.francis,
	Campbell Sean, xen-devel, Edgar E. Iglesias, manish.jaggi,
	Shanker Donthineni, Roger Pau Monné

On Tue, 14 Feb 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 02/13/2017 07:59 PM, Stefano Stabellini wrote:
> > On Mon, 13 Feb 2017, Julien Grall wrote:
> >> Hi Stefano,
> >>
> >> On 10/02/17 01:01, Stefano Stabellini wrote:
> >>> On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
> >>>> A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
> >>>> PCI DMA devs could be locked in to only be able to access this mem + MSI
> >>>> doorbell.
> >>>> Guests can still screw each other up but at least it becomes harder to
> >>>> read/write directly from each others OS memory.
> >>>> It may not be worth the effort though....
> >>>
> >>> Actually, we do have the swiotlb in Dom0, which can be used to bounce
> >>> DMA requests over a buffer that has been previously setup to be DMA safe
> >>> using an hypercall. That is how the swiotlb is used on x86. On ARM it is
> >>> used to issue cache flushes via hypercall, but it could be adapted to do
> >>> both. It would degrade performance, due to the additional memcpy, but it
> >>> would work, I believe.
> >>
> >> A while ago, Globallogic suggested to use direct memory mapping for the guest
> >> to allow guest using DMA on platform not supporting SMMU.
> >>
> >> I believe we can use the same trick on platform where SMMUs can not
> >> distinguish PCI devices.
> >
> > Yes, that would work, but only on platforms with a very limited number
> > of guests. However, it might still be a very common use-case on a
> > platform such as the Zynq MPSoC.
> 
> Can you explain why you think this could only work with limited number
> of guests?

Because the memory regions would need to be mapped 1:1, right? And often
devices have less than 4G DMA addresses limitations?

I can see how it could work well with 1-4 guests, but I don't think it
could work in a typical server environment with many more guests. Or am
I missing something?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-14 18:20                               ` Stefano Stabellini
@ 2017-02-14 20:18                                 ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-02-14 20:18 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar E. Iglesias, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Campbell Sean,
	xen-devel, Edgar E. Iglesias, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

Hi Stefano,

On 02/14/2017 06:20 PM, Stefano Stabellini wrote:
> On Tue, 14 Feb 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 02/13/2017 07:59 PM, Stefano Stabellini wrote:
>>> On Mon, 13 Feb 2017, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 10/02/17 01:01, Stefano Stabellini wrote:
>>>>> On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
>>>>>> A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
>>>>>> PCI DMA devs could be locked in to only be able to access this mem + MSI
>>>>>> doorbell.
>>>>>> Guests can still screw each other up but at least it becomes harder to
>>>>>> read/write directly from each others OS memory.
>>>>>> It may not be worth the effort though....
>>>>>
>>>>> Actually, we do have the swiotlb in Dom0, which can be used to bounce
>>>>> DMA requests over a buffer that has been previously setup to be DMA safe
>>>>> using an hypercall. That is how the swiotlb is used on x86. On ARM it is
>>>>> used to issue cache flushes via hypercall, but it could be adapted to do
>>>>> both. It would degrade performance, due to the additional memcpy, but it
>>>>> would work, I believe.
>>>>
>>>> A while ago, Globallogic suggested to use direct memory mapping for the guest
>>>> to allow guest using DMA on platform not supporting SMMU.
>>>>
>>>> I believe we can use the same trick on platform where SMMUs can not
>>>> distinguish PCI devices.
>>>
>>> Yes, that would work, but only on platforms with a very limited number
>>> of guests. However, it might still be a very common use-case on a
>>> platform such as the Zynq MPSoC.
>>
>> Can you explain why you think this could only work with limited number
>> of guests?
>
> Because the memory regions would need to be mapped 1:1, right?

Correct. In your case, the DMA buffer would have to be contiguous in the 
memory.

> And often
> devices have less than 4G DMA addresses limitations?

Many platform has more than 4GB of memory today, I would be surprised if 
devices still have this 32-bit DMA address limitation. But maybe I am 
wrong here.

If it that is the case, you would still need to have memory freed below 
4GB for the swiotlb.

>
> I can see how it could work well with 1-4 guests, but I don't think it
> could work in a typical server environment with many more guests. Or am
> I missing something?

I expect all servers to be SBSA compliant and AFAICT the SBSA mandates 
an SMMU for I/O virtualization (see 8.6 in ARM-DEN-0029 v3.0).

Furthermore, for embedded the cost of using swiotlb might not be 
acceptable (you add an extra copy).

In the server case, I would not bother to support properly platform with 
broken SMMU. For embedded, I think it would be acceptable to have direct 
mapping.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-13 15:35                   ` Julien Grall
@ 2017-02-22  4:03                     ` Edgar E. Iglesias
  2017-02-23 16:47                       ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-02-22  4:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper

On Mon, Feb 13, 2017 at 03:35:19PM +0000, Julien Grall wrote:
> On 02/02/17 15:33, Edgar E. Iglesias wrote:
> >On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> >>On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> >>>On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> >>Thank you for the documentation. I am trying to understand if we could move
> >>initialization in Xen as suggested by Stefano. I looked at the driver in
> >>Linux and the code looks simple not many dependencies. However, I was not
> >>able to find where the Gigabit Transceivers are configured. Do you have any
> >>link to the code for that?
> >
> >Hi Julien,
> 
> Hi Edgar,

Hi Julien,

Sorry for the late reply..


> 
> >
> >I suspect that this setup has previously been done by the initial bootloader
> >auto-generated from design configuration tools.
> >
> >Now, this is moving into Linux.
> 
> Do you know why they decide to move the code in Linux? What would be the
> problem to let the bootloader configuring the GT?


No, I'm not sure why this approach was not used. The only thing I can think of
is a runtime configuration approach.


> 
> >There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> >You can see it here:
> >https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> >
> >DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> >&sata {
> >        phy-names = "sata-phy";
> >        phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> >};
> >
> >I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> >DTS, Kernel etc.
> 
> I've found a device tree on the github from the ZCU102: zynqmp-zcu102.dts,
> it looks like there is no use of PHY for the pcie so far.
> 
> Lets imagine in the future, pcie will use the PHY. If we decide to
> initialize the hostbridge in Xen, we would also have to pull the PHY code in
> the hypervisor. Leaving aside the problem to pull more code in Xen, this is
> not nice because the PHY is used by different components (e.g SATA, USB). So
> Xen and DOM0 would have to share the PHY.
> 
> For Xen POV, the best solution would be the bootloader initializing the PHY
> because starting Xen. So we can keep all the hostbridge (initialization +
> access) in Xen.
> 
> If it is not possible, then I would prefer to see the hostbridge
> initialization in DOM0.

Yes, I agree that the GT setup in the bootloader is very attractive.
I don't think hte setup sequence is complicated, we can perhaps even do it
on the commandline in u-boot or xsdb. I'll have to check.


> 
> >
> >If you are looking for a platform to get started, an option could be if I get you a build of
> >our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
> >These models are friendly wrt. PHY configs and initialization sequences, it will
> >accept pretty much any sequence and still work. This would allow you to focus on
> >architectural issues rather than exact details of init sequences (which we can
> >deal with later).
> 
> From my understanding the problem is where the hostbridge should be
> initialized. In an ideal world, I think this is the goal of the bootloader.
> If it is not possible then depending on the complexity, the initialization
> would have to be done either in Xen or DOM0.
> 
> I guess this could be decided on case by case basis. I will suggest
> different possibility in the design document.
> 
> [...]
> 
> >>
> >>From a design point of view, it would make more sense to have the MSI
> >>controller driver in Xen as the hostbridge emulation for guest will also
> >>live there.
> >>
> >>So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> >>to receive MSI. The same way would be the best, and I guess non-PV if
> >>possible. I know you are looking to boot unmodified OS in a VM. This would
> >>mean we need to emulate the MSI controller and potentially xilinx PCI
> >>controller. How much are you willing to modify the OS?
> >
> >Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> >things are very open and we could design with pretty much anything in mind.
> >
> >Yes, we could perhaps include a very small model with most registers dummied.
> >Implementing the MSI read FIFO would allow us to:
> >
> >1. Inject the MSI doorbell SPI into guests. The guest will then see the same
> >   IRQ as on real HW.
> >
> >2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> 
> The Xilinx PCIe hostbridge is not the only hostbridge having MSI controller
> embedded. So I would like to see a generic solution if possible. This would
> avoid to increase the code required for emulation in Xen.
> 
> My concern with a FIFO is it will require an upper bound to avoid using to
> much memory in Xen. What if the FIFO is full? Will you drop MSI?

The FIFO I'm refering to is a FIFO in the MSI controller itself.
I agree that this wouldn't be generic though....


> 
> >>Regarding the MSI doorbell, I have seen it is configured by the software
> >>using a physical address of a page allocated in the RAM. When the PCI
> >>devices is writing into the doorbell does the access go through the SMMU?
> >
> >That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.
> >
> >>Regardless the answer, I think we would need to map the MSI doorbell page in
> >>the guest. Meaning that even if we trap MSI configuration access, a guess
> >>could DMA in the page. So if I am not mistaken, MSI would be insecure in
> >>this case :/.
> >>
> >>Or maybe we could avoid mapping the doorbell in the guest and let Xen
> >>receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> >>value and write into the real MSI doorbell. Not sure if it would works
> >>thought.
> >
> >Yeah, this is a problem.
> >I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
> >Without the data, I'm not sure how we would distinguish between different MSI's from the same device.
> 
> You are right, you don't get the data written and therefore it is not
> possible to distinguish MSIs. I got confused with the data abort trap.
> 
> >
> >Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.
> >BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?
> 
> That's limitation is only for your current version of the hardware correct?

Yes :-)


> 
> >Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?
> 
> Guest memory would have to be direct mapped as we do for DOM0. However, it
> means the guest should be able to parse the firmware table (DT, ACPI) in
> order to know where the RAM banks has been positioned.
> 
> Cheers,
> 
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-22  4:03                     ` Edgar E. Iglesias
@ 2017-02-23 16:47                       ` Julien Grall
  2017-03-02 21:13                         ` Edgar E. Iglesias
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-02-23 16:47 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Stefano Stabellini, Wei Chen, Campbell Sean, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, xen-devel,
	Roger Pau Monné,
	manish.jaggi, Shanker Donthineni, Steve Capper


Hi Edgar,

On 22/02/17 04:03, Edgar E. Iglesias wrote:
> On Mon, Feb 13, 2017 at 03:35:19PM +0000, Julien Grall wrote:
>> On 02/02/17 15:33, Edgar E. Iglesias wrote:
>>> On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
>>>> On 31/01/2017 19:06, Edgar E. Iglesias wrote:
>>>>> On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
>>> I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
>>> DTS, Kernel etc.
>>
>> I've found a device tree on the github from the ZCU102: zynqmp-zcu102.dts,
>> it looks like there is no use of PHY for the pcie so far.
>>
>> Lets imagine in the future, pcie will use the PHY. If we decide to
>> initialize the hostbridge in Xen, we would also have to pull the PHY code in
>> the hypervisor. Leaving aside the problem to pull more code in Xen, this is
>> not nice because the PHY is used by different components (e.g SATA, USB). So
>> Xen and DOM0 would have to share the PHY.
>>
>> For Xen POV, the best solution would be the bootloader initializing the PHY
>> because starting Xen. So we can keep all the hostbridge (initialization +
>> access) in Xen.
>>
>> If it is not possible, then I would prefer to see the hostbridge
>> initialization in DOM0.
>
>>>
>>> I suspect that this setup has previously been done by the initial bootloader
>>> auto-generated from design configuration tools.
>>>
>>> Now, this is moving into Linux.
>>
>> Do you know why they decide to move the code in Linux? What would be the
>> problem to let the bootloader configuring the GT?
>
>
> No, I'm not sure why this approach was not used. The only thing I can think of
> is a runtime configuration approach.
>
>
>>
>>> There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
>>> You can see it here:
>>> https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
>>>
>>> DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
>>> &sata {
>>>        phy-names = "sata-phy";
>>>        phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
>>> };
>>>
> Yes, I agree that the GT setup in the bootloader is very attractive.
> I don't think hte setup sequence is complicated, we can perhaps even do it
> on the commandline in u-boot or xsdb. I'll have to check.

That might simplify things for Xen. I would be happy to consider any 
other solutions. It might probably be worth to kick a separate thread 
regarding how to support Xilinx hostcontroller in Xen.

For now, I will explain in the design document the different situation 
we can encounter with an hostbridge and will leave open the design for 
initialization bits.


[...]

>>>>
>>> >From a design point of view, it would make more sense to have the MSI
>>>> controller driver in Xen as the hostbridge emulation for guest will also
>>>> live there.
>>>>
>>>> So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
>>>> to receive MSI. The same way would be the best, and I guess non-PV if
>>>> possible. I know you are looking to boot unmodified OS in a VM. This would
>>>> mean we need to emulate the MSI controller and potentially xilinx PCI
>>>> controller. How much are you willing to modify the OS?
>>>
>>> Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
>>> things are very open and we could design with pretty much anything in mind.
>>>
>>> Yes, we could perhaps include a very small model with most registers dummied.
>>> Implementing the MSI read FIFO would allow us to:
>>>
>>> 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
>>>   IRQ as on real HW.
>>>
>>> 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
>>
>> The Xilinx PCIe hostbridge is not the only hostbridge having MSI controller
>> embedded. So I would like to see a generic solution if possible. This would
>> avoid to increase the code required for emulation in Xen.
>>
>> My concern with a FIFO is it will require an upper bound to avoid using to
>> much memory in Xen. What if the FIFO is full? Will you drop MSI?
>
> The FIFO I'm refering to is a FIFO in the MSI controller itself.

Sorry if it was unclear. I was trying to explain what would be the issue 
to emulate this kind of MSI controller in Xen not using them in Xen.

> I agree that this wouldn't be generic though....

An idea would be to emulate a GICv2m frame (see appendix E in 
ARM-DEN-0029 v3.0) for the guest. The frame is able to handle a certain 
number of SPIs. Each MSI will be presented as a uniq SPI. The 
association SPI <-> MSI is left at the discretion of the driver.

A guest will discover the number of SPIs by reading the register 
MSI_TYPER. To initialize MSI, the guest will compose the message using 
the GICv2m doorbell (see register MSI_SETSPI_NS in the frame) and the 
SPI allocated. As the PCI hostbridge will be emulated for the guest, any 
write to the MSI space would be trapped. Then, I would expect Xen to 
allocate an host MSI, compose a new message using the doorbell of the 
Xilinx MSI controller and then write into the host PCI configuration space.

MSI will be received by the hypervisor that will look-up for the domain 
where it needs to be injected and will inject the SPI configured by the Xen.

The frame is always 4KB and the msi is embedded in it. This means we 
cannot map the virtual GICv2m MSI doorbell into the Xilinx MSI doorbell. 
The problem will also happen when using virtual ITS because a guest may 
have devices assigned using different physical ITS. However each ITS has 
it's own doorbell, therefore we would have to map all the ITS doorbell 
in the guest as we may not know which ITS will be used for hotplug devices.

To solve this problem, I would suggest to have a reserved range in the 
guest address space to map MSI doorbell.

This solution is the most generic I have in mind. The driver for the 
guest is very simple and the amount of emulation required is quite 
limited. Any opinions?

I am also open to any other suggestions.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-23 16:47                       ` Julien Grall
@ 2017-03-02 21:13                         ` Edgar E. Iglesias
  0 siblings, 0 replies; 82+ messages in thread
From: Edgar E. Iglesias @ 2017-03-02 21:13 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Wei Chen, Campbell Sean,
	Andrew Cooper, Jiandi An, Punit Agrawal, Steve Capper,
	alistair.francis, xen-devel, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

On Thu, Feb 23, 2017 at 04:47:19PM +0000, Julien Grall wrote:
> 
> Hi Edgar,
> 
> On 22/02/17 04:03, Edgar E. Iglesias wrote:
> >On Mon, Feb 13, 2017 at 03:35:19PM +0000, Julien Grall wrote:
> >>On 02/02/17 15:33, Edgar E. Iglesias wrote:
> >>>On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> >>>>On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> >>>>>On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> >>>I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> >>>DTS, Kernel etc.
> >>
> >>I've found a device tree on the github from the ZCU102: zynqmp-zcu102.dts,
> >>it looks like there is no use of PHY for the pcie so far.
> >>
> >>Lets imagine in the future, pcie will use the PHY. If we decide to
> >>initialize the hostbridge in Xen, we would also have to pull the PHY code in
> >>the hypervisor. Leaving aside the problem to pull more code in Xen, this is
> >>not nice because the PHY is used by different components (e.g SATA, USB). So
> >>Xen and DOM0 would have to share the PHY.
> >>
> >>For Xen POV, the best solution would be the bootloader initializing the PHY
> >>because starting Xen. So we can keep all the hostbridge (initialization +
> >>access) in Xen.
> >>
> >>If it is not possible, then I would prefer to see the hostbridge
> >>initialization in DOM0.
> >
> >>>
> >>>I suspect that this setup has previously been done by the initial bootloader
> >>>auto-generated from design configuration tools.
> >>>
> >>>Now, this is moving into Linux.
> >>
> >>Do you know why they decide to move the code in Linux? What would be the
> >>problem to let the bootloader configuring the GT?
> >
> >
> >No, I'm not sure why this approach was not used. The only thing I can think of
> >is a runtime configuration approach.
> >
> >
> >>
> >>>There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> >>>You can see it here:
> >>>https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> >>>
> >>>DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> >>>&sata {
> >>>       phy-names = "sata-phy";
> >>>       phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> >>>};
> >>>
> >Yes, I agree that the GT setup in the bootloader is very attractive.
> >I don't think hte setup sequence is complicated, we can perhaps even do it
> >on the commandline in u-boot or xsdb. I'll have to check.
> 
> That might simplify things for Xen. I would be happy to consider any other
> solutions. It might probably be worth to kick a separate thread regarding
> how to support Xilinx hostcontroller in Xen.
> 
> For now, I will explain in the design document the different situation we
> can encounter with an hostbridge and will leave open the design for
> initialization bits.
> 
> 
> [...]
> 
> >>>>
> >>>>From a design point of view, it would make more sense to have the MSI
> >>>>controller driver in Xen as the hostbridge emulation for guest will also
> >>>>live there.
> >>>>
> >>>>So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> >>>>to receive MSI. The same way would be the best, and I guess non-PV if
> >>>>possible. I know you are looking to boot unmodified OS in a VM. This would
> >>>>mean we need to emulate the MSI controller and potentially xilinx PCI
> >>>>controller. How much are you willing to modify the OS?
> >>>
> >>>Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> >>>things are very open and we could design with pretty much anything in mind.
> >>>
> >>>Yes, we could perhaps include a very small model with most registers dummied.
> >>>Implementing the MSI read FIFO would allow us to:
> >>>
> >>>1. Inject the MSI doorbell SPI into guests. The guest will then see the same
> >>>  IRQ as on real HW.
> >>>
> >>>2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> >>
> >>The Xilinx PCIe hostbridge is not the only hostbridge having MSI controller
> >>embedded. So I would like to see a generic solution if possible. This would
> >>avoid to increase the code required for emulation in Xen.
> >>
> >>My concern with a FIFO is it will require an upper bound to avoid using to
> >>much memory in Xen. What if the FIFO is full? Will you drop MSI?
> >
> >The FIFO I'm refering to is a FIFO in the MSI controller itself.
> 
> Sorry if it was unclear. I was trying to explain what would be the issue to
> emulate this kind of MSI controller in Xen not using them in Xen.
> 
> >I agree that this wouldn't be generic though....
> 
> An idea would be to emulate a GICv2m frame (see appendix E in ARM-DEN-0029
> v3.0) for the guest. The frame is able to handle a certain number of SPIs.
> Each MSI will be presented as a uniq SPI. The association SPI <-> MSI is
> left at the discretion of the driver.
> 
> A guest will discover the number of SPIs by reading the register MSI_TYPER.
> To initialize MSI, the guest will compose the message using the GICv2m
> doorbell (see register MSI_SETSPI_NS in the frame) and the SPI allocated. As
> the PCI hostbridge will be emulated for the guest, any write to the MSI
> space would be trapped. Then, I would expect Xen to allocate an host MSI,
> compose a new message using the doorbell of the Xilinx MSI controller and
> then write into the host PCI configuration space.
> 
> MSI will be received by the hypervisor that will look-up for the domain
> where it needs to be injected and will inject the SPI configured by the Xen.
> 
> The frame is always 4KB and the msi is embedded in it. This means we cannot
> map the virtual GICv2m MSI doorbell into the Xilinx MSI doorbell. The
> problem will also happen when using virtual ITS because a guest may have
> devices assigned using different physical ITS. However each ITS has it's own
> doorbell, therefore we would have to map all the ITS doorbell in the guest
> as we may not know which ITS will be used for hotplug devices.
> 
> To solve this problem, I would suggest to have a reserved range in the guest
> address space to map MSI doorbell.
> 
> This solution is the most generic I have in mind. The driver for the guest
> is very simple and the amount of emulation required is quite limited. Any
> opinions?

Yes, GICv2m is probably as generic and simple as we can get.
Sounds good as a starting point, if we run into something we can reconsider.

Thanks,
Edgar



> 
> I am also open to any other suggestions.
> 
> Cheers,
> 
> -- 
> Julien Grall
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-02-02 23:06             ` Stefano Stabellini
@ 2017-03-08 19:06               ` Julien Grall
  2017-03-08 19:12                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-03-08 19:06 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Steve Capper, Andrew Cooper, Jiandi An, Punit Agrawal,
	alistair.francis, Shanker Donthineni, xen-devel, manish.jaggi,
	Campbell Sean, Roger Pau Monné

Hi,

On 02/02/17 23:06, Stefano Stabellini wrote:
> On Thu, 2 Feb 2017, Julien Grall wrote:
>> On 01/02/17 10:55, Roger Pau Monné wrote:
>>> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>>>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>> For DT, I would have a fallback on mapping the root complex to DOM0 if we
>> don't support it. So DOM0 could still use PCI.
>>
>> For ACPI, I am expecting all the platform ECAM compliant or require few
>> quirks. So I would mandate the support of the root complex in Xen in order to
>> get PCI supported.
>
> Sound good. Ack.

I am currently rewriting the design document to take into account all 
the comments and follow the path to have the host bridge in Xen and DOM0 
will get an emulated one.

I began to look at scanning and configuring PCI devices in Xen. Looking 
at the PCI firmware specification, the firmware is not required to 
configure the BAR register other than for boot and console devices. This 
means an Operating System (or the hypervisor in our case) may have to 
configure some devices.

In order to configure the BAR register, Xen would need to know where are 
the PCI resources. On ACPI they can be found in ASL, however Xen is not 
able to parse it. In the case of Device Tree with can retrieve the PCI 
resources using the property "ranges".

I can see a couple of solutions:
	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 
should see all the PCI devices and therefore will not be possible to 
hide from DOM0 if we know at boot a device will be used by a guest (i.e 
something similar to pciback.hide but directly handled in Xen).
	2# Add an ASL interpreter in Xen. Roger mentioned that openbsd as a 
DSDT parser in 4000 lines (see [1]).

Any opinions?

Cheers,

[1] https://github.com/openbsd/src/blob/master/sys/dev/acpi/dsdt.c

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-08 19:06               ` Julien Grall
@ 2017-03-08 19:12                 ` Konrad Rzeszutek Wilk
  2017-03-08 19:55                   ` Stefano Stabellini
  2017-03-09  2:59                   ` Roger Pau Monné
  0 siblings, 2 replies; 82+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-03-08 19:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Campbell Sean,
	xen-devel, manish.jaggi, Shanker Donthineni, Roger Pau Monné

On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> Hi,
> 
> On 02/02/17 23:06, Stefano Stabellini wrote:
> > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > On 01/02/17 10:55, Roger Pau Monné wrote:
> > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > For DT, I would have a fallback on mapping the root complex to DOM0 if we
> > > don't support it. So DOM0 could still use PCI.
> > > 
> > > For ACPI, I am expecting all the platform ECAM compliant or require few
> > > quirks. So I would mandate the support of the root complex in Xen in order to
> > > get PCI supported.
> > 
> > Sound good. Ack.
> 
> I am currently rewriting the design document to take into account all the
> comments and follow the path to have the host bridge in Xen and DOM0 will
> get an emulated one.
> 
> I began to look at scanning and configuring PCI devices in Xen. Looking at
> the PCI firmware specification, the firmware is not required to configure
> the BAR register other than for boot and console devices. This means an
> Operating System (or the hypervisor in our case) may have to configure some
> devices.
> 
> In order to configure the BAR register, Xen would need to know where are the
> PCI resources. On ACPI they can be found in ASL, however Xen is not able to
> parse it. In the case of Device Tree with can retrieve the PCI resources
> using the property "ranges".
> 
> I can see a couple of solutions:
> 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
> see all the PCI devices and therefore will not be possible to hide from DOM0
> if we know at boot a device will be used by a guest (i.e something similar
> to pciback.hide but directly handled in Xen).

.. this as for SR-IOV devices you need the drivers to kick the hardware
to generate the new bus addresses. And those (along with the BAR regions) are
not visible in ACPI (they are constructued dynamically).


> 	2# Add an ASL interpreter in Xen. Roger mentioned that openbsd as a DSDT
> parser in 4000 lines (see [1]).
> 
> Any opinions?
> 
> Cheers,
> 
> [1] https://github.com/openbsd/src/blob/master/sys/dev/acpi/dsdt.c
> 
> -- 
> Julien Grall
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-08 19:12                 ` Konrad Rzeszutek Wilk
@ 2017-03-08 19:55                   ` Stefano Stabellini
  2017-03-08 21:51                     ` Julien Grall
  2017-03-09  2:59                   ` Roger Pau Monné
  1 sibling, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-03-08 19:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2516 bytes --]

On Wed, 8 Mar 2017, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > Hi,
> > 
> > On 02/02/17 23:06, Stefano Stabellini wrote:
> > > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > > On 01/02/17 10:55, Roger Pau Monné wrote:
> > > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > For DT, I would have a fallback on mapping the root complex to DOM0 if we
> > > > don't support it. So DOM0 could still use PCI.
> > > > 
> > > > For ACPI, I am expecting all the platform ECAM compliant or require few
> > > > quirks. So I would mandate the support of the root complex in Xen in order to
> > > > get PCI supported.
> > > 
> > > Sound good. Ack.
> > 
> > I am currently rewriting the design document to take into account all the
> > comments and follow the path to have the host bridge in Xen and DOM0 will
> > get an emulated one.
> > 
> > I began to look at scanning and configuring PCI devices in Xen. Looking at
> > the PCI firmware specification, the firmware is not required to configure
> > the BAR register other than for boot and console devices. This means an
> > Operating System (or the hypervisor in our case) may have to configure some
> > devices.
> > 
> > In order to configure the BAR register, Xen would need to know where are the
> > PCI resources. On ACPI they can be found in ASL, however Xen is not able to
> > parse it. In the case of Device Tree with can retrieve the PCI resources
> > using the property "ranges".
> > 
> > I can see a couple of solutions:
> > 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
> > see all the PCI devices and therefore will not be possible to hide from DOM0
> > if we know at boot a device will be used by a guest (i.e something similar
> > to pciback.hide but directly handled in Xen).
> 
> .. this as for SR-IOV devices you need the drivers to kick the hardware
> to generate the new bus addresses. And those (along with the BAR regions) are
> not visible in ACPI (they are constructued dynamically).

Yes indeed. In truth, SR-IOV is a much bigger problem than the BARs. In
reality, the BARs are always setup by the firmware (all cases I have
seen), but SR-IOV definitely need the Linux driver to poke the device.


> > 	2# Add an ASL interpreter in Xen. Roger mentioned that openbsd as a DSDT
> > parser in 4000 lines (see [1]).
> > 
> > Any opinions?

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-08 19:55                   ` Stefano Stabellini
@ 2017-03-08 21:51                     ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-03-08 21:51 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Steve Capper, Andrew Cooper, Jiandi An, Punit Agrawal,
	alistair.francis, Campbell Sean, xen-devel, manish.jaggi,
	Shanker Donthineni, Roger Pau Monné

Hi Stefano,

On 08/03/2017 19:55, Stefano Stabellini wrote:
> On Wed, 8 Mar 2017, Konrad Rzeszutek Wilk wrote:
>> On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
>>> Hi,
>>>
>>> On 02/02/17 23:06, Stefano Stabellini wrote:
>>>> On Thu, 2 Feb 2017, Julien Grall wrote:
>>>>> On 01/02/17 10:55, Roger Pau Monné wrote:
>>>>>> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>>>>>>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>>>>>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>>>>> For DT, I would have a fallback on mapping the root complex to DOM0 if we
>>>>> don't support it. So DOM0 could still use PCI.
>>>>>
>>>>> For ACPI, I am expecting all the platform ECAM compliant or require few
>>>>> quirks. So I would mandate the support of the root complex in Xen in order to
>>>>> get PCI supported.
>>>>
>>>> Sound good. Ack.
>>>
>>> I am currently rewriting the design document to take into account all the
>>> comments and follow the path to have the host bridge in Xen and DOM0 will
>>> get an emulated one.
>>>
>>> I began to look at scanning and configuring PCI devices in Xen. Looking at
>>> the PCI firmware specification, the firmware is not required to configure
>>> the BAR register other than for boot and console devices. This means an
>>> Operating System (or the hypervisor in our case) may have to configure some
>>> devices.
>>>
>>> In order to configure the BAR register, Xen would need to know where are the
>>> PCI resources. On ACPI they can be found in ASL, however Xen is not able to
>>> parse it. In the case of Device Tree with can retrieve the PCI resources
>>> using the property "ranges".
>>>
>>> I can see a couple of solutions:
>>> 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
>>> see all the PCI devices and therefore will not be possible to hide from DOM0
>>> if we know at boot a device will be used by a guest (i.e something similar
>>> to pciback.hide but directly handled in Xen).
>>
>> .. this as for SR-IOV devices you need the drivers to kick the hardware
>> to generate the new bus addresses. And those (along with the BAR regions) are
>> not visible in ACPI (they are constructued dynamically).
>
> Yes indeed. In truth, SR-IOV is a much bigger problem than the BARs. In
> reality, the BARs are always setup by the firmware (all cases I have
> seen), but SR-IOV definitely need the Linux driver to poke the device.

The truth is you don't know if the firmware will configure that BAR as 
the specification does not require it.

For instance, looking at U-boot I don't see any PCI BAR initialization. 
And it makes sense otherwise why would Linux bother to rescan and 
configure all the PCI devices at boot?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-08 19:12                 ` Konrad Rzeszutek Wilk
  2017-03-08 19:55                   ` Stefano Stabellini
@ 2017-03-09  2:59                   ` Roger Pau Monné
  2017-03-09 11:17                     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-03-09  2:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni

On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > Hi,
> > 
> > On 02/02/17 23:06, Stefano Stabellini wrote:
> > > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > > On 01/02/17 10:55, Roger Pau Monné wrote:
> > > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > For DT, I would have a fallback on mapping the root complex to DOM0 if we
> > > > don't support it. So DOM0 could still use PCI.
> > > > 
> > > > For ACPI, I am expecting all the platform ECAM compliant or require few
> > > > quirks. So I would mandate the support of the root complex in Xen in order to
> > > > get PCI supported.
> > > 
> > > Sound good. Ack.
> > 
> > I am currently rewriting the design document to take into account all the
> > comments and follow the path to have the host bridge in Xen and DOM0 will
> > get an emulated one.
> > 
> > I began to look at scanning and configuring PCI devices in Xen. Looking at
> > the PCI firmware specification, the firmware is not required to configure
> > the BAR register other than for boot and console devices. This means an
> > Operating System (or the hypervisor in our case) may have to configure some
> > devices.
> > 
> > In order to configure the BAR register, Xen would need to know where are the
> > PCI resources. On ACPI they can be found in ASL, however Xen is not able to
> > parse it. In the case of Device Tree with can retrieve the PCI resources
> > using the property "ranges".
> > 
> > I can see a couple of solutions:
> > 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
> > see all the PCI devices and therefore will not be possible to hide from DOM0
> > if we know at boot a device will be used by a guest (i.e something similar
> > to pciback.hide but directly handled in Xen).
> 
> .. this as for SR-IOV devices you need the drivers to kick the hardware
> to generate the new bus addresses. And those (along with the BAR regions) are
> not visible in ACPI (they are constructued dynamically).

There's already code in Xen [0] to find out the size of the BARs of SR-IOV
devices, but I'm not sure what's the intended usage of that, does it need to
happen _after_ the driver in Dom0 has done whatever magic for this to work?

Roger.

[0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/drivers/passthrough/pci.c;h=beddd4270161b9b00b792124a770bbafe398939a;hb=HEAD#l639

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-09  2:59                   ` Roger Pau Monné
@ 2017-03-09 11:17                     ` Konrad Rzeszutek Wilk
  2017-03-09 13:26                       ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-03-09 11:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni

On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 02/02/17 23:06, Stefano Stabellini wrote:
> > > > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > > > On 01/02/17 10:55, Roger Pau Monné wrote:
> > > > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > For DT, I would have a fallback on mapping the root complex to DOM0 if we
> > > > > don't support it. So DOM0 could still use PCI.
> > > > > 
> > > > > For ACPI, I am expecting all the platform ECAM compliant or require few
> > > > > quirks. So I would mandate the support of the root complex in Xen in order to
> > > > > get PCI supported.
> > > > 
> > > > Sound good. Ack.
> > > 
> > > I am currently rewriting the design document to take into account all the
> > > comments and follow the path to have the host bridge in Xen and DOM0 will
> > > get an emulated one.
> > > 
> > > I began to look at scanning and configuring PCI devices in Xen. Looking at
> > > the PCI firmware specification, the firmware is not required to configure
> > > the BAR register other than for boot and console devices. This means an
> > > Operating System (or the hypervisor in our case) may have to configure some
> > > devices.
> > > 
> > > In order to configure the BAR register, Xen would need to know where are the
> > > PCI resources. On ACPI they can be found in ASL, however Xen is not able to
> > > parse it. In the case of Device Tree with can retrieve the PCI resources
> > > using the property "ranges".
> > > 
> > > I can see a couple of solutions:
> > > 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
> > > see all the PCI devices and therefore will not be possible to hide from DOM0
> > > if we know at boot a device will be used by a guest (i.e something similar
> > > to pciback.hide but directly handled in Xen).
> > 
> > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > to generate the new bus addresses. And those (along with the BAR regions) are
> > not visible in ACPI (they are constructued dynamically).
> 
> There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> devices, but I'm not sure what's the intended usage of that, does it need to
> happen _after_ the driver in Dom0 has done whatever magic for this to work?

Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> 
> Roger.
> 
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/drivers/passthrough/pci.c;h=beddd4270161b9b00b792124a770bbafe398939a;hb=HEAD#l639

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-09 11:17                     ` Konrad Rzeszutek Wilk
@ 2017-03-09 13:26                       ` Julien Grall
  2017-03-10  0:29                         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-03-09 13:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Campbell Sean,
	xen-devel, manish.jaggi, Shanker Donthineni

Hi Konrad,

On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
>> On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 02/02/17 23:06, Stefano Stabellini wrote:
>>>>> On Thu, 2 Feb 2017, Julien Grall wrote:
>>>>>> On 01/02/17 10:55, Roger Pau Monné wrote:
>>>>>>> On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
>>>>>>>> On 24/01/17 20:07, Stefano Stabellini wrote:
>>>>>>>>> On Tue, 24 Jan 2017, Julien Grall wrote:
>>>>>> For DT, I would have a fallback on mapping the root complex to DOM0 if we
>>>>>> don't support it. So DOM0 could still use PCI.
>>>>>>
>>>>>> For ACPI, I am expecting all the platform ECAM compliant or require few
>>>>>> quirks. So I would mandate the support of the root complex in Xen in order to
>>>>>> get PCI supported.
>>>>>
>>>>> Sound good. Ack.
>>>>
>>>> I am currently rewriting the design document to take into account all the
>>>> comments and follow the path to have the host bridge in Xen and DOM0 will
>>>> get an emulated one.
>>>>
>>>> I began to look at scanning and configuring PCI devices in Xen. Looking at
>>>> the PCI firmware specification, the firmware is not required to configure
>>>> the BAR register other than for boot and console devices. This means an
>>>> Operating System (or the hypervisor in our case) may have to configure some
>>>> devices.
>>>>
>>>> In order to configure the BAR register, Xen would need to know where are the
>>>> PCI resources. On ACPI they can be found in ASL, however Xen is not able to
>>>> parse it. In the case of Device Tree with can retrieve the PCI resources
>>>> using the property "ranges".
>>>>
>>>> I can see a couple of solutions:
>>>> 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
>>>> see all the PCI devices and therefore will not be possible to hide from DOM0
>>>> if we know at boot a device will be used by a guest (i.e something similar
>>>> to pciback.hide but directly handled in Xen).
>>>
>>> .. this as for SR-IOV devices you need the drivers to kick the hardware
>>> to generate the new bus addresses. And those (along with the BAR regions) are
>>> not visible in ACPI (they are constructued dynamically).
>>
>> There's already code in Xen [0] to find out the size of the BARs of SR-IOV
>> devices, but I'm not sure what's the intended usage of that, does it need to
>> happen _after_ the driver in Dom0 has done whatever magic for this to work?
>
> Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c

We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM 
and do the PCI scanning in Xen.

If I understand correctly what you said, only the PCI driver will be 
able to kick SR-IOV device and Xen would not be able to detect the 
device until it has been fully configured. So it would mean that we have 
to keep PHYSDEVOP_pci_device_add around to know when Xen can use the device.

Am I correct?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-09 13:26                       ` Julien Grall
@ 2017-03-10  0:29                         ` Konrad Rzeszutek Wilk
  2017-03-10  3:23                           ` Roger Pau Monné
  0 siblings, 1 reply; 82+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-03-10  0:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Punit Agrawal, alistair.francis, Campbell Sean,
	xen-devel, manish.jaggi, Shanker Donthineni, Roger Pau Monné

On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > Hi,
> > > > > 
> > > > > On 02/02/17 23:06, Stefano Stabellini wrote:
> > > > > > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > > > > > On 01/02/17 10:55, Roger Pau Monné wrote:
> > > > > > > > On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > > > > > > On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > > > > > > > On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > > > > For DT, I would have a fallback on mapping the root complex to DOM0 if we
> > > > > > > don't support it. So DOM0 could still use PCI.
> > > > > > > 
> > > > > > > For ACPI, I am expecting all the platform ECAM compliant or require few
> > > > > > > quirks. So I would mandate the support of the root complex in Xen in order to
> > > > > > > get PCI supported.
> > > > > > 
> > > > > > Sound good. Ack.
> > > > > 
> > > > > I am currently rewriting the design document to take into account all the
> > > > > comments and follow the path to have the host bridge in Xen and DOM0 will
> > > > > get an emulated one.
> > > > > 
> > > > > I began to look at scanning and configuring PCI devices in Xen. Looking at
> > > > > the PCI firmware specification, the firmware is not required to configure
> > > > > the BAR register other than for boot and console devices. This means an
> > > > > Operating System (or the hypervisor in our case) may have to configure some
> > > > > devices.
> > > > > 
> > > > > In order to configure the BAR register, Xen would need to know where are the
> > > > > PCI resources. On ACPI they can be found in ASL, however Xen is not able to
> > > > > parse it. In the case of Device Tree with can retrieve the PCI resources
> > > > > using the property "ranges".
> > > > > 
> > > > > I can see a couple of solutions:
> > > > > 	1# Rely on DOM0 to do the PCI configuration. This means that DOM0 should
> > > > > see all the PCI devices and therefore will not be possible to hide from DOM0
> > > > > if we know at boot a device will be used by a guest (i.e something similar
> > > > > to pciback.hide but directly handled in Xen).
> > > > 
> > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > not visible in ACPI (they are constructued dynamically).
> > > 
> > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > 
> > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> 
> We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> the PCI scanning in Xen.
> 
> If I understand correctly what you said, only the PCI driver will be able to
> kick SR-IOV device and Xen would not be able to detect the device until it
> has been fully configured. So it would mean that we have to keep
> PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> 
> Am I correct?

Yes. Unless the PCI drivers come up with some other way to tell the
OS that oh, hey, there is this new PCI device with this BDF.

Or the underlaying bus on ARM can send some 'new device' information?

> 
> Cheers,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-10  0:29                         ` Konrad Rzeszutek Wilk
@ 2017-03-10  3:23                           ` Roger Pau Monné
  2017-03-10 15:28                             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-03-10  3:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni

On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > Hi Konrad,
> > 
> > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > not visible in ACPI (they are constructued dynamically).
> > > > 
> > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > 
> > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > 
> > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > the PCI scanning in Xen.
> > 
> > If I understand correctly what you said, only the PCI driver will be able to
> > kick SR-IOV device and Xen would not be able to detect the device until it
> > has been fully configured. So it would mean that we have to keep
> > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > 
> > Am I correct?
> 
> Yes. Unless the PCI drivers come up with some other way to tell the
> OS that oh, hey, there is this new PCI device with this BDF.
> 
> Or the underlaying bus on ARM can send some 'new device' information?

Hm, is this something standard between all the SR-IOV implementations, or each
vendors have their own sauce?

Would it be possible to do this SR-IOV initialization inside of Xen, or that
requires ACPI information?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-10  3:23                           ` Roger Pau Monné
@ 2017-03-10 15:28                             ` Konrad Rzeszutek Wilk
  2017-03-15 12:07                               ` Roger Pau Monné
  0 siblings, 1 reply; 82+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-03-10 15:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni

On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monné wrote:
> On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > Hi Konrad,
> > > 
> > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > 
> > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > 
> > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > 
> > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > the PCI scanning in Xen.
> > > 
> > > If I understand correctly what you said, only the PCI driver will be able to
> > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > has been fully configured. So it would mean that we have to keep
> > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > 
> > > Am I correct?
> > 
> > Yes. Unless the PCI drivers come up with some other way to tell the
> > OS that oh, hey, there is this new PCI device with this BDF.
> > 
> > Or the underlaying bus on ARM can send some 'new device' information?
> 
> Hm, is this something standard between all the SR-IOV implementations, or each
> vendors have their own sauce?

Gosh, all of them have their own sauce. The only thing that is the same
is that suddenly behind the PF device there are PCI devies that are responding
to 0xcfc requests. MAgic!
> 
> Would it be possible to do this SR-IOV initialization inside of Xen, or that

Not possible inside of the Xen, unless you slurp up DPDK inside of Xen
or the Linux network drivers inside of it.

> requires ACPI information?

No ACPI information. On x86 there is no hardware notification system. It is
only by the goodwill of the PF driver kicking the Linux OS in 'scanning' the
new PCI addresses and 'discovering' the new devices.


> 
> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-10 15:28                             ` Konrad Rzeszutek Wilk
@ 2017-03-15 12:07                               ` Roger Pau Monné
  2017-03-15 12:42                                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-03-15 12:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni

On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monné wrote:
> > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > Hi Konrad,
> > > > 
> > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > 
> > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > 
> > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > 
> > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > the PCI scanning in Xen.
> > > > 
> > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > has been fully configured. So it would mean that we have to keep
> > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > 
> > > > Am I correct?
> > > 
> > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > OS that oh, hey, there is this new PCI device with this BDF.
> > > 
> > > Or the underlaying bus on ARM can send some 'new device' information?
> > 
> > Hm, is this something standard between all the SR-IOV implementations, or each
> > vendors have their own sauce?
> 
> Gosh, all of them have their own sauce. The only thing that is the same
> is that suddenly behind the PF device there are PCI devies that are responding
> to 0xcfc requests. MAgic!

I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
device driver in Dom0 in order to get the information of the VF devices, what
Xen cares about is the position of the BARs (so that they can be mapped into
Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
it.

AFAICT both of this can be obtained without any driver-specific code, since
it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 12:07                               ` Roger Pau Monné
@ 2017-03-15 12:42                                 ` Konrad Rzeszutek Wilk
  2017-03-15 12:56                                   ` Roger Pau Monné
  0 siblings, 1 reply; 82+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-03-15 12:42 UTC (permalink / raw)
  To: Roger Pau Monné, venu.busireddy
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, Punit Agrawal,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni

On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monné wrote:
> On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monné wrote:
> > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > > Hi Konrad,
> > > > > 
> > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > 
> > > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > > 
> > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > > 
> > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > > the PCI scanning in Xen.
> > > > > 
> > > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > > has been fully configured. So it would mean that we have to keep
> > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > > 
> > > > > Am I correct?
> > > > 
> > > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > 
> > > > Or the underlaying bus on ARM can send some 'new device' information?
> > > 
> > > Hm, is this something standard between all the SR-IOV implementations, or each
> > > vendors have their own sauce?
> > 
> > Gosh, all of them have their own sauce. The only thing that is the same
> > is that suddenly behind the PF device there are PCI devies that are responding
> > to 0xcfc requests. MAgic!
> 
> I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
> device driver in Dom0 in order to get the information of the VF devices, what
> Xen cares about is the position of the BARs (so that they can be mapped into
> Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
> it.
> 
> AFAICT both of this can be obtained without any driver-specific code, since
> it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).

CC-ing Venu,

Roger, could you point out which of the chapters has this?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 12:42                                 ` Konrad Rzeszutek Wilk
@ 2017-03-15 12:56                                   ` Roger Pau Monné
  2017-03-15 15:11                                     ` Venu Busireddy
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2017-03-15 12:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, alistair.francis, venu.busireddy,
	Campbell Sean, xen-devel, manish.jaggi, Shanker Donthineni,
	Punit Agrawal

On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monné wrote:
> > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monné wrote:
> > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > > > Hi Konrad,
> > > > > > 
> > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monné wrote:
> > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > > 
> > > > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > > > 
> > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > > > 
> > > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > > > the PCI scanning in Xen.
> > > > > > 
> > > > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > > > has been fully configured. So it would mean that we have to keep
> > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > > > 
> > > > > > Am I correct?
> > > > > 
> > > > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > 
> > > > > Or the underlaying bus on ARM can send some 'new device' information?
> > > > 
> > > > Hm, is this something standard between all the SR-IOV implementations, or each
> > > > vendors have their own sauce?
> > > 
> > > Gosh, all of them have their own sauce. The only thing that is the same
> > > is that suddenly behind the PF device there are PCI devies that are responding
> > > to 0xcfc requests. MAgic!
> > 
> > I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
> > device driver in Dom0 in order to get the information of the VF devices, what
> > Xen cares about is the position of the BARs (so that they can be mapped into
> > Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
> > it.
> > 
> > AFAICT both of this can be obtained without any driver-specific code, since
> > it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
> 
> CC-ing Venu,
> 
> Roger, could you point out which of the chapters has this?

This would be chapter 2 ("Initialization and Resource Allocation"), and then
there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
function numbers in page 45 (I have the following copy, which is the latest
revision: "Single Root I/O Virtualization and Sharing Specification Revision
1.1" from January 20 2010 [0]).

The document is quite complex, but it is a standard that all SR-IOV devices
should follow so AFAICT Xen should be able to get all the information that it
needs from the PCI config space in order to detect the PF/VF BARs and the BDF
device addresses.

Roger.

[0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 12:56                                   ` Roger Pau Monné
@ 2017-03-15 15:11                                     ` Venu Busireddy
  2017-03-15 16:38                                       ` Roger Pau Monn?
  0 siblings, 1 reply; 82+ messages in thread
From: Venu Busireddy @ 2017-03-15 15:11 UTC (permalink / raw)
  To: Roger Pau Monn?
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, Punit Agrawal, Campbell Sean, xen-devel,
	alistair.francis, manish.jaggi, Shanker Donthineni

On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
> On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
> > > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
> > > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > > > > Hi Konrad,
> > > > > > > 
> > > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? wrote:
> > > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > > > 
> > > > > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > > > > 
> > > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > > > > 
> > > > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > > > > the PCI scanning in Xen.
> > > > > > > 
> > > > > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > > > > has been fully configured. So it would mean that we have to keep
> > > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > > > > 
> > > > > > > Am I correct?
> > > > > > 
> > > > > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > > 
> > > > > > Or the underlaying bus on ARM can send some 'new device' information?
> > > > > 
> > > > > Hm, is this something standard between all the SR-IOV implementations, or each
> > > > > vendors have their own sauce?
> > > > 
> > > > Gosh, all of them have their own sauce. The only thing that is the same
> > > > is that suddenly behind the PF device there are PCI devies that are responding
> > > > to 0xcfc requests. MAgic!
> > > 
> > > I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
> > > device driver in Dom0 in order to get the information of the VF devices, what
> > > Xen cares about is the position of the BARs (so that they can be mapped into
> > > Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
> > > it.
> > > 
> > > AFAICT both of this can be obtained without any driver-specific code, since
> > > it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
> > 
> > CC-ing Venu,
> > 
> > Roger, could you point out which of the chapters has this?
> 
> This would be chapter 2 ("Initialization and Resource Allocation"), and then
> there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
> function numbers in page 45 (I have the following copy, which is the latest
> revision: "Single Root I/O Virtualization and Sharing Specification Revision
> 1.1" from January 20 2010 [0]).
> 
> The document is quite complex, but it is a standard that all SR-IOV devices
> should follow so AFAICT Xen should be able to get all the information that it
> needs from the PCI config space in order to detect the PF/VF BARs and the BDF
> device addresses.
> 
> Roger.
> 
> [0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238

I do not have access to this document, so I have to rely on Rev 1.0
document, but I don't think this aspect of the spec changed much.

In any case, I am afraid I am not seeing the overall picture, but I
would like to comment on the last part of this discussion. Indeed, the
configuration space (including the SR-IOV extended capability) contains
all the information, but only the information necessary for the OS to
"enumerate" the device (PF as well as VFs). The bus and device number
(SBDF) assignment, and programming of the BARs, are all done during that
enumeration. In this discussion, which entity is doing the enumeration?
Xen, or Dom0?

If Xen needs to map the BAR positions into Dom0, then Xen must enumerate
the device, program the BARs, and then map the BAR positions to Dom0. If
Xen waits until Dom0 enumerated the device, then the BAR positions are
already within the Dom0's memory space! No further mapping is needed,
right? What am I missing?

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 15:11                                     ` Venu Busireddy
@ 2017-03-15 16:38                                       ` Roger Pau Monn?
  2017-03-15 16:54                                         ` Venu Busireddy
  2017-05-03 12:53                                         ` Julien Grall
  0 siblings, 2 replies; 82+ messages in thread
From: Roger Pau Monn? @ 2017-03-15 16:38 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, Punit Agrawal, Campbell Sean, xen-devel,
	alistair.francis, manish.jaggi, Shanker Donthineni

On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
> On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
> > On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
> > > > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
> > > > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > > > > > Hi Konrad,
> > > > > > > > 
> > > > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? wrote:
> > > > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > > > > 
> > > > > > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > > > > > 
> > > > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > > > > > 
> > > > > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > > > > > the PCI scanning in Xen.
> > > > > > > > 
> > > > > > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > > > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > > > > > has been fully configured. So it would mean that we have to keep
> > > > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > > > > > 
> > > > > > > > Am I correct?
> > > > > > > 
> > > > > > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > > > 
> > > > > > > Or the underlaying bus on ARM can send some 'new device' information?
> > > > > > 
> > > > > > Hm, is this something standard between all the SR-IOV implementations, or each
> > > > > > vendors have their own sauce?
> > > > > 
> > > > > Gosh, all of them have their own sauce. The only thing that is the same
> > > > > is that suddenly behind the PF device there are PCI devies that are responding
> > > > > to 0xcfc requests. MAgic!
> > > > 
> > > > I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
> > > > device driver in Dom0 in order to get the information of the VF devices, what
> > > > Xen cares about is the position of the BARs (so that they can be mapped into
> > > > Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
> > > > it.
> > > > 
> > > > AFAICT both of this can be obtained without any driver-specific code, since
> > > > it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
> > > 
> > > CC-ing Venu,
> > > 
> > > Roger, could you point out which of the chapters has this?
> > 
> > This would be chapter 2 ("Initialization and Resource Allocation"), and then
> > there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
> > function numbers in page 45 (I have the following copy, which is the latest
> > revision: "Single Root I/O Virtualization and Sharing Specification Revision
> > 1.1" from January 20 2010 [0]).
> > 
> > The document is quite complex, but it is a standard that all SR-IOV devices
> > should follow so AFAICT Xen should be able to get all the information that it
> > needs from the PCI config space in order to detect the PF/VF BARs and the BDF
> > device addresses.
> > 
> > Roger.
> > 
> > [0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238
> 
> I do not have access to this document, so I have to rely on Rev 1.0
> document, but I don't think this aspect of the spec changed much.
> 
> In any case, I am afraid I am not seeing the overall picture, but I
> would like to comment on the last part of this discussion. Indeed, the
> configuration space (including the SR-IOV extended capability) contains
> all the information, but only the information necessary for the OS to
> "enumerate" the device (PF as well as VFs). The bus and device number
> (SBDF) assignment, and programming of the BARs, are all done during that
> enumeration. In this discussion, which entity is doing the enumeration?
> Xen, or Dom0?

Xen needs to let Dom0 manage the device, but at the same time it needs to
correctly map the device BARs into Dom0 physmap. I think the easiest solution
is to let Dom0 manage the device, and Xen should setup a trap to detect Dom0
setting the VF Enable bit (bit 0 in SR-IOV Control (08h)), at which point Xen
will size the VF BARs (and map them into Dom0) and also enumerate the VF
devices.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 16:38                                       ` Roger Pau Monn?
@ 2017-03-15 16:54                                         ` Venu Busireddy
  2017-03-15 17:00                                           ` Roger Pau Monn?
  2017-05-03 12:53                                         ` Julien Grall
  1 sibling, 1 reply; 82+ messages in thread
From: Venu Busireddy @ 2017-03-15 16:54 UTC (permalink / raw)
  To: Roger Pau Monn?
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, Punit Agrawal, Campbell Sean, xen-devel,
	alistair.francis, manish.jaggi, Shanker Donthineni

On Wed, Mar 15, 2017 at 04:38:39PM +0000, Roger Pau Monn? wrote:
> On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
> > On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
> > > On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
> > > > > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
> > > > > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > > > > > > Hi Konrad,
> > > > > > > > > 
> > > > > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? wrote:
> > > > > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > > > > > 
> > > > > > > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > > > > > > 
> > > > > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > > > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > > > > > > 
> > > > > > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > > > > > > the PCI scanning in Xen.
> > > > > > > > > 
> > > > > > > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > > > > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > > > > > > has been fully configured. So it would mean that we have to keep
> > > > > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > > > > > > 
> > > > > > > > > Am I correct?
> > > > > > > > 
> > > > > > > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > > > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > > > > 
> > > > > > > > Or the underlaying bus on ARM can send some 'new device' information?
> > > > > > > 
> > > > > > > Hm, is this something standard between all the SR-IOV implementations, or each
> > > > > > > vendors have their own sauce?
> > > > > > 
> > > > > > Gosh, all of them have their own sauce. The only thing that is the same
> > > > > > is that suddenly behind the PF device there are PCI devies that are responding
> > > > > > to 0xcfc requests. MAgic!
> > > > > 
> > > > > I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
> > > > > device driver in Dom0 in order to get the information of the VF devices, what
> > > > > Xen cares about is the position of the BARs (so that they can be mapped into
> > > > > Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
> > > > > it.
> > > > > 
> > > > > AFAICT both of this can be obtained without any driver-specific code, since
> > > > > it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
> > > > 
> > > > CC-ing Venu,
> > > > 
> > > > Roger, could you point out which of the chapters has this?
> > > 
> > > This would be chapter 2 ("Initialization and Resource Allocation"), and then
> > > there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
> > > function numbers in page 45 (I have the following copy, which is the latest
> > > revision: "Single Root I/O Virtualization and Sharing Specification Revision
> > > 1.1" from January 20 2010 [0]).
> > > 
> > > The document is quite complex, but it is a standard that all SR-IOV devices
> > > should follow so AFAICT Xen should be able to get all the information that it
> > > needs from the PCI config space in order to detect the PF/VF BARs and the BDF
> > > device addresses.
> > > 
> > > Roger.
> > > 
> > > [0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238
> > 
> > I do not have access to this document, so I have to rely on Rev 1.0
> > document, but I don't think this aspect of the spec changed much.
> > 
> > In any case, I am afraid I am not seeing the overall picture, but I
> > would like to comment on the last part of this discussion. Indeed, the
> > configuration space (including the SR-IOV extended capability) contains
> > all the information, but only the information necessary for the OS to
> > "enumerate" the device (PF as well as VFs). The bus and device number
> > (SBDF) assignment, and programming of the BARs, are all done during that
> > enumeration. In this discussion, which entity is doing the enumeration?
> > Xen, or Dom0?
> 
> Xen needs to let Dom0 manage the device, but at the same time it needs to
> correctly map the device BARs into Dom0 physmap. I think the easiest solution
> is to let Dom0 manage the device, and Xen should setup a trap to detect Dom0
> setting the VF Enable bit (bit 0 in SR-IOV Control (08h)), at which point Xen
> will size the VF BARs (and map them into Dom0) and also enumerate the VF
> devices.

There was a second part in my earlier email. Copied below:

"If Xen waits until Dom0 enumerated the device, then the BAR positions
are already within the Dom0's memory space! No further mapping is needed,
right?"

As I asked, if Dom0 enumerates the device and programs the BARs, the BAR
regions are already in Dom0's physical memory! What further mapping is
needed? What am I missing?

Venu



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 16:54                                         ` Venu Busireddy
@ 2017-03-15 17:00                                           ` Roger Pau Monn?
  2017-05-03 12:38                                             ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monn? @ 2017-03-15 17:00 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, Julien Grall, Punit Agrawal, Campbell Sean, xen-devel,
	alistair.francis, manish.jaggi, Shanker Donthineni

On Wed, Mar 15, 2017 at 11:54:07AM -0500, Venu Busireddy wrote:
> On Wed, Mar 15, 2017 at 04:38:39PM +0000, Roger Pau Monn? wrote:
> > On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
> > > On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
> > > > On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
> > > > > > On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
> > > > > > > > On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
> > > > > > > > > > Hi Konrad,
> > > > > > > > > > 
> > > > > > > > > > On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > > On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? wrote:
> > > > > > > > > > > > On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > > > > > > On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
> > > > > > > > > > > > > .. this as for SR-IOV devices you need the drivers to kick the hardware
> > > > > > > > > > > > > to generate the new bus addresses. And those (along with the BAR regions) are
> > > > > > > > > > > > > not visible in ACPI (they are constructued dynamically).
> > > > > > > > > > > > 
> > > > > > > > > > > > There's already code in Xen [0] to find out the size of the BARs of SR-IOV
> > > > > > > > > > > > devices, but I'm not sure what's the intended usage of that, does it need to
> > > > > > > > > > > > happen _after_ the driver in Dom0 has done whatever magic for this to work?
> > > > > > > > > > > 
> > > > > > > > > > > Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
> > > > > > > > > > > the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
> > > > > > > > > > 
> > > > > > > > > > We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
> > > > > > > > > > the PCI scanning in Xen.
> > > > > > > > > > 
> > > > > > > > > > If I understand correctly what you said, only the PCI driver will be able to
> > > > > > > > > > kick SR-IOV device and Xen would not be able to detect the device until it
> > > > > > > > > > has been fully configured. So it would mean that we have to keep
> > > > > > > > > > PHYSDEVOP_pci_device_add around to know when Xen can use the device.
> > > > > > > > > > 
> > > > > > > > > > Am I correct?
> > > > > > > > > 
> > > > > > > > > Yes. Unless the PCI drivers come up with some other way to tell the
> > > > > > > > > OS that oh, hey, there is this new PCI device with this BDF.
> > > > > > > > > 
> > > > > > > > > Or the underlaying bus on ARM can send some 'new device' information?
> > > > > > > > 
> > > > > > > > Hm, is this something standard between all the SR-IOV implementations, or each
> > > > > > > > vendors have their own sauce?
> > > > > > > 
> > > > > > > Gosh, all of them have their own sauce. The only thing that is the same
> > > > > > > is that suddenly behind the PF device there are PCI devies that are responding
> > > > > > > to 0xcfc requests. MAgic!
> > > > > > 
> > > > > > I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
> > > > > > device driver in Dom0 in order to get the information of the VF devices, what
> > > > > > Xen cares about is the position of the BARs (so that they can be mapped into
> > > > > > Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
> > > > > > it.
> > > > > > 
> > > > > > AFAICT both of this can be obtained without any driver-specific code, since
> > > > > > it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
> > > > > 
> > > > > CC-ing Venu,
> > > > > 
> > > > > Roger, could you point out which of the chapters has this?
> > > > 
> > > > This would be chapter 2 ("Initialization and Resource Allocation"), and then
> > > > there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
> > > > function numbers in page 45 (I have the following copy, which is the latest
> > > > revision: "Single Root I/O Virtualization and Sharing Specification Revision
> > > > 1.1" from January 20 2010 [0]).
> > > > 
> > > > The document is quite complex, but it is a standard that all SR-IOV devices
> > > > should follow so AFAICT Xen should be able to get all the information that it
> > > > needs from the PCI config space in order to detect the PF/VF BARs and the BDF
> > > > device addresses.
> > > > 
> > > > Roger.
> > > > 
> > > > [0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238
> > > 
> > > I do not have access to this document, so I have to rely on Rev 1.0
> > > document, but I don't think this aspect of the spec changed much.
> > > 
> > > In any case, I am afraid I am not seeing the overall picture, but I
> > > would like to comment on the last part of this discussion. Indeed, the
> > > configuration space (including the SR-IOV extended capability) contains
> > > all the information, but only the information necessary for the OS to
> > > "enumerate" the device (PF as well as VFs). The bus and device number
> > > (SBDF) assignment, and programming of the BARs, are all done during that
> > > enumeration. In this discussion, which entity is doing the enumeration?
> > > Xen, or Dom0?
> > 
> > Xen needs to let Dom0 manage the device, but at the same time it needs to
> > correctly map the device BARs into Dom0 physmap. I think the easiest solution
> > is to let Dom0 manage the device, and Xen should setup a trap to detect Dom0
> > setting the VF Enable bit (bit 0 in SR-IOV Control (08h)), at which point Xen
> > will size the VF BARs (and map them into Dom0) and also enumerate the VF
> > devices.
> 
> There was a second part in my earlier email. Copied below:
> 
> "If Xen waits until Dom0 enumerated the device, then the BAR positions
> are already within the Dom0's memory space! No further mapping is needed,
> right?"
> 
> As I asked, if Dom0 enumerates the device and programs the BARs, the BAR
> regions are already in Dom0's physical memory! What further mapping is
> needed? What am I missing?

No, that's not true. Xen is the one that performs the mapping into Dom0 physmap
on PVH (and ARM), so unless Xen has mapped those BARs into Dom0, Dom0 doesn't
have access to the BARs at all. Hence I think Xen should detect Dom0 setting
the VF Enable bit, properly size the VF BARs and map them into Dom0.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 17:00                                           ` Roger Pau Monn?
@ 2017-05-03 12:38                                             ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-03 12:38 UTC (permalink / raw)
  To: Roger Pau Monn?, Venu Busireddy
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, alistair.francis, Punit Agrawal, Campbell Sean,
	xen-devel, manish.jaggi, Shanker Donthineni

Hi Roger,

Sorry for the late answer.

On 15/03/17 17:00, Roger Pau Monn? wrote:
> On Wed, Mar 15, 2017 at 11:54:07AM -0500, Venu Busireddy wrote:
>> On Wed, Mar 15, 2017 at 04:38:39PM +0000, Roger Pau Monn? wrote:
>>> On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
>>>> On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
>>>>> On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
>>>>>> On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
>>>>>>> On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>>> On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
>>>>>>>>> On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>> On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>
>>>>>>>>>>> On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>>>> On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? wrote:
>>>>>>>>>>>>> On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>>>>>> On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
>>>>>>>>>>>>>> .. this as for SR-IOV devices you need the drivers to kick the hardware
>>>>>>>>>>>>>> to generate the new bus addresses. And those (along with the BAR regions) are
>>>>>>>>>>>>>> not visible in ACPI (they are constructued dynamically).
>>>>>>>>>>>>>
>>>>>>>>>>>>> There's already code in Xen [0] to find out the size of the BARs of SR-IOV
>>>>>>>>>>>>> devices, but I'm not sure what's the intended usage of that, does it need to
>>>>>>>>>>>>> happen _after_ the driver in Dom0 has done whatever magic for this to work?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
>>>>>>>>>>>> the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
>>>>>>>>>>>
>>>>>>>>>>> We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
>>>>>>>>>>> the PCI scanning in Xen.
>>>>>>>>>>>
>>>>>>>>>>> If I understand correctly what you said, only the PCI driver will be able to
>>>>>>>>>>> kick SR-IOV device and Xen would not be able to detect the device until it
>>>>>>>>>>> has been fully configured. So it would mean that we have to keep
>>>>>>>>>>> PHYSDEVOP_pci_device_add around to know when Xen can use the device.
>>>>>>>>>>>
>>>>>>>>>>> Am I correct?
>>>>>>>>>>
>>>>>>>>>> Yes. Unless the PCI drivers come up with some other way to tell the
>>>>>>>>>> OS that oh, hey, there is this new PCI device with this BDF.
>>>>>>>>>>
>>>>>>>>>> Or the underlaying bus on ARM can send some 'new device' information?
>>>>>>>>>
>>>>>>>>> Hm, is this something standard between all the SR-IOV implementations, or each
>>>>>>>>> vendors have their own sauce?
>>>>>>>>
>>>>>>>> Gosh, all of them have their own sauce. The only thing that is the same
>>>>>>>> is that suddenly behind the PF device there are PCI devies that are responding
>>>>>>>> to 0xcfc requests. MAgic!
>>>>>>>
>>>>>>> I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
>>>>>>> device driver in Dom0 in order to get the information of the VF devices, what
>>>>>>> Xen cares about is the position of the BARs (so that they can be mapped into
>>>>>>> Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
>>>>>>> it.
>>>>>>>
>>>>>>> AFAICT both of this can be obtained without any driver-specific code, since
>>>>>>> it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
>>>>>>
>>>>>> CC-ing Venu,
>>>>>>
>>>>>> Roger, could you point out which of the chapters has this?
>>>>>
>>>>> This would be chapter 2 ("Initialization and Resource Allocation"), and then
>>>>> there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
>>>>> function numbers in page 45 (I have the following copy, which is the latest
>>>>> revision: "Single Root I/O Virtualization and Sharing Specification Revision
>>>>> 1.1" from January 20 2010 [0]).
>>>>>
>>>>> The document is quite complex, but it is a standard that all SR-IOV devices
>>>>> should follow so AFAICT Xen should be able to get all the information that it
>>>>> needs from the PCI config space in order to detect the PF/VF BARs and the BDF
>>>>> device addresses.
>>>>>
>>>>> Roger.
>>>>>
>>>>> [0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238
>>>>
>>>> I do not have access to this document, so I have to rely on Rev 1.0
>>>> document, but I don't think this aspect of the spec changed much.
>>>>
>>>> In any case, I am afraid I am not seeing the overall picture, but I
>>>> would like to comment on the last part of this discussion. Indeed, the
>>>> configuration space (including the SR-IOV extended capability) contains
>>>> all the information, but only the information necessary for the OS to
>>>> "enumerate" the device (PF as well as VFs). The bus and device number
>>>> (SBDF) assignment, and programming of the BARs, are all done during that
>>>> enumeration. In this discussion, which entity is doing the enumeration?
>>>> Xen, or Dom0?
>>>
>>> Xen needs to let Dom0 manage the device, but at the same time it needs to
>>> correctly map the device BARs into Dom0 physmap. I think the easiest solution
>>> is to let Dom0 manage the device, and Xen should setup a trap to detect Dom0
>>> setting the VF Enable bit (bit 0 in SR-IOV Control (08h)), at which point Xen
>>> will size the VF BARs (and map them into Dom0) and also enumerate the VF
>>> devices.
>>
>> There was a second part in my earlier email. Copied below:
>>
>> "If Xen waits until Dom0 enumerated the device, then the BAR positions
>> are already within the Dom0's memory space! No further mapping is needed,
>> right?"
>>
>> As I asked, if Dom0 enumerates the device and programs the BARs, the BAR
>> regions are already in Dom0's physical memory! What further mapping is
>> needed? What am I missing?
>
> No, that's not true. Xen is the one that performs the mapping into Dom0 physmap
> on PVH (and ARM), so unless Xen has mapped those BARs into Dom0, Dom0 doesn't
> have access to the BARs at all. Hence I think Xen should detect Dom0 setting
> the VF Enable bit, properly size the VF BARs and map them into Dom0.

This is not accurate for ARM. In the case of Device Tree, the PCI memory 
space is mapped to DOM0 when building it, for the ACPI those regions 
will be mapped on fault.

So Venu is right for ARM, all the BAR regions are already mapped into 
DOM0's physical memory.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-03-15 16:38                                       ` Roger Pau Monn?
  2017-03-15 16:54                                         ` Venu Busireddy
@ 2017-05-03 12:53                                         ` Julien Grall
  1 sibling, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-03 12:53 UTC (permalink / raw)
  To: Roger Pau Monn?, Venu Busireddy
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Stefano Stabellini, Wei Chen, Steve Capper, Andrew Cooper,
	Jiandi An, alistair.francis, Punit Agrawal, Campbell Sean,
	xen-devel, manish.jaggi, Shanker Donthineni

Hi Roger,

On 15/03/17 16:38, Roger Pau Monn? wrote:
> On Wed, Mar 15, 2017 at 10:11:35AM -0500, Venu Busireddy wrote:
>> On Wed, Mar 15, 2017 at 12:56:50PM +0000, Roger Pau Monn? wrote:
>>> On Wed, Mar 15, 2017 at 08:42:04AM -0400, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Mar 15, 2017 at 12:07:28PM +0000, Roger Pau Monn? wrote:
>>>>> On Fri, Mar 10, 2017 at 10:28:43AM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>> On Fri, Mar 10, 2017 at 12:23:18PM +0900, Roger Pau Monn? wrote:
>>>>>>> On Thu, Mar 09, 2017 at 07:29:34PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>>> On Thu, Mar 09, 2017 at 01:26:45PM +0000, Julien Grall wrote:
>>>>>>>>> Hi Konrad,
>>>>>>>>>
>>>>>>>>> On 09/03/17 11:17, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>> On Thu, Mar 09, 2017 at 11:59:51AM +0900, Roger Pau Monn? wrote:
>>>>>>>>>>> On Wed, Mar 08, 2017 at 02:12:09PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>>>> On Wed, Mar 08, 2017 at 07:06:23PM +0000, Julien Grall wrote:
>>>>>>>>>>>> .. this as for SR-IOV devices you need the drivers to kick the hardware
>>>>>>>>>>>> to generate the new bus addresses. And those (along with the BAR regions) are
>>>>>>>>>>>> not visible in ACPI (they are constructued dynamically).
>>>>>>>>>>>
>>>>>>>>>>> There's already code in Xen [0] to find out the size of the BARs of SR-IOV
>>>>>>>>>>> devices, but I'm not sure what's the intended usage of that, does it need to
>>>>>>>>>>> happen _after_ the driver in Dom0 has done whatever magic for this to work?
>>>>>>>>>>
>>>>>>>>>> Yes. This is called via the PHYSDEVOP_pci_device_add hypercall when
>>>>>>>>>> the device driver in dom0 has finished "creating" the VF. See drivers/xen/pci.c
>>>>>>>>>
>>>>>>>>> We are thinking to not use PHYSDEVOP_pci_device_add hypercall for ARM and do
>>>>>>>>> the PCI scanning in Xen.
>>>>>>>>>
>>>>>>>>> If I understand correctly what you said, only the PCI driver will be able to
>>>>>>>>> kick SR-IOV device and Xen would not be able to detect the device until it
>>>>>>>>> has been fully configured. So it would mean that we have to keep
>>>>>>>>> PHYSDEVOP_pci_device_add around to know when Xen can use the device.
>>>>>>>>>
>>>>>>>>> Am I correct?
>>>>>>>>
>>>>>>>> Yes. Unless the PCI drivers come up with some other way to tell the
>>>>>>>> OS that oh, hey, there is this new PCI device with this BDF.
>>>>>>>>
>>>>>>>> Or the underlaying bus on ARM can send some 'new device' information?
>>>>>>>
>>>>>>> Hm, is this something standard between all the SR-IOV implementations, or each
>>>>>>> vendors have their own sauce?
>>>>>>
>>>>>> Gosh, all of them have their own sauce. The only thing that is the same
>>>>>> is that suddenly behind the PF device there are PCI devies that are responding
>>>>>> to 0xcfc requests. MAgic!
>>>>>
>>>>> I'm reading the PCI SR-IOV 1.1 spec, and I think we don't need to wait for the
>>>>> device driver in Dom0 in order to get the information of the VF devices, what
>>>>> Xen cares about is the position of the BARs (so that they can be mapped into
>>>>> Dom0 at boot), and the PCI SBDF of each PF/VF, so that Xen can trap accesses to
>>>>> it.
>>>>>
>>>>> AFAICT both of this can be obtained without any driver-specific code, since
>>>>> it's all contained in the PCI SR-IOV spec (but maybe I'm missing something).
>>>>
>>>> CC-ing Venu,
>>>>
>>>> Roger, could you point out which of the chapters has this?
>>>
>>> This would be chapter 2 ("Initialization and Resource Allocation"), and then
>>> there's a "IMPLEMENTATION NOTE" that shows how the PF/VF are matched to
>>> function numbers in page 45 (I have the following copy, which is the latest
>>> revision: "Single Root I/O Virtualization and Sharing Specification Revision
>>> 1.1" from January 20 2010 [0]).
>>>
>>> The document is quite complex, but it is a standard that all SR-IOV devices
>>> should follow so AFAICT Xen should be able to get all the information that it
>>> needs from the PCI config space in order to detect the PF/VF BARs and the BDF
>>> device addresses.
>>>
>>> Roger.
>>>
>>> [0] https://members.pcisig.com/wg/PCI-SIG/document/download/8238
>>
>> I do not have access to this document, so I have to rely on Rev 1.0
>> document, but I don't think this aspect of the spec changed much.
>>
>> In any case, I am afraid I am not seeing the overall picture, but I
>> would like to comment on the last part of this discussion. Indeed, the
>> configuration space (including the SR-IOV extended capability) contains
>> all the information, but only the information necessary for the OS to
>> "enumerate" the device (PF as well as VFs). The bus and device number
>> (SBDF) assignment, and programming of the BARs, are all done during that
>> enumeration. In this discussion, which entity is doing the enumeration?
>> Xen, or Dom0?
>
> Xen needs to let Dom0 manage the device, but at the same time it needs to
> correctly map the device BARs into Dom0 physmap. I think the easiest solution
> is to let Dom0 manage the device, and Xen should setup a trap to detect Dom0
> setting the VF Enable bit (bit 0 in SR-IOV Control (08h)), at which point Xen
> will size the VF BARs (and map them into Dom0) and also enumerate the VF
> devices.

Why not using the existing hypercall? This would avoid to duplicate 
enumerating VF devices in Xen as DOM0 will exactly do the same.

I thought a bit more about the PHYSDEVOP_pci_device_add hypercall. I 
think it would be better to keep them around for ARM because we may 
still want to keep DOM0 in the loop.

For instance some PCI hostbridge might be very complex to implement in 
Xen because of the dependencies with other components (clock, power 
domain, MSI controller)... So it may make sense to keep all those 
hostbridges in Linux and having Xen config space access going through Linux.

For "simple" hostbridges (the distinction has to be defined), Xen will 
support them directly and will be able to drive them.

An hybrid approach, would allow us to get support of all the hostbridges 
without having to port all the hostbridges in Xen (and there are quite a 
lot on ARM).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
                   ` (4 preceding siblings ...)
  2017-01-19  5:09 ` Manish Jaggi
@ 2017-05-19  6:38 ` Goel, Sameer
  2017-05-19 16:48   ` Julien Grall
  5 siblings, 1 reply; 82+ messages in thread
From: Goel, Sameer @ 2017-05-19  6:38 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Jiandi An, Punit Agrawal, Steve Capper,
	alistair.francis, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné



On 12/29/2016 7:04 AM, Julien Grall wrote:

> 
> ### Finding the StreamID and DeviceID
> 
> The static table IORT (see [5]) will provide information that will help to
> deduce the StreamID and DeviceID from a given RID.
>

IORT table will also need some information on PCI seg to parse through the table
and find the required SMMU. Should, we consider the API to be similar to Linux.
This will mandate pulling in parts of fw_spec which will make the bookkeeping for 
SMMUs easier. 

Also, for arm64 will be be reusing the current definition of struct pci_device? (SBDF specifically)

- Sameer 
 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instantiate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges. Because the
> algorithm to allocate the segment is not specified, it is necessary for
> DOM0 and Xen to agree on the number before any PCI is been added.
> 
> ### Finding the StreamID and DeviceID
> 
> ### StreamID
> 
> The first binding existing (see [9]) for SMMU didn't have a way to describe the
> relationship between RID and StreamID, it was assumed that StreamID == RequesterID.
> This bindins has now been deprecated in favor of a generic binding (see [10])
> which will use the property "iommu-map" to describe the relationship between
> an RID, the associated IOMMU and the StreamID.
> 
> ### DeviceID
> 
> The relationship between the RID and the DeviceID can be found using the
> property "msi-map" (see [11]).
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in DOM0, the hypervisor does not
> have any knowledge of them. The first step of supporting PCI passthrough is
> to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
> 
> ## Discovering and register hostbridge
> 
> Both ACPI and Device Tree do not provide enough information to fully
> instantiate an host bridge driver. In the case of ACPI, some data may come
> from ASL, whilst for Device Tree the segment number is not available.
> 
> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> with all the relevant informations. This will be done via a new hypercall
> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> 
> struct physdev_pci_host_bridge_add
> {
>     /* IN */
>     uint16_t seg;
>     /* Range of bus supported by the host bridge */
>     uint8_t  bus_start;
>     uint8_t  bus_nr;
>     uint32_t res0;  /* Padding */
>     /* Information about the configuration space region */
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> }
> 
> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> bridge available on the platform. When Xen is receiving the hypercall, the
> the driver associated to the host bridge will be instantiated.
> 
> XXX: Shall we limit DOM0 the access to the configuration space from that
> moment?
> 
> ## Discovering and register PCI
> 
> Similarly to x86, PCI devices will be discovered by DOM0 and register
> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> 
> By default all the PCI devices will be assigned to DOM0. So Xen would have
> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> devices. As mentioned earlier, those subsystems will require the StreamID
> and DeviceID. Both can be deduced from the RID.
> 
> XXX: How to hide PCI devices from DOM0?
> 
> # Glossary
> 
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
> 
> # Bibliography
> 
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [5] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

-- 
 Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [early RFC] ARM PCI Passthrough design document
  2017-05-19  6:38 ` Goel, Sameer
@ 2017-05-19 16:48   ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-19 16:48 UTC (permalink / raw)
  To: Goel, Sameer, xen-devel, Stefano Stabellini
  Cc: Edgar Iglesias (edgar.iglesias@xilinx.com),
	Wei Chen, Campbell Sean, Jiandi An, Punit Agrawal, Steve Capper,
	alistair.francis, manish.jaggi, Shanker Donthineni,
	Roger Pau Monné

Hello Sameer,

On 19/05/17 07:38, Goel, Sameer wrote:
>
>
> On 12/29/2016 7:04 AM, Julien Grall wrote:
>
>>
>> ### Finding the StreamID and DeviceID
>>
>> The static table IORT (see [5]) will provide information that will help to
>> deduce the StreamID and DeviceID from a given RID.
>>
>
> IORT table will also need some information on PCI seg to parse through the table
> and find the required SMMU. Should, we consider the API to be similar to Linux.
> This will mandate pulling in parts of fw_spec which will make the bookkeeping for
> SMMUs easier.

I haven't looked closely at the code. I would say we need to pull what 
makes sense.

I would recommend you to send an RFC of your proposal where we can 
discuss about the various ways.

>
> Also, for arm64 will be be reusing the current definition of struct pci_device? (SBDF specifically)

Do you mean pci_dev in Xen? Or something else?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2017-05-19 16:49 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
2016-12-29 14:16 ` Jaggi, Manish
2016-12-29 17:03   ` Julien Grall
2016-12-29 18:41     ` Jaggi, Manish
2016-12-29 19:38       ` Julien Grall
2017-01-04  0:24 ` Stefano Stabellini
2017-01-24 14:28   ` Julien Grall
2017-01-24 20:07     ` Stefano Stabellini
2017-01-25 11:21       ` Roger Pau Monné
2017-01-25 18:53       ` Julien Grall
2017-01-31 16:53         ` Edgar E. Iglesias
2017-01-31 17:09           ` Julien Grall
2017-01-31 19:06             ` Edgar E. Iglesias
2017-01-31 22:08               ` Stefano Stabellini
2017-02-01 19:04               ` Julien Grall
2017-02-01 19:31                 ` Stefano Stabellini
2017-02-01 20:24                   ` Julien Grall
2017-02-02 15:33                 ` Edgar E. Iglesias
2017-02-02 23:12                   ` Stefano Stabellini
2017-02-02 23:44                     ` Edgar E. Iglesias
2017-02-10  1:01                       ` Stefano Stabellini
2017-02-13 15:39                         ` Julien Grall
2017-02-13 19:59                           ` Stefano Stabellini
2017-02-14 17:21                             ` Julien Grall
2017-02-14 18:20                               ` Stefano Stabellini
2017-02-14 20:18                                 ` Julien Grall
2017-02-13 15:35                   ` Julien Grall
2017-02-22  4:03                     ` Edgar E. Iglesias
2017-02-23 16:47                       ` Julien Grall
2017-03-02 21:13                         ` Edgar E. Iglesias
2017-02-02 15:40                 ` Roger Pau Monné
2017-02-13 16:22                   ` Julien Grall
2017-01-31 21:58         ` Stefano Stabellini
2017-02-01 20:12           ` Julien Grall
2017-02-01 10:55         ` Roger Pau Monné
2017-02-01 18:50           ` Stefano Stabellini
2017-02-10  9:48             ` Roger Pau Monné
2017-02-10 10:11               ` Paul Durrant
2017-02-10 12:57                 ` Roger Pau Monne
2017-02-10 13:02                   ` Paul Durrant
2017-02-10 21:04                     ` Stefano Stabellini
2017-02-02 12:38           ` Julien Grall
2017-02-02 23:06             ` Stefano Stabellini
2017-03-08 19:06               ` Julien Grall
2017-03-08 19:12                 ` Konrad Rzeszutek Wilk
2017-03-08 19:55                   ` Stefano Stabellini
2017-03-08 21:51                     ` Julien Grall
2017-03-09  2:59                   ` Roger Pau Monné
2017-03-09 11:17                     ` Konrad Rzeszutek Wilk
2017-03-09 13:26                       ` Julien Grall
2017-03-10  0:29                         ` Konrad Rzeszutek Wilk
2017-03-10  3:23                           ` Roger Pau Monné
2017-03-10 15:28                             ` Konrad Rzeszutek Wilk
2017-03-15 12:07                               ` Roger Pau Monné
2017-03-15 12:42                                 ` Konrad Rzeszutek Wilk
2017-03-15 12:56                                   ` Roger Pau Monné
2017-03-15 15:11                                     ` Venu Busireddy
2017-03-15 16:38                                       ` Roger Pau Monn?
2017-03-15 16:54                                         ` Venu Busireddy
2017-03-15 17:00                                           ` Roger Pau Monn?
2017-05-03 12:38                                             ` Julien Grall
2017-05-03 12:53                                         ` Julien Grall
2017-01-25  4:23     ` Manish Jaggi
2017-01-06 15:12 ` Roger Pau Monné
2017-01-06 21:16   ` Stefano Stabellini
2017-01-24 17:17   ` Julien Grall
2017-01-25 11:42     ` Roger Pau Monné
2017-01-31 15:59       ` Julien Grall
2017-01-31 22:03         ` Stefano Stabellini
2017-02-01 10:28           ` Roger Pau Monné
2017-02-01 18:45             ` Stefano Stabellini
2017-01-06 16:27 ` Edgar E. Iglesias
2017-01-06 21:12   ` Stefano Stabellini
2017-01-09 17:50     ` Edgar E. Iglesias
2017-01-19  5:09 ` Manish Jaggi
2017-01-24 17:43   ` Julien Grall
2017-01-25  4:37     ` Manish Jaggi
2017-01-25 15:25       ` Julien Grall
2017-01-30  7:41         ` Manish Jaggi
2017-01-31 13:33           ` Julien Grall
2017-05-19  6:38 ` Goel, Sameer
2017-05-19 16:48   ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.