All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: PCI devices passthrough on Arm design proposal
       [not found] ` <BD475825-10F6-4538-8294-931E370A602C@arm.com>
@ 2020-07-16 17:10   ` Rahul Singh
  2020-07-16 20:51     ` Stefano Stabellini
                       ` (2 more replies)
       [not found]   ` <8ac91a1b-e6b3-0f2b-0f23-d7aff100936d@xen.org>
  1 sibling, 3 replies; 62+ messages in thread
From: Rahul Singh @ 2020-07-16 17:10 UTC (permalink / raw)
  To: xen-devel; +Cc: nd, Stefano Stabellini, Roger Pau Monné, Julien Grall

Hello All,

Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.

The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.

# Title:

PCI devices passthrough on Arm design proposal

# Problem statement:

On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.

Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.

Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.

## Current state: Draft version

# Proposer(s): Rahul Singh, Bertrand Marquis

# Proposal:

This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.

# PCI Terminology:

Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.  
ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.

# Discovering PCI Host Bridge in XEN:

In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.

If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.

When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.

Limitation:
* Only PCI ECAM configuration space access is supported.
* Device tree binding is supported as of now, ACPI is not supported.
* Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required  some specific code or quirks).

# Discovering PCI devices:

PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.

PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.

#define PHYSDEVOP_pci_device_add        25
struct physdev_pci_device_add {
    uint16_t seg;
    uint8_t bus;
    uint8_t devfn;
    uint32_t flags;
    struct {
    	uint8_t bus;
    	uint8_t devfn;
    } physfn;
    /*
    * Optional parameters array.
    * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
    */
    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
    };

As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.

XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.

Limitations:
* When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
* ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
* Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).

# Enable the existing x86 virtual PCI support for ARM:

The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.

A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.

Limitation:
* No handler is register for the MSI configuration.
* Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.  

# Assign the device to the guest:

Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
	pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]

Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.

Limitation:
* As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.

# Emulated PCI device tree node in libxl:

Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.

A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.

Limitation:
* Only one PCI device tree node is supported as of now.

BAR value and IOMEM mapping:

Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.

As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
For example:
	Guest reserved IOMEM region:  0x04020000
    	Real physical IOMEM region:0x50000000
    	IOMEM size:128MB
    	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]

There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.

IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.

In the same, we mapped the assigned devices IRQ to the guest using below config options.
	irqs= [ NUMBER, NUMBER, ...]

Limitation:
* Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
* Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
* X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).

# SMMU configuration for guest:

When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory and add
configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.

# MSI/MSI-X support:
Not implement and tested as of now.

# ITS support:
Not implement and tested as of now.

Regards,
Rahul



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-16 17:10   ` RFC: PCI devices passthrough on Arm design proposal Rahul Singh
@ 2020-07-16 20:51     ` Stefano Stabellini
  2020-07-17  6:53       ` Bertrand Marquis
  2020-07-17  8:10     ` Jan Beulich
  2020-07-17 11:16     ` Roger Pau Monné
  2 siblings, 1 reply; 62+ messages in thread
From: Stefano Stabellini @ 2020-07-16 20:51 UTC (permalink / raw)
  To: Rahul Singh
  Cc: xen-devel, nd, Stefano Stabellini, Roger Pau Monné, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 10540 bytes --]

On Thu, 16 Jul 2020, Rahul Singh wrote:
> Hello All,
> 
> Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.
> 
> The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.

I think the proposal is good and I only have a couple of thoughts to
share below.


> # Title:
> 
> PCI devices passthrough on Arm design proposal
> 
> # Problem statement:
> 
> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
> 
> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
> 
> Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
> 
> ## Current state: Draft version
> 
> # Proposer(s): Rahul Singh, Bertrand Marquis
> 
> # Proposal:
> 
> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
> 
> # PCI Terminology:
> 
> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.  
> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
> 
> # Discovering PCI Host Bridge in XEN:
> 
> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
> 
> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
> 
> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
> 
> Limitation:
> * Only PCI ECAM configuration space access is supported.
> * Device tree binding is supported as of now, ACPI is not supported.
> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required  some specific code or quirks).
>
> # Discovering PCI devices:
> 
> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
> 
> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
> 
> #define PHYSDEVOP_pci_device_add        25
> struct physdev_pci_device_add {
>     uint16_t seg;
>     uint8_t bus;
>     uint8_t devfn;
>     uint32_t flags;
>     struct {
>     	uint8_t bus;
>     	uint8_t devfn;
>     } physfn;
>     /*
>     * Optional parameters array.
>     * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>     */
>     uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>     };
> 
> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
> 
> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
> 
> Limitations:
> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
> * ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
 
I think it is fine to assume that for dom0less the "firmware" has taken
care of setting up the BARs correctly. Starting with that assumption, it
looks like it should be "easy" to walk the PCI topology in Xen when/if
there is no dom0?


> # Enable the existing x86 virtual PCI support for ARM:
> 
> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
> 
> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
> 
> Limitation:
> * No handler is register for the MSI configuration.
> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.  
> 
> # Assign the device to the guest:
> 
> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
> 	pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
> 
> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
> 
> Limitation:
> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.

We need to come up with something similar for dom0less too. It could be
exactly the same thing (a list of BDFs as strings as a device tree
property) or something else if we can come up with a better idea.


> # Emulated PCI device tree node in libxl:
> 
> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> 
> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
> 
> Limitation:
> * Only one PCI device tree node is supported as of now.

I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
...] is specififed, then vpci="pci_ecam" is implied.

vpci="pci_ecam" is only useful one day in the future when we want to be
able to emulate other non-ecam host bridges. For now we could even skip
it.


> BAR value and IOMEM mapping:
> 
> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
> 
> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
> For example:
> 	Guest reserved IOMEM region:  0x04020000
>     	Real physical IOMEM region:0x50000000
>     	IOMEM size:128MB
>     	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
> 
> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
> 
> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
> 
> In the same, we mapped the assigned devices IRQ to the guest using below config options.
> 	irqs= [ NUMBER, NUMBER, ...]
> 
> Limitation:
> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
> 
> # SMMU configuration for guest:
> 
> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory and add
> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
> 
> # MSI/MSI-X support:
> Not implement and tested as of now.
> 
> # ITS support:
> Not implement and tested as of now.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-16 20:51     ` Stefano Stabellini
@ 2020-07-17  6:53       ` Bertrand Marquis
  2020-07-17  7:41         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17  6:53 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, nd, Rahul Singh, Julien Grall, Roger Pau Monné



> On 16 Jul 2020, at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Thu, 16 Jul 2020, Rahul Singh wrote:
>> Hello All,
>> 
>> Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.
>> 
>> The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.
> 
> I think the proposal is good and I only have a couple of thoughts to
> share below.
> 
> 
>> # Title:
>> 
>> PCI devices passthrough on Arm design proposal
>> 
>> # Problem statement:
>> 
>> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
>> 
>> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
>> 
>> Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
>> 
>> ## Current state: Draft version
>> 
>> # Proposer(s): Rahul Singh, Bertrand Marquis
>> 
>> # Proposal:
>> 
>> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
>> 
>> # PCI Terminology:
>> 
>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.  
>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
>> 
>> # Discovering PCI Host Bridge in XEN:
>> 
>> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
>> 
>> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
>> 
>> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
>> 
>> Limitation:
>> * Only PCI ECAM configuration space access is supported.
>> * Device tree binding is supported as of now, ACPI is not supported.
>> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required some specific code or quirks).
>> 
>> # Discovering PCI devices:
>> 
>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>> 
>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>> 
>> #define PHYSDEVOP_pci_device_add        25
>> struct physdev_pci_device_add {
>>    uint16_t seg;
>>    uint8_t bus;
>>    uint8_t devfn;
>>    uint32_t flags;
>>    struct {
>>    	uint8_t bus;
>>    	uint8_t devfn;
>>    } physfn;
>>    /*
>>    * Optional parameters array.
>>    * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>    */
>>    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>    };
>> 
>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>> 
>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>> 
>> Limitations:
>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
>> * ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
> 
> I think it is fine to assume that for dom0less the "firmware" has taken
> care of setting up the BARs correctly. Starting with that assumption, it
> looks like it should be "easy" to walk the PCI topology in Xen when/if
> there is no dom0?

Yes as we discussed during the design session, we currently think that it is the way to go.
We are for now relying on Dom0 to get the list of PCI devices but this is definitely the strategy we would like to use to have Dom0 support.
If this is working well, I even think we could get rid of the hypercall all together.

> 
> 
>> # Enable the existing x86 virtual PCI support for ARM:
>> 
>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>> 
>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>> 
>> Limitation:
>> * No handler is register for the MSI configuration.
>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.  
>> 
>> # Assign the device to the guest:
>> 
>> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
>> 	pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>> 
>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
>> 
>> Limitation:
>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
> 
> We need to come up with something similar for dom0less too. It could be
> exactly the same thing (a list of BDFs as strings as a device tree
> property) or something else if we can come up with a better idea.

Fully agree.
Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
> 
> 
>> # Emulated PCI device tree node in libxl:
>> 
>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>> 
>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>> 
>> Limitation:
>> * Only one PCI device tree node is supported as of now.
> 
> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
> ...] is specififed, then vpci="pci_ecam" is implied.
> 
> vpci="pci_ecam" is only useful one day in the future when we want to be
> able to emulate other non-ecam host bridges. For now we could even skip
> it.

This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
I agree this is not needed but removing it might create more complexity in the code.

Bertrand

> 
> 
>> BAR value and IOMEM mapping:
>> 
>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>> 
>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>> For example:
>> 	Guest reserved IOMEM region:  0x04020000
>>    	Real physical IOMEM region:0x50000000
>>    	IOMEM size:128MB
>>    	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>> 
>> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
>> 
>> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
>> 
>> In the same, we mapped the assigned devices IRQ to the guest using below config options.
>> 	irqs= [ NUMBER, NUMBER, ...]
>> 
>> Limitation:
>> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
>> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
>> 
>> # SMMU configuration for guest:
>> 
>> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory and add
>> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
>> 
>> # MSI/MSI-X support:
>> Not implement and tested as of now.
>> 
>> # ITS support:
>> Not implement and tested as of now.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  6:53       ` Bertrand Marquis
@ 2020-07-17  7:41         ` Oleksandr Andrushchenko
  2020-07-17 11:26           ` Julien Grall
                             ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Oleksandr Andrushchenko @ 2020-07-17  7:41 UTC (permalink / raw)
  To: Bertrand Marquis, Stefano Stabellini
  Cc: xen-devel, nd, Rahul Singh, Roger Pau Monné, Julien Grall


On 7/17/20 9:53 AM, Bertrand Marquis wrote:
>
>> On 16 Jul 2020, at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>
>> On Thu, 16 Jul 2020, Rahul Singh wrote:
>>> Hello All,
>>>
>>> Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.
>>>
>>> The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.
>> I think the proposal is good and I only have a couple of thoughts to
>> share below.
>>
>>
>>> # Title:
>>>
>>> PCI devices passthrough on Arm design proposal
>>>
>>> # Problem statement:
>>>
>>> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
>>>
>>> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
>>>
>>> Only Dom0 and Xen will have access to the real PCI bus,

So, in this case how is the access serialization going to work?

I mean that if both Xen and Dom0 are about to access the bus at the same time?

There was a discussion on the same before [1] and IMO it was not decided on

how to deal with that.

>>> ​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
>>>
>>> ## Current state: Draft version
>>>
>>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>>
>>> # Proposal:
>>>
>>> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
>>>
>>> # PCI Terminology:
>>>
>>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
>>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
>>>
>>> # Discovering PCI Host Bridge in XEN:
>>>
>>> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
>>>
>>> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
>>>
>>> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
>>>
>>> Limitation:
>>> * Only PCI ECAM configuration space access is supported.

This is really the limitation which we have to think of now as there are lots of

HW w/o ECAM support and not providing a way to use PCI(e) on those boards

would render them useless wrt PCI. I don't suggest to have some real code for

that, but I would suggest we design some interfaces from day 0.

At the same time I do understand that supporting non-ECAM bridges is a pain

>>> * Device tree binding is supported as of now, ACPI is not supported.
>>> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required some specific code or quirks).
>>>
>>> # Discovering PCI devices:
>>>
>>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure
Great, so we assume here that the bootloader can do the enumeration and configuration...
>>>   the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>>
>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>>
>>> #define PHYSDEVOP_pci_device_add        25
>>> struct physdev_pci_device_add {
>>>     uint16_t seg;
>>>     uint8_t bus;
>>>     uint8_t devfn;
>>>     uint32_t flags;
>>>     struct {
>>>     	uint8_t bus;
>>>     	uint8_t devfn;
>>>     } physfn;
>>>     /*
>>>     * Optional parameters array.
>>>     * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>>     */
>>>     uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>     };
>>>
>>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>>
>>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>>>
>>> Limitations:
>>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
>>> * ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>> I think it is fine to assume that for dom0less the "firmware" has taken
>> care of setting up the BARs correctly. Starting with that assumption, it
>> looks like it should be "easy" to walk the PCI topology in Xen when/if
>> there is no dom0?
> Yes as we discussed during the design session, we currently think that it is the way to go.
> We are for now relying on Dom0 to get the list of PCI devices but this is definitely the strategy we would like to use to have Dom0 support.
> If this is working well, I even think we could get rid of the hypercall all together.
...and this is the same way of configuring if enumeration happens in the bootloader?

I do support the idea we go away from PHYSDEVOP_pci_device_add, but driver domain

just signals Xen that the enumeration is done and Xen can traverse the bus by that time.

Please also note, that there are actually 3 cases possible wrt where the enumeration and

configuration happens: boot firmware, Dom0, Xen. So, it seems we

are going to have different approaches for the first two (see my comment above on

the hypercall use in Dom0). So, walking the bus ourselves in Xen seems to be good for all

the use-cases above

>
>
>>
>>> # Enable the existing x86 virtual PCI support for ARM:
>>>
>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>
>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
Just to make it clear: Dom0 still access the bus directly w/o emulation, right?
>>>
>>> Limitation:
>>> * No handler is register for the MSI configuration.
>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>>
>>> # Assign the device to the guest:
>>>
>>> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
>>> 	pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>
>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
Does this mean that we do not need to configure the bridges as those are exposed to the guest implicitly?
>>>
>>> Limitation:
>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.

So, what happens if a guest tries to access the bridge that doesn't have the assigned

PCI device? E.g. we pass PCIe_dev0 which is behind Bridge0 and the guest also sees

Bridge1 and tries to access devices behind it during the enumeration.

Could you please clarify?

>> We need to come up with something similar for dom0less too. It could be
>> exactly the same thing (a list of BDFs as strings as a device tree
>> property) or something else if we can come up with a better idea.
> Fully agree.
> Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
>>
>>> # Emulated PCI device tree node in libxl:
>>>
>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>
>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>
>>> Limitation:
>>> * Only one PCI device tree node is supported as of now.
>> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
>> ...] is specififed, then vpci="pci_ecam" is implied.
>>
>> vpci="pci_ecam" is only useful one day in the future when we want to be
>> able to emulate other non-ecam host bridges. For now we could even skip
>> it.
> This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
> I agree this is not needed but removing it might create more complexity in the code.

I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.

Having vpci allows other bridges to be supported

>
> Bertrand
>
>>
>>> BAR value and IOMEM mapping:
>>>
>>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>>
>>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>>> For example:
>>> 	Guest reserved IOMEM region:  0x04020000
>>>     	Real physical IOMEM region:0x50000000
>>>     	IOMEM size:128MB
>>>     	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>>>
>>> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
>>>
>>> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
>>>
>>> In the same, we mapped the assigned devices IRQ to the guest using below config options.
>>> 	irqs= [ NUMBER, NUMBER, ...]
>>>
>>> Limitation:
>>> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
>>> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
>>>
>>> # SMMU configuration for guest:
>>>
>>> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory

So, as the hardware domain still has access to the PCI configuration space, we

can potentially have a condition when Dom0 accesses the device. AFAIU, if we have

pci front/back then before assigning the device to the guest we unbind it from the

real driver and bind to the back. Are we going to do something similar here?


Thank you,

Oleksandr

>>>   and add
>>> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
>>>
>>> # MSI/MSI-X support:
>>> Not implement and tested as of now.
>>>
>>> # ITS support:
>>> Not implement and tested as of now.
[1] https://lists.xen.org/archives/html/xen-devel/2017-05/msg02674.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-16 17:10   ` RFC: PCI devices passthrough on Arm design proposal Rahul Singh
  2020-07-16 20:51     ` Stefano Stabellini
@ 2020-07-17  8:10     ` Jan Beulich
  2020-07-17  8:47       ` Oleksandr Andrushchenko
  2020-07-17 13:14       ` Bertrand Marquis
  2020-07-17 11:16     ` Roger Pau Monné
  2 siblings, 2 replies; 62+ messages in thread
From: Jan Beulich @ 2020-07-17  8:10 UTC (permalink / raw)
  To: Rahul Singh
  Cc: xen-devel, nd, Julien Grall, Stefano Stabellini, Roger Pau Monné

On 16.07.2020 19:10, Rahul Singh wrote:
> # Discovering PCI devices:
> 
> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
> 
> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
> 
> #define PHYSDEVOP_pci_device_add        25
> struct physdev_pci_device_add {
>     uint16_t seg;
>     uint8_t bus;
>     uint8_t devfn;
>     uint32_t flags;
>     struct {
>     	uint8_t bus;
>     	uint8_t devfn;
>     } physfn;
>     /*
>     * Optional parameters array.
>     * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>     */
>     uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>     };
> 
> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
> 
> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.

Have you had any thoughts about Dom0 re-arranging the bus numbering?
This is, afaict, a still open issue on x86 as well.

> Limitations:
> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.

I think this is a pretty severe limitation, as modern devices tend to
not support pin based interrupts anymore.

> # Emulated PCI device tree node in libxl:
> 
> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.

I support Stefano's suggestion for this to be an optional thing, i.e.
there to be no need for it when there are PCI devices assigned to the
guest anyway. I also wonder about the pci_ prefix here - isn't
vpci="ecam" as unambiguous?

> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
> 
> Limitation:
> * Only one PCI device tree node is supported as of now.
> 
> BAR value and IOMEM mapping:
> 
> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
> 
> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
> For example:
> 	Guest reserved IOMEM region:  0x04020000
>     	Real physical IOMEM region:0x50000000
>     	IOMEM size:128MB
>     	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]

This surely is planned to go away before the code hits upstream? The
ranges really should be read out of the BARs, as I see the
"limitations" section further down suggests, but it's not clear
whether "limitations" are items that you plan to take care of before
submitting your code for review.

Jan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  8:10     ` Jan Beulich
@ 2020-07-17  8:47       ` Oleksandr Andrushchenko
  2020-07-17 13:28         ` Rahul Singh
  2020-07-17 13:14       ` Bertrand Marquis
  1 sibling, 1 reply; 62+ messages in thread
From: Oleksandr Andrushchenko @ 2020-07-17  8:47 UTC (permalink / raw)
  To: Jan Beulich, Rahul Singh
  Cc: xen-devel, Roger Pau Monné, nd, Stefano Stabellini, Julien Grall


On 7/17/20 11:10 AM, Jan Beulich wrote:
> On 16.07.2020 19:10, Rahul Singh wrote:
>> # Discovering PCI devices:
>>
>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>
>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>
>> #define PHYSDEVOP_pci_device_add        25
>> struct physdev_pci_device_add {
>>      uint16_t seg;
>>      uint8_t bus;
>>      uint8_t devfn;
>>      uint32_t flags;
>>      struct {
>>      	uint8_t bus;
>>      	uint8_t devfn;
>>      } physfn;
>>      /*
>>      * Optional parameters array.
>>      * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>      */
>>      uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>      };
>>
>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>
>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
> Have you had any thoughts about Dom0 re-arranging the bus numbering?
> This is, afaict, a still open issue on x86 as well.

This can get even trickier as we may have PCI enumerated at boot time

by the firmware and then Dom0 may perform the enumeration differently.

So, Xen needs to be aware of what is going to be used as the source of the

enumeration data and be ready to re-build its internal structures in order

to be aligned with that entity: e.g. compare Dom0 and Dom0less use-cases

>
>> Limitations:
>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
> I think this is a pretty severe limitation, as modern devices tend to
> not support pin based interrupts anymore.
>
>> # Emulated PCI device tree node in libxl:
>>
>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> I support Stefano's suggestion for this to be an optional thing, i.e.
> there to be no need for it when there are PCI devices assigned to the
> guest anyway. I also wonder about the pci_ prefix here - isn't
> vpci="ecam" as unambiguous?
>
>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>
>> Limitation:
>> * Only one PCI device tree node is supported as of now.
>>
>> BAR value and IOMEM mapping:
>>
>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>
>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>> For example:
>> 	Guest reserved IOMEM region:  0x04020000
>>      	Real physical IOMEM region:0x50000000
>>      	IOMEM size:128MB
>>      	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
> This surely is planned to go away before the code hits upstream? The
> ranges really should be read out of the BARs, as I see the
> "limitations" section further down suggests, but it's not clear
> whether "limitations" are items that you plan to take care of before
> submitting your code for review.
>
> Jan
>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-16 17:10   ` RFC: PCI devices passthrough on Arm design proposal Rahul Singh
  2020-07-16 20:51     ` Stefano Stabellini
  2020-07-17  8:10     ` Jan Beulich
@ 2020-07-17 11:16     ` Roger Pau Monné
  2020-07-17 13:22       ` Bertrand Marquis
  2 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 11:16 UTC (permalink / raw)
  To: Rahul Singh; +Cc: xen-devel, nd, Stefano Stabellini, Julien Grall

I've wrapped the email to 80 columns in order to make it easier to
reply.

Thanks for doing this, I think the design is good, I have some
questions below so that I understand the full picture.

On Thu, Jul 16, 2020 at 05:10:05PM +0000, Rahul Singh wrote:
> Hello All,
> 
> Following up on discussion on PCI Passthrough support on ARM that we
> had at the XEN summit, we are submitting a Review For Comment and a
> design proposal for PCI passthrough support on ARM. Feel free to
> give your feedback.
> 
> The followings describe the high-level design proposal of the PCI
> passthrough support and how the different modules within the system
> interacts with each other to assign a particular PCI device to the
> guest.
> 
> # Title:
> 
> PCI devices passthrough on Arm design proposal
> 
> # Problem statement:
> 
> On ARM there in no support to assign a PCI device to a guest. PCI
> device passthrough capability allows guests to have full access to
> some PCI devices. PCI device passthrough allows PCI devices to
> appear and behave as if they were physically attached to the guest
> operating system and provide full isolation of the PCI devices.
> 
> Goal of this work is to also support Dom0Less configuration so the
> PCI backend/frontend drivers used on x86 shall not be used on Arm.
> It will use the existing VPCI concept from X86 and implement the
> virtual PCI bus through IO emulation such that only assigned devices
> are visible to the guest and guest can use the standard PCI
> driver.
> 
> Only Dom0 and Xen will have access to the real PCI bus, guest will
> have a direct access to the assigned device itself. IOMEM memory
> will be mapped to the guest and interrupt will be redirected to the
> guest. SMMU has to be configured correctly to have DMA
> transaction.
> 
> ## Current state: Draft version
> 
> # Proposer(s): Rahul Singh, Bertrand Marquis
> 
> # Proposal:
> 
> This section will describe the different subsystem to support the
> PCI device passthrough and how these subsystems interact with each
> other to assign a device to the guest.
> 
> # PCI Terminology:
> 
> Host Bridge: Host bridge allows the PCI devices to talk to the rest
> of the computer.  ECAM: ECAM (Enhanced Configuration Access
> Mechanism) is a mechanism developed to allow PCIe to access
> configuration space. The space available per function is 4KB.
> 
> # Discovering PCI Host Bridge in XEN:
> 
> In order to support the PCI passthrough XEN should be aware of all
> the PCI host bridges available on the system and should be able to
> access the PCI configuration space. ECAM configuration access is
> supported as of now. XEN during boot will read the PCI device tree
> node “reg” property and will map the ECAM space to the XEN memory
> using the “ioremap_nocache ()” function.

What about ACPI? I think you should also mention the MMCFG table,
which should contain the information about the ECAM region(s) (or at
least that's how it works on x86). Just realized that you don't
support ACPI ATM, so you can ignore this comment.

> 
> If there are more than one segment on the system, XEN will read the
> “linux, pci-domain” property from the device tree node and configure
> the host bridge segment number accordingly. All the PCI device tree
> nodes should have the “linux,pci-domain” property so that there will
> be no conflicts. During hardware domain boot Linux will also use the
> same “linux,pci-domain” property and assign the domain number to the
> host bridge.

So it's my understanding that the PCI domain (or segment) is just an
abstract concept to differentiate all the Root Complex present on
the system, but the host bridge itself it's not aware of the segment
assigned to it in any way.

I'm not sure Xen and the hardware domain having matching segments is a
requirement, if you use vPCI you can match the segment (from Xen's
PoV) by just checking from which ECAM region the access has been
performed.

The only reason to require matching segment values between Xen and the
hardware domain is to allow using hypercalls against the PCI devices,
ie: to be able to use hypercalls to assign a device to a domain from
the hardware domain.

I have 0 understanding of DT or it's spec, but why does this have a
'linux,' prefix? The segment number is part of the PCI spec, and not
something specific to Linux IMO.

> 
> When Dom0 tries to access the PCI config space of the device, XEN
> will find the corresponding host bridge based on segment number and
> access the corresponding config space assigned to that bridge.
> 
> Limitation:
> * Only PCI ECAM configuration space access is supported.
> * Device tree binding is supported as of now, ACPI is not supported.
> * Need to port the PCI host bridge access code to XEN to access the
>   configuration space (generic one works but lots of platforms will
>   required  some specific code or quirks).
> 
> # Discovering PCI devices:
> 
> PCI-PCIe enumeration is a process of detecting devices connected to
> its host. It is the responsibility of the hardware domain or boot
> firmware to do the PCI enumeration and configure the BAR, PCI
> capabilities, and MSI/MSI-X configuration.
> 
> PCI-PCIe enumeration in XEN is not feasible for the configuration
> part as it would require a lot of code inside Xen which would
> require a lot of maintenance. Added to this many platforms require
> some quirks in that part of the PCI code which would greatly improve
> Xen complexity. Once hardware domain enumerates the device then it
> will communicate to XEN via the below hypercall.
> 
> #define PHYSDEVOP_pci_device_add        25 struct
> physdev_pci_device_add {
>     uint16_t seg;
>     uint8_t bus;
>     uint8_t devfn;
>     uint32_t flags;
>     struct {
>         uint8_t bus;
>         uint8_t devfn;
>     } physfn;
>     /*
>      * Optional parameters array.
>      * First element ([0]) is PXM domain associated with the device (if
>      * XEN_PCI_DEV_PXM is set)
>      */
>     uint32_t optarr[XEN_FLEX_ARRAY_DIM];
> };
> 
> As the hypercall argument has the PCI segment number, XEN will
> access the PCI config space based on this segment number and find
> the host-bridge corresponding to this segment number. At this stage
> host bridge is fully initialized so there will be no issue to access
> the config space.
> 
> XEN will add the PCI devices in the linked list maintain in XEN
> using the function pci_add_device(). XEN will be aware of all the
> PCI devices on the system and all the device will be added to the
> hardware domain.
> 
> Limitations:
> * When PCI devices are added to XEN, MSI capability is
>   not initialized inside XEN and not supported as of now.

I assume you will mask such capability and will prevent the guest (or
hardware domain) from interacting with it?

> * ACS capability is disable for ARM as of now as after enabling it
>   devices are not accessible.
> * Dom0Less implementation will require to have the capacity inside Xen
>   to discover the PCI devices (without depending on Dom0 to declare them
>   to Xen).

I assume the firmware will properly initialize the host bridge and
configure the resources for each device, so that Xen just has to walk
the PCI space and find the devices.

TBH that would be my preferred method, because then you can get rid of
the hypercall.

Is there anyway for Xen to know whether the host bridge is properly
setup and thus the PCI bus can be scanned?

That way Arm could do something similar to x86, where Xen will scan
the bus and discover devices, but you could still provide the
hypercall in case the bus cannot be scanned by Xen (because it hasn't
been setup).

> 
> # Enable the existing x86 virtual PCI support for ARM:
> 
> The existing VPCI support available for X86 is adapted for Arm. When
> the device is added to XEN via the hyper call
> “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access
> is added to the PCI device to emulate the PCI devices.
> 
> A MMIO trap handler for the PCI ECAM space is registered in XEN so
> that when guest is trying to access the PCI config space, XEN will
> trap the access and emulate read/write using the VPCI and not the
> real PCI hardware.
> 
> Limitation:
> * No handler is register for the MSI configuration.

But you need to mask MSI/MSI-X capabilities in the config space in
order to prevent access from domains? (and by mask I mean remove from
the list of capabilities and prevent reads/writes to that
configuration space).

Note this is already implemented for x86, and I've tried to add arch_
hooks for arch specific stuff so that it could be reused by Arm. But
maybe this would require a different design document?

> * Only legacy interrupt is supported and tested as of now, MSI is not
>   implemented and tested.
> 
> # Assign the device to the guest:
> 
> Assign the PCI device from the hardware domain to the guest is done
> using the below guest config option. When xl tool create the domain,
> PCI devices will be assigned to the guest VPCI bus.
>
> pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
> 
> Guest will be only able to access the assigned devices and see the
> bridges. Guest will not be able to access or see the devices that
> are no assigned to him.
> 
> Limitation:
> * As of now all the bridges in the PCI bus are seen by
>   the guest on the VPCI bus.

I don't think you need all of them, just the ones that are higher up
on the hierarchy of the device you are trying to passthrough?

Which kind of access do guest have to PCI bridges config space?

This should be limited to read-only accesses in order to be safe.

Emulating a PCI bridge in Xen using vPCI shouldn't be that
complicated, so you could likely replace the real bridges with
emulated ones. Or even provide a fake topology to the guest using an
emulated bridge.

> 
> # Emulated PCI device tree node in libxl:
> 
> Libxl is creating a virtual PCI device tree node in the device tree
> to enable the guest OS to discover the virtual PCI during guest
> boot. We introduced the new config option [vpci="pci_ecam"] for
> guests. When this config option is enabled in a guest configuration,
> a PCI device tree node will be created in the guest device tree.
> 
> A new area has been reserved in the arm guest physical map at which
> the VPCI bus is declared in the device tree (reg and ranges
> parameters of the node). A trap handler for the PCI ECAM access from
> guest has been registered at the defined address and redirects
> requests to the VPCI driver in Xen.

Can't you deduce the requirement of such DT node based on the presence
of a 'pci=' option in the same config file?

Also I wouldn't discard that in the future you might want to use
different emulators for different devices, so it might be helpful to
introduce something like:

pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]

For the time being Arm will require backend=vpci for all the passed
through devices, but I wouldn't rule out this changing in the future.

> Limitation:
> * Only one PCI device tree node is supported as of now.
> 
> BAR value and IOMEM mapping:
> 
> Linux guest will do the PCI enumeration based on the area reserved
> for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI
> device is assigned to the guest, XEN will map the guest PCI IOMEM
> region to the real physical IOMEM region only for the assigned
> devices.

PCI IOMEM == BARs? Or are you referring to the ECAM access window?

> As of now we have not modified the existing VPCI code to map the
> guest PCI IOMEM region to the real physical IOMEM region. We used
> the existing guest “iomem” config option to map the region.  For
> example: Guest reserved IOMEM region:  0x04020000 Real physical
> IOMEM region:0x50000000 IOMEM size:128MB iomem config will be:
> iomem = ["0x50000,0x8000@0x4020"]
> 
> There is no need to map the ECAM space as XEN already have access to
> the ECAM space and XEN will trap ECAM accesses from the guest and
> will perform read/write on the VPCI bus.
> 
> IOMEM access will not be trapped and the guest will directly access
> the IOMEM region of the assigned device via stage-2 translation.
> 
> In the same, we mapped the assigned devices IRQ to the guest using
> below config options.  irqs= [ NUMBER, NUMBER, ...]

Are you providing this for the hardware domain also? Or are irqs
fetched from the DT in that case?

> Limitation:
> * Need to avoid the “iomem” and “irq” guest config
>   options and map the IOMEM region and IRQ at the same time when
>   device is assigned to the guest using the “pci” guest config options
>   when xl creates the domain.
> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
>   address.

It was my understanding that you would identity map the BAR into the
domU stage-2 translation, and that changes by the guest won't be
allowed.

> * X86 mapping code should be ported on Arm so that the stage-2
>   translation is adapted when the guest is doing a modification of the
>   BAR registers values (to map the address requested by the guest for
>   a specific IOMEM to the address actually contained in the real BAR
>   register of the corresponding device).

I think the above means that you want to allow the guest to change the
position of the BAR in the stage-2 translation _without_ allowing it
to change the position of the BAR in the physical memory map, is that
correct?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  7:41         ` Oleksandr Andrushchenko
@ 2020-07-17 11:26           ` Julien Grall
  2020-07-17 11:41             ` Oleksandr Andrushchenko
  2020-07-17 12:46           ` Rahul Singh
  2020-07-17 13:12           ` Bertrand Marquis
  2 siblings, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-17 11:26 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Bertrand Marquis, Stefano Stabellini
  Cc: xen-devel, nd, Rahul Singh, Roger Pau Monné, Julien Grall



On 17/07/2020 08:41, Oleksandr Andrushchenko wrote:
>>> We need to come up with something similar for dom0less too. It could be
>>> exactly the same thing (a list of BDFs as strings as a device tree
>>> property) or something else if we can come up with a better idea.
>> Fully agree.
>> Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
>>>
>>>> # Emulated PCI device tree node in libxl:
>>>>
>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>
>>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>>
>>>> Limitation:
>>>> * Only one PCI device tree node is supported as of now.
>>> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
>>> ...] is specififed, then vpci="pci_ecam" is implied.
>>>
>>> vpci="pci_ecam" is only useful one day in the future when we want to be
>>> able to emulate other non-ecam host bridges. For now we could even skip
>>> it.
>> This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
>> I agree this is not needed but removing it might create more complexity in the code.
> 
> I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.
> 
> Having vpci allows other bridges to be supported

So I can understand why you would want to have a driver for non-ECAM 
host PCI controller. However, why would you want to emulate a non-ECAM 
PCI controller to a guest?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 11:26           ` Julien Grall
@ 2020-07-17 11:41             ` Oleksandr Andrushchenko
  2020-07-17 13:21               ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Oleksandr Andrushchenko @ 2020-07-17 11:41 UTC (permalink / raw)
  To: Julien Grall, Bertrand Marquis, Stefano Stabellini
  Cc: xen-devel, nd, Rahul Singh, Julien Grall, Roger Pau Monné


On 7/17/20 2:26 PM, Julien Grall wrote:
>
>
> On 17/07/2020 08:41, Oleksandr Andrushchenko wrote:
>>>> We need to come up with something similar for dom0less too. It could be
>>>> exactly the same thing (a list of BDFs as strings as a device tree
>>>> property) or something else if we can come up with a better idea.
>>> Fully agree.
>>> Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
>>>>
>>>>> # Emulated PCI device tree node in libxl:
>>>>>
>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>
>>>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>>>
>>>>> Limitation:
>>>>> * Only one PCI device tree node is supported as of now.
>>>> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
>>>> ...] is specififed, then vpci="pci_ecam" is implied.
>>>>
>>>> vpci="pci_ecam" is only useful one day in the future when we want to be
>>>> able to emulate other non-ecam host bridges. For now we could even skip
>>>> it.
>>> This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
>>> I agree this is not needed but removing it might create more complexity in the code.
>>
>> I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.
>>
>> Having vpci allows other bridges to be supported
>
> So I can understand why you would want to have a driver for non-ECAM host PCI controller. However, why would you want to emulate a non-ECAM PCI controller to a guest?
Indeed. No need to emulate non-ECAM
>
> Cheers,
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  7:41         ` Oleksandr Andrushchenko
  2020-07-17 11:26           ` Julien Grall
@ 2020-07-17 12:46           ` Rahul Singh
  2020-07-17 12:55             ` Jan Beulich
  2020-07-17 13:12           ` Bertrand Marquis
  2 siblings, 1 reply; 62+ messages in thread
From: Rahul Singh @ 2020-07-17 12:46 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Stefano Stabellini, Roger Pau Monné,
	Bertrand Marquis, xen-devel, nd, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 15510 bytes --]

Sorry for previous mail formatting issue. Replying again so that any comment history should not missed.

On 17 Jul 2020, at 8:41 am, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com<mailto:Oleksandr_Andrushchenko@epam.com>> wrote:


On 7/17/20 9:53 AM, Bertrand Marquis wrote:

On 16 Jul 2020, at 22:51, Stefano Stabellini <sstabellini@kernel.org<mailto:sstabellini@kernel.org>> wrote:

On Thu, 16 Jul 2020, Rahul Singh wrote:
Hello All,

Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.

The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.
I think the proposal is good and I only have a couple of thoughts to
share below.


# Title:

PCI devices passthrough on Arm design proposal

# Problem statement:

On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.

Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.

Only Dom0 and Xen will have access to the real PCI bus,

So, in this case how is the access serialization going to work?

I mean that if both Xen and Dom0 are about to access the bus at the same time?

There was a discussion on the same before [1] and IMO it was not decided on

how to deal with that.

DOM0 also access the real PCI hardware via MMIO config space trap in XEN. We will take care of access the config space lock in XEN.

​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.

## Current state: Draft version

# Proposer(s): Rahul Singh, Bertrand Marquis

# Proposal:

This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.

# PCI Terminology:

Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.

# Discovering PCI Host Bridge in XEN:

In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.

If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.

When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.

Limitation:
* Only PCI ECAM configuration space access is supported.

This is really the limitation which we have to think of now as there are lots of

HW w/o ECAM support and not providing a way to use PCI(e) on those boards

would render them useless wrt PCI. I don't suggest to have some real code for

that, but I would suggest we design some interfaces from day 0.

At the same time I do understand that supporting non-ECAM bridges is a pain

Adding any type of host bridge is supported, we did put the ECAM specific code in an identifed source file so that other types can be implemented. As of now we have implemented the ECAM support and we are implementing right now support for N1SDP which requires specific quirks which will be done in a separate source file.


* Device tree binding is supported as of now, ACPI is not supported.
* Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required some specific code or quirks).

# Discovering PCI devices:

PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure
Great, so we assume here that the bootloader can do the enumeration and configuration...
 the BAR, PCI capabilities, and MSI/MSI-X configuration.

PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.

#define PHYSDEVOP_pci_device_add        25
struct physdev_pci_device_add {
   uint16_t seg;
   uint8_t bus;
   uint8_t devfn;
   uint32_t flags;
   struct {
    uint8_t bus;
    uint8_t devfn;
   } physfn;
   /*
   * Optional parameters array.
   * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
   */
   uint32_t optarr[XEN_FLEX_ARRAY_DIM];
   };

As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.

XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.

Limitations:
* When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
* ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
* Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
I think it is fine to assume that for dom0less the "firmware" has taken
care of setting up the BARs correctly. Starting with that assumption, it
looks like it should be "easy" to walk the PCI topology in Xen when/if
there is no dom0?
Yes as we discussed during the design session, we currently think that it is the way to go.
We are for now relying on Dom0 to get the list of PCI devices but this is definitely the strategy we would like to use to have Dom0 support.
If this is working well, I even think we could get rid of the hypercall all together.
...and this is the same way of configuring if enumeration happens in the bootloader?

I do support the idea we go away from PHYSDEVOP_pci_device_add, but driver domain

just signals Xen that the enumeration is done and Xen can traverse the bus by that time.

Please also note, that there are actually 3 cases possible wrt where the enumeration and

configuration happens: boot firmware, Dom0, Xen. So, it seems we

are going to have different approaches for the first two (see my comment above on

the hypercall use in Dom0). So, walking the bus ourselves in Xen seems to be good for all

the use-cases above


In that case we may have to implement a new hypercall to inform XEN that enumeration is complete and now scan the devices. We could tell Xen to delay its enumeration until this hypercall is called using a xen command line parameter. This way when this is not required because the firmware did the enumeration, we can properly support Dom0Less.




# Enable the existing x86 virtual PCI support for ARM:

The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.

A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
Just to make it clear: Dom0 still access the bus directly w/o emulation, right?

No.Once Xen has done his PCI enumeration (either on boot or after an hypercall from the hardware domain), only Xen will access the physical PCI bus, everybody else will go through VPCI.


Limitation:
* No handler is register for the MSI configuration.
* Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.

# Assign the device to the guest:

Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]

Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
Does this mean that we do not need to configure the bridges as those are exposed to the guest implicitly?

Limitation:
* As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.

So, what happens if a guest tries to access the bridge that doesn't have the assigned

PCI device? E.g. we pass PCIe_dev0 which is behind Bridge0 and the guest also sees

Bridge1 and tries to access devices behind it during the enumeration.

Could you please clarify?

The bridges are only accessible in read-only and cannot be modified. Even though a guest would see the bridge, the VPCI will only show the assigned devices behind it. If there is no device behind that bridge assigned to the guest, the guest will see an empty bus behind that bridge.


We need to come up with something similar for dom0less too. It could be
exactly the same thing (a list of BDFs as strings as a device tree
property) or something else if we can come up with a better idea.
Fully agree.
Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.

# Emulated PCI device tree node in libxl:

Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.

A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.

Limitation:
* Only one PCI device tree node is supported as of now.
I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
...] is specififed, then vpci="pci_ecam" is implied.

vpci="pci_ecam" is only useful one day in the future when we want to be
able to emulate other non-ecam host bridges. For now we could even skip
it.
This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
I agree this is not needed but removing it might create more complexity in the code.

I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.

Having vpci allows other bridges to be supported

Yes we agree.



Bertrand


BAR value and IOMEM mapping:

Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.

As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
For example:
Guest reserved IOMEM region:  0x04020000
    Real physical IOMEM region:0x50000000
    IOMEM size:128MB
    iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]

There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.

IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.

In the same, we mapped the assigned devices IRQ to the guest using below config options.
irqs= [ NUMBER, NUMBER, ...]

Limitation:
* Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
* Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
* X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).

# SMMU configuration for guest:

When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory

So, as the hardware domain still has access to the PCI configuration space, we

can potentially have a condition when Dom0 accesses the device. AFAIU, if we have

pci front/back then before assigning the device to the guest we unbind it from the

real driver and bind to the back. Are we going to do something similar here?

Yes we have to unbind the driver from the hardware domain before assigning the device to the guest. Also as soon as Xen has done his PCI enumeration (either on boot or after an hypercall from the hardware domain), only Xen will access the physical PCI bus, everybody else will go through VPCI.

- Rahul



Thank you,

Oleksandr

 and add
configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.

# MSI/MSI-X support:
Not implement and tested as of now.

# ITS support:
Not implement and tested as of now.
[1] https://lists.xen.org/archives/html/xen-devel/2017-05/msg02674.html


[-- Attachment #2: Type: text/html, Size: 99107 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 12:46           ` Rahul Singh
@ 2020-07-17 12:55             ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2020-07-17 12:55 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Stefano Stabellini, Julien Grall, Oleksandr Andrushchenko,
	Bertrand Marquis, xen-devel, nd, Roger Pau Monné

On 17.07.2020 14:46, Rahul Singh wrote:
> Sorry for previous mail formatting issue. Replying again so that any comment history should not missed.

I'm sorry, but from a plain text view I cannot determine what parts
your replies were (in fact all nesting of prior replies is lost).
Please can you arrange for suitable reply quoting in your mail
client, using plain text mails? (Leaving all prior text in place
below, for you to see what it is that at least some people got to
see.)

Jan

> On 17 Jul 2020, at 8:41 am, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com<mailto:Oleksandr_Andrushchenko@epam.com>> wrote:
> 
> 
> On 7/17/20 9:53 AM, Bertrand Marquis wrote:
> 
> On 16 Jul 2020, at 22:51, Stefano Stabellini <sstabellini@kernel.org<mailto:sstabellini@kernel.org>> wrote:
> 
> On Thu, 16 Jul 2020, Rahul Singh wrote:
> Hello All,
> 
> Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.
> 
> The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.
> I think the proposal is good and I only have a couple of thoughts to
> share below.
> 
> 
> # Title:
> 
> PCI devices passthrough on Arm design proposal
> 
> # Problem statement:
> 
> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
> 
> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
> 
> Only Dom0 and Xen will have access to the real PCI bus,
> 
> So, in this case how is the access serialization going to work?
> 
> I mean that if both Xen and Dom0 are about to access the bus at the same time?
> 
> There was a discussion on the same before [1] and IMO it was not decided on
> 
> how to deal with that.
> 
> DOM0 also access the real PCI hardware via MMIO config space trap in XEN. We will take care of access the config space lock in XEN.
> 
> ​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
> 
> ## Current state: Draft version
> 
> # Proposer(s): Rahul Singh, Bertrand Marquis
> 
> # Proposal:
> 
> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
> 
> # PCI Terminology:
> 
> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
> 
> # Discovering PCI Host Bridge in XEN:
> 
> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
> 
> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
> 
> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
> 
> Limitation:
> * Only PCI ECAM configuration space access is supported.
> 
> This is really the limitation which we have to think of now as there are lots of
> 
> HW w/o ECAM support and not providing a way to use PCI(e) on those boards
> 
> would render them useless wrt PCI. I don't suggest to have some real code for
> 
> that, but I would suggest we design some interfaces from day 0.
> 
> At the same time I do understand that supporting non-ECAM bridges is a pain
> 
> Adding any type of host bridge is supported, we did put the ECAM specific code in an identifed source file so that other types can be implemented. As of now we have implemented the ECAM support and we are implementing right now support for N1SDP which requires specific quirks which will be done in a separate source file.
> 
> 
> * Device tree binding is supported as of now, ACPI is not supported.
> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required some specific code or quirks).
> 
> # Discovering PCI devices:
> 
> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure
> Great, so we assume here that the bootloader can do the enumeration and configuration...
>  the BAR, PCI capabilities, and MSI/MSI-X configuration.
> 
> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
> 
> #define PHYSDEVOP_pci_device_add        25
> struct physdev_pci_device_add {
>    uint16_t seg;
>    uint8_t bus;
>    uint8_t devfn;
>    uint32_t flags;
>    struct {
>     uint8_t bus;
>     uint8_t devfn;
>    } physfn;
>    /*
>    * Optional parameters array.
>    * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>    */
>    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>    };
> 
> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
> 
> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
> 
> Limitations:
> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
> * ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
> I think it is fine to assume that for dom0less the "firmware" has taken
> care of setting up the BARs correctly. Starting with that assumption, it
> looks like it should be "easy" to walk the PCI topology in Xen when/if
> there is no dom0?
> Yes as we discussed during the design session, we currently think that it is the way to go.
> We are for now relying on Dom0 to get the list of PCI devices but this is definitely the strategy we would like to use to have Dom0 support.
> If this is working well, I even think we could get rid of the hypercall all together.
> ...and this is the same way of configuring if enumeration happens in the bootloader?
> 
> I do support the idea we go away from PHYSDEVOP_pci_device_add, but driver domain
> 
> just signals Xen that the enumeration is done and Xen can traverse the bus by that time.
> 
> Please also note, that there are actually 3 cases possible wrt where the enumeration and
> 
> configuration happens: boot firmware, Dom0, Xen. So, it seems we
> 
> are going to have different approaches for the first two (see my comment above on
> 
> the hypercall use in Dom0). So, walking the bus ourselves in Xen seems to be good for all
> 
> the use-cases above
> 
> 
> In that case we may have to implement a new hypercall to inform XEN that enumeration is complete and now scan the devices. We could tell Xen to delay its enumeration until this hypercall is called using a xen command line parameter. This way when this is not required because the firmware did the enumeration, we can properly support Dom0Less.
> 
> 
> 
> 
> # Enable the existing x86 virtual PCI support for ARM:
> 
> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
> 
> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
> Just to make it clear: Dom0 still access the bus directly w/o emulation, right?
> 
> No.Once Xen has done his PCI enumeration (either on boot or after an hypercall from the hardware domain), only Xen will access the physical PCI bus, everybody else will go through VPCI.
> 
> 
> Limitation:
> * No handler is register for the MSI configuration.
> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
> 
> # Assign the device to the guest:
> 
> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
> pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
> 
> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
> Does this mean that we do not need to configure the bridges as those are exposed to the guest implicitly?
> 
> Limitation:
> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
> 
> So, what happens if a guest tries to access the bridge that doesn't have the assigned
> 
> PCI device? E.g. we pass PCIe_dev0 which is behind Bridge0 and the guest also sees
> 
> Bridge1 and tries to access devices behind it during the enumeration.
> 
> Could you please clarify?
> 
> The bridges are only accessible in read-only and cannot be modified. Even though a guest would see the bridge, the VPCI will only show the assigned devices behind it. If there is no device behind that bridge assigned to the guest, the guest will see an empty bus behind that bridge.
> 
> 
> We need to come up with something similar for dom0less too. It could be
> exactly the same thing (a list of BDFs as strings as a device tree
> property) or something else if we can come up with a better idea.
> Fully agree.
> Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
> 
> # Emulated PCI device tree node in libxl:
> 
> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> 
> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
> 
> Limitation:
> * Only one PCI device tree node is supported as of now.
> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
> ...] is specififed, then vpci="pci_ecam" is implied.
> 
> vpci="pci_ecam" is only useful one day in the future when we want to be
> able to emulate other non-ecam host bridges. For now we could even skip
> it.
> This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
> I agree this is not needed but removing it might create more complexity in the code.
> 
> I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.
> 
> Having vpci allows other bridges to be supported
> 
> Yes we agree.
> 
> 
> 
> Bertrand
> 
> 
> BAR value and IOMEM mapping:
> 
> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
> 
> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
> For example:
> Guest reserved IOMEM region:  0x04020000
>     Real physical IOMEM region:0x50000000
>     IOMEM size:128MB
>     iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
> 
> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
> 
> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
> 
> In the same, we mapped the assigned devices IRQ to the guest using below config options.
> irqs= [ NUMBER, NUMBER, ...]
> 
> Limitation:
> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
> 
> # SMMU configuration for guest:
> 
> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory
> 
> So, as the hardware domain still has access to the PCI configuration space, we
> 
> can potentially have a condition when Dom0 accesses the device. AFAIU, if we have
> 
> pci front/back then before assigning the device to the guest we unbind it from the
> 
> real driver and bind to the back. Are we going to do something similar here?
> 
> Yes we have to unbind the driver from the hardware domain before assigning the device to the guest. Also as soon as Xen has done his PCI enumeration (either on boot or after an hypercall from the hardware domain), only Xen will access the physical PCI bus, everybody else will go through VPCI.
> 
> - Rahul
> 
> 
> 
> Thank you,
> 
> Oleksandr
> 
>  and add
> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
> 
> # MSI/MSI-X support:
> Not implement and tested as of now.
> 
> # ITS support:
> Not implement and tested as of now.
> [1] https://lists.xen.org/archives/html/xen-devel/2017-05/msg02674.html
> 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  7:41         ` Oleksandr Andrushchenko
  2020-07-17 11:26           ` Julien Grall
  2020-07-17 12:46           ` Rahul Singh
@ 2020-07-17 13:12           ` Bertrand Marquis
  2 siblings, 0 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 13:12 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh, xen-devel, nd,
	Roger Pau Monné

Hi,

I will reply for Rahul until he gets his mail client fixed.

> On 17 Jul 2020, at 09:41, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com> wrote:
> 
> 
> On 7/17/20 9:53 AM, Bertrand Marquis wrote:
>> 
>>> On 16 Jul 2020, at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>> 
>>> On Thu, 16 Jul 2020, Rahul Singh wrote:
>>>> Hello All,
>>>> 
>>>> Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.
>>>> 
>>>> The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.
>>> I think the proposal is good and I only have a couple of thoughts to
>>> share below.
>>> 
>>> 
>>>> # Title:
>>>> 
>>>> PCI devices passthrough on Arm design proposal
>>>> 
>>>> # Problem statement:
>>>> 
>>>> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
>>>> 
>>>> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
>>>> 
>>>> Only Dom0 and Xen will have access to the real PCI bus,
> 
> So, in this case how is the access serialization going to work?
> 
> I mean that if both Xen and Dom0 are about to access the bus at the same time?
> 
> There was a discussion on the same before [1] and IMO it was not decided on
> 
> how to deal with that.

DOM0 also access the real PCI hardware via MMIO config space trap in XEN. We will take care of locking access to the config space in XEN. 

> 
>>>> ​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
>>>> 
>>>> ## Current state: Draft version
>>>> 
>>>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>>> 
>>>> # Proposal:
>>>> 
>>>> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
>>>> 
>>>> # PCI Terminology:
>>>> 
>>>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
>>>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
>>>> 
>>>> # Discovering PCI Host Bridge in XEN:
>>>> 
>>>> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
>>>> 
>>>> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
>>>> 
>>>> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
>>>> 
>>>> Limitation:
>>>> * Only PCI ECAM configuration space access is supported.
> 
> This is really the limitation which we have to think of now as there are lots of
> 
> HW w/o ECAM support and not providing a way to use PCI(e) on those boards
> 
> would render them useless wrt PCI. I don't suggest to have some real code for
> 
> that, but I would suggest we design some interfaces from day 0.
> 
> At the same time I do understand that supporting non-ECAM bridges is a pain

Adding any type of host bridge is supported, we did put the ECAM specific code in an identifed source file so that other types can be implemented. As of now we have implemented the ECAM support and we are implementing right now support for N1SDP which requires specific quirks which will be done in a separate source file.

> 
>>>> * Device tree binding is supported as of now, ACPI is not supported.
>>>> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required some specific code or quirks).
>>>> 
>>>> # Discovering PCI devices:
>>>> 
>>>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure
> Great, so we assume here that the bootloader can do the enumeration and configuration...

In case where nobody before Xen does, we will need the hardware domain to do it and maybe have an hypercall to signal to Xen when it can detect the PCI devices.

>>>>  the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>>> 
>>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>>> 
>>>> #define PHYSDEVOP_pci_device_add        25
>>>> struct physdev_pci_device_add {
>>>>    uint16_t seg;
>>>>    uint8_t bus;
>>>>    uint8_t devfn;
>>>>    uint32_t flags;
>>>>    struct {
>>>>    	uint8_t bus;
>>>>    	uint8_t devfn;
>>>>    } physfn;
>>>>    /*
>>>>    * Optional parameters array.
>>>>    * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>>>    */
>>>>    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>>    };
>>>> 
>>>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>>> 
>>>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>>>> 
>>>> Limitations:
>>>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
>>>> * ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>> I think it is fine to assume that for dom0less the "firmware" has taken
>>> care of setting up the BARs correctly. Starting with that assumption, it
>>> looks like it should be "easy" to walk the PCI topology in Xen when/if
>>> there is no dom0?
>> Yes as we discussed during the design session, we currently think that it is the way to go.
>> We are for now relying on Dom0 to get the list of PCI devices but this is definitely the strategy we would like to use to have Dom0 support.
>> If this is working well, I even think we could get rid of the hypercall all together.
> ...and this is the same way of configuring if enumeration happens in the bootloader?
> 
> I do support the idea we go away from PHYSDEVOP_pci_device_add, but driver domain
> 
> just signals Xen that the enumeration is done and Xen can traverse the bus by that time.
> 
> Please also note, that there are actually 3 cases possible wrt where the enumeration and
> 
> configuration happens: boot firmware, Dom0, Xen. So, it seems we
> 
> are going to have different approaches for the first two (see my comment above on
> 
> the hypercall use in Dom0). So, walking the bus ourselves in Xen seems to be good for all
> 
> the use-cases above

In that case we may have to implement a new hypercall to inform XEN that enumeration is complete and now scan the devices. We could tell Xen to delay its enumeration until this hypercall is called using a xen command line parameter. This way when this is not required because the firmware did the enumeration, we can properly support Dom0Less.

> 
>> 
>> 
>>> 
>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>> 
>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>> 
>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
> Just to make it clear: Dom0 still access the bus directly w/o emulation, right?

Only if Dom0 does the enumeration. If not or after it has done the enumeration, everybody will use the VPCI and only Xen will access the real bus. 

>>>> 
>>>> Limitation:
>>>> * No handler is register for the MSI configuration.
>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>>> 
>>>> # Assign the device to the guest:
>>>> 
>>>> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
>>>> 	pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>> 
>>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
> Does this mean that we do not need to configure the bridges as those are exposed to the guest implicitly?

Yes

>>>> 
>>>> Limitation:
>>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
> 
> So, what happens if a guest tries to access the bridge that doesn't have the assigned
> 
> PCI device? E.g. we pass PCIe_dev0 which is behind Bridge0 and the guest also sees
> 
> Bridge1 and tries to access devices behind it during the enumeration.
> 
> Could you please clarify?

The bridges are only accessible in read-only and cannot be modified. Even though a guest would see the bridge, the VPCI will only show the assigned devices behind it. If there is no device behind that bridge assigned to the guest, the guest will see an empty bus behind that bridge. 

> 
>>> We need to come up with something similar for dom0less too. It could be
>>> exactly the same thing (a list of BDFs as strings as a device tree
>>> property) or something else if we can come up with a better idea.
>> Fully agree.
>> Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
>>> 
>>>> # Emulated PCI device tree node in libxl:
>>>> 
>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>> 
>>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>> 
>>>> Limitation:
>>>> * Only one PCI device tree node is supported as of now.
>>> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
>>> ...] is specififed, then vpci="pci_ecam" is implied.
>>> 
>>> vpci="pci_ecam" is only useful one day in the future when we want to be
>>> able to emulate other non-ecam host bridges. For now we could even skip
>>> it.
>> This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
>> I agree this is not needed but removing it might create more complexity in the code.
> 
> I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.
> 
> Having vpci allows other bridges to be supported

Yes we agree.

> 
>> 
>> Bertrand
>> 
>>> 
>>>> BAR value and IOMEM mapping:
>>>> 
>>>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>>> 
>>>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>>>> For example:
>>>> 	Guest reserved IOMEM region:  0x04020000
>>>>    	Real physical IOMEM region:0x50000000
>>>>    	IOMEM size:128MB
>>>>    	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>>>> 
>>>> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
>>>> 
>>>> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
>>>> 
>>>> In the same, we mapped the assigned devices IRQ to the guest using below config options.
>>>> 	irqs= [ NUMBER, NUMBER, ...]
>>>> 
>>>> Limitation:
>>>> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
>>>> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
>>>> 
>>>> # SMMU configuration for guest:
>>>> 
>>>> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory
> 
> So, as the hardware domain still has access to the PCI configuration space, we
> 
> can potentially have a condition when Dom0 accesses the device. AFAIU, if we have
> 
> pci front/back then before assigning the device to the guest we unbind it from the
> 
> real driver and bind to the back. Are we going to do something similar here?

Yes we have to unbind the driver from the hardware domain before assigning the device to the guest. Also as soon as Xen has done his PCI enumeration (either on boot or after an hypercall from the hardware domain), only Xen will access the physical PCI bus, everybody else will go through VPCI. 

Bertrand (and Rahul)

> 
> 
> Thank you,
> 
> Oleksandr
> 
>>>>  and add
>>>> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
>>>> 
>>>> # MSI/MSI-X support:
>>>> Not implement and tested as of now.
>>>> 
>>>> # ITS support:
>>>> Not implement and tested as of now.
> [1] https://lists.xen.org/archives/html/xen-devel/2017-05/msg02674.html


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  8:10     ` Jan Beulich
  2020-07-17  8:47       ` Oleksandr Andrushchenko
@ 2020-07-17 13:14       ` Bertrand Marquis
  2020-07-17 13:19         ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 13:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Roger Pau Monné



> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 16.07.2020 19:10, Rahul Singh wrote:
>> # Discovering PCI devices:
>> 
>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>> 
>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>> 
>> #define PHYSDEVOP_pci_device_add        25
>> struct physdev_pci_device_add {
>>    uint16_t seg;
>>    uint8_t bus;
>>    uint8_t devfn;
>>    uint32_t flags;
>>    struct {
>>    	uint8_t bus;
>>    	uint8_t devfn;
>>    } physfn;
>>    /*
>>    * Optional parameters array.
>>    * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>    */
>>    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>    };
>> 
>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>> 
>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
> 
> Have you had any thoughts about Dom0 re-arranging the bus numbering?
> This is, afaict, a still open issue on x86 as well.

no that’s not something we looked into. But in theory if this is done by Linux before Xen enumeration this will work. If a domain is trying to do this we have to look if we can somehow support this in VPCI but that is something we did not consider so far. 

> 
>> Limitations:
>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
> 
> I think this is a pretty severe limitation, as modern devices tend to
> not support pin based interrupts anymore.

Sorry this is not what we meant. We will add MSI support but as of now this is not implemented or designed. “Limitations” means currently not supported but we will work on it on a second step.

> 
>> # Emulated PCI device tree node in libxl:
>> 
>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> 
> I support Stefano's suggestion for this to be an optional thing, i.e.
> there to be no need for it when there are PCI devices assigned to the
> guest anyway. I also wonder about the pci_ prefix here - isn't
> vpci="ecam" as unambiguous?

This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
Regarding the naming, I agree. We will remove the pci_ prefix here. 
> 
>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>> 
>> Limitation:
>> * Only one PCI device tree node is supported as of now.
>> 
>> BAR value and IOMEM mapping:
>> 
>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>> 
>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>> For example:
>> 	Guest reserved IOMEM region:  0x04020000
>>    	Real physical IOMEM region:0x50000000
>>    	IOMEM size:128MB
>>    	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
> 
> This surely is planned to go away before the code hits upstream? The
> ranges really should be read out of the BARs, as I see the
> "limitations" section further down suggests, but it's not clear
> whether "limitations" are items that you plan to take care of before
> submitting your code for review.

definitely yes. As said before the limitations are in the RFC we will submit but we will work on that and remove those limitations before submitting the final code for review. 

Bertrand and Rahul

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:14       ` Bertrand Marquis
@ 2020-07-17 13:19         ` Jan Beulich
  2020-07-17 13:59           ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2020-07-17 13:19 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Roger Pau Monné

On 17.07.2020 15:14, Bertrand Marquis wrote:
>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>> On 16.07.2020 19:10, Rahul Singh wrote:
>>> # Emulated PCI device tree node in libxl:
>>>
>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>
>> I support Stefano's suggestion for this to be an optional thing, i.e.
>> there to be no need for it when there are PCI devices assigned to the
>> guest anyway. I also wonder about the pci_ prefix here - isn't
>> vpci="ecam" as unambiguous?
> 
> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 

I'm afraid I don't understand: When there are no PCI device that get
handed to a guest when it gets created, but it is supposed to be able
to have some assigned while already running, then we agree the option
is needed (afaict). When PCI devices get handed to the guest while it
gets constructed, where's the problem to infer this option from the
presence of PCI devices in the guest configuration?

Jan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 11:41             ` Oleksandr Andrushchenko
@ 2020-07-17 13:21               ` Bertrand Marquis
  0 siblings, 0 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 13:21 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Stefano Stabellini, Julien Grall, Julien Grall, Rahul Singh,
	xen-devel, nd, Roger Pau Monné



> On 17 Jul 2020, at 13:41, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com> wrote:
> 
> 
> On 7/17/20 2:26 PM, Julien Grall wrote:
>> 
>> 
>> On 17/07/2020 08:41, Oleksandr Andrushchenko wrote:
>>>>> We need to come up with something similar for dom0less too. It could be
>>>>> exactly the same thing (a list of BDFs as strings as a device tree
>>>>> property) or something else if we can come up with a better idea.
>>>> Fully agree.
>>>> Maybe a tree topology could allow more possibilities (like giving BAR values) in the future.
>>>>> 
>>>>>> # Emulated PCI device tree node in libxl:
>>>>>> 
>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>> 
>>>>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>>>> 
>>>>>> Limitation:
>>>>>> * Only one PCI device tree node is supported as of now.
>>>>> I think vpci="pci_ecam" should be optional: if pci=[ "PCI_SPEC_STRING",
>>>>> ...] is specififed, then vpci="pci_ecam" is implied.
>>>>> 
>>>>> vpci="pci_ecam" is only useful one day in the future when we want to be
>>>>> able to emulate other non-ecam host bridges. For now we could even skip
>>>>> it.
>>>> This would create a problem if xl is used to add a PCI device as we need the PCI node to be in the DTB when the guest is created.
>>>> I agree this is not needed but removing it might create more complexity in the code.
>>> 
>>> I would suggest we have it from day 0 as there are plenty of HW available which is not ECAM.
>>> 
>>> Having vpci allows other bridges to be supported
>> 
>> So I can understand why you would want to have a driver for non-ECAM host PCI controller. However, why would you want to emulate a non-ECAM PCI controller to a guest?
> Indeed. No need to emulate non-ECAM

If someone wants to implement something else then ECAM in the future, there will be nothing preventing it to be done.
But indeed I do not really see a need for that.

Cheers
Bertrand

>> 
>> Cheers,



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 11:16     ` Roger Pau Monné
@ 2020-07-17 13:22       ` Bertrand Marquis
  2020-07-17 13:29         ` Julien Grall
  2020-07-17 14:31         ` Roger Pau Monné
  0 siblings, 2 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 13:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall



> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> I've wrapped the email to 80 columns in order to make it easier to
> reply.
> 
> Thanks for doing this, I think the design is good, I have some
> questions below so that I understand the full picture.
> 
> On Thu, Jul 16, 2020 at 05:10:05PM +0000, Rahul Singh wrote:
>> Hello All,
>> 
>> Following up on discussion on PCI Passthrough support on ARM that we
>> had at the XEN summit, we are submitting a Review For Comment and a
>> design proposal for PCI passthrough support on ARM. Feel free to
>> give your feedback.
>> 
>> The followings describe the high-level design proposal of the PCI
>> passthrough support and how the different modules within the system
>> interacts with each other to assign a particular PCI device to the
>> guest.
>> 
>> # Title:
>> 
>> PCI devices passthrough on Arm design proposal
>> 
>> # Problem statement:
>> 
>> On ARM there in no support to assign a PCI device to a guest. PCI
>> device passthrough capability allows guests to have full access to
>> some PCI devices. PCI device passthrough allows PCI devices to
>> appear and behave as if they were physically attached to the guest
>> operating system and provide full isolation of the PCI devices.
>> 
>> Goal of this work is to also support Dom0Less configuration so the
>> PCI backend/frontend drivers used on x86 shall not be used on Arm.
>> It will use the existing VPCI concept from X86 and implement the
>> virtual PCI bus through IO emulation such that only assigned devices
>> are visible to the guest and guest can use the standard PCI
>> driver.
>> 
>> Only Dom0 and Xen will have access to the real PCI bus, guest will
>> have a direct access to the assigned device itself. IOMEM memory
>> will be mapped to the guest and interrupt will be redirected to the
>> guest. SMMU has to be configured correctly to have DMA
>> transaction.
>> 
>> ## Current state: Draft version
>> 
>> # Proposer(s): Rahul Singh, Bertrand Marquis
>> 
>> # Proposal:
>> 
>> This section will describe the different subsystem to support the
>> PCI device passthrough and how these subsystems interact with each
>> other to assign a device to the guest.
>> 
>> # PCI Terminology:
>> 
>> Host Bridge: Host bridge allows the PCI devices to talk to the rest
>> of the computer.  ECAM: ECAM (Enhanced Configuration Access
>> Mechanism) is a mechanism developed to allow PCIe to access
>> configuration space. The space available per function is 4KB.
>> 
>> # Discovering PCI Host Bridge in XEN:
>> 
>> In order to support the PCI passthrough XEN should be aware of all
>> the PCI host bridges available on the system and should be able to
>> access the PCI configuration space. ECAM configuration access is
>> supported as of now. XEN during boot will read the PCI device tree
>> node “reg” property and will map the ECAM space to the XEN memory
>> using the “ioremap_nocache ()” function.
> 
> What about ACPI? I think you should also mention the MMCFG table,
> which should contain the information about the ECAM region(s) (or at
> least that's how it works on x86). Just realized that you don't
> support ACPI ATM, so you can ignore this comment.

Yes for now we did not consider ACPI support.

> 
>> 
>> If there are more than one segment on the system, XEN will read the
>> “linux, pci-domain” property from the device tree node and configure
>> the host bridge segment number accordingly. All the PCI device tree
>> nodes should have the “linux,pci-domain” property so that there will
>> be no conflicts. During hardware domain boot Linux will also use the
>> same “linux,pci-domain” property and assign the domain number to the
>> host bridge.
> 
> So it's my understanding that the PCI domain (or segment) is just an
> abstract concept to differentiate all the Root Complex present on
> the system, but the host bridge itself it's not aware of the segment
> assigned to it in any way.
> 
> I'm not sure Xen and the hardware domain having matching segments is a
> requirement, if you use vPCI you can match the segment (from Xen's
> PoV) by just checking from which ECAM region the access has been
> performed.
> 
> The only reason to require matching segment values between Xen and the
> hardware domain is to allow using hypercalls against the PCI devices,
> ie: to be able to use hypercalls to assign a device to a domain from
> the hardware domain.
> 
> I have 0 understanding of DT or it's spec, but why does this have a
> 'linux,' prefix? The segment number is part of the PCI spec, and not
> something specific to Linux IMO.

This is exact that this is only needed for the hypercall when Dom0 is
doing the full enumeration and communicating the devices to Xen. 
On all other cases this can be deduced from the address of the access. 
Regarding the DT entry, this is not coming from us and this is already
defined this way in existing DTBs, we just reuse the existing entry. 

> 
>> 
>> When Dom0 tries to access the PCI config space of the device, XEN
>> will find the corresponding host bridge based on segment number and
>> access the corresponding config space assigned to that bridge.
>> 
>> Limitation:
>> * Only PCI ECAM configuration space access is supported.
>> * Device tree binding is supported as of now, ACPI is not supported.
>> * Need to port the PCI host bridge access code to XEN to access the
>>  configuration space (generic one works but lots of platforms will
>>  required  some specific code or quirks).
>> 
>> # Discovering PCI devices:
>> 
>> PCI-PCIe enumeration is a process of detecting devices connected to
>> its host. It is the responsibility of the hardware domain or boot
>> firmware to do the PCI enumeration and configure the BAR, PCI
>> capabilities, and MSI/MSI-X configuration.
>> 
>> PCI-PCIe enumeration in XEN is not feasible for the configuration
>> part as it would require a lot of code inside Xen which would
>> require a lot of maintenance. Added to this many platforms require
>> some quirks in that part of the PCI code which would greatly improve
>> Xen complexity. Once hardware domain enumerates the device then it
>> will communicate to XEN via the below hypercall.
>> 
>> #define PHYSDEVOP_pci_device_add        25 struct
>> physdev_pci_device_add {
>>    uint16_t seg;
>>    uint8_t bus;
>>    uint8_t devfn;
>>    uint32_t flags;
>>    struct {
>>        uint8_t bus;
>>        uint8_t devfn;
>>    } physfn;
>>    /*
>>     * Optional parameters array.
>>     * First element ([0]) is PXM domain associated with the device (if
>>     * XEN_PCI_DEV_PXM is set)
>>     */
>>    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>> };
>> 
>> As the hypercall argument has the PCI segment number, XEN will
>> access the PCI config space based on this segment number and find
>> the host-bridge corresponding to this segment number. At this stage
>> host bridge is fully initialized so there will be no issue to access
>> the config space.
>> 
>> XEN will add the PCI devices in the linked list maintain in XEN
>> using the function pci_add_device(). XEN will be aware of all the
>> PCI devices on the system and all the device will be added to the
>> hardware domain.
>> 
>> Limitations:
>> * When PCI devices are added to XEN, MSI capability is
>>  not initialized inside XEN and not supported as of now.
> 
> I assume you will mask such capability and will prevent the guest (or
> hardware domain) from interacting with it?

No we will actually implement that part but later. This is not supported in
the RFC that we will submit. 

> 
>> * ACS capability is disable for ARM as of now as after enabling it
>>  devices are not accessible.
>> * Dom0Less implementation will require to have the capacity inside Xen
>>  to discover the PCI devices (without depending on Dom0 to declare them
>>  to Xen).
> 
> I assume the firmware will properly initialize the host bridge and
> configure the resources for each device, so that Xen just has to walk
> the PCI space and find the devices.
> 
> TBH that would be my preferred method, because then you can get rid of
> the hypercall.
> 
> Is there anyway for Xen to know whether the host bridge is properly
> setup and thus the PCI bus can be scanned?
> 
> That way Arm could do something similar to x86, where Xen will scan
> the bus and discover devices, but you could still provide the
> hypercall in case the bus cannot be scanned by Xen (because it hasn't
> been setup).

That is definitely the idea to rely by default on a firmware doing this properly.
I am not sure wether a proper enumeration could be detected properly in all
cases so it would make sens to rely on Dom0 enumeration when a Xen
command line argument is passed as explained in one of Rahul’s mails.

> 
>> 
>> # Enable the existing x86 virtual PCI support for ARM:
>> 
>> The existing VPCI support available for X86 is adapted for Arm. When
>> the device is added to XEN via the hyper call
>> “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access
>> is added to the PCI device to emulate the PCI devices.
>> 
>> A MMIO trap handler for the PCI ECAM space is registered in XEN so
>> that when guest is trying to access the PCI config space, XEN will
>> trap the access and emulate read/write using the VPCI and not the
>> real PCI hardware.
>> 
>> Limitation:
>> * No handler is register for the MSI configuration.
> 
> But you need to mask MSI/MSI-X capabilities in the config space in
> order to prevent access from domains? (and by mask I mean remove from
> the list of capabilities and prevent reads/writes to that
> configuration space).
> 
> Note this is already implemented for x86, and I've tried to add arch_
> hooks for arch specific stuff so that it could be reused by Arm. But
> maybe this would require a different design document?

as said, we will handle MSI support in a separate document/step.

> 
>> * Only legacy interrupt is supported and tested as of now, MSI is not
>>  implemented and tested.
>> 
>> # Assign the device to the guest:
>> 
>> Assign the PCI device from the hardware domain to the guest is done
>> using the below guest config option. When xl tool create the domain,
>> PCI devices will be assigned to the guest VPCI bus.
>> 
>> pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>> 
>> Guest will be only able to access the assigned devices and see the
>> bridges. Guest will not be able to access or see the devices that
>> are no assigned to him.
>> 
>> Limitation:
>> * As of now all the bridges in the PCI bus are seen by
>>  the guest on the VPCI bus.
> 
> I don't think you need all of them, just the ones that are higher up
> on the hierarchy of the device you are trying to passthrough?
> 
> Which kind of access do guest have to PCI bridges config space?

For now the bridges are read only, no specific access is required by guests. 

> 
> This should be limited to read-only accesses in order to be safe.
> 
> Emulating a PCI bridge in Xen using vPCI shouldn't be that
> complicated, so you could likely replace the real bridges with
> emulated ones. Or even provide a fake topology to the guest using an
> emulated bridge.

Just showing all bridges and keeping the hardware topology is the simplest
solution for now. But maybe showing a different topology and only fake
bridges could make sense and be implemented in the future.

> 
>> 
>> # Emulated PCI device tree node in libxl:
>> 
>> Libxl is creating a virtual PCI device tree node in the device tree
>> to enable the guest OS to discover the virtual PCI during guest
>> boot. We introduced the new config option [vpci="pci_ecam"] for
>> guests. When this config option is enabled in a guest configuration,
>> a PCI device tree node will be created in the guest device tree.
>> 
>> A new area has been reserved in the arm guest physical map at which
>> the VPCI bus is declared in the device tree (reg and ranges
>> parameters of the node). A trap handler for the PCI ECAM access from
>> guest has been registered at the defined address and redirects
>> requests to the VPCI driver in Xen.
> 
> Can't you deduce the requirement of such DT node based on the presence
> of a 'pci=' option in the same config file?
> 
> Also I wouldn't discard that in the future you might want to use
> different emulators for different devices, so it might be helpful to
> introduce something like:
> 
> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
> 
> For the time being Arm will require backend=vpci for all the passed
> through devices, but I wouldn't rule out this changing in the future.

We need it for the case where no device is declared in the config file and the user
wants to add devices using xl later. In this case we must have the DT node for it
to work. 

Regarding possibles backend this could be added in the future if required. 

> 
>> Limitation:
>> * Only one PCI device tree node is supported as of now.
>> 
>> BAR value and IOMEM mapping:
>> 
>> Linux guest will do the PCI enumeration based on the area reserved
>> for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI
>> device is assigned to the guest, XEN will map the guest PCI IOMEM
>> region to the real physical IOMEM region only for the assigned
>> devices.
> 
> PCI IOMEM == BARs? Or are you referring to the ECAM access window?

Here by PCI IOMEM we mean the IOMEM spaces referred to by the BARs
of the PCI device

> 
>> As of now we have not modified the existing VPCI code to map the
>> guest PCI IOMEM region to the real physical IOMEM region. We used
>> the existing guest “iomem” config option to map the region.  For
>> example: Guest reserved IOMEM region:  0x04020000 Real physical
>> IOMEM region:0x50000000 IOMEM size:128MB iomem config will be:
>> iomem = ["0x50000,0x8000@0x4020"]
>> 
>> There is no need to map the ECAM space as XEN already have access to
>> the ECAM space and XEN will trap ECAM accesses from the guest and
>> will perform read/write on the VPCI bus.
>> 
>> IOMEM access will not be trapped and the guest will directly access
>> the IOMEM region of the assigned device via stage-2 translation.
>> 
>> In the same, we mapped the assigned devices IRQ to the guest using
>> below config options.  irqs= [ NUMBER, NUMBER, ...]
> 
> Are you providing this for the hardware domain also? Or are irqs
> fetched from the DT in that case?

This will only be used temporarily until we have proper support to do this
automatically when a device is assigned. Right now our current implementation
status requires the user to explicitely redirect the interrupts required by the PCI
devices assigned but in the final version this entry will not be needed.

Dom0 relies on the entries declared in the DT.

> 
>> Limitation:
>> * Need to avoid the “iomem” and “irq” guest config
>>  options and map the IOMEM region and IRQ at the same time when
>>  device is assigned to the guest using the “pci” guest config options
>>  when xl creates the domain.
>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
>>  address.
> 
> It was my understanding that you would identity map the BAR into the
> domU stage-2 translation, and that changes by the guest won't be
> allowed.

In fact this is not possible to do and we have to remap at a different address
because the guest physical mapping is fixed by Xen on Arm so we must follow
the same design otherwise this would only work if the BARs are pointing to an
address unused and on Juno this is for example conflicting with the guest
RAM address.

> 
>> * X86 mapping code should be ported on Arm so that the stage-2
>>  translation is adapted when the guest is doing a modification of the
>>  BAR registers values (to map the address requested by the guest for
>>  a specific IOMEM to the address actually contained in the real BAR
>>  register of the corresponding device).
> 
> I think the above means that you want to allow the guest to change the
> position of the BAR in the stage-2 translation _without_ allowing it
> to change the position of the BAR in the physical memory map, is that
> correct?

yes this is correct. This is not very complex and make it easier to use
unmodified guests as VPCI would behave as an hardware PCI.

Bertrand

> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17  8:47       ` Oleksandr Andrushchenko
@ 2020-07-17 13:28         ` Rahul Singh
  0 siblings, 0 replies; 62+ messages in thread
From: Rahul Singh @ 2020-07-17 13:28 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Stefano Stabellini, Julien Grall, Jan Beulich, xen-devel, nd,
	Roger Pau Monné



> On 17 Jul 2020, at 9:47 am, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
> 
> 
> On 7/17/20 11:10 AM, Jan Beulich wrote:
>> On 16.07.2020 19:10, Rahul Singh wrote:
>>> # Discovering PCI devices:
>>> 
>>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>> 
>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>> 
>>> #define PHYSDEVOP_pci_device_add        25
>>> struct physdev_pci_device_add {
>>>     uint16_t seg;
>>>     uint8_t bus;
>>>     uint8_t devfn;
>>>     uint32_t flags;
>>>     struct {
>>>     	uint8_t bus;
>>>     	uint8_t devfn;
>>>     } physfn;
>>>     /*
>>>     * Optional parameters array.
>>>     * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>>     */
>>>     uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>     };
>>> 
>>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>> 
>>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>> Have you had any thoughts about Dom0 re-arranging the bus numbering?
>> This is, afaict, a still open issue on x86 as well.
> 
> This can get even trickier as we may have PCI enumerated at boot time
> 
> by the firmware and then Dom0 may perform the enumeration differently.
> 
> So, Xen needs to be aware of what is going to be used as the source of the
> 
> enumeration data and be ready to re-build its internal structures in order
> 
> to be aligned with that entity: e.g. compare Dom0 and Dom0less use-cases
> 

The idea is that as soon as Xen has done his enumeration (it being on boot or after Dom0 signal), no domain will be able to modify the physical PCI bus anymore. 
- Rahul
>> 
>>> Limitations:
>>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
>> I think this is a pretty severe limitation, as modern devices tend to
>> not support pin based interrupts anymore.
>> 
>>> # Emulated PCI device tree node in libxl:
>>> 
>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>> I support Stefano's suggestion for this to be an optional thing, i.e.
>> there to be no need for it when there are PCI devices assigned to the
>> guest anyway. I also wonder about the pci_ prefix here - isn't
>> vpci="ecam" as unambiguous?
>> 
>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>> 
>>> Limitation:
>>> * Only one PCI device tree node is supported as of now.
>>> 
>>> BAR value and IOMEM mapping:
>>> 
>>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI	device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>> 
>>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>>> For example:
>>> 	Guest reserved IOMEM region:  0x04020000
>>>     	Real physical IOMEM region:0x50000000
>>>     	IOMEM size:128MB
>>>     	iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>> This surely is planned to go away before the code hits upstream? The
>> ranges really should be read out of the BARs, as I see the
>> "limitations" section further down suggests, but it's not clear
>> whether "limitations" are items that you plan to take care of before
>> submitting your code for review.
>> 
>> Jan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:22       ` Bertrand Marquis
@ 2020-07-17 13:29         ` Julien Grall
  2020-07-17 13:44           ` Bertrand Marquis
  2020-07-17 14:31         ` Roger Pau Monné
  1 sibling, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-17 13:29 UTC (permalink / raw)
  To: Bertrand Marquis, Roger Pau Monné
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall



On 17/07/2020 14:22, Bertrand Marquis wrote:
>>> # Emulated PCI device tree node in libxl:
>>>
>>> Libxl is creating a virtual PCI device tree node in the device tree
>>> to enable the guest OS to discover the virtual PCI during guest
>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>> guests. When this config option is enabled in a guest configuration,
>>> a PCI device tree node will be created in the guest device tree.
>>>
>>> A new area has been reserved in the arm guest physical map at which
>>> the VPCI bus is declared in the device tree (reg and ranges
>>> parameters of the node). A trap handler for the PCI ECAM access from
>>> guest has been registered at the defined address and redirects
>>> requests to the VPCI driver in Xen.
>>
>> Can't you deduce the requirement of such DT node based on the presence
>> of a 'pci=' option in the same config file?
>>
>> Also I wouldn't discard that in the future you might want to use
>> different emulators for different devices, so it might be helpful to
>> introduce something like:
>>
>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]

I like this idea :).

>>
>> For the time being Arm will require backend=vpci for all the passed
>> through devices, but I wouldn't rule out this changing in the future.
> 
> We need it for the case where no device is declared in the config file and the user
> wants to add devices using xl later. In this case we must have the DT node for it
> to work.

Are you suggesting that you plan to implement PCI hotplug?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:29         ` Julien Grall
@ 2020-07-17 13:44           ` Bertrand Marquis
  2020-07-17 13:49             ` Julien Grall
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 13:44 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall



> On 17 Jul 2020, at 15:29, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 17/07/2020 14:22, Bertrand Marquis wrote:
>>>> # Emulated PCI device tree node in libxl:
>>>> 
>>>> Libxl is creating a virtual PCI device tree node in the device tree
>>>> to enable the guest OS to discover the virtual PCI during guest
>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>>> guests. When this config option is enabled in a guest configuration,
>>>> a PCI device tree node will be created in the guest device tree.
>>>> 
>>>> A new area has been reserved in the arm guest physical map at which
>>>> the VPCI bus is declared in the device tree (reg and ranges
>>>> parameters of the node). A trap handler for the PCI ECAM access from
>>>> guest has been registered at the defined address and redirects
>>>> requests to the VPCI driver in Xen.
>>> 
>>> Can't you deduce the requirement of such DT node based on the presence
>>> of a 'pci=' option in the same config file?
>>> 
>>> Also I wouldn't discard that in the future you might want to use
>>> different emulators for different devices, so it might be helpful to
>>> introduce something like:
>>> 
>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
> 
> I like this idea :).
> 
>>> 
>>> For the time being Arm will require backend=vpci for all the passed
>>> through devices, but I wouldn't rule out this changing in the future.
>> We need it for the case where no device is declared in the config file and the user
>> wants to add devices using xl later. In this case we must have the DT node for it
>> to work.
> 
> Are you suggesting that you plan to implement PCI hotplug?

No this is not in the current plan but we should not prevent this to be supported some day :-)

Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:44           ` Bertrand Marquis
@ 2020-07-17 13:49             ` Julien Grall
  2020-07-17 14:01               ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-17 13:49 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall



On 17/07/2020 14:44, Bertrand Marquis wrote:
> 
> 
>> On 17 Jul 2020, at 15:29, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 17/07/2020 14:22, Bertrand Marquis wrote:
>>>>> # Emulated PCI device tree node in libxl:
>>>>>
>>>>> Libxl is creating a virtual PCI device tree node in the device tree
>>>>> to enable the guest OS to discover the virtual PCI during guest
>>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>>>> guests. When this config option is enabled in a guest configuration,
>>>>> a PCI device tree node will be created in the guest device tree.
>>>>>
>>>>> A new area has been reserved in the arm guest physical map at which
>>>>> the VPCI bus is declared in the device tree (reg and ranges
>>>>> parameters of the node). A trap handler for the PCI ECAM access from
>>>>> guest has been registered at the defined address and redirects
>>>>> requests to the VPCI driver in Xen.
>>>>
>>>> Can't you deduce the requirement of such DT node based on the presence
>>>> of a 'pci=' option in the same config file?
>>>>
>>>> Also I wouldn't discard that in the future you might want to use
>>>> different emulators for different devices, so it might be helpful to
>>>> introduce something like:
>>>>
>>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
>>
>> I like this idea :).
>>
>>>>
>>>> For the time being Arm will require backend=vpci for all the passed
>>>> through devices, but I wouldn't rule out this changing in the future.
>>> We need it for the case where no device is declared in the config file and the user
>>> wants to add devices using xl later. In this case we must have the DT node for it
>>> to work.
>>
>> Are you suggesting that you plan to implement PCI hotplug?
> 
> No this is not in the current plan but we should not prevent this to be supported some day :-)

I agree that we don't want to prevent extension. But I fail to see why 
this would be an issue if we don't introduce the option "vcpi" today.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
       [not found]   ` <8ac91a1b-e6b3-0f2b-0f23-d7aff100936d@xen.org>
@ 2020-07-17 13:50     ` Julien Grall
  2020-07-17 13:59       ` Jan Beulich
  2020-07-17 14:47       ` Bertrand Marquis
  0 siblings, 2 replies; 62+ messages in thread
From: Julien Grall @ 2020-07-17 13:50 UTC (permalink / raw)
  To: Rahul Singh, xen-devel
  Cc: nd, Stefano Stabellini, Roger Pau Monné, Julien Grall

(Resending to the correct ML)

On 17/07/2020 14:23, Julien Grall wrote:
> 
> 
> On 16/07/2020 18:02, Rahul Singh wrote:
>> Hello All,
> 
> Hi,
> 
>> Following up on discussion on PCI Passthrough support on ARM that we 
>> had at the XEN summit, we are submitting a Review For Comment and a 
>> design proposal for PCI passthrough support on ARM. Feel free to give 
>> your feedback.
>>
>> The followings describe the high-level design proposal of the PCI 
>> passthrough support and how the different modules within the system 
>> interacts with each other to assign a particular PCI device to the guest.
> 
> There was an attempt a few years ago to get a design document for PCI 
> passthrough (see [1]). I would suggest to have a look at the thread as I 
> think it would help to have an overview of all the components (e.g MSI 
> controllers...) even if they will not be implemented at the beginning.
> 
>>
>> # Title:
>>
>> PCI devices passthrough on Arm design proposal
>>
>> # Problem statement:
>>
>> On ARM there in no support to assign a PCI device to a guest. PCI 
>> device passthrough capability allows guests to have full access to 
>> some PCI devices. PCI device passthrough allows PCI devices to appear 
>> and behave as if they were physically attached to the guest operating 
>> system and provide full isolation of the PCI devices.
>>
>> Goal of this work is to also support Dom0Less configuration so the PCI 
>> backend/frontend drivers used on x86 shall not be used on Arm. It will 
>> use the existing VPCI concept from X86 and implement the virtual PCI 
>> bus through IO emulation​ such that only assigned devices are visible​ 
>> to the guest and guest can use the standard PCI driver.
>>
>> Only Dom0 and Xen will have access to the real PCI bus,​ guest will 
>> have a direct access to the assigned device itself​. IOMEM memory will 
>> be mapped to the guest ​and interrupt will be redirected to the guest. 
>> SMMU has to be configured correctly to have DMA transaction.
>>
>> ## Current state: Draft version
>>
>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>
>> # Proposal:
>>
>> This section will describe the different subsystem to support the PCI 
>> device passthrough and how these subsystems interact with each other 
>> to assign a device to the guest.
>>
>> # PCI Terminology:
>>
>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of 
>> the computer.
>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism 
>> developed to allow PCIe to access configuration space. The space 
>> available per function is 4KB.
>>
>> # Discovering PCI Host Bridge in XEN:
>>
>> In order to support the PCI passthrough XEN should be aware of all the 
>> PCI host bridges available on the system and should be able to access 
>> the PCI configuration space. ECAM configuration access is supported as 
>> of now. XEN during boot will read the PCI device tree node “reg” 
>> property and will map the ECAM space to the XEN memory using the 
>> “ioremap_nocache ()” function.
>>
>> If there are more than one segment on the system, XEN will read the 
>> “linux, pci-domain” property from the device tree node and configure  
>> the host bridge segment number accordingly. All the PCI device tree 
>> nodes should have the “linux,pci-domain” property so that there will 
>> be no conflicts. During hardware domain boot Linux will also use the 
>> same “linux,pci-domain” property and assign the domain number to the 
>> host bridge.
> 
> AFAICT, "linux,pci-domain" is not a mandatory option and mostly tie to 
> Linux. What would happen with other OS?
> 
> But I would rather avoid trying to mandate a user to modifying his/her 
> device-tree in order to support PCI passthrough. It would be better to 
> consider Xen to assign the number if it is not present.
> 
>>
>> When Dom0 tries to access the PCI config space of the device, XEN will 
>> find the corresponding host bridge based on segment number and access 
>> the corresponding config space assigned to that bridge.
>>
>> Limitation:
>> * Only PCI ECAM configuration space access is supported.
>> * Device tree binding is supported as of now, ACPI is not supported.
> 
> We want to differentiate the high-level design from the actual 
> implementation. While you may not yet implement ACPI, we still need to 
> keep it in mind to avoid incompatibilities in long term.
> 
>> * Need to port the PCI host bridge access code to XEN to access the 
>> configuration space (generic one works but lots of platforms will 
>> required  some specific code or quirks).
>>
>> # Discovering PCI devices:
>>
>> PCI-PCIe enumeration is a process of detecting devices connected to 
>> its host. It is the responsibility of the hardware domain or boot 
>> firmware to do the PCI enumeration and configure the BAR, PCI 
>> capabilities, and MSI/MSI-X configuration.
>>
>> PCI-PCIe enumeration in XEN is not feasible for the configuration part 
>> as it would require a lot of code inside Xen which would require a lot 
>> of maintenance. Added to this many platforms require some quirks in 
>> that part of the PCI code which would greatly improve Xen complexity. 
>> Once hardware domain enumerates the device then it will communicate to 
>> XEN via the below hypercall.
>>
>> #define PHYSDEVOP_pci_device_add        25
>> struct physdev_pci_device_add {
>>      uint16_t seg;
>>      uint8_t bus;
>>      uint8_t devfn;
>>      uint32_t flags;
>>      struct {
>>          uint8_t bus;
>>          uint8_t devfn;
>>      } physfn;
>>      /*
>>      * Optional parameters array.
>>      * First element ([0]) is PXM domain associated with the device 
>> (if * XEN_PCI_DEV_PXM is set)
>>      */
>>      uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>      };
>>
>> As the hypercall argument has the PCI segment number, XEN will access 
>> the PCI config space based on this segment number and find the 
>> host-bridge corresponding to this segment number. At this stage host 
>> bridge is fully initialized so there will be no issue to access the 
>> config space.
>>
>> XEN will add the PCI devices in the linked list maintain in XEN using 
>> the function pci_add_device(). XEN will be aware of all the PCI 
>> devices on the system and all the device will be added to the hardware 
>> domain.
> I understand this what x86 does. However, may I ask why we would want it 
> for Arm?
> 
>>
>> Limitations:
>> * When PCI devices are added to XEN, MSI capability is not initialized 
>> inside XEN and not supported as of now.
>> * ACS capability is disable for ARM as of now as after enabling it 
>> devices are not accessible.
> 
> I am not sure to understand this. Can you expand?
> 
>> * Dom0Less implementation will require to have the capacity inside Xen 
>> to discover the PCI devices (without depending on Dom0 to declare them 
>> to Xen).
>>
>> # Enable the existing x86 virtual PCI support for ARM:
>>
>> The existing VPCI support available for X86 is adapted for Arm. When 
>> the device is added to XEN via the hyper call 
>> “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access 
>> is added to the PCI device to emulate the PCI devices.
>>
>> A MMIO trap handler for the PCI ECAM space is registered in XEN so 
>> that when guest is trying to access the PCI config space, XEN will 
>> trap the access and emulate read/write using the VPCI and not the real 
>> PCI hardware.
>>
>> Limitation:
>> * No handler is register for the MSI configuration.
>> * Only legacy interrupt is supported and tested as of now, MSI is not 
>> implemented and tested.
> 
> IIRC, legacy interrupt may be shared between two PCI devices. How do you 
> plan to handle this on Arm?
> 
>>
>> # Assign the device to the guest:
>>
>> Assign the PCI device from the hardware domain to the guest is done 
>> using the below guest config option. When xl tool create the domain, 
>> PCI devices will be assigned to the guest VPCI bus.
> 
> Above, you suggest that device will be assigned to the hardware domain 
> at boot. I am assuming this also means that all the interrupts/MMIOs 
> will be routed/mapped, is that correct?
> 
> If so, can you provide a rough sketch how assign/deassign will work?
> 
>>     pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>
>> Guest will be only able to access the assigned devices and see the 
>> bridges. Guest will not be able to access or see the devices that are 
>> no assigned to him.
>>
>> Limitation:
>> * As of now all the bridges in the PCI bus are seen by the guest on 
>> the VPCI bus.
> 
> Why do you want to expose all the bridges to a guest? Does this mean 
> that the BDF should always match between the host and the guest?
> 
>>
>> # Emulated PCI device tree node in libxl:
>>
>> Libxl is creating a virtual PCI device tree node in the device tree to 
>> enable the guest OS to discover the virtual PCI during guest boot. We 
>> introduced the new config option [vpci="pci_ecam"] for guests. When 
>> this config option is enabled in a guest configuration, a PCI device 
>> tree node will be created in the guest device tree.
>>
>> A new area has been reserved in the arm guest physical map at which 
>> the VPCI bus is declared in the device tree (reg and ranges parameters 
>> of the node). A trap handler for the PCI ECAM access from guest has 
>> been registered at the defined address and redirects requests to the 
>> VPCI driver in Xen.
>>
>> Limitation:
>> * Only one PCI device tree node is supported as of now.
>>
>> BAR value and IOMEM mapping:
>>
>> Linux guest will do the PCI enumeration based on the area reserved for 
>> ECAM and IOMEM ranges in the VPCI device tree node. Once PCI    device 
>> is assigned to the guest, XEN will map the guest PCI IOMEM region to 
>> the real physical IOMEM region only for the assigned devices.
>>
>> As of now we have not modified the existing VPCI code to map the guest 
>> PCI IOMEM region to the real physical IOMEM region. We used the 
>> existing guest “iomem” config option to map the region.
>> For example:
>>     Guest reserved IOMEM region:  0x04020000
>>          Real physical IOMEM region:0x50000000
>>          IOMEM size:128MB
>>          iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>>
>> There is no need to map the ECAM space as XEN already have access to 
>> the ECAM space and XEN will trap ECAM accesses from the guest and will 
>> perform read/write on the VPCI bus.
>>
>> IOMEM access will not be trapped and the guest will directly access 
>> the IOMEM region of the assigned device via stage-2 translation.
>>
>> In the same, we mapped the assigned devices IRQ to the guest using 
>> below config options.
>>     irqs= [ NUMBER, NUMBER, ...]
>>
>> Limitation:
>> * Need to avoid the “iomem” and “irq” guest config options and map the 
>> IOMEM region and IRQ at the same time when device is assigned to the 
>> guest using the “pci” guest config options when xl creates the domain.
>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped 
>> address.
>> * X86 mapping code should be ported on Arm so that the stage-2 
>> translation is adapted when the guest is doing a modification of the 
>> BAR registers values (to map the address requested by the guest for a 
>> specific IOMEM to the address actually contained in the real BAR 
>> register of the corresponding device).
>>
>> # SMMU configuration for guest:
>>
>> When assigning PCI devices to a guest, the SMMU configuration should 
>> be updated to remove access to the hardware domain memory and add
>> configuration to have access to the guest memory with the proper 
>> address translation so that the device can do DMA operations from and 
>> to the guest memory only.
> 
> There are a few more questions to answer here:
>     - When a guest is destroyed, who will be the owner of the PCI 
> devices? Depending on the answer, how do you make sure the device is 
> quiescent?
>     - Is there any memory access that can bypassed the IOMMU (e.g 
> doorbell)?
> 
> Cheers,
> 
> [1] 
> https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg02520.html
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 13:50     ` Julien Grall
@ 2020-07-17 13:59       ` Jan Beulich
  2020-07-17 14:12         ` Julien Grall
  2020-07-17 14:47       ` Bertrand Marquis
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2020-07-17 13:59 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall

On 17.07.2020 15:50, Julien Grall wrote:
> (Resending to the correct ML)
> On 17/07/2020 14:23, Julien Grall wrote:
>> On 16/07/2020 18:02, Rahul Singh wrote:
>>> # Discovering PCI devices:
>>>
>>> PCI-PCIe enumeration is a process of detecting devices connected to 
>>> its host. It is the responsibility of the hardware domain or boot 
>>> firmware to do the PCI enumeration and configure the BAR, PCI 
>>> capabilities, and MSI/MSI-X configuration.
>>>
>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part 
>>> as it would require a lot of code inside Xen which would require a lot 
>>> of maintenance. Added to this many platforms require some quirks in 
>>> that part of the PCI code which would greatly improve Xen complexity. 
>>> Once hardware domain enumerates the device then it will communicate to 
>>> XEN via the below hypercall.
>>>
>>> #define PHYSDEVOP_pci_device_add        25
>>> struct physdev_pci_device_add {
>>>      uint16_t seg;
>>>      uint8_t bus;
>>>      uint8_t devfn;
>>>      uint32_t flags;
>>>      struct {
>>>          uint8_t bus;
>>>          uint8_t devfn;
>>>      } physfn;
>>>      /*
>>>      * Optional parameters array.
>>>      * First element ([0]) is PXM domain associated with the device 
>>> (if * XEN_PCI_DEV_PXM is set)
>>>      */
>>>      uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>      };
>>>
>>> As the hypercall argument has the PCI segment number, XEN will access 
>>> the PCI config space based on this segment number and find the 
>>> host-bridge corresponding to this segment number. At this stage host 
>>> bridge is fully initialized so there will be no issue to access the 
>>> config space.
>>>
>>> XEN will add the PCI devices in the linked list maintain in XEN using 
>>> the function pci_add_device(). XEN will be aware of all the PCI 
>>> devices on the system and all the device will be added to the hardware 
>>> domain.
>> I understand this what x86 does. However, may I ask why we would want it 
>> for Arm?

Isn't it the normal thing to follow what there is, and instead provide
reasons why a different approach is to be followed? Personally I'd much
prefer if we didn't have two fundamentally different PCI implementations
in the tree. Perhaps some of what Arm wants or needs can actually
benefit x86 as well, but this requires sufficiently much code sharing
then.

Jan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:19         ` Jan Beulich
@ 2020-07-17 13:59           ` Bertrand Marquis
  2020-07-17 14:06             ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 13:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Roger Pau Monné



> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>> # Emulated PCI device tree node in libxl:
>>>> 
>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>> 
>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>> there to be no need for it when there are PCI devices assigned to the
>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>> vpci="ecam" as unambiguous?
>> 
>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
> 
> I'm afraid I don't understand: When there are no PCI device that get
> handed to a guest when it gets created, but it is supposed to be able
> to have some assigned while already running, then we agree the option
> is needed (afaict). When PCI devices get handed to the guest while it
> gets constructed, where's the problem to infer this option from the
> presence of PCI devices in the guest configuration?

If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
If we do not have the vpci parameter in the configuration this use case will not work anymore.

@julien: in fact this can be considered as hotplug from guest point of view and we do support this :-)
We will not support PCI hotplug for hardware hotplug definitely (ie adding in runtime a new device on PCI).

Bertrand

> 
> Jan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:49             ` Julien Grall
@ 2020-07-17 14:01               ` Bertrand Marquis
  0 siblings, 0 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 14:01 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall



> On 17 Jul 2020, at 15:49, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 17/07/2020 14:44, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 15:29, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 17/07/2020 14:22, Bertrand Marquis wrote:
>>>>>> # Emulated PCI device tree node in libxl:
>>>>>> 
>>>>>> Libxl is creating a virtual PCI device tree node in the device tree
>>>>>> to enable the guest OS to discover the virtual PCI during guest
>>>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>>>>> guests. When this config option is enabled in a guest configuration,
>>>>>> a PCI device tree node will be created in the guest device tree.
>>>>>> 
>>>>>> A new area has been reserved in the arm guest physical map at which
>>>>>> the VPCI bus is declared in the device tree (reg and ranges
>>>>>> parameters of the node). A trap handler for the PCI ECAM access from
>>>>>> guest has been registered at the defined address and redirects
>>>>>> requests to the VPCI driver in Xen.
>>>>> 
>>>>> Can't you deduce the requirement of such DT node based on the presence
>>>>> of a 'pci=' option in the same config file?
>>>>> 
>>>>> Also I wouldn't discard that in the future you might want to use
>>>>> different emulators for different devices, so it might be helpful to
>>>>> introduce something like:
>>>>> 
>>>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
>>> 
>>> I like this idea :).
>>> 
>>>>> 
>>>>> For the time being Arm will require backend=vpci for all the passed
>>>>> through devices, but I wouldn't rule out this changing in the future.
>>>> We need it for the case where no device is declared in the config file and the user
>>>> wants to add devices using xl later. In this case we must have the DT node for it
>>>> to work.
>>> 
>>> Are you suggesting that you plan to implement PCI hotplug?
>> No this is not in the current plan but we should not prevent this to be supported some day :-)
> 
> I agree that we don't want to prevent extension. But I fail to see why this would be an issue if we don't introduce the option "vcpi" today.

I answered that in parallel while answering to Jan.
This is needed to have no PCI device assigned when starting the guest and assign them later using xl pci-attach

Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:59           ` Bertrand Marquis
@ 2020-07-17 14:06             ` Jan Beulich
  2020-07-17 14:34               ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2020-07-17 14:06 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Roger Pau Monné

On 17.07.2020 15:59, Bertrand Marquis wrote:
> 
> 
>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>> # Emulated PCI device tree node in libxl:
>>>>>
>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>
>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>> there to be no need for it when there are PCI devices assigned to the
>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>> vpci="ecam" as unambiguous?
>>>
>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
>>
>> I'm afraid I don't understand: When there are no PCI device that get
>> handed to a guest when it gets created, but it is supposed to be able
>> to have some assigned while already running, then we agree the option
>> is needed (afaict). When PCI devices get handed to the guest while it
>> gets constructed, where's the problem to infer this option from the
>> presence of PCI devices in the guest configuration?
> 
> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> If we do not have the vpci parameter in the configuration this use case will not work anymore.

That's what everyone looks to agree with. Yet why is the parameter needed
when there _are_ PCI devices anyway? That's the "optional" that Stefano
was suggesting, aiui.

Jan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 13:59       ` Jan Beulich
@ 2020-07-17 14:12         ` Julien Grall
  2020-07-17 14:23           ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-17 14:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall

Hi,

On 17/07/2020 14:59, Jan Beulich wrote:
> On 17.07.2020 15:50, Julien Grall wrote:
>> (Resending to the correct ML)
>> On 17/07/2020 14:23, Julien Grall wrote:
>>> On 16/07/2020 18:02, Rahul Singh wrote:
>>>> # Discovering PCI devices:
>>>>
>>>> PCI-PCIe enumeration is a process of detecting devices connected to
>>>> its host. It is the responsibility of the hardware domain or boot
>>>> firmware to do the PCI enumeration and configure the BAR, PCI
>>>> capabilities, and MSI/MSI-X configuration.
>>>>
>>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part
>>>> as it would require a lot of code inside Xen which would require a lot
>>>> of maintenance. Added to this many platforms require some quirks in
>>>> that part of the PCI code which would greatly improve Xen complexity.
>>>> Once hardware domain enumerates the device then it will communicate to
>>>> XEN via the below hypercall.
>>>>
>>>> #define PHYSDEVOP_pci_device_add        25
>>>> struct physdev_pci_device_add {
>>>>       uint16_t seg;
>>>>       uint8_t bus;
>>>>       uint8_t devfn;
>>>>       uint32_t flags;
>>>>       struct {
>>>>           uint8_t bus;
>>>>           uint8_t devfn;
>>>>       } physfn;
>>>>       /*
>>>>       * Optional parameters array.
>>>>       * First element ([0]) is PXM domain associated with the device
>>>> (if * XEN_PCI_DEV_PXM is set)
>>>>       */
>>>>       uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>>       };
>>>>
>>>> As the hypercall argument has the PCI segment number, XEN will access
>>>> the PCI config space based on this segment number and find the
>>>> host-bridge corresponding to this segment number. At this stage host
>>>> bridge is fully initialized so there will be no issue to access the
>>>> config space.
>>>>
>>>> XEN will add the PCI devices in the linked list maintain in XEN using
>>>> the function pci_add_device(). XEN will be aware of all the PCI
>>>> devices on the system and all the device will be added to the hardware
>>>> domain.
>>> I understand this what x86 does. However, may I ask why we would want it
>>> for Arm?
> 
> Isn't it the normal thing to follow what there is, and instead provide
> reasons why a different approach is to be followed?

Not all the decision on x86 have been great and this is the opportunity 
to make it better rather than blindly follow. For instance, platform 
devices were are not assigned (back) to dom0 by default. Thanks to this 
decision, we were not affected by XSA-306.

> Personally I'd much
> prefer if we didn't have two fundamentally different PCI implementations
> in the tree. Perhaps some of what Arm wants or needs can actually
> benefit x86 as well, but this requires sufficiently much code sharing
> then.

Well, it would be nice to have similar implementations. But at the same 
time, we have different constraint. For instance, dom0 may disappear in 
the future on Arm.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 14:12         ` Julien Grall
@ 2020-07-17 14:23           ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2020-07-17 14:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall

On 17.07.2020 16:12, Julien Grall wrote:
> On 17/07/2020 14:59, Jan Beulich wrote:
>> Personally I'd much
>> prefer if we didn't have two fundamentally different PCI implementations
>> in the tree. Perhaps some of what Arm wants or needs can actually
>> benefit x86 as well, but this requires sufficiently much code sharing
>> then.
> 
> Well, it would be nice to have similar implementations. But at the same 
> time, we have different constraint. For instance, dom0 may disappear in 
> the future on Arm.

And becoming independent of Dom0 in this regard would be a benefit to
x86 as well, I think, irrespective of whether dom0less is to become a
thing there, too.

Jan


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 13:22       ` Bertrand Marquis
  2020-07-17 13:29         ` Julien Grall
@ 2020-07-17 14:31         ` Roger Pau Monné
  2020-07-17 15:21           ` Bertrand Marquis
  1 sibling, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 14:31 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall

On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote:
> 
> 
> > On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > I've wrapped the email to 80 columns in order to make it easier to
> > reply.
> > 
> > Thanks for doing this, I think the design is good, I have some
> > questions below so that I understand the full picture.
> > 
> > On Thu, Jul 16, 2020 at 05:10:05PM +0000, Rahul Singh wrote:
> >> Hello All,
> >> 
> >> Following up on discussion on PCI Passthrough support on ARM that we
> >> had at the XEN summit, we are submitting a Review For Comment and a
> >> design proposal for PCI passthrough support on ARM. Feel free to
> >> give your feedback.
> >> 
> >> The followings describe the high-level design proposal of the PCI
> >> passthrough support and how the different modules within the system
> >> interacts with each other to assign a particular PCI device to the
> >> guest.
> >> 
> >> # Title:
> >> 
> >> PCI devices passthrough on Arm design proposal
> >> 
> >> # Problem statement:
> >> 
> >> On ARM there in no support to assign a PCI device to a guest. PCI
> >> device passthrough capability allows guests to have full access to
> >> some PCI devices. PCI device passthrough allows PCI devices to
> >> appear and behave as if they were physically attached to the guest
> >> operating system and provide full isolation of the PCI devices.
> >> 
> >> Goal of this work is to also support Dom0Less configuration so the
> >> PCI backend/frontend drivers used on x86 shall not be used on Arm.
> >> It will use the existing VPCI concept from X86 and implement the
> >> virtual PCI bus through IO emulation such that only assigned devices
> >> are visible to the guest and guest can use the standard PCI
> >> driver.
> >> 
> >> Only Dom0 and Xen will have access to the real PCI bus, guest will
> >> have a direct access to the assigned device itself. IOMEM memory
> >> will be mapped to the guest and interrupt will be redirected to the
> >> guest. SMMU has to be configured correctly to have DMA
> >> transaction.
> >> 
> >> ## Current state: Draft version
> >> 
> >> # Proposer(s): Rahul Singh, Bertrand Marquis
> >> 
> >> # Proposal:
> >> 
> >> This section will describe the different subsystem to support the
> >> PCI device passthrough and how these subsystems interact with each
> >> other to assign a device to the guest.
> >> 
> >> # PCI Terminology:
> >> 
> >> Host Bridge: Host bridge allows the PCI devices to talk to the rest
> >> of the computer.  ECAM: ECAM (Enhanced Configuration Access
> >> Mechanism) is a mechanism developed to allow PCIe to access
> >> configuration space. The space available per function is 4KB.
> >> 
> >> # Discovering PCI Host Bridge in XEN:
> >> 
> >> In order to support the PCI passthrough XEN should be aware of all
> >> the PCI host bridges available on the system and should be able to
> >> access the PCI configuration space. ECAM configuration access is
> >> supported as of now. XEN during boot will read the PCI device tree
> >> node “reg” property and will map the ECAM space to the XEN memory
> >> using the “ioremap_nocache ()” function.
> > 
> > What about ACPI? I think you should also mention the MMCFG table,
> > which should contain the information about the ECAM region(s) (or at
> > least that's how it works on x86). Just realized that you don't
> > support ACPI ATM, so you can ignore this comment.
> 
> Yes for now we did not consider ACPI support.

I have 0 knowledge of ACPI on Arm, but I would assume it's also using
the MCFG table in order to report ECAM regions to the OSPM. This is a
static table that's very simple to parse, and it contains the ECAM
IOMEM area and the segment assigned to that ECAM region.

This is better than DT because ACPI already assigns segment numbers to
each ECAM region.

Even if not currently supported in the code implemented so far
describing the plan for it's implementation here seems fine IMO, as
it's going to be slightly different from what you need to do when
using DT.

> > 
> >> 
> >> If there are more than one segment on the system, XEN will read the
> >> “linux, pci-domain” property from the device tree node and configure
> >> the host bridge segment number accordingly. All the PCI device tree
> >> nodes should have the “linux,pci-domain” property so that there will
> >> be no conflicts. During hardware domain boot Linux will also use the
> >> same “linux,pci-domain” property and assign the domain number to the
> >> host bridge.
> > 
> > So it's my understanding that the PCI domain (or segment) is just an
> > abstract concept to differentiate all the Root Complex present on
> > the system, but the host bridge itself it's not aware of the segment
> > assigned to it in any way.
> > 
> > I'm not sure Xen and the hardware domain having matching segments is a
> > requirement, if you use vPCI you can match the segment (from Xen's
> > PoV) by just checking from which ECAM region the access has been
> > performed.
> > 
> > The only reason to require matching segment values between Xen and the
> > hardware domain is to allow using hypercalls against the PCI devices,
> > ie: to be able to use hypercalls to assign a device to a domain from
> > the hardware domain.
> > 
> > I have 0 understanding of DT or it's spec, but why does this have a
> > 'linux,' prefix? The segment number is part of the PCI spec, and not
> > something specific to Linux IMO.
> 
> This is exact that this is only needed for the hypercall when Dom0 is
> doing the full enumeration and communicating the devices to Xen. 
> On all other cases this can be deduced from the address of the access.

You also need the SBDF nomenclature in order to assign deices to
guests from the control domain, so at least there needs to be some
consensus from the hardware domain and Xen on the segment numbering in
that regard.

Same applies to dom0less mode, there needs to be some consensus about
the segment numbers used, so Xen can identify the devices assigned to
each guests without confusion.

> Regarding the DT entry, this is not coming from us and this is already
> defined this way in existing DTBs, we just reuse the existing entry. 

Is it possible to standardize the property and drop the linux prefix?

> > 
> >> 
> >> When Dom0 tries to access the PCI config space of the device, XEN
> >> will find the corresponding host bridge based on segment number and
> >> access the corresponding config space assigned to that bridge.
> >> 
> >> Limitation:
> >> * Only PCI ECAM configuration space access is supported.
> >> * Device tree binding is supported as of now, ACPI is not supported.
> >> * Need to port the PCI host bridge access code to XEN to access the
> >>  configuration space (generic one works but lots of platforms will
> >>  required  some specific code or quirks).
> >> 
> >> # Discovering PCI devices:
> >> 
> >> PCI-PCIe enumeration is a process of detecting devices connected to
> >> its host. It is the responsibility of the hardware domain or boot
> >> firmware to do the PCI enumeration and configure the BAR, PCI
> >> capabilities, and MSI/MSI-X configuration.
> >> 
> >> PCI-PCIe enumeration in XEN is not feasible for the configuration
> >> part as it would require a lot of code inside Xen which would
> >> require a lot of maintenance. Added to this many platforms require
> >> some quirks in that part of the PCI code which would greatly improve
> >> Xen complexity. Once hardware domain enumerates the device then it
> >> will communicate to XEN via the below hypercall.
> >> 
> >> #define PHYSDEVOP_pci_device_add        25 struct
> >> physdev_pci_device_add {
> >>    uint16_t seg;
> >>    uint8_t bus;
> >>    uint8_t devfn;
> >>    uint32_t flags;
> >>    struct {
> >>        uint8_t bus;
> >>        uint8_t devfn;
> >>    } physfn;
> >>    /*
> >>     * Optional parameters array.
> >>     * First element ([0]) is PXM domain associated with the device (if
> >>     * XEN_PCI_DEV_PXM is set)
> >>     */
> >>    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
> >> };
> >> 
> >> As the hypercall argument has the PCI segment number, XEN will
> >> access the PCI config space based on this segment number and find
> >> the host-bridge corresponding to this segment number. At this stage
> >> host bridge is fully initialized so there will be no issue to access
> >> the config space.
> >> 
> >> XEN will add the PCI devices in the linked list maintain in XEN
> >> using the function pci_add_device(). XEN will be aware of all the
> >> PCI devices on the system and all the device will be added to the
> >> hardware domain.
> >> 
> >> Limitations:
> >> * When PCI devices are added to XEN, MSI capability is
> >>  not initialized inside XEN and not supported as of now.
> > 
> > I assume you will mask such capability and will prevent the guest (or
> > hardware domain) from interacting with it?
> 
> No we will actually implement that part but later. This is not supported in
> the RFC that we will submit. 

OK, might be nice to note this somewhere, even if it's not implemented
right now. It might also be relevant to start thinking about which
capabilities you have to expose to guests, and how you will make those
safe. This could even be in a separate document, but ideally a design
document (or set of documents) should try to cover all the
implementation that will be done in order to support a feature.

> > 
> >> * ACS capability is disable for ARM as of now as after enabling it
> >>  devices are not accessible.
> >> * Dom0Less implementation will require to have the capacity inside Xen
> >>  to discover the PCI devices (without depending on Dom0 to declare them
> >>  to Xen).
> > 
> > I assume the firmware will properly initialize the host bridge and
> > configure the resources for each device, so that Xen just has to walk
> > the PCI space and find the devices.
> > 
> > TBH that would be my preferred method, because then you can get rid of
> > the hypercall.
> > 
> > Is there anyway for Xen to know whether the host bridge is properly
> > setup and thus the PCI bus can be scanned?
> > 
> > That way Arm could do something similar to x86, where Xen will scan
> > the bus and discover devices, but you could still provide the
> > hypercall in case the bus cannot be scanned by Xen (because it hasn't
> > been setup).
> 
> That is definitely the idea to rely by default on a firmware doing this properly.
> I am not sure wether a proper enumeration could be detected properly in all
> cases so it would make sens to rely on Dom0 enumeration when a Xen
> command line argument is passed as explained in one of Rahul’s mails.

I assume Linux somehow knows when it needs to initialize the PCI root
complex before attempting to access the bus. Would it be possible to
add this logic to Xen so it can figure out on it's own whether it's
safe to scan the PCI bus or whether it needs to wait for the hardware
domain to report the devices present?

> > 
> >> 
> >> # Enable the existing x86 virtual PCI support for ARM:
> >> 
> >> The existing VPCI support available for X86 is adapted for Arm. When
> >> the device is added to XEN via the hyper call
> >> “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access
> >> is added to the PCI device to emulate the PCI devices.
> >> 
> >> A MMIO trap handler for the PCI ECAM space is registered in XEN so
> >> that when guest is trying to access the PCI config space, XEN will
> >> trap the access and emulate read/write using the VPCI and not the
> >> real PCI hardware.
> >> 
> >> Limitation:
> >> * No handler is register for the MSI configuration.
> > 
> > But you need to mask MSI/MSI-X capabilities in the config space in
> > order to prevent access from domains? (and by mask I mean remove from
> > the list of capabilities and prevent reads/writes to that
> > configuration space).
> > 
> > Note this is already implemented for x86, and I've tried to add arch_
> > hooks for arch specific stuff so that it could be reused by Arm. But
> > maybe this would require a different design document?
> 
> as said, we will handle MSI support in a separate document/step.
> 
> > 
> >> * Only legacy interrupt is supported and tested as of now, MSI is not
> >>  implemented and tested.
> >> 
> >> # Assign the device to the guest:
> >> 
> >> Assign the PCI device from the hardware domain to the guest is done
> >> using the below guest config option. When xl tool create the domain,
> >> PCI devices will be assigned to the guest VPCI bus.
> >> 
> >> pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
> >> 
> >> Guest will be only able to access the assigned devices and see the
> >> bridges. Guest will not be able to access or see the devices that
> >> are no assigned to him.
> >> 
> >> Limitation:
> >> * As of now all the bridges in the PCI bus are seen by
> >>  the guest on the VPCI bus.
> > 
> > I don't think you need all of them, just the ones that are higher up
> > on the hierarchy of the device you are trying to passthrough?
> > 
> > Which kind of access do guest have to PCI bridges config space?
> 
> For now the bridges are read only, no specific access is required by guests. 
> 
> > 
> > This should be limited to read-only accesses in order to be safe.
> > 
> > Emulating a PCI bridge in Xen using vPCI shouldn't be that
> > complicated, so you could likely replace the real bridges with
> > emulated ones. Or even provide a fake topology to the guest using an
> > emulated bridge.
> 
> Just showing all bridges and keeping the hardware topology is the simplest
> solution for now. But maybe showing a different topology and only fake
> bridges could make sense and be implemented in the future.

Ack. I've also heard rumors of Xen on Arm people being very interested
in VirtIO support, in which case you might expose both fully emulated
VirtIO devices and PCI passthrough devices on the PCI bus, so it would
be good to spend some time thinking how those will fit together.

Will you allocate a separate segment unused by hardware to expose the
fully emulated PCI devices (VirtIO)?

Will OSes support having several segments?

If not you likely need to have emulated bridges so that you can adjust
the bridge window accordingly to fit the passthrough and the emulated
MMIO space, and likely be able to expose passthrough devices using a
different topology than the host one.

> > 
> >> 
> >> # Emulated PCI device tree node in libxl:
> >> 
> >> Libxl is creating a virtual PCI device tree node in the device tree
> >> to enable the guest OS to discover the virtual PCI during guest
> >> boot. We introduced the new config option [vpci="pci_ecam"] for
> >> guests. When this config option is enabled in a guest configuration,
> >> a PCI device tree node will be created in the guest device tree.
> >> 
> >> A new area has been reserved in the arm guest physical map at which
> >> the VPCI bus is declared in the device tree (reg and ranges
> >> parameters of the node). A trap handler for the PCI ECAM access from
> >> guest has been registered at the defined address and redirects
> >> requests to the VPCI driver in Xen.
> > 
> > Can't you deduce the requirement of such DT node based on the presence
> > of a 'pci=' option in the same config file?
> > 
> > Also I wouldn't discard that in the future you might want to use
> > different emulators for different devices, so it might be helpful to
> > introduce something like:
> > 
> > pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
> > 
> > For the time being Arm will require backend=vpci for all the passed
> > through devices, but I wouldn't rule out this changing in the future.
> 
> We need it for the case where no device is declared in the config file and the user
> wants to add devices using xl later. In this case we must have the DT node for it
> to work. 

There's a passthrough xl.cfg option for that already, so that if you
don't want to add any PCI passthrough devices at creation time but
rather hotplug them you can set:

passthrough=enabled

And it should setup the domain to be prepared to support hot
passthrough, including the IOMMU [0].

> Regarding possibles backend this could be added in the future if required. 
> 
> > 
> >> Limitation:
> >> * Only one PCI device tree node is supported as of now.
> >> 
> >> BAR value and IOMEM mapping:
> >> 
> >> Linux guest will do the PCI enumeration based on the area reserved
> >> for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI
> >> device is assigned to the guest, XEN will map the guest PCI IOMEM
> >> region to the real physical IOMEM region only for the assigned
> >> devices.
> > 
> > PCI IOMEM == BARs? Or are you referring to the ECAM access window?
> 
> Here by PCI IOMEM we mean the IOMEM spaces referred to by the BARs
> of the PCI device

OK, might be worth to use PCI BARs explicitly rather than PCI IOMEM as
I think that's likely to be confused with the config space IOMEM.

> > 
> >> As of now we have not modified the existing VPCI code to map the
> >> guest PCI IOMEM region to the real physical IOMEM region. We used
> >> the existing guest “iomem” config option to map the region.  For
> >> example: Guest reserved IOMEM region:  0x04020000 Real physical
> >> IOMEM region:0x50000000 IOMEM size:128MB iomem config will be:
> >> iomem = ["0x50000,0x8000@0x4020"]
> >> 
> >> There is no need to map the ECAM space as XEN already have access to
> >> the ECAM space and XEN will trap ECAM accesses from the guest and
> >> will perform read/write on the VPCI bus.
> >> 
> >> IOMEM access will not be trapped and the guest will directly access
> >> the IOMEM region of the assigned device via stage-2 translation.
> >> 
> >> In the same, we mapped the assigned devices IRQ to the guest using
> >> below config options.  irqs= [ NUMBER, NUMBER, ...]
> > 
> > Are you providing this for the hardware domain also? Or are irqs
> > fetched from the DT in that case?
> 
> This will only be used temporarily until we have proper support to do this
> automatically when a device is assigned. Right now our current implementation
> status requires the user to explicitely redirect the interrupts required by the PCI
> devices assigned but in the final version this entry will not be needed.

Right, I'm not sure whether this should be marked somehow as **
WORKAROUND ** or ** TEMPORARY ** in the document, since it's not
supposed to be part of the final implementation.

> Dom0 relies on the entries declared in the DT.
> 
> > 
> >> Limitation:
> >> * Need to avoid the “iomem” and “irq” guest config
> >>  options and map the IOMEM region and IRQ at the same time when
> >>  device is assigned to the guest using the “pci” guest config options
> >>  when xl creates the domain.
> >> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
> >>  address.
> > 
> > It was my understanding that you would identity map the BAR into the
> > domU stage-2 translation, and that changes by the guest won't be
> > allowed.
> 
> In fact this is not possible to do and we have to remap at a different address
> because the guest physical mapping is fixed by Xen on Arm so we must follow
> the same design otherwise this would only work if the BARs are pointing to an
> address unused and on Juno this is for example conflicting with the guest
> RAM address.

This was not clear from my reading of the document, could you please
clarify on the next version that the guest physical memory map is
always the same, and that BARs from PCI devices cannot be identity
mapped to the stage-2 translation and instead are relocated somewhere
else?

I'm then confused about what you do with bridge windows, do you also
trap and adjust them to report a different IOMEM region?

Above you mentioned that read-only access was given to bridge
registers, but I guess some are also emulated in order to report
matching IOMEM regions?

Roger.

[0] https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Other-Options


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 14:06             ` Jan Beulich
@ 2020-07-17 14:34               ` Bertrand Marquis
  2020-07-17 14:41                 ` Roger Pau Monné
  2020-07-20 23:23                 ` Stefano Stabellini
  0 siblings, 2 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 14:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Roger Pau Monné



> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 17.07.2020 15:59, Bertrand Marquis wrote:
>> 
>> 
>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>> 
>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>> # Emulated PCI device tree node in libxl:
>>>>>> 
>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>> 
>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>> vpci="ecam" as unambiguous?
>>>> 
>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
>>> 
>>> I'm afraid I don't understand: When there are no PCI device that get
>>> handed to a guest when it gets created, but it is supposed to be able
>>> to have some assigned while already running, then we agree the option
>>> is needed (afaict). When PCI devices get handed to the guest while it
>>> gets constructed, where's the problem to infer this option from the
>>> presence of PCI devices in the guest configuration?
>> 
>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
> 
> That's what everyone looks to agree with. Yet why is the parameter needed
> when there _are_ PCI devices anyway? That's the "optional" that Stefano
> was suggesting, aiui.

I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.

Bertrand

> 
> Jan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 14:34               ` Bertrand Marquis
@ 2020-07-17 14:41                 ` Roger Pau Monné
  2020-07-17 14:49                   ` Bertrand Marquis
  2020-07-20 23:23                 ` Stefano Stabellini
  1 sibling, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 14:41 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall

On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
> 
> 
> > On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> > 
> > On 17.07.2020 15:59, Bertrand Marquis wrote:
> >> 
> >> 
> >>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> >>> 
> >>> On 17.07.2020 15:14, Bertrand Marquis wrote:
> >>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>> On 16.07.2020 19:10, Rahul Singh wrote:
> >>>>>> # Emulated PCI device tree node in libxl:
> >>>>>> 
> >>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> >>>>> 
> >>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
> >>>>> there to be no need for it when there are PCI devices assigned to the
> >>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
> >>>>> vpci="ecam" as unambiguous?
> >>>> 
> >>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
> >>> 
> >>> I'm afraid I don't understand: When there are no PCI device that get
> >>> handed to a guest when it gets created, but it is supposed to be able
> >>> to have some assigned while already running, then we agree the option
> >>> is needed (afaict). When PCI devices get handed to the guest while it
> >>> gets constructed, where's the problem to infer this option from the
> >>> presence of PCI devices in the guest configuration?
> >> 
> >> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> >> If we do not have the vpci parameter in the configuration this use case will not work anymore.
> > 
> > That's what everyone looks to agree with. Yet why is the parameter needed
> > when there _are_ PCI devices anyway? That's the "optional" that Stefano
> > was suggesting, aiui.
> 
> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.

Where will the ECAM region(s) appear on the guest physmap?

Are you going to re-use the same locations as on the physical
hardware, or will they appear somewhere else?

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 13:50     ` Julien Grall
  2020-07-17 13:59       ` Jan Beulich
@ 2020-07-17 14:47       ` Bertrand Marquis
  2020-07-17 15:26         ` Julien Grall
  1 sibling, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 14:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall



> On 17 Jul 2020, at 15:50, Julien Grall <julien@xen.org> wrote:
> 
> (Resending to the correct ML)
> 
> On 17/07/2020 14:23, Julien Grall wrote:
>> On 16/07/2020 18:02, Rahul Singh wrote:
>>> Hello All,
>> Hi,
>>> Following up on discussion on PCI Passthrough support on ARM that we had at the XEN summit, we are submitting a Review For Comment and a design proposal for PCI passthrough support on ARM. Feel free to give your feedback.
>>> 
>>> The followings describe the high-level design proposal of the PCI passthrough support and how the different modules within the system interacts with each other to assign a particular PCI device to the guest.
>> There was an attempt a few years ago to get a design document for PCI passthrough (see [1]). I would suggest to have a look at the thread as I think it would help to have an overview of all the components (e.g MSI controllers...) even if they will not be implemented at the beginning.

Thanks for the pointer. This design is a first draft that we will improve and complete it along the way.

>>> 
>>> # Title:
>>> 
>>> PCI devices passthrough on Arm design proposal
>>> 
>>> # Problem statement:
>>> 
>>> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
>>> 
>>> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
>>> 
>>> Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
>>> 
>>> ## Current state: Draft version
>>> 
>>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>> 
>>> # Proposal:
>>> 
>>> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
>>> 
>>> # PCI Terminology:
>>> 
>>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
>>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
>>> 
>>> # Discovering PCI Host Bridge in XEN:
>>> 
>>> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
>>> 
>>> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
>> AFAICT, "linux,pci-domain" is not a mandatory option and mostly tie to Linux. What would happen with other OS?
>> But I would rather avoid trying to mandate a user to modifying his/her device-tree in order to support PCI passthrough. It would be better to consider Xen to assign the number if it is not present.

so you would suggest here that if this entry is not present in the configuration, we just assign a value inside xen ? How should this information be passed to the guest ? 
This number is required for the current hypercall to declare devices to xen so those could end up being different.

>>> 
>>> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
>>> 
>>> Limitation:
>>> * Only PCI ECAM configuration space access is supported.
>>> * Device tree binding is supported as of now, ACPI is not supported.
>> We want to differentiate the high-level design from the actual implementation. While you may not yet implement ACPI, we still need to keep it in mind to avoid incompatibilities in long term.

For sure we do not want to make anything which would not be possible to implement with ACPI. 
I hope the community will help us during review to find those possible problems if we do not see them. 

>>> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required  some specific code or quirks).
>>> 
>>> # Discovering PCI devices:
>>> 
>>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>> 
>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>> 
>>> #define PHYSDEVOP_pci_device_add        25
>>> struct physdev_pci_device_add {
>>>      uint16_t seg;
>>>      uint8_t bus;
>>>      uint8_t devfn;
>>>      uint32_t flags;
>>>      struct {
>>>          uint8_t bus;
>>>          uint8_t devfn;
>>>      } physfn;
>>>      /*
>>>      * Optional parameters array.
>>>      * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>>      */
>>>      uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>      };
>>> 
>>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>> 
>>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>> I understand this what x86 does. However, may I ask why we would want it for Arm?

We wanted to be as near as possible from x86 implementation and design. 
But if you have an other idea here we are fully open to discuss it. 

>>> 
>>> Limitations:
>>> * When PCI devices are added to XEN, MSI capability is not initialized inside XEN and not supported as of now.
>>> * ACS capability is disable for ARM as of now as after enabling it devices are not accessible.
>> I am not sure to understand this. Can you expand?

As a temporary workaround we turned that feature off in the code for now but we will fix that later.

>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>> 
>>> # Enable the existing x86 virtual PCI support for ARM:
>>> 
>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>> 
>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>>> 
>>> Limitation:
>>> * No handler is register for the MSI configuration.
>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?

We plan to fix this by adding proper support for MSI in the long term. 
For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.

>>> 
>>> # Assign the device to the guest:
>>> 
>>> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
>> Above, you suggest that device will be assigned to the hardware domain at boot. I am assuming this also means that all the interrupts/MMIOs will be routed/mapped, is that correct?
>> If so, can you provide a rough sketch how assign/deassign will work?

Yes this is correct. We will improve the design and add a more detailed description on that in the next version. 
To make it short we remove the resources from the hardware domain first and assign them to the guest the device has been assigned to. There are still some parts in there where we are still in investigation mode on that part. 

>>>     pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>> 
>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
>>> 
>>> Limitation:
>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
>> Why do you want to expose all the bridges to a guest? Does this mean that the BDF should always match between the host and the guest?

That’s not really something that we wanted but this was the easiest way to go.
As said in a previous mail we could build a VPCI bus with a completely different topology but I am not sure of the advantages this would have.
Do you see some reason to do this ?

>>> 
>>> # Emulated PCI device tree node in libxl:
>>> 
>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>> 
>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>> 
>>> Limitation:
>>> * Only one PCI device tree node is supported as of now.
>>> 
>>> BAR value and IOMEM mapping:
>>> 
>>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI    device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>> 
>>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>>> For example:
>>>     Guest reserved IOMEM region:  0x04020000
>>>          Real physical IOMEM region:0x50000000
>>>          IOMEM size:128MB
>>>          iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>>> 
>>> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
>>> 
>>> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
>>> 
>>> In the same, we mapped the assigned devices IRQ to the guest using below config options.
>>>     irqs= [ NUMBER, NUMBER, ...]
>>> 
>>> Limitation:
>>> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
>>> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
>>> 
>>> # SMMU configuration for guest:
>>> 
>>> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory and add
>>> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
>> There are a few more questions to answer here:
>>    - When a guest is destroyed, who will be the owner of the PCI devices? Depending on the answer, how do you make sure the device is quiescent?

I would say the hardware domain if there is one otherwise nobody. 
On the quiescent part this is definitely something for which I have no answer for now and any suggestion is more then welcome.

>>    - Is there any memory access that can bypassed the IOMMU (e.g doorbell)?

This is still something to be investigated as part of the MSI implementation. 
If you have any idea here, feel free to tell us. 

We are submitting all this as requested during Xen Summit to have some first feedback but this is a huge work package and there are still lots of areas that we have to dig into :-)

Cheers
Bertrand

>> Cheers,
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg02520.html
> 
> -- 
> Julien Grall
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 14:41                 ` Roger Pau Monné
@ 2020-07-17 14:49                   ` Bertrand Marquis
  2020-07-17 15:05                     ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 14:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall



> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
>> 
>> 
>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>> 
>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>> 
>>>> 
>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>>>> 
>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>> 
>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>>> 
>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>>>> vpci="ecam" as unambiguous?
>>>>>> 
>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
>>>>> 
>>>>> I'm afraid I don't understand: When there are no PCI device that get
>>>>> handed to a guest when it gets created, but it is supposed to be able
>>>>> to have some assigned while already running, then we agree the option
>>>>> is needed (afaict). When PCI devices get handed to the guest while it
>>>>> gets constructed, where's the problem to infer this option from the
>>>>> presence of PCI devices in the guest configuration?
>>>> 
>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
>>> 
>>> That's what everyone looks to agree with. Yet why is the parameter needed
>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
>>> was suggesting, aiui.
>> 
>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
> 
> Where will the ECAM region(s) appear on the guest physmap?
> 
> Are you going to re-use the same locations as on the physical
> hardware, or will they appear somewhere else?

We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
So they will appear at a different address then the hardware.

Bertrand

> 
> Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 14:49                   ` Bertrand Marquis
@ 2020-07-17 15:05                     ` Roger Pau Monné
  2020-07-17 15:23                       ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 15:05 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall

On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
> 
> 
> > On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
> >> 
> >> 
> >>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> >>> 
> >>> On 17.07.2020 15:59, Bertrand Marquis wrote:
> >>>> 
> >>>> 
> >>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>> 
> >>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
> >>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
> >>>>>>>> # Emulated PCI device tree node in libxl:
> >>>>>>>> 
> >>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> >>>>>>> 
> >>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
> >>>>>>> there to be no need for it when there are PCI devices assigned to the
> >>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
> >>>>>>> vpci="ecam" as unambiguous?
> >>>>>> 
> >>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
> >>>>> 
> >>>>> I'm afraid I don't understand: When there are no PCI device that get
> >>>>> handed to a guest when it gets created, but it is supposed to be able
> >>>>> to have some assigned while already running, then we agree the option
> >>>>> is needed (afaict). When PCI devices get handed to the guest while it
> >>>>> gets constructed, where's the problem to infer this option from the
> >>>>> presence of PCI devices in the guest configuration?
> >>>> 
> >>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> >>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
> >>> 
> >>> That's what everyone looks to agree with. Yet why is the parameter needed
> >>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
> >>> was suggesting, aiui.
> >> 
> >> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
> > 
> > Where will the ECAM region(s) appear on the guest physmap?
> > 
> > Are you going to re-use the same locations as on the physical
> > hardware, or will they appear somewhere else?
> 
> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)

I think I'm confused, but that file doesn't contain anything related
to the guest physmap, that's the Xen virtual memory layout on Arm
AFAICT?

Does this somehow relate to the physical memory map exposed to guests
on Arm?

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 14:31         ` Roger Pau Monné
@ 2020-07-17 15:21           ` Bertrand Marquis
  2020-07-17 15:55             ` Roger Pau Monné
  2020-07-20 23:24             ` Stefano Stabellini
  0 siblings, 2 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 15:21 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall



> On 17 Jul 2020, at 16:31, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote:
>> 
>> 
>>> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> 
>>> I've wrapped the email to 80 columns in order to make it easier to
>>> reply.
>>> 
>>> Thanks for doing this, I think the design is good, I have some
>>> questions below so that I understand the full picture.
>>> 
>>> On Thu, Jul 16, 2020 at 05:10:05PM +0000, Rahul Singh wrote:
>>>> Hello All,
>>>> 
>>>> Following up on discussion on PCI Passthrough support on ARM that we
>>>> had at the XEN summit, we are submitting a Review For Comment and a
>>>> design proposal for PCI passthrough support on ARM. Feel free to
>>>> give your feedback.
>>>> 
>>>> The followings describe the high-level design proposal of the PCI
>>>> passthrough support and how the different modules within the system
>>>> interacts with each other to assign a particular PCI device to the
>>>> guest.
>>>> 
>>>> # Title:
>>>> 
>>>> PCI devices passthrough on Arm design proposal
>>>> 
>>>> # Problem statement:
>>>> 
>>>> On ARM there in no support to assign a PCI device to a guest. PCI
>>>> device passthrough capability allows guests to have full access to
>>>> some PCI devices. PCI device passthrough allows PCI devices to
>>>> appear and behave as if they were physically attached to the guest
>>>> operating system and provide full isolation of the PCI devices.
>>>> 
>>>> Goal of this work is to also support Dom0Less configuration so the
>>>> PCI backend/frontend drivers used on x86 shall not be used on Arm.
>>>> It will use the existing VPCI concept from X86 and implement the
>>>> virtual PCI bus through IO emulation such that only assigned devices
>>>> are visible to the guest and guest can use the standard PCI
>>>> driver.
>>>> 
>>>> Only Dom0 and Xen will have access to the real PCI bus, guest will
>>>> have a direct access to the assigned device itself. IOMEM memory
>>>> will be mapped to the guest and interrupt will be redirected to the
>>>> guest. SMMU has to be configured correctly to have DMA
>>>> transaction.
>>>> 
>>>> ## Current state: Draft version
>>>> 
>>>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>>> 
>>>> # Proposal:
>>>> 
>>>> This section will describe the different subsystem to support the
>>>> PCI device passthrough and how these subsystems interact with each
>>>> other to assign a device to the guest.
>>>> 
>>>> # PCI Terminology:
>>>> 
>>>> Host Bridge: Host bridge allows the PCI devices to talk to the rest
>>>> of the computer.  ECAM: ECAM (Enhanced Configuration Access
>>>> Mechanism) is a mechanism developed to allow PCIe to access
>>>> configuration space. The space available per function is 4KB.
>>>> 
>>>> # Discovering PCI Host Bridge in XEN:
>>>> 
>>>> In order to support the PCI passthrough XEN should be aware of all
>>>> the PCI host bridges available on the system and should be able to
>>>> access the PCI configuration space. ECAM configuration access is
>>>> supported as of now. XEN during boot will read the PCI device tree
>>>> node “reg” property and will map the ECAM space to the XEN memory
>>>> using the “ioremap_nocache ()” function.
>>> 
>>> What about ACPI? I think you should also mention the MMCFG table,
>>> which should contain the information about the ECAM region(s) (or at
>>> least that's how it works on x86). Just realized that you don't
>>> support ACPI ATM, so you can ignore this comment.
>> 
>> Yes for now we did not consider ACPI support.
> 
> I have 0 knowledge of ACPI on Arm, but I would assume it's also using
> the MCFG table in order to report ECAM regions to the OSPM. This is a
> static table that's very simple to parse, and it contains the ECAM
> IOMEM area and the segment assigned to that ECAM region.
> 
> This is better than DT because ACPI already assigns segment numbers to
> each ECAM region.
> 
> Even if not currently supported in the code implemented so far
> describing the plan for it's implementation here seems fine IMO, as
> it's going to be slightly different from what you need to do when
> using DT.

That should be fairly straight forward i agree and it make sense to spend
some time to make sure the design would allow to add ACPI support.
I will note that down to add this in the next design version.

> 
>>> 
>>>> 
>>>> If there are more than one segment on the system, XEN will read the
>>>> “linux, pci-domain” property from the device tree node and configure
>>>> the host bridge segment number accordingly. All the PCI device tree
>>>> nodes should have the “linux,pci-domain” property so that there will
>>>> be no conflicts. During hardware domain boot Linux will also use the
>>>> same “linux,pci-domain” property and assign the domain number to the
>>>> host bridge.
>>> 
>>> So it's my understanding that the PCI domain (or segment) is just an
>>> abstract concept to differentiate all the Root Complex present on
>>> the system, but the host bridge itself it's not aware of the segment
>>> assigned to it in any way.
>>> 
>>> I'm not sure Xen and the hardware domain having matching segments is a
>>> requirement, if you use vPCI you can match the segment (from Xen's
>>> PoV) by just checking from which ECAM region the access has been
>>> performed.
>>> 
>>> The only reason to require matching segment values between Xen and the
>>> hardware domain is to allow using hypercalls against the PCI devices,
>>> ie: to be able to use hypercalls to assign a device to a domain from
>>> the hardware domain.
>>> 
>>> I have 0 understanding of DT or it's spec, but why does this have a
>>> 'linux,' prefix? The segment number is part of the PCI spec, and not
>>> something specific to Linux IMO.
>> 
>> This is exact that this is only needed for the hypercall when Dom0 is
>> doing the full enumeration and communicating the devices to Xen. 
>> On all other cases this can be deduced from the address of the access.
> 
> You also need the SBDF nomenclature in order to assign deices to
> guests from the control domain, so at least there needs to be some
> consensus from the hardware domain and Xen on the segment numbering in
> that regard.
> 
> Same applies to dom0less mode, there needs to be some consensus about
> the segment numbers used, so Xen can identify the devices assigned to
> each guests without confusion.

Agree.

> 
>> Regarding the DT entry, this is not coming from us and this is already
>> defined this way in existing DTBs, we just reuse the existing entry. 
> 
> Is it possible to standardize the property and drop the linux prefix?

Honestly i do not know. This was there in the DT examples we checked so
we planned to use that. But it might be possible to standardize this.
@stefano: You are the device tree expert :-) any idea on this ?

> 
>>> 
>>>> 
>>>> When Dom0 tries to access the PCI config space of the device, XEN
>>>> will find the corresponding host bridge based on segment number and
>>>> access the corresponding config space assigned to that bridge.
>>>> 
>>>> Limitation:
>>>> * Only PCI ECAM configuration space access is supported.
>>>> * Device tree binding is supported as of now, ACPI is not supported.
>>>> * Need to port the PCI host bridge access code to XEN to access the
>>>> configuration space (generic one works but lots of platforms will
>>>> required  some specific code or quirks).
>>>> 
>>>> # Discovering PCI devices:
>>>> 
>>>> PCI-PCIe enumeration is a process of detecting devices connected to
>>>> its host. It is the responsibility of the hardware domain or boot
>>>> firmware to do the PCI enumeration and configure the BAR, PCI
>>>> capabilities, and MSI/MSI-X configuration.
>>>> 
>>>> PCI-PCIe enumeration in XEN is not feasible for the configuration
>>>> part as it would require a lot of code inside Xen which would
>>>> require a lot of maintenance. Added to this many platforms require
>>>> some quirks in that part of the PCI code which would greatly improve
>>>> Xen complexity. Once hardware domain enumerates the device then it
>>>> will communicate to XEN via the below hypercall.
>>>> 
>>>> #define PHYSDEVOP_pci_device_add        25 struct
>>>> physdev_pci_device_add {
>>>>   uint16_t seg;
>>>>   uint8_t bus;
>>>>   uint8_t devfn;
>>>>   uint32_t flags;
>>>>   struct {
>>>>       uint8_t bus;
>>>>       uint8_t devfn;
>>>>   } physfn;
>>>>   /*
>>>>    * Optional parameters array.
>>>>    * First element ([0]) is PXM domain associated with the device (if
>>>>    * XEN_PCI_DEV_PXM is set)
>>>>    */
>>>>   uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>> };
>>>> 
>>>> As the hypercall argument has the PCI segment number, XEN will
>>>> access the PCI config space based on this segment number and find
>>>> the host-bridge corresponding to this segment number. At this stage
>>>> host bridge is fully initialized so there will be no issue to access
>>>> the config space.
>>>> 
>>>> XEN will add the PCI devices in the linked list maintain in XEN
>>>> using the function pci_add_device(). XEN will be aware of all the
>>>> PCI devices on the system and all the device will be added to the
>>>> hardware domain.
>>>> 
>>>> Limitations:
>>>> * When PCI devices are added to XEN, MSI capability is
>>>> not initialized inside XEN and not supported as of now.
>>> 
>>> I assume you will mask such capability and will prevent the guest (or
>>> hardware domain) from interacting with it?
>> 
>> No we will actually implement that part but later. This is not supported in
>> the RFC that we will submit. 
> 
> OK, might be nice to note this somewhere, even if it's not implemented
> right now. It might also be relevant to start thinking about which
> capabilities you have to expose to guests, and how you will make those
> safe. This could even be in a separate document, but ideally a design
> document (or set of documents) should try to cover all the
> implementation that will be done in order to support a feature.

I added that as points we need to clear in the next design version.

> 
>>> 
>>>> * ACS capability is disable for ARM as of now as after enabling it
>>>> devices are not accessible.
>>>> * Dom0Less implementation will require to have the capacity inside Xen
>>>> to discover the PCI devices (without depending on Dom0 to declare them
>>>> to Xen).
>>> 
>>> I assume the firmware will properly initialize the host bridge and
>>> configure the resources for each device, so that Xen just has to walk
>>> the PCI space and find the devices.
>>> 
>>> TBH that would be my preferred method, because then you can get rid of
>>> the hypercall.
>>> 
>>> Is there anyway for Xen to know whether the host bridge is properly
>>> setup and thus the PCI bus can be scanned?
>>> 
>>> That way Arm could do something similar to x86, where Xen will scan
>>> the bus and discover devices, but you could still provide the
>>> hypercall in case the bus cannot be scanned by Xen (because it hasn't
>>> been setup).
>> 
>> That is definitely the idea to rely by default on a firmware doing this properly.
>> I am not sure wether a proper enumeration could be detected properly in all
>> cases so it would make sens to rely on Dom0 enumeration when a Xen
>> command line argument is passed as explained in one of Rahul’s mails.
> 
> I assume Linux somehow knows when it needs to initialize the PCI root
> complex before attempting to access the bus. Would it be possible to
> add this logic to Xen so it can figure out on it's own whether it's
> safe to scan the PCI bus or whether it needs to wait for the hardware
> domain to report the devices present?

That might be possible to do but will anyway require a command line argument
to be able to force xen to let the hardware domain do the initialization anyway in
case Xen detection does not work properly.
In the case where there is a Dom0 i would more expect that we let it do the initialization
all the time unless the user is telling using a command line argument that the current one
is correct and shall be used.
In Dom0Less case it must be correct as nobody will do it anyway.

I fear a bit that some kind of logic to detect if the initialization is correct will have more
exceptions then working cases and we might end up with lots of platform specific quirks.
But i might be wrong, anybody has an idea here ?

> 
>>> 
>>>> 
>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>> 
>>>> The existing VPCI support available for X86 is adapted for Arm. When
>>>> the device is added to XEN via the hyper call
>>>> “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access
>>>> is added to the PCI device to emulate the PCI devices.
>>>> 
>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so
>>>> that when guest is trying to access the PCI config space, XEN will
>>>> trap the access and emulate read/write using the VPCI and not the
>>>> real PCI hardware.
>>>> 
>>>> Limitation:
>>>> * No handler is register for the MSI configuration.
>>> 
>>> But you need to mask MSI/MSI-X capabilities in the config space in
>>> order to prevent access from domains? (and by mask I mean remove from
>>> the list of capabilities and prevent reads/writes to that
>>> configuration space).
>>> 
>>> Note this is already implemented for x86, and I've tried to add arch_
>>> hooks for arch specific stuff so that it could be reused by Arm. But
>>> maybe this would require a different design document?
>> 
>> as said, we will handle MSI support in a separate document/step.
>> 
>>> 
>>>> * Only legacy interrupt is supported and tested as of now, MSI is not
>>>> implemented and tested.
>>>> 
>>>> # Assign the device to the guest:
>>>> 
>>>> Assign the PCI device from the hardware domain to the guest is done
>>>> using the below guest config option. When xl tool create the domain,
>>>> PCI devices will be assigned to the guest VPCI bus.
>>>> 
>>>> pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>> 
>>>> Guest will be only able to access the assigned devices and see the
>>>> bridges. Guest will not be able to access or see the devices that
>>>> are no assigned to him.
>>>> 
>>>> Limitation:
>>>> * As of now all the bridges in the PCI bus are seen by
>>>> the guest on the VPCI bus.
>>> 
>>> I don't think you need all of them, just the ones that are higher up
>>> on the hierarchy of the device you are trying to passthrough?
>>> 
>>> Which kind of access do guest have to PCI bridges config space?
>> 
>> For now the bridges are read only, no specific access is required by guests. 
>> 
>>> 
>>> This should be limited to read-only accesses in order to be safe.
>>> 
>>> Emulating a PCI bridge in Xen using vPCI shouldn't be that
>>> complicated, so you could likely replace the real bridges with
>>> emulated ones. Or even provide a fake topology to the guest using an
>>> emulated bridge.
>> 
>> Just showing all bridges and keeping the hardware topology is the simplest
>> solution for now. But maybe showing a different topology and only fake
>> bridges could make sense and be implemented in the future.
> 
> Ack. I've also heard rumors of Xen on Arm people being very interested
> in VirtIO support, in which case you might expose both fully emulated
> VirtIO devices and PCI passthrough devices on the PCI bus, so it would
> be good to spend some time thinking how those will fit together.
> 
> Will you allocate a separate segment unused by hardware to expose the
> fully emulated PCI devices (VirtIO)?
> 
> Will OSes support having several segments?
> 
> If not you likely need to have emulated bridges so that you can adjust
> the bridge window accordingly to fit the passthrough and the emulated
> MMIO space, and likely be able to expose passthrough devices using a
> different topology than the host one.

Honestly this is not something we considered. I was more thinking that
this use case would be handled by creating an other VPCI bus dedicated
to those kind of devices instead of mixing physical and virtual devices.


> 
>>> 
>>>> 
>>>> # Emulated PCI device tree node in libxl:
>>>> 
>>>> Libxl is creating a virtual PCI device tree node in the device tree
>>>> to enable the guest OS to discover the virtual PCI during guest
>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>>> guests. When this config option is enabled in a guest configuration,
>>>> a PCI device tree node will be created in the guest device tree.
>>>> 
>>>> A new area has been reserved in the arm guest physical map at which
>>>> the VPCI bus is declared in the device tree (reg and ranges
>>>> parameters of the node). A trap handler for the PCI ECAM access from
>>>> guest has been registered at the defined address and redirects
>>>> requests to the VPCI driver in Xen.
>>> 
>>> Can't you deduce the requirement of such DT node based on the presence
>>> of a 'pci=' option in the same config file?
>>> 
>>> Also I wouldn't discard that in the future you might want to use
>>> different emulators for different devices, so it might be helpful to
>>> introduce something like:
>>> 
>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
>>> 
>>> For the time being Arm will require backend=vpci for all the passed
>>> through devices, but I wouldn't rule out this changing in the future.
>> 
>> We need it for the case where no device is declared in the config file and the user
>> wants to add devices using xl later. In this case we must have the DT node for it
>> to work. 
> 
> There's a passthrough xl.cfg option for that already, so that if you
> don't want to add any PCI passthrough devices at creation time but
> rather hotplug them you can set:
> 
> passthrough=enabled
> 
> And it should setup the domain to be prepared to support hot
> passthrough, including the IOMMU [0].

Isn’t this option covering more then PCI passthrough ?

Lots of Arm platform do not have a PCI bus at all, so for those
creating a VPCI bus would be pointless. But you might need to
activate this to pass devices which are not on the PCI bus.

> 
>> Regarding possibles backend this could be added in the future if required. 
>> 
>>> 
>>>> Limitation:
>>>> * Only one PCI device tree node is supported as of now.
>>>> 
>>>> BAR value and IOMEM mapping:
>>>> 
>>>> Linux guest will do the PCI enumeration based on the area reserved
>>>> for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI
>>>> device is assigned to the guest, XEN will map the guest PCI IOMEM
>>>> region to the real physical IOMEM region only for the assigned
>>>> devices.
>>> 
>>> PCI IOMEM == BARs? Or are you referring to the ECAM access window?
>> 
>> Here by PCI IOMEM we mean the IOMEM spaces referred to by the BARs
>> of the PCI device
> 
> OK, might be worth to use PCI BARs explicitly rather than PCI IOMEM as
> I think that's likely to be confused with the config space IOMEM.

Good point we will rephrase that in the next design version.

> 
>>> 
>>>> As of now we have not modified the existing VPCI code to map the
>>>> guest PCI IOMEM region to the real physical IOMEM region. We used
>>>> the existing guest “iomem” config option to map the region.  For
>>>> example: Guest reserved IOMEM region:  0x04020000 Real physical
>>>> IOMEM region:0x50000000 IOMEM size:128MB iomem config will be:
>>>> iomem = ["0x50000,0x8000@0x4020"]
>>>> 
>>>> There is no need to map the ECAM space as XEN already have access to
>>>> the ECAM space and XEN will trap ECAM accesses from the guest and
>>>> will perform read/write on the VPCI bus.
>>>> 
>>>> IOMEM access will not be trapped and the guest will directly access
>>>> the IOMEM region of the assigned device via stage-2 translation.
>>>> 
>>>> In the same, we mapped the assigned devices IRQ to the guest using
>>>> below config options.  irqs= [ NUMBER, NUMBER, ...]
>>> 
>>> Are you providing this for the hardware domain also? Or are irqs
>>> fetched from the DT in that case?
>> 
>> This will only be used temporarily until we have proper support to do this
>> automatically when a device is assigned. Right now our current implementation
>> status requires the user to explicitely redirect the interrupts required by the PCI
>> devices assigned but in the final version this entry will not be needed.
> 
> Right, I'm not sure whether this should be marked somehow as **
> WORKAROUND ** or ** TEMPORARY ** in the document, since it's not
> supposed to be part of the final implementation.

We added those to make clear that the implementation in the RFC we will push
was this way. Those will disappear with time and versions.
We will add a big TEMPORARY in the next version.

> 
>> Dom0 relies on the entries declared in the DT.
>> 
>>> 
>>>> Limitation:
>>>> * Need to avoid the “iomem” and “irq” guest config
>>>> options and map the IOMEM region and IRQ at the same time when
>>>> device is assigned to the guest using the “pci” guest config options
>>>> when xl creates the domain.
>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
>>>> address.
>>> 
>>> It was my understanding that you would identity map the BAR into the
>>> domU stage-2 translation, and that changes by the guest won't be
>>> allowed.
>> 
>> In fact this is not possible to do and we have to remap at a different address
>> because the guest physical mapping is fixed by Xen on Arm so we must follow
>> the same design otherwise this would only work if the BARs are pointing to an
>> address unused and on Juno this is for example conflicting with the guest
>> RAM address.
> 
> This was not clear from my reading of the document, could you please
> clarify on the next version that the guest physical memory map is
> always the same, and that BARs from PCI devices cannot be identity
> mapped to the stage-2 translation and instead are relocated somewhere
> else?

We will.

> 
> I'm then confused about what you do with bridge windows, do you also
> trap and adjust them to report a different IOMEM region?

Yes this is what we will have to do so that the regions reflect the VPCI mappings
and not the hardware one.

> 
> Above you mentioned that read-only access was given to bridge
> registers, but I guess some are also emulated in order to report
> matching IOMEM regions?

yes that’s exact. We will clear this in the next version.

Bertrand

> 
> Roger.
> 
> [0] https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Other-Options


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:05                     ` Roger Pau Monné
@ 2020-07-17 15:23                       ` Bertrand Marquis
  2020-07-17 15:30                         ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 15:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall



> On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
>> 
>> 
>>> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> 
>>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
>>>> 
>>>> 
>>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>>>> 
>>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> 
>>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>>>> 
>>>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>>>>> 
>>>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>>>>>> vpci="ecam" as unambiguous?
>>>>>>>> 
>>>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
>>>>>>> 
>>>>>>> I'm afraid I don't understand: When there are no PCI device that get
>>>>>>> handed to a guest when it gets created, but it is supposed to be able
>>>>>>> to have some assigned while already running, then we agree the option
>>>>>>> is needed (afaict). When PCI devices get handed to the guest while it
>>>>>>> gets constructed, where's the problem to infer this option from the
>>>>>>> presence of PCI devices in the guest configuration?
>>>>>> 
>>>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>>>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
>>>>> 
>>>>> That's what everyone looks to agree with. Yet why is the parameter needed
>>>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
>>>>> was suggesting, aiui.
>>>> 
>>>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
>>> 
>>> Where will the ECAM region(s) appear on the guest physmap?
>>> 
>>> Are you going to re-use the same locations as on the physical
>>> hardware, or will they appear somewhere else?
>> 
>> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
> 
> I think I'm confused, but that file doesn't contain anything related
> to the guest physmap, that's the Xen virtual memory layout on Arm
> AFAICT?
> 
> Does this somehow relate to the physical memory map exposed to guests
> on Arm?

Yes it does.
We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.

Bertrand

> 
> Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 14:47       ` Bertrand Marquis
@ 2020-07-17 15:26         ` Julien Grall
  2020-07-17 15:47           ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-17 15:26 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall



On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>> # Title:
>>>>
>>>> PCI devices passthrough on Arm design proposal
>>>>
>>>> # Problem statement:
>>>>
>>>> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
>>>>
>>>> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
>>>>
>>>> Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
>>>>
>>>> ## Current state: Draft version
>>>>
>>>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>>>
>>>> # Proposal:
>>>>
>>>> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
>>>>
>>>> # PCI Terminology:
>>>>
>>>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
>>>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
>>>>
>>>> # Discovering PCI Host Bridge in XEN:
>>>>
>>>> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
>>>>
>>>> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
>>> AFAICT, "linux,pci-domain" is not a mandatory option and mostly tie to Linux. What would happen with other OS?
>>> But I would rather avoid trying to mandate a user to modifying his/her device-tree in order to support PCI passthrough. It would be better to consider Xen to assign the number if it is not present.
> 
> so you would suggest here that if this entry is not present in the configuration, we just assign a value inside xen ? How should this information be passed to the guest ?
> This number is required for the current hypercall to declare devices to xen so those could end up being different.

I am guessing you mean passing to the hardware domain? If so, Xen is 
already rewriting the device-tree for the hardware domain. So it would 
be easy to add more property.

Now the question is whether other OSes are using "linux,pci-domain". I 
would suggest to have a look at a *BSD to see how they deal with PCI 
controllers.

> 
>>>>
>>>> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
>>>>
>>>> Limitation:
>>>> * Only PCI ECAM configuration space access is supported.
>>>> * Device tree binding is supported as of now, ACPI is not supported.
>>> We want to differentiate the high-level design from the actual implementation. While you may not yet implement ACPI, we still need to keep it in mind to avoid incompatibilities in long term.
> 
> For sure we do not want to make anything which would not be possible to implement with ACPI.
> I hope the community will help us during review to find those possible problems if we do not see them.

Have a look at the design document I pointed out in my previous answer. 
It should contain a lot of information already for ACPI :).

> 
>>>> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required  some specific code or quirks).
>>>>
>>>> # Discovering PCI devices:
>>>>
>>>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>>>
>>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>>>
>>>> #define PHYSDEVOP_pci_device_add        25
>>>> struct physdev_pci_device_add {
>>>>       uint16_t seg;
>>>>       uint8_t bus;
>>>>       uint8_t devfn;
>>>>       uint32_t flags;
>>>>       struct {
>>>>           uint8_t bus;
>>>>           uint8_t devfn;
>>>>       } physfn;
>>>>       /*
>>>>       * Optional parameters array.
>>>>       * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>>>       */
>>>>       uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>>       };
>>>>
>>>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>>>
>>>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>>> I understand this what x86 does. However, may I ask why we would want it for Arm?
> 
> We wanted to be as near as possible from x86 implementation and design.
> But if you have an other idea here we are fully open to discuss it.

In the case of platform device passthrough, we are leaving the device 
unassigned when not using by a guest. This makes sure the device can't 
do any harm if somehow it wasn't reset correctly.

I would prefer to consider the same approach for PCI devices if there is 
no plan to use it in dom0. Although, we need to figure out how PCI 
devices will be reset.

>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>>>
>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>>
>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>>
>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>>>>
>>>> Limitation:
>>>> * No handler is register for the MSI configuration.
>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?
> 
> We plan to fix this by adding proper support for MSI in the long term.
> For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.

Sharing interrupts are a bit pain because you couldn't take advantage of 
the direct EOI in HW and have to be careful if one guest doesn't EOI in 
timely maneer.

This is something I would rather avoid unless there is a real use case 
for it.

> 
>>>>
>>>> # Assign the device to the guest:
>>>>
>>>> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
>>> Above, you suggest that device will be assigned to the hardware domain at boot. I am assuming this also means that all the interrupts/MMIOs will be routed/mapped, is that correct?
>>> If so, can you provide a rough sketch how assign/deassign will work?
> 
> Yes this is correct. We will improve the design and add a more detailed description on that in the next version.
> To make it short we remove the resources from the hardware domain first and assign them to the guest the device has been assigned to. There are still some parts in there where we are still in investigation mode on that part.

Hmmm... Does this mean you modified the code to allow a interrupt to be 
removed while the domain is still running?

> 
>>>>      pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>>
>>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
>>>>
>>>> Limitation:
>>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
>>> Why do you want to expose all the bridges to a guest? Does this mean that the BDF should always match between the host and the guest?
> 
> That’s not really something that we wanted but this was the easiest way to go.
> As said in a previous mail we could build a VPCI bus with a completely different topology but I am not sure of the advantages this would have.
> Do you see some reason to do this ?

Yes :):
   1) If a platform has two host controllers (IIRC Thunder-X has it) 
then you would need to expose two host controllers to your guest. I 
think this is undesirable if your guest is only using a couple of PCI 
devices on each host controllers.
   2) In the case of migration (live or not), you may want to use a 
difference PCI card on the target platform. So your BDF and bridges may 
be different.

Therefore I think the virtual topology can be beneficial.

> 
>>>>
>>>> # Emulated PCI device tree node in libxl:
>>>>
>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>
>>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>>
>>>> Limitation:
>>>> * Only one PCI device tree node is supported as of now.
>>>>
>>>> BAR value and IOMEM mapping:
>>>>
>>>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI    device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>>>
>>>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>>>> For example:
>>>>      Guest reserved IOMEM region:  0x04020000
>>>>           Real physical IOMEM region:0x50000000
>>>>           IOMEM size:128MB
>>>>           iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>>>>
>>>> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
>>>>
>>>> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
>>>>
>>>> In the same, we mapped the assigned devices IRQ to the guest using below config options.
>>>>      irqs= [ NUMBER, NUMBER, ...]
>>>>
>>>> Limitation:
>>>> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
>>>> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
>>>>
>>>> # SMMU configuration for guest:
>>>>
>>>> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory and add
>>>> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
>>> There are a few more questions to answer here:
>>>     - When a guest is destroyed, who will be the owner of the PCI devices? Depending on the answer, how do you make sure the device is quiescent?
> 
> I would say the hardware domain if there is one otherwise nobody.

This is risky, in particular if your device is not quiescent (e.g 
because the reset failed). This would mean your device may be able to 
rewrite part of Dom0.

> On the quiescent part this is definitely something for which I have no answer for now and any suggestion is more then welcome.

Usually you will have to reset a device, but I am not sure this can 
always work properly. Hence, I think assigning the PCI devices to nobody 
would be more sensible. Note this is what XSA-306 aimed to do on x86 
(not yet implemented on Arm).

> 
>>>     - Is there any memory access that can bypassed the IOMMU (e.g doorbell)?
> 
> This is still something to be investigated as part of the MSI implementation.
> If you have any idea here, feel free to tell us.

My memory is a bit fuzzy here. I am sure that the doorbell can bypass 
the IOMMU on some platform, but I also vaguely remember that accesses to 
the PCI host controller memory window may also bypass the IOMMU. A good 
reading might be [2].

IIRC, I came to the conclusion that we may want to use the host memory 
map in the guest when using the PCI passthrough. But maybe not on all 
the platforms.

Cheers,

>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg02520.html

[2] https://www.spinics.net/lists/kvm/msg140116.html
>>
>> -- 
>> Julien Grall
>>
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:23                       ` Bertrand Marquis
@ 2020-07-17 15:30                         ` Roger Pau Monné
  2020-07-17 15:51                           ` Bertrand Marquis
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 15:30 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall

On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
> 
> 
> > On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
> >> 
> >> 
> >>> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>> 
> >>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
> >>>> 
> >>>> 
> >>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>> 
> >>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>> 
> >>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
> >>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
> >>>>>>>>>> # Emulated PCI device tree node in libxl:
> >>>>>>>>>> 
> >>>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> >>>>>>>>> 
> >>>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
> >>>>>>>>> there to be no need for it when there are PCI devices assigned to the
> >>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
> >>>>>>>>> vpci="ecam" as unambiguous?
> >>>>>>>> 
> >>>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
> >>>>>>> 
> >>>>>>> I'm afraid I don't understand: When there are no PCI device that get
> >>>>>>> handed to a guest when it gets created, but it is supposed to be able
> >>>>>>> to have some assigned while already running, then we agree the option
> >>>>>>> is needed (afaict). When PCI devices get handed to the guest while it
> >>>>>>> gets constructed, where's the problem to infer this option from the
> >>>>>>> presence of PCI devices in the guest configuration?
> >>>>>> 
> >>>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> >>>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
> >>>>> 
> >>>>> That's what everyone looks to agree with. Yet why is the parameter needed
> >>>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
> >>>>> was suggesting, aiui.
> >>>> 
> >>>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
> >>> 
> >>> Where will the ECAM region(s) appear on the guest physmap?
> >>> 
> >>> Are you going to re-use the same locations as on the physical
> >>> hardware, or will they appear somewhere else?
> >> 
> >> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
> > 
> > I think I'm confused, but that file doesn't contain anything related
> > to the guest physmap, that's the Xen virtual memory layout on Arm
> > AFAICT?
> > 
> > Does this somehow relate to the physical memory map exposed to guests
> > on Arm?
> 
> Yes it does.
> We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.

Yes, that's completely fine and is what's done on x86, but again I
feel like I'm lost here, this is the Xen virtual memory map, how does
this relate to the guest physical memory map?

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 15:26         ` Julien Grall
@ 2020-07-17 15:47           ` Bertrand Marquis
  2020-07-17 16:05             ` Roger Pau Monné
  2020-07-18 11:08             ` Julien Grall
  0 siblings, 2 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 15:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall



> On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>>> # Title:
>>>>> 
>>>>> PCI devices passthrough on Arm design proposal
>>>>> 
>>>>> # Problem statement:
>>>>> 
>>>>> On ARM there in no support to assign a PCI device to a guest. PCI device passthrough capability allows guests to have full access to some PCI devices. PCI device passthrough allows PCI devices to appear and behave as if they were physically attached to the guest operating system and provide full isolation of the PCI devices.
>>>>> 
>>>>> Goal of this work is to also support Dom0Less configuration so the PCI backend/frontend drivers used on x86 shall not be used on Arm. It will use the existing VPCI concept from X86 and implement the virtual PCI bus through IO emulation​ such that only assigned devices are visible​ to the guest and guest can use the standard PCI driver.
>>>>> 
>>>>> Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a direct access to the assigned device itself​. IOMEM memory will be mapped to the guest ​and interrupt will be redirected to the guest. SMMU has to be configured correctly to have DMA transaction.
>>>>> 
>>>>> ## Current state: Draft version
>>>>> 
>>>>> # Proposer(s): Rahul Singh, Bertrand Marquis
>>>>> 
>>>>> # Proposal:
>>>>> 
>>>>> This section will describe the different subsystem to support the PCI device passthrough and how these subsystems interact with each other to assign a device to the guest.
>>>>> 
>>>>> # PCI Terminology:
>>>>> 
>>>>> Host Bridge: Host bridge allows the PCI devices to talk to the rest of the computer.
>>>>> ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed to allow PCIe to access configuration space. The space available per function is 4KB.
>>>>> 
>>>>> # Discovering PCI Host Bridge in XEN:
>>>>> 
>>>>> In order to support the PCI passthrough XEN should be aware of all the PCI host bridges available on the system and should be able to access the PCI configuration space. ECAM configuration access is supported as of now. XEN during boot will read the PCI device tree node “reg” property and will map the ECAM space to the XEN memory using the “ioremap_nocache ()” function.
>>>>> 
>>>>> If there are more than one segment on the system, XEN will read the “linux, pci-domain” property from the device tree node and configure  the host bridge segment number accordingly. All the PCI device tree nodes should have the “linux,pci-domain” property so that there will be no conflicts. During hardware domain boot Linux will also use the same “linux,pci-domain” property and assign the domain number to the host bridge.
>>>> AFAICT, "linux,pci-domain" is not a mandatory option and mostly tie to Linux. What would happen with other OS?
>>>> But I would rather avoid trying to mandate a user to modifying his/her device-tree in order to support PCI passthrough. It would be better to consider Xen to assign the number if it is not present.
>> so you would suggest here that if this entry is not present in the configuration, we just assign a value inside xen ? How should this information be passed to the guest ?
>> This number is required for the current hypercall to declare devices to xen so those could end up being different.
> 
> I am guessing you mean passing to the hardware domain? If so, Xen is already rewriting the device-tree for the hardware domain. So it would be easy to add more property.

True this can be done :-)
We will add this to the design.

> 
> Now the question is whether other OSes are using "linux,pci-domain". I would suggest to have a look at a *BSD to see how they deal with PCI controllers.

Good idea, we will check how BSD is using the hypercall to declare PCI devices and what value is used there for the domain id.

> 
>>>>> 
>>>>> When Dom0 tries to access the PCI config space of the device, XEN will find the corresponding host bridge based on segment number and access the corresponding config space assigned to that bridge.
>>>>> 
>>>>> Limitation:
>>>>> * Only PCI ECAM configuration space access is supported.
>>>>> * Device tree binding is supported as of now, ACPI is not supported.
>>>> We want to differentiate the high-level design from the actual implementation. While you may not yet implement ACPI, we still need to keep it in mind to avoid incompatibilities in long term.
>> For sure we do not want to make anything which would not be possible to implement with ACPI.
>> I hope the community will help us during review to find those possible problems if we do not see them.
> 
> Have a look at the design document I pointed out in my previous answer. It should contain a lot of information already for ACPI :).

Thanks for the pointer, we will go through that

> 
>>>>> * Need to port the PCI host bridge access code to XEN to access the configuration space (generic one works but lots of platforms will required  some specific code or quirks).
>>>>> 
>>>>> # Discovering PCI devices:
>>>>> 
>>>>> PCI-PCIe enumeration is a process of detecting devices connected to its host. It is the responsibility of the hardware domain or boot firmware to do the PCI enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X configuration.
>>>>> 
>>>>> PCI-PCIe enumeration in XEN is not feasible for the configuration part as it would require a lot of code inside Xen which would require a lot of maintenance. Added to this many platforms require some quirks in that part of the PCI code which would greatly improve Xen complexity. Once hardware domain enumerates the device then it will communicate to XEN via the below hypercall.
>>>>> 
>>>>> #define PHYSDEVOP_pci_device_add        25
>>>>> struct physdev_pci_device_add {
>>>>>      uint16_t seg;
>>>>>      uint8_t bus;
>>>>>      uint8_t devfn;
>>>>>      uint32_t flags;
>>>>>      struct {
>>>>>          uint8_t bus;
>>>>>          uint8_t devfn;
>>>>>      } physfn;
>>>>>      /*
>>>>>      * Optional parameters array.
>>>>>      * First element ([0]) is PXM domain associated with the device (if * XEN_PCI_DEV_PXM is set)
>>>>>      */
>>>>>      uint32_t optarr[XEN_FLEX_ARRAY_DIM];
>>>>>      };
>>>>> 
>>>>> As the hypercall argument has the PCI segment number, XEN will access the PCI config space based on this segment number and find the host-bridge corresponding to this segment number. At this stage host bridge is fully initialized so there will be no issue to access the config space.
>>>>> 
>>>>> XEN will add the PCI devices in the linked list maintain in XEN using the function pci_add_device(). XEN will be aware of all the PCI devices on the system and all the device will be added to the hardware domain.
>>>> I understand this what x86 does. However, may I ask why we would want it for Arm?
>> We wanted to be as near as possible from x86 implementation and design.
>> But if you have an other idea here we are fully open to discuss it.
> 
> In the case of platform device passthrough, we are leaving the device unassigned when not using by a guest. This makes sure the device can't do any harm if somehow it wasn't reset correctly.
> 
> I would prefer to consider the same approach for PCI devices if there is no plan to use it in dom0. Although, we need to figure out how PCI devices will be reset.

Definitely we cannot rely on a guest to reset the device properly if it is killed and I doubt there is a “standard” way to do a reset of a PCI device that works all the time.
So I agree that leaving it unassigned is better and more secure.
We will modify our design accordingly.

> 
>>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>>>> 
>>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>>> 
>>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>>> 
>>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>>>>> 
>>>>> Limitation:
>>>>> * No handler is register for the MSI configuration.
>>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?
>> We plan to fix this by adding proper support for MSI in the long term.
>> For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.
> 
> Sharing interrupts are a bit pain because you couldn't take advantage of the direct EOI in HW and have to be careful if one guest doesn't EOI in timely maneer.
> 
> This is something I would rather avoid unless there is a real use case for it.

I would expect that most recent hardware will support MSI and this will not be needed.
When MSI is not used, the only solution would be to enforce that devices assigned to different guest are using different interrupts which would limit the number of domains being able to use PCI devices on a bus to 4 (if the enumeration can be modified correctly to assign the interrupts properly).
If we all agree that this is an acceptable limitation then we would not need the “interrupt sharing”.

> 
>>>>> 
>>>>> # Assign the device to the guest:
>>>>> 
>>>>> Assign the PCI device from the hardware domain to the guest is done using the below guest config option. When xl tool create the domain, PCI devices will be assigned to the guest VPCI bus.
>>>> Above, you suggest that device will be assigned to the hardware domain at boot. I am assuming this also means that all the interrupts/MMIOs will be routed/mapped, is that correct?
>>>> If so, can you provide a rough sketch how assign/deassign will work?
>> Yes this is correct. We will improve the design and add a more detailed description on that in the next version.
>> To make it short we remove the resources from the hardware domain first and assign them to the guest the device has been assigned to. There are still some parts in there where we are still in investigation mode on that part.
> 
> Hmmm... Does this mean you modified the code to allow a interrupt to be removed while the domain is still running?

For now we are not doing this automatically so this is done by explicitely assigning an interrupt to the guest in the configuration of the guest.
So we did not modify the code for that so far as this is part of the implementation using workarounds right now.

> 
>>>>>     pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>>> 
>>>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
>>>>> 
>>>>> Limitation:
>>>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
>>>> Why do you want to expose all the bridges to a guest? Does this mean that the BDF should always match between the host and the guest?
>> That’s not really something that we wanted but this was the easiest way to go.
>> As said in a previous mail we could build a VPCI bus with a completely different topology but I am not sure of the advantages this would have.
>> Do you see some reason to do this ?
> 
> Yes :):
>  1) If a platform has two host controllers (IIRC Thunder-X has it) then you would need to expose two host controllers to your guest. I think this is undesirable if your guest is only using a couple of PCI devices on each host controllers.
>  2) In the case of migration (live or not), you may want to use a difference PCI card on the target platform. So your BDF and bridges may be different.
> 
> Therefore I think the virtual topology can be beneficial.

I would see a big advantage definitely to have only one VPCI bus per guest and put all devices in their independently of the hardware domain the device is on.
But this will probably make the VPCI BARs value computation a bit more complex as we might end up with no space on the guest physical map for it.
This might make the implementation a lot more complex.

> 
>>>>> 
>>>>> # Emulated PCI device tree node in libxl:
>>>>> 
>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>> 
>>>>> A new area has been reserved in the arm guest physical map at which the VPCI bus is declared in the device tree (reg and ranges parameters of the node). A trap handler for the PCI ECAM access from guest has been registered at the defined address and redirects requests to the VPCI driver in Xen.
>>>>> 
>>>>> Limitation:
>>>>> * Only one PCI device tree node is supported as of now.
>>>>> 
>>>>> BAR value and IOMEM mapping:
>>>>> 
>>>>> Linux guest will do the PCI enumeration based on the area reserved for ECAM and IOMEM ranges in the VPCI device tree node. Once PCI    device is assigned to the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM region only for the assigned devices.
>>>>> 
>>>>> As of now we have not modified the existing VPCI code to map the guest PCI IOMEM region to the real physical IOMEM region. We used the existing guest “iomem” config option to map the region.
>>>>> For example:
>>>>>     Guest reserved IOMEM region:  0x04020000
>>>>>          Real physical IOMEM region:0x50000000
>>>>>          IOMEM size:128MB
>>>>>          iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
>>>>> 
>>>>> There is no need to map the ECAM space as XEN already have access to the ECAM space and XEN will trap ECAM accesses from the guest and will perform read/write on the VPCI bus.
>>>>> 
>>>>> IOMEM access will not be trapped and the guest will directly access the IOMEM region of the assigned device via stage-2 translation.
>>>>> 
>>>>> In the same, we mapped the assigned devices IRQ to the guest using below config options.
>>>>>     irqs= [ NUMBER, NUMBER, ...]
>>>>> 
>>>>> Limitation:
>>>>> * Need to avoid the “iomem” and “irq” guest config options and map the IOMEM region and IRQ at the same time when device is assigned to the guest using the “pci” guest config options when xl creates the domain.
>>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
>>>>> * X86 mapping code should be ported on Arm so that the stage-2 translation is adapted when the guest is doing a modification of the BAR registers values (to map the address requested by the guest for a specific IOMEM to the address actually contained in the real BAR register of the corresponding device).
>>>>> 
>>>>> # SMMU configuration for guest:
>>>>> 
>>>>> When assigning PCI devices to a guest, the SMMU configuration should be updated to remove access to the hardware domain memory and add
>>>>> configuration to have access to the guest memory with the proper address translation so that the device can do DMA operations from and to the guest memory only.
>>>> There are a few more questions to answer here:
>>>>    - When a guest is destroyed, who will be the owner of the PCI devices? Depending on the answer, how do you make sure the device is quiescent?
>> I would say the hardware domain if there is one otherwise nobody.
> 
> This is risky, in particular if your device is not quiescent (e.g because the reset failed). This would mean your device may be able to rewrite part of Dom0.

Agree. We should not reassign the device to Dom0 and always let is unassigned.
We will modify the design accordingly

> 
>> On the quiescent part this is definitely something for which I have no answer for now and any suggestion is more then welcome.
> 
> Usually you will have to reset a device, but I am not sure this can always work properly. Hence, I think assigning the PCI devices to nobody would be more sensible. Note this is what XSA-306 aimed to do on x86 (not yet implemented on Arm).

Ack

> 
>>>>    - Is there any memory access that can bypassed the IOMMU (e.g doorbell)?
>> This is still something to be investigated as part of the MSI implementation.
>> If you have any idea here, feel free to tell us.
> 
> My memory is a bit fuzzy here. I am sure that the doorbell can bypass the IOMMU on some platform, but I also vaguely remember that accesses to the PCI host controller memory window may also bypass the IOMMU. A good reading might be [2].
> 
> IIRC, I came to the conclusion that we may want to use the host memory map in the guest when using the PCI passthrough. But maybe not on all the platforms.

Definitely a lot of this would be easier if could use 1:1 mapping.
We will keep that in mind when we will start to investigate on the MSI part.

Cheers
Bertrand

> 
> Cheers,
> 
>>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg02520.html
> 
> [2] https://www.spinics.net/lists/kvm/msg140116.html
>>> 
>>> -- 
>>> Julien Grall
>>> 
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:30                         ` Roger Pau Monné
@ 2020-07-17 15:51                           ` Bertrand Marquis
  2020-07-17 16:08                             ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-17 15:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall



> On 17 Jul 2020, at 17:30, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
>> 
>> 
>>> On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> 
>>> On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
>>>> 
>>>> 
>>>>> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>> 
>>>>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> 
>>>>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>> 
>>>>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>>>>>> 
>>>>>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>>>>>>> 
>>>>>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>>>>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>>>>>>>> vpci="ecam" as unambiguous?
>>>>>>>>>> 
>>>>>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
>>>>>>>>> 
>>>>>>>>> I'm afraid I don't understand: When there are no PCI device that get
>>>>>>>>> handed to a guest when it gets created, but it is supposed to be able
>>>>>>>>> to have some assigned while already running, then we agree the option
>>>>>>>>> is needed (afaict). When PCI devices get handed to the guest while it
>>>>>>>>> gets constructed, where's the problem to infer this option from the
>>>>>>>>> presence of PCI devices in the guest configuration?
>>>>>>>> 
>>>>>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>>>>>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
>>>>>>> 
>>>>>>> That's what everyone looks to agree with. Yet why is the parameter needed
>>>>>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
>>>>>>> was suggesting, aiui.
>>>>>> 
>>>>>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
>>>>> 
>>>>> Where will the ECAM region(s) appear on the guest physmap?
>>>>> 
>>>>> Are you going to re-use the same locations as on the physical
>>>>> hardware, or will they appear somewhere else?
>>>> 
>>>> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
>>> 
>>> I think I'm confused, but that file doesn't contain anything related
>>> to the guest physmap, that's the Xen virtual memory layout on Arm
>>> AFAICT?
>>> 
>>> Does this somehow relate to the physical memory map exposed to guests
>>> on Arm?
>> 
>> Yes it does.
>> We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.
> 
> Yes, that's completely fine and is what's done on x86, but again I
> feel like I'm lost here, this is the Xen virtual memory map, how does
> this relate to the guest physical memory map?

Sorry my bad, we will add values in include/public/arch-arm.h, wrong header :-)

Bertrand


> 
> Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:21           ` Bertrand Marquis
@ 2020-07-17 15:55             ` Roger Pau Monné
  2020-07-18  9:49               ` Bertrand Marquis
  2020-07-20 23:24             ` Stefano Stabellini
  1 sibling, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 15:55 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall

On Fri, Jul 17, 2020 at 03:21:57PM +0000, Bertrand Marquis wrote:
> > On 17 Jul 2020, at 16:31, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote:
> >>> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>> * ACS capability is disable for ARM as of now as after enabling it
> >>>> devices are not accessible.
> >>>> * Dom0Less implementation will require to have the capacity inside Xen
> >>>> to discover the PCI devices (without depending on Dom0 to declare them
> >>>> to Xen).
> >>> 
> >>> I assume the firmware will properly initialize the host bridge and
> >>> configure the resources for each device, so that Xen just has to walk
> >>> the PCI space and find the devices.
> >>> 
> >>> TBH that would be my preferred method, because then you can get rid of
> >>> the hypercall.
> >>> 
> >>> Is there anyway for Xen to know whether the host bridge is properly
> >>> setup and thus the PCI bus can be scanned?
> >>> 
> >>> That way Arm could do something similar to x86, where Xen will scan
> >>> the bus and discover devices, but you could still provide the
> >>> hypercall in case the bus cannot be scanned by Xen (because it hasn't
> >>> been setup).
> >> 
> >> That is definitely the idea to rely by default on a firmware doing this properly.
> >> I am not sure wether a proper enumeration could be detected properly in all
> >> cases so it would make sens to rely on Dom0 enumeration when a Xen
> >> command line argument is passed as explained in one of Rahul’s mails.
> > 
> > I assume Linux somehow knows when it needs to initialize the PCI root
> > complex before attempting to access the bus. Would it be possible to
> > add this logic to Xen so it can figure out on it's own whether it's
> > safe to scan the PCI bus or whether it needs to wait for the hardware
> > domain to report the devices present?
> 
> That might be possible to do but will anyway require a command line argument
> to be able to force xen to let the hardware domain do the initialization anyway in
> case Xen detection does not work properly.
> In the case where there is a Dom0 i would more expect that we let it do the initialization
> all the time unless the user is telling using a command line argument that the current one
> is correct and shall be used.

FRT, on x86 we let dom0 enumerate and probe the PCI devices as it
feels like, but vPCI traps have already been set to all the detected
devices, and vPCI already supports letting dom0 size the BARs, or even
change it's position (theoretically, I haven't seen a dom0 change the
position of the BARs yet).

So on Arm you could also let dom0 do all of this, the question is
whether vPCI traps could be set earlier (when dom0 is created) if the
PCI bus has been initialized and can be scanned.

I have no idea however how bare metal Linux on Arm figures out the
state of the PCI bus, or if it's something that's passed on the DT, or
signaled somehow from the firmware/bootloader.

> >>> This should be limited to read-only accesses in order to be safe.
> >>> 
> >>> Emulating a PCI bridge in Xen using vPCI shouldn't be that
> >>> complicated, so you could likely replace the real bridges with
> >>> emulated ones. Or even provide a fake topology to the guest using an
> >>> emulated bridge.
> >> 
> >> Just showing all bridges and keeping the hardware topology is the simplest
> >> solution for now. But maybe showing a different topology and only fake
> >> bridges could make sense and be implemented in the future.
> > 
> > Ack. I've also heard rumors of Xen on Arm people being very interested
> > in VirtIO support, in which case you might expose both fully emulated
> > VirtIO devices and PCI passthrough devices on the PCI bus, so it would
> > be good to spend some time thinking how those will fit together.
> > 
> > Will you allocate a separate segment unused by hardware to expose the
> > fully emulated PCI devices (VirtIO)?
> > 
> > Will OSes support having several segments?
> > 
> > If not you likely need to have emulated bridges so that you can adjust
> > the bridge window accordingly to fit the passthrough and the emulated
> > MMIO space, and likely be able to expose passthrough devices using a
> > different topology than the host one.
> 
> Honestly this is not something we considered. I was more thinking that
> this use case would be handled by creating an other VPCI bus dedicated
> to those kind of devices instead of mixing physical and virtual devices.

Just mentioning it and your plans when guests might also have fully
emulated devices on the PCI bus would be relevant I think.

Anyway, I don't think it's something mandatory here, as from a guest
PoV how we expose PCI devices shouldn't matter that much, as long as
it's done in a spec compliant way.

So you can start with this approach if it's easier, I just wanted to
make sure you have in mind that at some point Arm guests might also
require fully emulated PCI devices so that you don't paint yourselves
in a corner.

> > 
> >>> 
> >>>> 
> >>>> # Emulated PCI device tree node in libxl:
> >>>> 
> >>>> Libxl is creating a virtual PCI device tree node in the device tree
> >>>> to enable the guest OS to discover the virtual PCI during guest
> >>>> boot. We introduced the new config option [vpci="pci_ecam"] for
> >>>> guests. When this config option is enabled in a guest configuration,
> >>>> a PCI device tree node will be created in the guest device tree.
> >>>> 
> >>>> A new area has been reserved in the arm guest physical map at which
> >>>> the VPCI bus is declared in the device tree (reg and ranges
> >>>> parameters of the node). A trap handler for the PCI ECAM access from
> >>>> guest has been registered at the defined address and redirects
> >>>> requests to the VPCI driver in Xen.
> >>> 
> >>> Can't you deduce the requirement of such DT node based on the presence
> >>> of a 'pci=' option in the same config file?
> >>> 
> >>> Also I wouldn't discard that in the future you might want to use
> >>> different emulators for different devices, so it might be helpful to
> >>> introduce something like:
> >>> 
> >>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
> >>> 
> >>> For the time being Arm will require backend=vpci for all the passed
> >>> through devices, but I wouldn't rule out this changing in the future.
> >> 
> >> We need it for the case where no device is declared in the config file and the user
> >> wants to add devices using xl later. In this case we must have the DT node for it
> >> to work. 
> > 
> > There's a passthrough xl.cfg option for that already, so that if you
> > don't want to add any PCI passthrough devices at creation time but
> > rather hotplug them you can set:
> > 
> > passthrough=enabled
> > 
> > And it should setup the domain to be prepared to support hot
> > passthrough, including the IOMMU [0].
> 
> Isn’t this option covering more then PCI passthrough ?
> 
> Lots of Arm platform do not have a PCI bus at all, so for those
> creating a VPCI bus would be pointless. But you might need to
> activate this to pass devices which are not on the PCI bus.

Well, you can check whether the host has PCI support and decide
whether to attach a virtual PCI bus to the guest or not?

Setting passthrough=enabled should prepare the guest to handle
passthrough, in whatever form is supported by the host IMO.

> >>>> Limitation:
> >>>> * Need to avoid the “iomem” and “irq” guest config
> >>>> options and map the IOMEM region and IRQ at the same time when
> >>>> device is assigned to the guest using the “pci” guest config options
> >>>> when xl creates the domain.
> >>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
> >>>> address.
> >>> 
> >>> It was my understanding that you would identity map the BAR into the
> >>> domU stage-2 translation, and that changes by the guest won't be
> >>> allowed.
> >> 
> >> In fact this is not possible to do and we have to remap at a different address
> >> because the guest physical mapping is fixed by Xen on Arm so we must follow
> >> the same design otherwise this would only work if the BARs are pointing to an
> >> address unused and on Juno this is for example conflicting with the guest
> >> RAM address.
> > 
> > This was not clear from my reading of the document, could you please
> > clarify on the next version that the guest physical memory map is
> > always the same, and that BARs from PCI devices cannot be identity
> > mapped to the stage-2 translation and instead are relocated somewhere
> > else?
> 
> We will.
> 
> > 
> > I'm then confused about what you do with bridge windows, do you also
> > trap and adjust them to report a different IOMEM region?
> 
> Yes this is what we will have to do so that the regions reflect the VPCI mappings
> and not the hardware one.
> 
> > 
> > Above you mentioned that read-only access was given to bridge
> > registers, but I guess some are also emulated in order to report
> > matching IOMEM regions?
> 
> yes that’s exact. We will clear this in the next version.

If you have to go this route for domUs, it might be easier to just
fake a PCI host bridge and place all the devices there even with
different SBDF addresses. Having to replicate all the bridges on the
physical PCI bus and fixing up it's MMIO windows seems much more
complicated than just faking/emulating a single bridge?

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 15:47           ` Bertrand Marquis
@ 2020-07-17 16:05             ` Roger Pau Monné
  2020-07-18  9:55               ` Bertrand Marquis
  2020-07-18 11:32               ` Julien Grall
  2020-07-18 11:08             ` Julien Grall
  1 sibling, 2 replies; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 16:05 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Julien Grall

On Fri, Jul 17, 2020 at 03:47:25PM +0000, Bertrand Marquis wrote:
> > On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
> > On 17/07/2020 15:47, Bertrand Marquis wrote:
> >>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
> >>>>> 
> >>>>> # Enable the existing x86 virtual PCI support for ARM:
> >>>>> 
> >>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
> >>>>> 
> >>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
> >>>>> 
> >>>>> Limitation:
> >>>>> * No handler is register for the MSI configuration.
> >>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
> >>>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?
> >> We plan to fix this by adding proper support for MSI in the long term.
> >> For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.
> > 
> > Sharing interrupts are a bit pain because you couldn't take advantage of the direct EOI in HW and have to be careful if one guest doesn't EOI in timely maneer.
> > 
> > This is something I would rather avoid unless there is a real use case for it.
> 
> I would expect that most recent hardware will support MSI and this
> will not be needed.

Well, PCI Express mandates MSI support, so while this is just a spec,
I would expect most (if not all) devices to support MSI (or MSI-X), as
Arm platforms haven't implemented legacy PCI anyway.

> When MSI is not used, the only solution would be to enforce that
> devices assigned to different guest are using different interrupts
> which would limit the number of domains being able to use PCI
> devices on a bus to 4 (if the enumeration can be modified correctly
> to assign the interrupts properly).
> 
> If we all agree that this is an acceptable limitation then we would
> not need the “interrupt sharing”.

I might be easier to start by just supporting devices that have MSI
(or MSI-X) and then move to legacy interrupts if required?

You should have most of the pieces you require already implemented
since that's what x86 uses, and hence could reuse almost all of it?

IIRC Julien even said that Arm was likely to require much less traps
than x86 for accesses to MSI and MSI-X since you could allow untrusted
guests to write directly to the registers as there's another piece of
hardware that would already translate the interrupts?

I think it's fine to use this workaround while you don't have MSI
support in order to start testing and upstreaming stuff, but maybe
that shouldn't be committed?

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:51                           ` Bertrand Marquis
@ 2020-07-17 16:08                             ` Roger Pau Monné
  2020-07-17 16:18                               ` Julien Grall
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-17 16:08 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall

On Fri, Jul 17, 2020 at 03:51:47PM +0000, Bertrand Marquis wrote:
> 
> 
> > On 17 Jul 2020, at 17:30, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
> >> 
> >> 
> >>> On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>> 
> >>> On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
> >>>> 
> >>>> 
> >>>>> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>>> 
> >>>>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>> 
> >>>>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>> 
> >>>>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
> >>>>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
> >>>>>>>>>>>> # Emulated PCI device tree node in libxl:
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> >>>>>>>>>>> 
> >>>>>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
> >>>>>>>>>>> there to be no need for it when there are PCI devices assigned to the
> >>>>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
> >>>>>>>>>>> vpci="ecam" as unambiguous?
> >>>>>>>>>> 
> >>>>>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl. 
> >>>>>>>>> 
> >>>>>>>>> I'm afraid I don't understand: When there are no PCI device that get
> >>>>>>>>> handed to a guest when it gets created, but it is supposed to be able
> >>>>>>>>> to have some assigned while already running, then we agree the option
> >>>>>>>>> is needed (afaict). When PCI devices get handed to the guest while it
> >>>>>>>>> gets constructed, where's the problem to infer this option from the
> >>>>>>>>> presence of PCI devices in the guest configuration?
> >>>>>>>> 
> >>>>>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> >>>>>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
> >>>>>>> 
> >>>>>>> That's what everyone looks to agree with. Yet why is the parameter needed
> >>>>>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
> >>>>>>> was suggesting, aiui.
> >>>>>> 
> >>>>>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
> >>>>> 
> >>>>> Where will the ECAM region(s) appear on the guest physmap?
> >>>>> 
> >>>>> Are you going to re-use the same locations as on the physical
> >>>>> hardware, or will they appear somewhere else?
> >>>> 
> >>>> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
> >>> 
> >>> I think I'm confused, but that file doesn't contain anything related
> >>> to the guest physmap, that's the Xen virtual memory layout on Arm
> >>> AFAICT?
> >>> 
> >>> Does this somehow relate to the physical memory map exposed to guests
> >>> on Arm?
> >> 
> >> Yes it does.
> >> We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.
> > 
> > Yes, that's completely fine and is what's done on x86, but again I
> > feel like I'm lost here, this is the Xen virtual memory map, how does
> > this relate to the guest physical memory map?
> 
> Sorry my bad, we will add values in include/public/arch-arm.h, wrong header :-)

Oh right, now I see it :).

Do you really need to specify the ECAM and MMIO regions there?

Wouldn't it be enough to specify the ECAM regions on the DT or the
ACPI MCFG table and get the MMIO regions directly from the BARs of the
devices?

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 16:08                             ` Roger Pau Monné
@ 2020-07-17 16:18                               ` Julien Grall
  2020-07-17 19:17                                 ` Oleksandr
  2020-07-20  8:47                                 ` Roger Pau Monné
  0 siblings, 2 replies; 62+ messages in thread
From: Julien Grall @ 2020-07-17 16:18 UTC (permalink / raw)
  To: Roger Pau Monné, Bertrand Marquis
  Cc: Rahul Singh, Stefano Stabellini, Jan Beulich, xen-devel, nd,
	Julien Grall



On 17/07/2020 17:08, Roger Pau Monné wrote:
> On Fri, Jul 17, 2020 at 03:51:47PM +0000, Bertrand Marquis wrote:
>>
>>
>>> On 17 Jul 2020, at 17:30, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>
>>> On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
>>>>
>>>>
>>>>> On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>
>>>>> On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
>>>>>>
>>>>>>
>>>>>>> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>>>
>>>>>>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>
>>>>>>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>>>>>>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>>>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>>>>>>>>>> vpci="ecam" as unambiguous?
>>>>>>>>>>>>
>>>>>>>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl.
>>>>>>>>>>>
>>>>>>>>>>> I'm afraid I don't understand: When there are no PCI device that get
>>>>>>>>>>> handed to a guest when it gets created, but it is supposed to be able
>>>>>>>>>>> to have some assigned while already running, then we agree the option
>>>>>>>>>>> is needed (afaict). When PCI devices get handed to the guest while it
>>>>>>>>>>> gets constructed, where's the problem to infer this option from the
>>>>>>>>>>> presence of PCI devices in the guest configuration?
>>>>>>>>>>
>>>>>>>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>>>>>>>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
>>>>>>>>>
>>>>>>>>> That's what everyone looks to agree with. Yet why is the parameter needed
>>>>>>>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
>>>>>>>>> was suggesting, aiui.
>>>>>>>>
>>>>>>>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
>>>>>>>
>>>>>>> Where will the ECAM region(s) appear on the guest physmap?
>>>>>>>
>>>>>>> Are you going to re-use the same locations as on the physical
>>>>>>> hardware, or will they appear somewhere else?
>>>>>>
>>>>>> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
>>>>>
>>>>> I think I'm confused, but that file doesn't contain anything related
>>>>> to the guest physmap, that's the Xen virtual memory layout on Arm
>>>>> AFAICT?
>>>>>
>>>>> Does this somehow relate to the physical memory map exposed to guests
>>>>> on Arm?
>>>>
>>>> Yes it does.
>>>> We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.
>>>
>>> Yes, that's completely fine and is what's done on x86, but again I
>>> feel like I'm lost here, this is the Xen virtual memory map, how does
>>> this relate to the guest physical memory map?
>>
>> Sorry my bad, we will add values in include/public/arch-arm.h, wrong header :-)
> 
> Oh right, now I see it :).
> 
> Do you really need to specify the ECAM and MMIO regions there?

You need to define those values somewhere :). The layout is only shared 
between the tools and the hypervisor. I think it would be better if they 
are defined at the same place as the rest of the layout, so it is easier 
to rework the layout.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 16:18                               ` Julien Grall
@ 2020-07-17 19:17                                 ` Oleksandr
  2020-07-18  9:58                                   ` Bertrand Marquis
  2020-07-18 11:24                                   ` Julien Grall
  2020-07-20  8:47                                 ` Roger Pau Monné
  1 sibling, 2 replies; 62+ messages in thread
From: Oleksandr @ 2020-07-17 19:17 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: nd, Stefano Stabellini, Julien Grall, Julien Grall, Jan Beulich,
	xen-devel, Rahul Singh, Roger Pau Monné


On 17.07.20 19:18, Julien Grall wrote:

Hello Bertrand

[two threads with the same name are shown in my mail client, so not 
completely sure I am asking in the correct one]

>
>
> On 17/07/2020 17:08, Roger Pau Monné wrote:
>> On Fri, Jul 17, 2020 at 03:51:47PM +0000, Bertrand Marquis wrote:
>>>
>>>
>>>> On 17 Jul 2020, at 17:30, Roger Pau Monné <roger.pau@citrix.com> 
>>>> wrote:
>>>>
>>>> On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
>>>>>
>>>>>
>>>>>> On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> 
>>>>>> wrote:
>>>>>>
>>>>>> On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 17 Jul 2020, at 16:41, Roger Pau Monné 
>>>>>>>> <roger.pau@citrix.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Libxl is creating a virtual PCI device tree node in the 
>>>>>>>>>>>>>>> device tree to enable the guest OS to discover the 
>>>>>>>>>>>>>>> virtual PCI during guest boot. We introduced the new 
>>>>>>>>>>>>>>> config option [vpci="pci_ecam"] for guests. When this 
>>>>>>>>>>>>>>> config option is enabled in a guest configuration, a PCI 
>>>>>>>>>>>>>>> device tree node will be created in the guest device tree.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I support Stefano's suggestion for this to be an optional 
>>>>>>>>>>>>>> thing, i.e.
>>>>>>>>>>>>>> there to be no need for it when there are PCI devices 
>>>>>>>>>>>>>> assigned to the
>>>>>>>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - 
>>>>>>>>>>>>>> isn't
>>>>>>>>>>>>>> vpci="ecam" as unambiguous?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This could be a problem as we need to know that this is 
>>>>>>>>>>>>> required for a guest upfront so that PCI devices can be 
>>>>>>>>>>>>> assigned after using xl.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm afraid I don't understand: When there are no PCI device 
>>>>>>>>>>>> that get
>>>>>>>>>>>> handed to a guest when it gets created, but it is supposed 
>>>>>>>>>>>> to be able
>>>>>>>>>>>> to have some assigned while already running, then we agree 
>>>>>>>>>>>> the option
>>>>>>>>>>>> is needed (afaict). When PCI devices get handed to the 
>>>>>>>>>>>> guest while it
>>>>>>>>>>>> gets constructed, where's the problem to infer this option 
>>>>>>>>>>>> from the
>>>>>>>>>>>> presence of PCI devices in the guest configuration?
>>>>>>>>>>>
>>>>>>>>>>> If the user wants to use xl pci-attach to attach in runtime 
>>>>>>>>>>> a device to a guest, this guest must have a VPCI bus (even 
>>>>>>>>>>> with no devices).
>>>>>>>>>>> If we do not have the vpci parameter in the configuration 
>>>>>>>>>>> this use case will not work anymore.
>>>>>>>>>>
>>>>>>>>>> That's what everyone looks to agree with. Yet why is the 
>>>>>>>>>> parameter needed
>>>>>>>>>> when there _are_ PCI devices anyway? That's the "optional" 
>>>>>>>>>> that Stefano
>>>>>>>>>> was suggesting, aiui.
>>>>>>>>>
>>>>>>>>> I agree in this case the parameter could be optional and only 
>>>>>>>>> required if not PCI device is assigned directly in the guest 
>>>>>>>>> configuration.
>>>>>>>>
>>>>>>>> Where will the ECAM region(s) appear on the guest physmap?
>>>>>>>>
>>>>>>>> Are you going to re-use the same locations as on the physical
>>>>>>>> hardware, or will they appear somewhere else?
>>>>>>>
>>>>>>> We will add some new definitions for the ECAM regions in the 
>>>>>>> guest physmap declared in xen (include/asm-arm/config.h)
>>>>>>
>>>>>> I think I'm confused, but that file doesn't contain anything related
>>>>>> to the guest physmap, that's the Xen virtual memory layout on Arm
>>>>>> AFAICT?
>>>>>>
>>>>>> Does this somehow relate to the physical memory map exposed to 
>>>>>> guests
>>>>>> on Arm?
>>>>>
>>>>> Yes it does.
>>>>> We will add new definitions there related to VPCI to reserve some 
>>>>> areas for the VPCI ECAM and the IOMEM areas.
>>>>
>>>> Yes, that's completely fine and is what's done on x86, but again I
>>>> feel like I'm lost here, this is the Xen virtual memory map, how does
>>>> this relate to the guest physical memory map?
>>>
>>> Sorry my bad, we will add values in include/public/arch-arm.h, wrong 
>>> header :-)
>>
>> Oh right, now I see it :).
>>
>> Do you really need to specify the ECAM and MMIO regions there?
>
> You need to define those values somewhere :). The layout is only 
> shared between the tools and the hypervisor. I think it would be 
> better if they are defined at the same place as the rest of the 
> layout, so it is easier to rework the layout.
>
> Cheers,


I would like to clarify regarding an IOMMU driver changes which should 
be done to support PCI pass-through properly.

Design document mentions about SMMU, but Xen also supports IPMMU-VMSA 
(under tech preview now). It would be really nice if the required 
support is extended to that kind of IOMMU as well.

May I clarify what should be implemented in the Xen driver in order to 
support PCI pass-through feature on Arm? Should the IOMMU H/W be 
"PCI-aware" for that purpose?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:55             ` Roger Pau Monné
@ 2020-07-18  9:49               ` Bertrand Marquis
  2020-07-20  8:45                 ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-18  9:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall



> On 17 Jul 2020, at 17:55, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Fri, Jul 17, 2020 at 03:21:57PM +0000, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 16:31, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote:
>>>>> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>> * ACS capability is disable for ARM as of now as after enabling it
>>>>>> devices are not accessible.
>>>>>> * Dom0Less implementation will require to have the capacity inside Xen
>>>>>> to discover the PCI devices (without depending on Dom0 to declare them
>>>>>> to Xen).
>>>>> 
>>>>> I assume the firmware will properly initialize the host bridge and
>>>>> configure the resources for each device, so that Xen just has to walk
>>>>> the PCI space and find the devices.
>>>>> 
>>>>> TBH that would be my preferred method, because then you can get rid of
>>>>> the hypercall.
>>>>> 
>>>>> Is there anyway for Xen to know whether the host bridge is properly
>>>>> setup and thus the PCI bus can be scanned?
>>>>> 
>>>>> That way Arm could do something similar to x86, where Xen will scan
>>>>> the bus and discover devices, but you could still provide the
>>>>> hypercall in case the bus cannot be scanned by Xen (because it hasn't
>>>>> been setup).
>>>> 
>>>> That is definitely the idea to rely by default on a firmware doing this properly.
>>>> I am not sure wether a proper enumeration could be detected properly in all
>>>> cases so it would make sens to rely on Dom0 enumeration when a Xen
>>>> command line argument is passed as explained in one of Rahul’s mails.
>>> 
>>> I assume Linux somehow knows when it needs to initialize the PCI root
>>> complex before attempting to access the bus. Would it be possible to
>>> add this logic to Xen so it can figure out on it's own whether it's
>>> safe to scan the PCI bus or whether it needs to wait for the hardware
>>> domain to report the devices present?
>> 
>> That might be possible to do but will anyway require a command line argument
>> to be able to force xen to let the hardware domain do the initialization anyway in
>> case Xen detection does not work properly.
>> In the case where there is a Dom0 i would more expect that we let it do the initialization
>> all the time unless the user is telling using a command line argument that the current one
>> is correct and shall be used.
> 
> FRT, on x86 we let dom0 enumerate and probe the PCI devices as it
> feels like, but vPCI traps have already been set to all the detected
> devices, and vPCI already supports letting dom0 size the BARs, or even
> change it's position (theoretically, I haven't seen a dom0 change the
> position of the BARs yet).
> 
> So on Arm you could also let dom0 do all of this, the question is
> whether vPCI traps could be set earlier (when dom0 is created) if the
> PCI bus has been initialized and can be scanned.
> 
> I have no idea however how bare metal Linux on Arm figures out the
> state of the PCI bus, or if it's something that's passed on the DT, or
> signaled somehow from the firmware/bootloader.

This is definitely something we will check and we will also try to keep the same
behaviour as x86 unless this is not possible. I would not see why we could not 
set the vPCI traps earlier and just relay the writes to the hardware but detect
if BARs are changed.

> 
>>>>> This should be limited to read-only accesses in order to be safe.
>>>>> 
>>>>> Emulating a PCI bridge in Xen using vPCI shouldn't be that
>>>>> complicated, so you could likely replace the real bridges with
>>>>> emulated ones. Or even provide a fake topology to the guest using an
>>>>> emulated bridge.
>>>> 
>>>> Just showing all bridges and keeping the hardware topology is the simplest
>>>> solution for now. But maybe showing a different topology and only fake
>>>> bridges could make sense and be implemented in the future.
>>> 
>>> Ack. I've also heard rumors of Xen on Arm people being very interested
>>> in VirtIO support, in which case you might expose both fully emulated
>>> VirtIO devices and PCI passthrough devices on the PCI bus, so it would
>>> be good to spend some time thinking how those will fit together.
>>> 
>>> Will you allocate a separate segment unused by hardware to expose the
>>> fully emulated PCI devices (VirtIO)?
>>> 
>>> Will OSes support having several segments?
>>> 
>>> If not you likely need to have emulated bridges so that you can adjust
>>> the bridge window accordingly to fit the passthrough and the emulated
>>> MMIO space, and likely be able to expose passthrough devices using a
>>> different topology than the host one.
>> 
>> Honestly this is not something we considered. I was more thinking that
>> this use case would be handled by creating an other VPCI bus dedicated
>> to those kind of devices instead of mixing physical and virtual devices.
> 
> Just mentioning it and your plans when guests might also have fully
> emulated devices on the PCI bus would be relevant I think.

We will add this.

> 
> Anyway, I don't think it's something mandatory here, as from a guest
> PoV how we expose PCI devices shouldn't matter that much, as long as
> it's done in a spec compliant way.
> 
> So you can start with this approach if it's easier, I just wanted to
> make sure you have in mind that at some point Arm guests might also
> require fully emulated PCI devices so that you don't paint yourselves
> in a corner.

Definitely that’s not something we did think of and thanks for the remark
as we need to keep this in mind.

> 
>>> 
>>>>> 
>>>>>> 
>>>>>> # Emulated PCI device tree node in libxl:
>>>>>> 
>>>>>> Libxl is creating a virtual PCI device tree node in the device tree
>>>>>> to enable the guest OS to discover the virtual PCI during guest
>>>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>>>>> guests. When this config option is enabled in a guest configuration,
>>>>>> a PCI device tree node will be created in the guest device tree.
>>>>>> 
>>>>>> A new area has been reserved in the arm guest physical map at which
>>>>>> the VPCI bus is declared in the device tree (reg and ranges
>>>>>> parameters of the node). A trap handler for the PCI ECAM access from
>>>>>> guest has been registered at the defined address and redirects
>>>>>> requests to the VPCI driver in Xen.
>>>>> 
>>>>> Can't you deduce the requirement of such DT node based on the presence
>>>>> of a 'pci=' option in the same config file?
>>>>> 
>>>>> Also I wouldn't discard that in the future you might want to use
>>>>> different emulators for different devices, so it might be helpful to
>>>>> introduce something like:
>>>>> 
>>>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
>>>>> 
>>>>> For the time being Arm will require backend=vpci for all the passed
>>>>> through devices, but I wouldn't rule out this changing in the future.
>>>> 
>>>> We need it for the case where no device is declared in the config file and the user
>>>> wants to add devices using xl later. In this case we must have the DT node for it
>>>> to work. 
>>> 
>>> There's a passthrough xl.cfg option for that already, so that if you
>>> don't want to add any PCI passthrough devices at creation time but
>>> rather hotplug them you can set:
>>> 
>>> passthrough=enabled
>>> 
>>> And it should setup the domain to be prepared to support hot
>>> passthrough, including the IOMMU [0].
>> 
>> Isn’t this option covering more then PCI passthrough ?
>> 
>> Lots of Arm platform do not have a PCI bus at all, so for those
>> creating a VPCI bus would be pointless. But you might need to
>> activate this to pass devices which are not on the PCI bus.
> 
> Well, you can check whether the host has PCI support and decide
> whether to attach a virtual PCI bus to the guest or not?
> 
> Setting passthrough=enabled should prepare the guest to handle
> passthrough, in whatever form is supported by the host IMO.

True, we could just say that we create a PCI bus if the host has one and
passthrough is activated.
But with virtual device point, we might even need one on guest without
PCI support on the hardware :-)

> 
>>>>>> Limitation:
>>>>>> * Need to avoid the “iomem” and “irq” guest config
>>>>>> options and map the IOMEM region and IRQ at the same time when
>>>>>> device is assigned to the guest using the “pci” guest config options
>>>>>> when xl creates the domain.
>>>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
>>>>>> address.
>>>>> 
>>>>> It was my understanding that you would identity map the BAR into the
>>>>> domU stage-2 translation, and that changes by the guest won't be
>>>>> allowed.
>>>> 
>>>> In fact this is not possible to do and we have to remap at a different address
>>>> because the guest physical mapping is fixed by Xen on Arm so we must follow
>>>> the same design otherwise this would only work if the BARs are pointing to an
>>>> address unused and on Juno this is for example conflicting with the guest
>>>> RAM address.
>>> 
>>> This was not clear from my reading of the document, could you please
>>> clarify on the next version that the guest physical memory map is
>>> always the same, and that BARs from PCI devices cannot be identity
>>> mapped to the stage-2 translation and instead are relocated somewhere
>>> else?
>> 
>> We will.
>> 
>>> 
>>> I'm then confused about what you do with bridge windows, do you also
>>> trap and adjust them to report a different IOMEM region?
>> 
>> Yes this is what we will have to do so that the regions reflect the VPCI mappings
>> and not the hardware one.
>> 
>>> 
>>> Above you mentioned that read-only access was given to bridge
>>> registers, but I guess some are also emulated in order to report
>>> matching IOMEM regions?
>> 
>> yes that’s exact. We will clear this in the next version.
> 
> If you have to go this route for domUs, it might be easier to just
> fake a PCI host bridge and place all the devices there even with
> different SBDF addresses. Having to replicate all the bridges on the
> physical PCI bus and fixing up it's MMIO windows seems much more
> complicated than just faking/emulating a single bridge?

That’s definitely something we have to dig more on. The whole problematic
of PCI enumeration and BAR value assignation in Xen might be pushed to
either Dom0 or the firmware but we might in fact find ourself with exactly the
same problem on the VPCI bus.

Bertrand

> 
> Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 16:05             ` Roger Pau Monné
@ 2020-07-18  9:55               ` Bertrand Marquis
  2020-07-18 11:14                 ` Julien Grall
  2020-07-18 11:32               ` Julien Grall
  1 sibling, 1 reply; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-18  9:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, xen-devel, nd,
	Julien Grall



> On 17 Jul 2020, at 18:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Fri, Jul 17, 2020 at 03:47:25PM +0000, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
>>> On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>>>>>> 
>>>>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>>>>> 
>>>>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>>>>> 
>>>>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>>>>>>> 
>>>>>>> Limitation:
>>>>>>> * No handler is register for the MSI configuration.
>>>>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>>>>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?
>>>> We plan to fix this by adding proper support for MSI in the long term.
>>>> For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.
>>> 
>>> Sharing interrupts are a bit pain because you couldn't take advantage of the direct EOI in HW and have to be careful if one guest doesn't EOI in timely maneer.
>>> 
>>> This is something I would rather avoid unless there is a real use case for it.
>> 
>> I would expect that most recent hardware will support MSI and this
>> will not be needed.
> 
> Well, PCI Express mandates MSI support, so while this is just a spec,
> I would expect most (if not all) devices to support MSI (or MSI-X), as
> Arm platforms haven't implemented legacy PCI anyway.

Yes that’s our assumption to. But we have to start somewhere so MSI is
planned but in a future step. I would think that supporting non MSI if not
impossible will be a lot more complex due to the interrupt sharing.
I do think that not supporting non MSI should be ok on Arm.

> 
>> When MSI is not used, the only solution would be to enforce that
>> devices assigned to different guest are using different interrupts
>> which would limit the number of domains being able to use PCI
>> devices on a bus to 4 (if the enumeration can be modified correctly
>> to assign the interrupts properly).
>> 
>> If we all agree that this is an acceptable limitation then we would
>> not need the “interrupt sharing”.
> 
> I might be easier to start by just supporting devices that have MSI
> (or MSI-X) and then move to legacy interrupts if required?

MSI support requires also some support in the interrupt controller part
on arm. So there is some work to achieve that.

> 
> You should have most of the pieces you require already implemented
> since that's what x86 uses, and hence could reuse almost all of it?

Inside PCI probably but the GIC part will require some work.

> 
> IIRC Julien even said that Arm was likely to require much less traps
> than x86 for accesses to MSI and MSI-X since you could allow untrusted
> guests to write directly to the registers as there's another piece of
> hardware that would already translate the interrupts?

Yes this is definitely the case. The ITS part of the GIC interrupt controller
will help a lot and reduce the number of traps.

> 
> I think it's fine to use this workaround while you don't have MSI
> support in order to start testing and upstreaming stuff, but maybe
> that shouldn't be committed?

That was definitely not our plan to commit the code without MSI.
But as requested during the Xen Summit, we try to publish some code
for an RFC and a design early to get comment from the community and
we try to do that with something working, even partially and with lots of
limitations.

Bertrand


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 19:17                                 ` Oleksandr
@ 2020-07-18  9:58                                   ` Bertrand Marquis
  2020-07-18 11:24                                   ` Julien Grall
  1 sibling, 0 replies; 62+ messages in thread
From: Bertrand Marquis @ 2020-07-18  9:58 UTC (permalink / raw)
  To: Oleksandr
  Cc: nd, Stefano Stabellini, Julien Grall, Julien Grall, Jan Beulich,
	xen-devel, Rahul Singh, Roger Pau Monné



> On 17 Jul 2020, at 21:17, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> On 17.07.20 19:18, Julien Grall wrote:
> 
> Hello Bertrand
> 
> [two threads with the same name are shown in my mail client, so not completely sure I am asking in the correct one]
> 
>> 
>> 
>> On 17/07/2020 17:08, Roger Pau Monné wrote:
>>> On Fri, Jul 17, 2020 at 03:51:47PM +0000, Bertrand Marquis wrote:
>>>> 
>>>> 
>>>>> On 17 Jul 2020, at 17:30, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>> 
>>>>> On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>>>>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>>>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>>>>>>>>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>>>>>>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>>>>>>>>>>>> vpci="ecam" as unambiguous?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm afraid I don't understand: When there are no PCI device that get
>>>>>>>>>>>>> handed to a guest when it gets created, but it is supposed to be able
>>>>>>>>>>>>> to have some assigned while already running, then we agree the option
>>>>>>>>>>>>> is needed (afaict). When PCI devices get handed to the guest while it
>>>>>>>>>>>>> gets constructed, where's the problem to infer this option from the
>>>>>>>>>>>>> presence of PCI devices in the guest configuration?
>>>>>>>>>>>> 
>>>>>>>>>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>>>>>>>>>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
>>>>>>>>>>> 
>>>>>>>>>>> That's what everyone looks to agree with. Yet why is the parameter needed
>>>>>>>>>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
>>>>>>>>>>> was suggesting, aiui.
>>>>>>>>>> 
>>>>>>>>>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
>>>>>>>>> 
>>>>>>>>> Where will the ECAM region(s) appear on the guest physmap?
>>>>>>>>> 
>>>>>>>>> Are you going to re-use the same locations as on the physical
>>>>>>>>> hardware, or will they appear somewhere else?
>>>>>>>> 
>>>>>>>> We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
>>>>>>> 
>>>>>>> I think I'm confused, but that file doesn't contain anything related
>>>>>>> to the guest physmap, that's the Xen virtual memory layout on Arm
>>>>>>> AFAICT?
>>>>>>> 
>>>>>>> Does this somehow relate to the physical memory map exposed to guests
>>>>>>> on Arm?
>>>>>> 
>>>>>> Yes it does.
>>>>>> We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.
>>>>> 
>>>>> Yes, that's completely fine and is what's done on x86, but again I
>>>>> feel like I'm lost here, this is the Xen virtual memory map, how does
>>>>> this relate to the guest physical memory map?
>>>> 
>>>> Sorry my bad, we will add values in include/public/arch-arm.h, wrong header :-)
>>> 
>>> Oh right, now I see it :).
>>> 
>>> Do you really need to specify the ECAM and MMIO regions there?
>> 
>> You need to define those values somewhere :). The layout is only shared between the tools and the hypervisor. I think it would be better if they are defined at the same place as the rest of the layout, so it is easier to rework the layout.
>> 
>> Cheers,
> 
> 
> I would like to clarify regarding an IOMMU driver changes which should be done to support PCI pass-through properly.
> 
> Design document mentions about SMMU, but Xen also supports IPMMU-VMSA (under tech preview now). It would be really nice if the required support is extended to that kind of IOMMU as well.

We will try to make the code as generic as possible. For now SMMU is the only hardware we have (and is a standard arm one) so we will start with that.
But we are welcoming others to improve and add support for more different hardware.

> 
> May I clarify what should be implemented in the Xen driver in order to support PCI pass-through feature on Arm? Should the IOMMU H/W be "PCI-aware" for that purpose?

We are still not on the SMMU implementation part but should be our next step.
Feel free to explain us what would be required so that we can take that into account.

Regards
Bertrand


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 15:47           ` Bertrand Marquis
  2020-07-17 16:05             ` Roger Pau Monné
@ 2020-07-18 11:08             ` Julien Grall
  2020-07-20 11:26               ` Rahul Singh
  1 sibling, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-18 11:08 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall

Hi,

On 17/07/2020 16:47, Bertrand Marquis wrote:
>> On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
>> On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>>>>      pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>>>>
>>>>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
>>>>>>
>>>>>> Limitation:
>>>>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
>>>>> Why do you want to expose all the bridges to a guest? Does this mean that the BDF should always match between the host and the guest?
>>> That’s not really something that we wanted but this was the easiest way to go.
>>> As said in a previous mail we could build a VPCI bus with a completely different topology but I am not sure of the advantages this would have.
>>> Do you see some reason to do this ?
>>
>> Yes :):
>>   1) If a platform has two host controllers (IIRC Thunder-X has it) then you would need to expose two host controllers to your guest. I think this is undesirable if your guest is only using a couple of PCI devices on each host controllers.
>>   2) In the case of migration (live or not), you may want to use a difference PCI card on the target platform. So your BDF and bridges may be different.
>>
>> Therefore I think the virtual topology can be beneficial.
> 
> I would see a big advantage definitely to have only one VPCI bus per guest and put all devices in their independently of the hardware domain the device is on.
> But this will probably make the VPCI BARs value computation a bit more complex as we might end up with no space on the guest physical map for it.
> This might make the implementation a lot more complex.

I am not sure to understand your argument about the space... You should 
be able to find out the size of each BARs, so you can size the MMIO 
window correctly. This shouldn't add a lot of complexity.

I am not asking any implementation for this, but we need to make sure 
the design can easily be extended for other use cases. In the case of 
server, we will likely want to expose a single vPCI to the guest.

>>
>>>>>     - Is there any memory access that can bypassed the IOMMU (e.g doorbell)?
>>> This is still something to be investigated as part of the MSI implementation.
>>> If you have any idea here, feel free to tell us.
>>
>> My memory is a bit fuzzy here. I am sure that the doorbell can bypass the IOMMU on some platform, but I also vaguely remember that accesses to the PCI host controller memory window may also bypass the IOMMU. A good reading might be [2].
>>
>> IIRC, I came to the conclusion that we may want to use the host memory map in the guest when using the PCI passthrough. But maybe not on all the platforms.
> 
> Definitely a lot of this would be easier if could use 1:1 mapping.
> We will keep that in mind when we will start to investigate on the MSI part.

Hmmm... Maybe I wasn't clear enough but the problem is not only 
happening with MSIs doorbells. It is also with the P2P transactions.

Again, I am not asking to implement it at the beginning. However, it 
would be good to outline the potential limitations of the approach in 
your design.

Cheers,


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-18  9:55               ` Bertrand Marquis
@ 2020-07-18 11:14                 ` Julien Grall
  2020-07-20 11:32                   ` Rahul Singh
  0 siblings, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-18 11:14 UTC (permalink / raw)
  To: Bertrand Marquis, Roger Pau Monné
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall



On 18/07/2020 10:55, Bertrand Marquis wrote:
> 
> 
>> On 17 Jul 2020, at 18:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>
>> On Fri, Jul 17, 2020 at 03:47:25PM +0000, Bertrand Marquis wrote:
>>>> On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
>>>> On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>>>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>>>>>>>
>>>>>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>>>>>>
>>>>>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>>>>>>
>>>>>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>>>>>>>>
>>>>>>>> Limitation:
>>>>>>>> * No handler is register for the MSI configuration.
>>>>>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>>>>>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?
>>>>> We plan to fix this by adding proper support for MSI in the long term.
>>>>> For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.
>>>>
>>>> Sharing interrupts are a bit pain because you couldn't take advantage of the direct EOI in HW and have to be careful if one guest doesn't EOI in timely maneer.
>>>>
>>>> This is something I would rather avoid unless there is a real use case for it.
>>>
>>> I would expect that most recent hardware will support MSI and this
>>> will not be needed.
>>
>> Well, PCI Express mandates MSI support, so while this is just a spec,
>> I would expect most (if not all) devices to support MSI (or MSI-X), as
>> Arm platforms haven't implemented legacy PCI anyway.
> 
> Yes that’s our assumption to. But we have to start somewhere so MSI is
> planned but in a future step. I would think that supporting non MSI if not
> impossible will be a lot more complex due to the interrupt sharing.
> I do think that not supporting non MSI should be ok on Arm.
> 
>>
>>> When MSI is not used, the only solution would be to enforce that
>>> devices assigned to different guest are using different interrupts
>>> which would limit the number of domains being able to use PCI
>>> devices on a bus to 4 (if the enumeration can be modified correctly
>>> to assign the interrupts properly).
>>>
>>> If we all agree that this is an acceptable limitation then we would
>>> not need the “interrupt sharing”.
>>
>> I might be easier to start by just supporting devices that have MSI
>> (or MSI-X) and then move to legacy interrupts if required?
> 
> MSI support requires also some support in the interrupt controller part
> on arm. So there is some work to achieve that.
> 
>>
>> You should have most of the pieces you require already implemented
>> since that's what x86 uses, and hence could reuse almost all of it?
> 
> Inside PCI probably but the GIC part will require some work.

We already have an ITS implementation in Xen. This is required in order 
to use PCI devices in DOM0 on thunder-x (there is no legacy interrupts 
supported).

It wasn't yet exposed to the guest because we didn't fully investigate 
the security aspect of the implementation. However, for a tech preview 
this should be sufficient.


-- 
Julien Grall



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 19:17                                 ` Oleksandr
  2020-07-18  9:58                                   ` Bertrand Marquis
@ 2020-07-18 11:24                                   ` Julien Grall
  2020-07-20 11:27                                     ` Oleksandr
  1 sibling, 1 reply; 62+ messages in thread
From: Julien Grall @ 2020-07-18 11:24 UTC (permalink / raw)
  To: Oleksandr, Bertrand Marquis
  Cc: Rahul Singh, Julien Grall, Stefano Stabellini, Jan Beulich,
	xen-devel, nd, Roger Pau Monné



On 17/07/2020 20:17, Oleksandr wrote:
> I would like to clarify regarding an IOMMU driver changes which should 
> be done to support PCI pass-through properly.
> 
> Design document mentions about SMMU, but Xen also supports IPMMU-VMSA 
> (under tech preview now). It would be really nice if the required 
> support is extended to that kind of IOMMU as well.
> 
> May I clarify what should be implemented in the Xen driver in order to 
> support PCI pass-through feature on Arm? 

I would expect callbacks to:
     - add a PCI device
     - remove a PCI device
     - assign a PCI device
     - deassign a PCI device

AFAICT, they are already existing. So it is a matter of plumbing. This 
would then be up to the driver to configure the IOMMU correctly.

> Should the IOMMU H/W be 
> "PCI-aware" for that purpose?

The only requirement is that your PCI devices are behind an IOMMU :). 
Other than that the IOMMU can mostly be configured the same way as you 
would do for the non-PCI devices. The main difference would be how you 
find the master ID.

I am aware that on some platforms, the masterID may be shared between 
multiple PCI devices. In that case, we would need to have a way to 
assign all the devices to the same guest (maybe using group?).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-17 16:05             ` Roger Pau Monné
  2020-07-18  9:55               ` Bertrand Marquis
@ 2020-07-18 11:32               ` Julien Grall
  1 sibling, 0 replies; 62+ messages in thread
From: Julien Grall @ 2020-07-18 11:32 UTC (permalink / raw)
  To: Roger Pau Monné, Bertrand Marquis
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall

Hi,

On 17/07/2020 17:05, Roger Pau Monné wrote:
> IIRC Julien even said that Arm was likely to require much less traps
> than x86 for accesses to MSI and MSI-X since you could allow untrusted
> guests to write directly to the registers as there's another piece of
> hardware that would already translate the interrupts?

This is correct in the case of the ITS. This is because the hardware 
will tag the message with the deviceÌD. So there is no way to spoof it.

However, this may not be the case of other MSI controllers. For 
instance, in the case of the GICv2m, I think we will need to trap and 
sanitize the MSI message (see [1]).

[1] https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-18  9:49               ` Bertrand Marquis
@ 2020-07-20  8:45                 ` Roger Pau Monné
  0 siblings, 0 replies; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-20  8:45 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: xen-devel, nd, Rahul Singh, Stefano Stabellini, Julien Grall

On Sat, Jul 18, 2020 at 09:49:43AM +0000, Bertrand Marquis wrote:
> 
> 
> > On 17 Jul 2020, at 17:55, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > On Fri, Jul 17, 2020 at 03:21:57PM +0000, Bertrand Marquis wrote:
> >>> On 17 Jul 2020, at 16:31, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>> On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote:
> >>>>> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>>>> # Emulated PCI device tree node in libxl:
> >>>>>> 
> >>>>>> Libxl is creating a virtual PCI device tree node in the device tree
> >>>>>> to enable the guest OS to discover the virtual PCI during guest
> >>>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
> >>>>>> guests. When this config option is enabled in a guest configuration,
> >>>>>> a PCI device tree node will be created in the guest device tree.
> >>>>>> 
> >>>>>> A new area has been reserved in the arm guest physical map at which
> >>>>>> the VPCI bus is declared in the device tree (reg and ranges
> >>>>>> parameters of the node). A trap handler for the PCI ECAM access from
> >>>>>> guest has been registered at the defined address and redirects
> >>>>>> requests to the VPCI driver in Xen.
> >>>>> 
> >>>>> Can't you deduce the requirement of such DT node based on the presence
> >>>>> of a 'pci=' option in the same config file?
> >>>>> 
> >>>>> Also I wouldn't discard that in the future you might want to use
> >>>>> different emulators for different devices, so it might be helpful to
> >>>>> introduce something like:
> >>>>> 
> >>>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', '0a:00.0,backend=qemu', ... ]
> >>>>> 
> >>>>> For the time being Arm will require backend=vpci for all the passed
> >>>>> through devices, but I wouldn't rule out this changing in the future.
> >>>> 
> >>>> We need it for the case where no device is declared in the config file and the user
> >>>> wants to add devices using xl later. In this case we must have the DT node for it
> >>>> to work. 
> >>> 
> >>> There's a passthrough xl.cfg option for that already, so that if you
> >>> don't want to add any PCI passthrough devices at creation time but
> >>> rather hotplug them you can set:
> >>> 
> >>> passthrough=enabled
> >>> 
> >>> And it should setup the domain to be prepared to support hot
> >>> passthrough, including the IOMMU [0].
> >> 
> >> Isn’t this option covering more then PCI passthrough ?
> >> 
> >> Lots of Arm platform do not have a PCI bus at all, so for those
> >> creating a VPCI bus would be pointless. But you might need to
> >> activate this to pass devices which are not on the PCI bus.
> > 
> > Well, you can check whether the host has PCI support and decide
> > whether to attach a virtual PCI bus to the guest or not?
> > 
> > Setting passthrough=enabled should prepare the guest to handle
> > passthrough, in whatever form is supported by the host IMO.
> 
> True, we could just say that we create a PCI bus if the host has one and
> passthrough is activated.
> But with virtual device point, we might even need one on guest without
> PCI support on the hardware :-)

Sure, but at that point you might want to consider unconditionally
adding an emulated PCI bus to guests anyway.

You will always have time to add new options to xl, but I would start
by trying to make use of the existing ones.

Are you planning to add the logic in Xen to enable hot-plug of devices
right away?

If the implementation hasn't been considered yet I wouldn't mind
leaving all this for later and just focusing on non-hotplug
passthrough using pci = [ ... ] for the time being.

> > 
> >>>>>> Limitation:
> >>>>>> * Need to avoid the “iomem” and “irq” guest config
> >>>>>> options and map the IOMEM region and IRQ at the same time when
> >>>>>> device is assigned to the guest using the “pci” guest config options
> >>>>>> when xl creates the domain.
> >>>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
> >>>>>> address.
> >>>>> 
> >>>>> It was my understanding that you would identity map the BAR into the
> >>>>> domU stage-2 translation, and that changes by the guest won't be
> >>>>> allowed.
> >>>> 
> >>>> In fact this is not possible to do and we have to remap at a different address
> >>>> because the guest physical mapping is fixed by Xen on Arm so we must follow
> >>>> the same design otherwise this would only work if the BARs are pointing to an
> >>>> address unused and on Juno this is for example conflicting with the guest
> >>>> RAM address.
> >>> 
> >>> This was not clear from my reading of the document, could you please
> >>> clarify on the next version that the guest physical memory map is
> >>> always the same, and that BARs from PCI devices cannot be identity
> >>> mapped to the stage-2 translation and instead are relocated somewhere
> >>> else?
> >> 
> >> We will.
> >> 
> >>> 
> >>> I'm then confused about what you do with bridge windows, do you also
> >>> trap and adjust them to report a different IOMEM region?
> >> 
> >> Yes this is what we will have to do so that the regions reflect the VPCI mappings
> >> and not the hardware one.
> >> 
> >>> 
> >>> Above you mentioned that read-only access was given to bridge
> >>> registers, but I guess some are also emulated in order to report
> >>> matching IOMEM regions?
> >> 
> >> yes that’s exact. We will clear this in the next version.
> > 
> > If you have to go this route for domUs, it might be easier to just
> > fake a PCI host bridge and place all the devices there even with
> > different SBDF addresses. Having to replicate all the bridges on the
> > physical PCI bus and fixing up it's MMIO windows seems much more
> > complicated than just faking/emulating a single bridge?
> 
> That’s definitely something we have to dig more on. The whole problematic
> of PCI enumeration and BAR value assignation in Xen might be pushed to
> either Dom0 or the firmware but we might in fact find ourself with exactly the
> same problem on the VPCI bus.

Not really, in order for Xen to do passthrough to a guest it must know
the SBDF of a device, the resources it's using and the memory map of
the guest, or else passthrough can't be done.

At that point Xen has the whole picture and can decide where the
resources of the device should appear on the stage-2 translation, and
hence the IOMEM windows required on the bridge(s).

What I'm trying to say is that I'm not convinced that exposing all the
host PCI bridges with adjusted IOMEM windows is easier than just
completely faking (and emulating) a PCI bridge inside of Xen.

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 16:18                               ` Julien Grall
  2020-07-17 19:17                                 ` Oleksandr
@ 2020-07-20  8:47                                 ` Roger Pau Monné
  2020-07-20  9:24                                   ` Julien Grall
  1 sibling, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2020-07-20  8:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Bertrand Marquis, Stefano Stabellini, Jan Beulich,
	xen-devel, nd, Julien Grall

On Fri, Jul 17, 2020 at 05:18:46PM +0100, Julien Grall wrote:
> 
> 
> On 17/07/2020 17:08, Roger Pau Monné wrote:
> > On Fri, Jul 17, 2020 at 03:51:47PM +0000, Bertrand Marquis wrote:
> > > 
> > > 
> > > > On 17 Jul 2020, at 17:30, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > 
> > > > On Fri, Jul 17, 2020 at 03:23:57PM +0000, Bertrand Marquis wrote:
> > > > > 
> > > > > 
> > > > > > On 17 Jul 2020, at 17:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > 
> > > > > > On Fri, Jul 17, 2020 at 02:49:20PM +0000, Bertrand Marquis wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > > On 17 Jul 2020, at 16:41, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > > > 
> > > > > > > > On Fri, Jul 17, 2020 at 02:34:55PM +0000, Bertrand Marquis wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > On 17.07.2020 15:59, Bertrand Marquis wrote:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > On 17.07.2020 15:14, Bertrand Marquis wrote:
> > > > > > > > > > > > > > On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> > > > > > > > > > > > > > On 16.07.2020 19:10, Rahul Singh wrote:
> > > > > > > > > > > > > > > # Emulated PCI device tree node in libxl:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I support Stefano's suggestion for this to be an optional thing, i.e.
> > > > > > > > > > > > > > there to be no need for it when there are PCI devices assigned to the
> > > > > > > > > > > > > > guest anyway. I also wonder about the pci_ prefix here - isn't
> > > > > > > > > > > > > > vpci="ecam" as unambiguous?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl.
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm afraid I don't understand: When there are no PCI device that get
> > > > > > > > > > > > handed to a guest when it gets created, but it is supposed to be able
> > > > > > > > > > > > to have some assigned while already running, then we agree the option
> > > > > > > > > > > > is needed (afaict). When PCI devices get handed to the guest while it
> > > > > > > > > > > > gets constructed, where's the problem to infer this option from the
> > > > > > > > > > > > presence of PCI devices in the guest configuration?
> > > > > > > > > > > 
> > > > > > > > > > > If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> > > > > > > > > > > If we do not have the vpci parameter in the configuration this use case will not work anymore.
> > > > > > > > > > 
> > > > > > > > > > That's what everyone looks to agree with. Yet why is the parameter needed
> > > > > > > > > > when there _are_ PCI devices anyway? That's the "optional" that Stefano
> > > > > > > > > > was suggesting, aiui.
> > > > > > > > > 
> > > > > > > > > I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
> > > > > > > > 
> > > > > > > > Where will the ECAM region(s) appear on the guest physmap?
> > > > > > > > 
> > > > > > > > Are you going to re-use the same locations as on the physical
> > > > > > > > hardware, or will they appear somewhere else?
> > > > > > > 
> > > > > > > We will add some new definitions for the ECAM regions in the guest physmap declared in xen (include/asm-arm/config.h)
> > > > > > 
> > > > > > I think I'm confused, but that file doesn't contain anything related
> > > > > > to the guest physmap, that's the Xen virtual memory layout on Arm
> > > > > > AFAICT?
> > > > > > 
> > > > > > Does this somehow relate to the physical memory map exposed to guests
> > > > > > on Arm?
> > > > > 
> > > > > Yes it does.
> > > > > We will add new definitions there related to VPCI to reserve some areas for the VPCI ECAM and the IOMEM areas.
> > > > 
> > > > Yes, that's completely fine and is what's done on x86, but again I
> > > > feel like I'm lost here, this is the Xen virtual memory map, how does
> > > > this relate to the guest physical memory map?
> > > 
> > > Sorry my bad, we will add values in include/public/arch-arm.h, wrong header :-)
> > 
> > Oh right, now I see it :).
> > 
> > Do you really need to specify the ECAM and MMIO regions there?
> 
> You need to define those values somewhere :). The layout is only shared
> between the tools and the hypervisor. I think it would be better if they are
> defined at the same place as the rest of the layout, so it is easier to
> rework the layout.

OK, that's certainly a different approach from what x86 uses, where
the guest memory layout is not defined in the public headers.

On x86 my plan would be to add an hypercall that would set the
position of the ECAM region in the guest physmap, and that would be
called by the toolstack during domain construction.

Roger.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-20  8:47                                 ` Roger Pau Monné
@ 2020-07-20  9:24                                   ` Julien Grall
  0 siblings, 0 replies; 62+ messages in thread
From: Julien Grall @ 2020-07-20  9:24 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Rahul Singh, Bertrand Marquis, Stefano Stabellini, Jan Beulich,
	xen-devel, nd, Julien Grall

Hi Roger,

On 20/07/2020 09:47, Roger Pau Monné wrote:
> On Fri, Jul 17, 2020 at 05:18:46PM +0100, Julien Grall wrote:
>>> Do you really need to specify the ECAM and MMIO regions there?
>>
>> You need to define those values somewhere :). The layout is only shared
>> between the tools and the hypervisor. I think it would be better if they are
>> defined at the same place as the rest of the layout, so it is easier to
>> rework the layout.
> 
> OK, that's certainly a different approach from what x86 uses, where
> the guest memory layout is not defined in the public headers.

It is mostly a convenience as some addresses are used by both the 
hypervisor and tools. A guest should use the firmware tables (ACPI/DT) 
to detect the MMIO regions.

> 
> On x86 my plan would be to add an hypercall that would set the
> position of the ECAM region in the guest physmap, and that would be
> called by the toolstack during domain construction.

It would be possible to use the same on Arm so the hypervisor doesn't 
use hardcoded values for the ECAM.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-18 11:08             ` Julien Grall
@ 2020-07-20 11:26               ` Rahul Singh
  0 siblings, 0 replies; 62+ messages in thread
From: Rahul Singh @ 2020-07-20 11:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, xen-devel,
	nd, Roger Pau Monné



> On 18 Jul 2020, at 12:08 pm, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 17/07/2020 16:47, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
>>> On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>>>>>     pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
>>>>>>> 
>>>>>>> Guest will be only able to access the assigned devices and see the bridges. Guest will not be able to access or see the devices that are no assigned to him.
>>>>>>> 
>>>>>>> Limitation:
>>>>>>> * As of now all the bridges in the PCI bus are seen by the guest on the VPCI bus.
>>>>>> Why do you want to expose all the bridges to a guest? Does this mean that the BDF should always match between the host and the guest?
>>>> That’s not really something that we wanted but this was the easiest way to go.
>>>> As said in a previous mail we could build a VPCI bus with a completely different topology but I am not sure of the advantages this would have.
>>>> Do you see some reason to do this ?
>>> 
>>> Yes :):
>>>  1) If a platform has two host controllers (IIRC Thunder-X has it) then you would need to expose two host controllers to your guest. I think this is undesirable if your guest is only using a couple of PCI devices on each host controllers.
>>>  2) In the case of migration (live or not), you may want to use a difference PCI card on the target platform. So your BDF and bridges may be different.
>>> 
>>> Therefore I think the virtual topology can be beneficial.
>> I would see a big advantage definitely to have only one VPCI bus per guest and put all devices in their independently of the hardware domain the device is on.
>> But this will probably make the VPCI BARs value computation a bit more complex as we might end up with no space on the guest physical map for it.
>> This might make the implementation a lot more complex.
> 
> I am not sure to understand your argument about the space... You should be able to find out the size of each BARs, so you can size the MMIO window correctly. This shouldn't add a lot of complexity.
> 
> I am not asking any implementation for this, but we need to make sure the design can easily be extended for other use cases. In the case of server, we will likely want to expose a single vPCI to the guest.

This is something we have to work on how to implement the virtual topology for the guest. 

> 
>>> 
>>>>>>    - Is there any memory access that can bypassed the IOMMU (e.g doorbell)?
>>>> This is still something to be investigated as part of the MSI implementation.
>>>> If you have any idea here, feel free to tell us.
>>> 
>>> My memory is a bit fuzzy here. I am sure that the doorbell can bypass the IOMMU on some platform, but I also vaguely remember that accesses to the PCI host controller memory window may also bypass the IOMMU. A good reading might be [2].
>>> 
>>> IIRC, I came to the conclusion that we may want to use the host memory map in the guest when using the PCI passthrough. But maybe not on all the platforms.
>> Definitely a lot of this would be easier if could use 1:1 mapping.
>> We will keep that in mind when we will start to investigate on the MSI part.
> 
> Hmmm... Maybe I wasn't clear enough but the problem is not only happening with MSIs doorbells. It is also with the P2P transactions.
> 
> Again, I am not asking to implement it at the beginning. However, it would be good to outline the potential limitations of the approach in your design.

As Bertrand mention once we start investigating on the MSI support we will have this in mind to proceed.
> 
> Cheers,
> 
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-18 11:24                                   ` Julien Grall
@ 2020-07-20 11:27                                     ` Oleksandr
  0 siblings, 0 replies; 62+ messages in thread
From: Oleksandr @ 2020-07-20 11:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: nd, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Jan Beulich, xen-devel, Rahul Singh, Roger Pau Monné


On 18.07.20 14:24, Julien Grall wrote:

Hello Julien

>
>
> On 17/07/2020 20:17, Oleksandr wrote:
>> I would like to clarify regarding an IOMMU driver changes which 
>> should be done to support PCI pass-through properly.
>>
>> Design document mentions about SMMU, but Xen also supports IPMMU-VMSA 
>> (under tech preview now). It would be really nice if the required 
>> support is extended to that kind of IOMMU as well.
>>
>> May I clarify what should be implemented in the Xen driver in order 
>> to support PCI pass-through feature on Arm? 
>
> I would expect callbacks to:
>     - add a PCI device
>     - remove a PCI device
>     - assign a PCI device
>     - deassign a PCI device
>
> AFAICT, they are already existing. So it is a matter of plumbing. This 
> would then be up to the driver to configure the IOMMU correctly.


Got it.

>
>> Should the IOMMU H/W be "PCI-aware" for that purpose?
>
> The only requirement is that your PCI devices are behind an IOMMU :). 
> Other than that the IOMMU can mostly be configured the same way as you 
> would do for the non-PCI devices. 

That's good.


> The main difference would be how you find the master ID.
>
> I am aware that on some platforms, the masterID may be shared between 
> multiple PCI devices. In that case, we would need to have a way to 
> assign all the devices to the same guest (maybe using group?).

Or just prevent these devices from being assigned to different guests? 
During assigning a device to newly created guest check whether masterID 
is already in use for any existing guests and deny operation in such a 
case.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI devices passthrough on Arm design proposal
  2020-07-18 11:14                 ` Julien Grall
@ 2020-07-20 11:32                   ` Rahul Singh
  0 siblings, 0 replies; 62+ messages in thread
From: Rahul Singh @ 2020-07-20 11:32 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, xen-devel,
	nd, Roger Pau Monné



> On 18 Jul 2020, at 12:14 pm, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 18/07/2020 10:55, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 18:05, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> 
>>> On Fri, Jul 17, 2020 at 03:47:25PM +0000, Bertrand Marquis wrote:
>>>>> On 17 Jul 2020, at 17:26, Julien Grall <julien@xen.org> wrote:
>>>>> On 17/07/2020 15:47, Bertrand Marquis wrote:
>>>>>>>>> * Dom0Less implementation will require to have the capacity inside Xen to discover the PCI devices (without depending on Dom0 to declare them to Xen).
>>>>>>>>> 
>>>>>>>>> # Enable the existing x86 virtual PCI support for ARM:
>>>>>>>>> 
>>>>>>>>> The existing VPCI support available for X86 is adapted for Arm. When the device is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for the config space access is added to the PCI device to emulate the PCI devices.
>>>>>>>>> 
>>>>>>>>> A MMIO trap handler for the PCI ECAM space is registered in XEN so that when guest is trying to access the PCI config space, XEN will trap the access and emulate read/write using the VPCI and not the real PCI hardware.
>>>>>>>>> 
>>>>>>>>> Limitation:
>>>>>>>>> * No handler is register for the MSI configuration.
>>>>>>>>> * Only legacy interrupt is supported and tested as of now, MSI is not implemented and tested.
>>>>>>>> IIRC, legacy interrupt may be shared between two PCI devices. How do you plan to handle this on Arm?
>>>>>> We plan to fix this by adding proper support for MSI in the long term.
>>>>>> For the use case where MSI is not supported or not wanted we might have to find a way to forward the hardware interrupt to several guests to emulate some kind of shared interrupt.
>>>>> 
>>>>> Sharing interrupts are a bit pain because you couldn't take advantage of the direct EOI in HW and have to be careful if one guest doesn't EOI in timely maneer.
>>>>> 
>>>>> This is something I would rather avoid unless there is a real use case for it.
>>>> 
>>>> I would expect that most recent hardware will support MSI and this
>>>> will not be needed.
>>> 
>>> Well, PCI Express mandates MSI support, so while this is just a spec,
>>> I would expect most (if not all) devices to support MSI (or MSI-X), as
>>> Arm platforms haven't implemented legacy PCI anyway.
>> Yes that’s our assumption to. But we have to start somewhere so MSI is
>> planned but in a future step. I would think that supporting non MSI if not
>> impossible will be a lot more complex due to the interrupt sharing.
>> I do think that not supporting non MSI should be ok on Arm.
>>> 
>>>> When MSI is not used, the only solution would be to enforce that
>>>> devices assigned to different guest are using different interrupts
>>>> which would limit the number of domains being able to use PCI
>>>> devices on a bus to 4 (if the enumeration can be modified correctly
>>>> to assign the interrupts properly).
>>>> 
>>>> If we all agree that this is an acceptable limitation then we would
>>>> not need the “interrupt sharing”.
>>> 
>>> I might be easier to start by just supporting devices that have MSI
>>> (or MSI-X) and then move to legacy interrupts if required?
>> MSI support requires also some support in the interrupt controller part
>> on arm. So there is some work to achieve that.
>>> 
>>> You should have most of the pieces you require already implemented
>>> since that's what x86 uses, and hence could reuse almost all of it?
>> Inside PCI probably but the GIC part will require some work.
> 
> We already have an ITS implementation in Xen. This is required in order to use PCI devices in DOM0 on thunder-x (there is no legacy interrupts supported).
> 
> It wasn't yet exposed to the guest because we didn't fully investigate the security aspect of the implementation. However, for a tech preview this should be sufficient.
> 

Ok We will have a look for the ITS implementation once we will start working on the MSI support. Thanks for the pointer.
> 
> -- 
> Julien Grall
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 14:34               ` Bertrand Marquis
  2020-07-17 14:41                 ` Roger Pau Monné
@ 2020-07-20 23:23                 ` Stefano Stabellini
  2020-07-21  9:54                   ` Rahul Singh
  1 sibling, 1 reply; 62+ messages in thread
From: Stefano Stabellini @ 2020-07-20 23:23 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, Jan Beulich, xen-devel, nd, Julien Grall

On Fri, 17 Jul 2020, Bertrand Marquis wrote:
> > On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
> > 
> > On 17.07.2020 15:59, Bertrand Marquis wrote:
> >> 
> >> 
> >>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
> >>> 
> >>> On 17.07.2020 15:14, Bertrand Marquis wrote:
> >>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>> On 16.07.2020 19:10, Rahul Singh wrote:
> >>>>>> # Emulated PCI device tree node in libxl:
> >>>>>> 
> >>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
> >>>>> 
> >>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
> >>>>> there to be no need for it when there are PCI devices assigned to the
> >>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
> >>>>> vpci="ecam" as unambiguous?
> >>>> 
> >>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl.> >>> 
> >>> I'm afraid I don't understand: When there are no PCI device that get
> >>> handed to a guest when it gets created, but it is supposed to be able
> >>> to have some assigned while already running, then we agree the option
> >>> is needed (afaict). When PCI devices get handed to the guest while it
> >>> gets constructed, where's the problem to infer this option from the
> >>> presence of PCI devices in the guest configuration?
> >> 
> >> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
> >> If we do not have the vpci parameter in the configuration this use case will not work anymore.
> > 
> > That's what everyone looks to agree with. Yet why is the parameter needed
> > when there _are_ PCI devices anyway? That's the "optional" that Stefano
> > was suggesting, aiui.
> 
> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.

Great!

Moreover, we might also be able to get rid of the vpci parameter in
cases where are no devices assigned at boot time but still we want to
create a vpci host bridge in domU anyway. In those cases we could use
the following:

  pci = [];

otherwise, worse but it might be easier to implement in xl:

  pci = [""];


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-17 15:21           ` Bertrand Marquis
  2020-07-17 15:55             ` Roger Pau Monné
@ 2020-07-20 23:24             ` Stefano Stabellini
  2020-07-21  1:39               ` Rob Herring
  1 sibling, 1 reply; 62+ messages in thread
From: Stefano Stabellini @ 2020-07-20 23:24 UTC (permalink / raw)
  To: Bertrand Marquis, robh
  Cc: Rahul Singh, Roger Pau Monné,
	Stefano Stabellini, xen-devel, nd, Julien Grall

+ Rob Herring

On Fri, 17 Jul 2020, Bertrand Marquis wrote:
> >> Regarding the DT entry, this is not coming from us and this is already
> >> defined this way in existing DTBs, we just reuse the existing entry. 
> > 
> > Is it possible to standardize the property and drop the linux prefix?
> 
> Honestly i do not know. This was there in the DT examples we checked so
> we planned to use that. But it might be possible to standardize this.

We could certainly start a discussion about it. It looks like
linux,pci-domain is used beyond purely the Linux kernel. I think that it
is worth getting Rob's advice on this.


Rob, for context we are trying to get Linux and Xen to agree on a
numbering scheme to identify PCI host bridges correctly. We already have
an existing hypercall from the old x86 days that passes a segment number
to Xen as a parameter, see drivers/xen/pci.c:xen_add_device.
(xen_add_device assumes that a Linux domain and a PCI segment are the
same thing which I understand is not the case.) 


There is an existing device tree property called "linux,pci-domain"
which would solve the problem (ignoring the difference in the definition
of domain and segment) but it is clearly marked as a Linux-specific
property. Is there anything more "standard" that we can use?

I can find PCI domains being mentioned a few times in the Device Tree
PCI specification but can't find any associated IDs, and I couldn't find
segments at all.

What's your take on this? In general, what's your suggestion on getting
Xen and Linux (and other OSes which could be used as dom0 one day like
Zephyr) to agree on a simple numbering scheme to identify PCI host
bridges?

Should we just use "linux,pci-domain" as-is because it is already the de
facto standard? It looks like the property appears in both QEMU and
UBoot already.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-20 23:24             ` Stefano Stabellini
@ 2020-07-21  1:39               ` Rob Herring
  2020-07-21 19:35                 ` Stefano Stabellini
  0 siblings, 1 reply; 62+ messages in thread
From: Rob Herring @ 2020-07-21  1:39 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Rahul Singh, Julien Grall, Bertrand Marquis, xen-devel, nd,
	Roger Pau Monné

On Mon, Jul 20, 2020 at 5:24 PM Stefano Stabellini
<sstabellini@kernel.org> wrote:
>
> + Rob Herring
>
> On Fri, 17 Jul 2020, Bertrand Marquis wrote:
> > >> Regarding the DT entry, this is not coming from us and this is already
> > >> defined this way in existing DTBs, we just reuse the existing entry.
> > >
> > > Is it possible to standardize the property and drop the linux prefix?
> >
> > Honestly i do not know. This was there in the DT examples we checked so
> > we planned to use that. But it might be possible to standardize this.
>
> We could certainly start a discussion about it. It looks like
> linux,pci-domain is used beyond purely the Linux kernel. I think that it
> is worth getting Rob's advice on this.
>
>
> Rob, for context we are trying to get Linux and Xen to agree on a
> numbering scheme to identify PCI host bridges correctly. We already have
> an existing hypercall from the old x86 days that passes a segment number
> to Xen as a parameter, see drivers/xen/pci.c:xen_add_device.
> (xen_add_device assumes that a Linux domain and a PCI segment are the
> same thing which I understand is not the case.)
>
>
> There is an existing device tree property called "linux,pci-domain"
> which would solve the problem (ignoring the difference in the definition
> of domain and segment) but it is clearly marked as a Linux-specific
> property. Is there anything more "standard" that we can use?
>
> I can find PCI domains being mentioned a few times in the Device Tree
> PCI specification but can't find any associated IDs, and I couldn't find
> segments at all.
>
> What's your take on this? In general, what's your suggestion on getting
> Xen and Linux (and other OSes which could be used as dom0 one day like
> Zephyr) to agree on a simple numbering scheme to identify PCI host
> bridges?
>
> Should we just use "linux,pci-domain" as-is because it is already the de
> facto standard? It looks like the property appears in both QEMU and
> UBoot already.

Sounds good to me. We could drop the 'linux' part, but based on other
places that has happened it just means we end up supporting both
strings forever.

Rob


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-20 23:23                 ` Stefano Stabellini
@ 2020-07-21  9:54                   ` Rahul Singh
  0 siblings, 0 replies; 62+ messages in thread
From: Rahul Singh @ 2020-07-21  9:54 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Roger Pau Monné,
	Bertrand Marquis, Jan Beulich, xen-devel, nd, Julien Grall



> On 21 Jul 2020, at 12:23 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Fri, 17 Jul 2020, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 16:06, Jan Beulich <jbeulich@suse.com> wrote:
>>> 
>>> On 17.07.2020 15:59, Bertrand Marquis wrote:
>>>> 
>>>> 
>>>>> On 17 Jul 2020, at 15:19, Jan Beulich <jbeulich@suse.com> wrote:
>>>>> 
>>>>> On 17.07.2020 15:14, Bertrand Marquis wrote:
>>>>>>> On 17 Jul 2020, at 10:10, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> On 16.07.2020 19:10, Rahul Singh wrote:
>>>>>>>> # Emulated PCI device tree node in libxl:
>>>>>>>> 
>>>>>>>> Libxl is creating a virtual PCI device tree node in the device tree to enable the guest OS to discover the virtual PCI during guest boot. We introduced the new config option [vpci="pci_ecam"] for guests. When this config option is enabled in a guest configuration, a PCI device tree node will be created in the guest device tree.
>>>>>>> 
>>>>>>> I support Stefano's suggestion for this to be an optional thing, i.e.
>>>>>>> there to be no need for it when there are PCI devices assigned to the
>>>>>>> guest anyway. I also wonder about the pci_ prefix here - isn't
>>>>>>> vpci="ecam" as unambiguous?
>>>>>> 
>>>>>> This could be a problem as we need to know that this is required for a guest upfront so that PCI devices can be assigned after using xl.> >>> 
>>>>> I'm afraid I don't understand: When there are no PCI device that get
>>>>> handed to a guest when it gets created, but it is supposed to be able
>>>>> to have some assigned while already running, then we agree the option
>>>>> is needed (afaict). When PCI devices get handed to the guest while it
>>>>> gets constructed, where's the problem to infer this option from the
>>>>> presence of PCI devices in the guest configuration?
>>>> 
>>>> If the user wants to use xl pci-attach to attach in runtime a device to a guest, this guest must have a VPCI bus (even with no devices).
>>>> If we do not have the vpci parameter in the configuration this use case will not work anymore.
>>> 
>>> That's what everyone looks to agree with. Yet why is the parameter needed
>>> when there _are_ PCI devices anyway? That's the "optional" that Stefano
>>> was suggesting, aiui.
>> 
>> I agree in this case the parameter could be optional and only required if not PCI device is assigned directly in the guest configuration.
> 
> Great!
> 
> Moreover, we might also be able to get rid of the vpci parameter in
> cases where are no devices assigned at boot time but still we want to
> create a vpci host bridge in domU anyway. In those cases we could use
> the following:
> 
>  pci = [];
> 
> otherwise, worse but it might be easier to implement in xl:
> 
>  pci = [""];

pci =[] ; is a great idea to avoid new config option to create a device tree node when there is no device assigned. We will check this and will update the design spec accodringly. 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: RFC: PCI devices passthrough on Arm design proposal
  2020-07-21  1:39               ` Rob Herring
@ 2020-07-21 19:35                 ` Stefano Stabellini
  0 siblings, 0 replies; 62+ messages in thread
From: Stefano Stabellini @ 2020-07-21 19:35 UTC (permalink / raw)
  To: Rob Herring
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Rahul Singh,
	xen-devel, nd, Roger Pau Monné

On Mon, 20 Jul 2020, Rob Herring wrote:
> On Mon, Jul 20, 2020 at 5:24 PM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
> >
> > + Rob Herring
> >
> > On Fri, 17 Jul 2020, Bertrand Marquis wrote:
> > > >> Regarding the DT entry, this is not coming from us and this is already
> > > >> defined this way in existing DTBs, we just reuse the existing entry.
> > > >
> > > > Is it possible to standardize the property and drop the linux prefix?
> > >
> > > Honestly i do not know. This was there in the DT examples we checked so
> > > we planned to use that. But it might be possible to standardize this.
> >
> > We could certainly start a discussion about it. It looks like
> > linux,pci-domain is used beyond purely the Linux kernel. I think that it
> > is worth getting Rob's advice on this.
> >
> >
> > Rob, for context we are trying to get Linux and Xen to agree on a
> > numbering scheme to identify PCI host bridges correctly. We already have
> > an existing hypercall from the old x86 days that passes a segment number
> > to Xen as a parameter, see drivers/xen/pci.c:xen_add_device.
> > (xen_add_device assumes that a Linux domain and a PCI segment are the
> > same thing which I understand is not the case.)
> >
> >
> > There is an existing device tree property called "linux,pci-domain"
> > which would solve the problem (ignoring the difference in the definition
> > of domain and segment) but it is clearly marked as a Linux-specific
> > property. Is there anything more "standard" that we can use?
> >
> > I can find PCI domains being mentioned a few times in the Device Tree
> > PCI specification but can't find any associated IDs, and I couldn't find
> > segments at all.
> >
> > What's your take on this? In general, what's your suggestion on getting
> > Xen and Linux (and other OSes which could be used as dom0 one day like
> > Zephyr) to agree on a simple numbering scheme to identify PCI host
> > bridges?
> >
> > Should we just use "linux,pci-domain" as-is because it is already the de
> > facto standard? It looks like the property appears in both QEMU and
> > UBoot already.
> 
> Sounds good to me. We could drop the 'linux' part, but based on other
> places that has happened it just means we end up supporting both
> strings forever.

OK, thank you!


^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2020-07-21 19:36 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3F6E40FB-79C5-4AE8-81CA-E16CA37BB298@arm.com>
     [not found] ` <BD475825-10F6-4538-8294-931E370A602C@arm.com>
2020-07-16 17:10   ` RFC: PCI devices passthrough on Arm design proposal Rahul Singh
2020-07-16 20:51     ` Stefano Stabellini
2020-07-17  6:53       ` Bertrand Marquis
2020-07-17  7:41         ` Oleksandr Andrushchenko
2020-07-17 11:26           ` Julien Grall
2020-07-17 11:41             ` Oleksandr Andrushchenko
2020-07-17 13:21               ` Bertrand Marquis
2020-07-17 12:46           ` Rahul Singh
2020-07-17 12:55             ` Jan Beulich
2020-07-17 13:12           ` Bertrand Marquis
2020-07-17  8:10     ` Jan Beulich
2020-07-17  8:47       ` Oleksandr Andrushchenko
2020-07-17 13:28         ` Rahul Singh
2020-07-17 13:14       ` Bertrand Marquis
2020-07-17 13:19         ` Jan Beulich
2020-07-17 13:59           ` Bertrand Marquis
2020-07-17 14:06             ` Jan Beulich
2020-07-17 14:34               ` Bertrand Marquis
2020-07-17 14:41                 ` Roger Pau Monné
2020-07-17 14:49                   ` Bertrand Marquis
2020-07-17 15:05                     ` Roger Pau Monné
2020-07-17 15:23                       ` Bertrand Marquis
2020-07-17 15:30                         ` Roger Pau Monné
2020-07-17 15:51                           ` Bertrand Marquis
2020-07-17 16:08                             ` Roger Pau Monné
2020-07-17 16:18                               ` Julien Grall
2020-07-17 19:17                                 ` Oleksandr
2020-07-18  9:58                                   ` Bertrand Marquis
2020-07-18 11:24                                   ` Julien Grall
2020-07-20 11:27                                     ` Oleksandr
2020-07-20  8:47                                 ` Roger Pau Monné
2020-07-20  9:24                                   ` Julien Grall
2020-07-20 23:23                 ` Stefano Stabellini
2020-07-21  9:54                   ` Rahul Singh
2020-07-17 11:16     ` Roger Pau Monné
2020-07-17 13:22       ` Bertrand Marquis
2020-07-17 13:29         ` Julien Grall
2020-07-17 13:44           ` Bertrand Marquis
2020-07-17 13:49             ` Julien Grall
2020-07-17 14:01               ` Bertrand Marquis
2020-07-17 14:31         ` Roger Pau Monné
2020-07-17 15:21           ` Bertrand Marquis
2020-07-17 15:55             ` Roger Pau Monné
2020-07-18  9:49               ` Bertrand Marquis
2020-07-20  8:45                 ` Roger Pau Monné
2020-07-20 23:24             ` Stefano Stabellini
2020-07-21  1:39               ` Rob Herring
2020-07-21 19:35                 ` Stefano Stabellini
     [not found]   ` <8ac91a1b-e6b3-0f2b-0f23-d7aff100936d@xen.org>
2020-07-17 13:50     ` Julien Grall
2020-07-17 13:59       ` Jan Beulich
2020-07-17 14:12         ` Julien Grall
2020-07-17 14:23           ` Jan Beulich
2020-07-17 14:47       ` Bertrand Marquis
2020-07-17 15:26         ` Julien Grall
2020-07-17 15:47           ` Bertrand Marquis
2020-07-17 16:05             ` Roger Pau Monné
2020-07-18  9:55               ` Bertrand Marquis
2020-07-18 11:14                 ` Julien Grall
2020-07-20 11:32                   ` Rahul Singh
2020-07-18 11:32               ` Julien Grall
2020-07-18 11:08             ` Julien Grall
2020-07-20 11:26               ` Rahul Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.