All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
@ 2022-10-26 13:17 Rahul Singh
  2022-10-26 13:36 ` Julien Grall
  2022-10-27  9:01 ` Ayan Kumar Halder
  0 siblings, 2 replies; 33+ messages in thread
From: Rahul Singh @ 2022-10-26 13:17 UTC (permalink / raw)
  To: Xen developer discussion
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Oleksandr Tyshchenko, Oleksandr Andrushchenko, Volodymyr Babchuk,
	Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi All,

At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
how to add the IOMMU binding for guest OS.

Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
configuration is called nesting.

Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
XEN will trap the access and configure the hardware accordingly.

Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
guests can configure the IOMMU correctly. The solution that I am suggesting is as below:

For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
IOMMU node phandle.

For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
node that is required to describe the generic device tree binding for IOMMUs and their master(s)

"iommus = < &magic_phandle 0xvMasterID>  
	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).  
	• vMasterID will be the virtual master ID that the user will provide.

The partial device tree will look like this:
/dts-v1/;
 
/ {
    /* #*cells are here to keep DTC happy */
    #address-cells = <2>;
    #size-cells = <2>;
 
    aliases {
        net = &mac0;
    };
 
    passthrough {
        compatible = "simple-bus";
        ranges;
        #address-cells = <2>;
        #size-cells = <2>;
        mac0: ethernet@10000000 {
            compatible = "calxeda,hb-xgmac";
            reg = <0 0x10000000 0 0x1000>;
            interrupts = <0 80 4  0 81 4  0 82 4>;
           iommus = <0xfdea 0x01>;
        };
    };
};
 
In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
the same master ID but behind a different IOMMU.
 
iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]

	• PMASTER_ID is the physical master ID of the device from the physical DT.
	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected. 
 
Example: Let's say the user wants to assign the below physical device in DT to the guest.
 
iommu@4f000000 {
                compatible = "arm,smmu-v3";
             	interrupts = <0x00 0xe4 0xf04>;
                interrupt-parent = <0x01>;
                #iommu-cells = <0x01>;
                interrupt-names = "combined";
                reg = <0x00 0x4f000000 0x00 0x40000>;
                phandle = <0xfdeb>;
                name = "iommu";
};
 
test@10000000 {
	compatible = "viommu-test”;
	iommus = <0xfdeb 0x10>;
	interrupts = <0x00 0xff 0x04>;
	reg = <0x00 0x10000000 0x00 0x1000>;
	name = "viommu-test";
};
 
The partial Device tree node will be like this:
 
/ {
    /* #*cells are here to keep DTC happy */
    #address-cells = <2>;
    #size-cells = <2>;
 
    passthrough {
        compatible = "simple-bus";
        ranges;
        #address-cells = <2>;
        #size-cells = <2>;

	test@10000000 {
            	compatible = "viommu-test";
            	reg = <0 0x10000000 0 0x1000>;
            	interrupts = <0 80 4  0 81 4  0 82 4>;
            	iommus = <0xfdea 0x01>;
        };
    };
};
 
 iommu_devid_map = [ “0x10@0x01,0x4f000000”]
	• 0x10 is the real physical master id from the physical DT.
	• 0x01 is the virtual master Id that the user defines as a partial device tree.
	• 0x4f000000 is the base address of the IOMMU device.

[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt
[2] https://xenbits.xen.org/docs/unstable/misc/arm/passthrough.txt

Regards,
Rahul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 13:17 Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices Rahul Singh
@ 2022-10-26 13:36 ` Julien Grall
  2022-10-26 14:33   ` Rahul Singh
  2022-10-27  9:01 ` Ayan Kumar Halder
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-10-26 13:36 UTC (permalink / raw)
  To: Rahul Singh, Xen developer discussion
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Oleksandr Tyshchenko, Oleksandr Andrushchenko, Volodymyr Babchuk,
	Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross



On 26/10/2022 14:17, Rahul Singh wrote:
> Hi All,

Hi Rahul,

> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
> how to add the IOMMU binding for guest OS.
> 
> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
> configuration is called nesting.
> 
> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
> XEN will trap the access and configure the hardware accordingly.
> 
> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
> 
> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
> IOMMU node phandle.
Below, you said that each IOMMUs may have a different ID space. So 
shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the 
user to specify the mapping?

> 
> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
> 
> "iommus = < &magic_phandle 0xvMasterID>
> 	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).

Does this mean only one IOMMU will be supported in the guest?

> 	• vMasterID will be the virtual master ID that the user will provide.
> 
> The partial device tree will look like this:
> /dts-v1/;
>   
> / {
>      /* #*cells are here to keep DTC happy */
>      #address-cells = <2>;
>      #size-cells = <2>;
>   
>      aliases {
>          net = &mac0;
>      };
>   
>      passthrough {
>          compatible = "simple-bus";
>          ranges;
>          #address-cells = <2>;
>          #size-cells = <2>;
>          mac0: ethernet@10000000 {
>              compatible = "calxeda,hb-xgmac";
>              reg = <0 0x10000000 0 0x1000>;
>              interrupts = <0 80 4  0 81 4  0 82 4>;
>             iommus = <0xfdea 0x01>;
>          };
>      };
> };
>   
> In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
> the same master ID but behind a different IOMMU.

In xl.cfg, we already pass the device-tree node path to passthrough. So 
Xen should already have all the information about the IOMMU and 
Master-ID. So it doesn't seem necessary for Device-Tree.

For ACPI, I would have expected the information to be found in the IOREQ.

So can you add more context why this is necessary for everyone?

>   
> iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> 
> 	• PMASTER_ID is the physical master ID of the device from the physical DT.
> 	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
> 	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.

Below you give an example for Platform device. How would that fit in the 
context of PCI passthrough?

>   
> Example: Let's say the user wants to assign the below physical device in DT to the guest.
>   
> iommu@4f000000 {
>                  compatible = "arm,smmu-v3";
>               	interrupts = <0x00 0xe4 0xf04>;
>                  interrupt-parent = <0x01>;
>                  #iommu-cells = <0x01>;
>                  interrupt-names = "combined";
>                  reg = <0x00 0x4f000000 0x00 0x40000>;
>                  phandle = <0xfdeb>;
>                  name = "iommu";
> };

So I guess this node will be written by Xen. How will you the case where 
there are extra property to added (e.g. dma-coherent)?

>   
> test@10000000 {
> 	compatible = "viommu-test”;
> 	iommus = <0xfdeb 0x10>;

I am a bit confused. Here you use 0xfdeb for the phandle but below...

> 	interrupts = <0x00 0xff 0x04>;
> 	reg = <0x00 0x10000000 0x00 0x1000>;
> 	name = "viommu-test";
> };
>   
> The partial Device tree node will be like this:
>   
> / {
>      /* #*cells are here to keep DTC happy */
>      #address-cells = <2>;
>      #size-cells = <2>;
>   
>      passthrough {
>          compatible = "simple-bus";
>          ranges;
>          #address-cells = <2>;
>          #size-cells = <2>;
> 
> 	test@10000000 {
>              	compatible = "viommu-test";
>              	reg = <0 0x10000000 0 0x1000>;
>              	interrupts = <0 80 4  0 81 4  0 82 4>;
>              	iommus = <0xfdea 0x01>;

... you use 0xfdea. Does this mean 'xl' will rewrite the phandle?

>          };
>      };
> };
>   
>   iommu_devid_map = [ “0x10@0x01,0x4f000000”]
> 	• 0x10 is the real physical master id from the physical DT.
> 	• 0x01 is the virtual master Id that the user defines as a partial device tree.
> 	• 0x4f000000 is the base address of the IOMMU device.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 13:36 ` Julien Grall
@ 2022-10-26 14:33   ` Rahul Singh
  2022-10-26 17:17     ` Michal Orzel
  2022-10-26 19:48     ` Julien Grall
  0 siblings, 2 replies; 33+ messages in thread
From: Rahul Singh @ 2022-10-26 14:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Julien,

> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 26/10/2022 14:17, Rahul Singh wrote:
>> Hi All,
> 
> Hi Rahul,
> 
>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>> how to add the IOMMU binding for guest OS.
>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>> configuration is called nesting.
>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>> XEN will trap the access and configure the hardware accordingly.
>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>> IOMMU node phandle.
> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?

Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
phandle and same base address.

For domU guests one vIOMMU per guest will be created.

> 
>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>> "iommus = < &magic_phandle 0xvMasterID>
>> 	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
> 
> Does this mean only one IOMMU will be supported in the guest?

Yes.

> 
>> 	• vMasterID will be the virtual master ID that the user will provide.
>> The partial device tree will look like this:
>> /dts-v1/;
>>  / {
>>     /* #*cells are here to keep DTC happy */
>>     #address-cells = <2>;
>>     #size-cells = <2>;
>>       aliases {
>>         net = &mac0;
>>     };
>>       passthrough {
>>         compatible = "simple-bus";
>>         ranges;
>>         #address-cells = <2>;
>>         #size-cells = <2>;
>>         mac0: ethernet@10000000 {
>>             compatible = "calxeda,hb-xgmac";
>>             reg = <0 0x10000000 0 0x1000>;
>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>            iommus = <0xfdea 0x01>;
>>         };
>>     };
>> };
>>  In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>> the same master ID but behind a different IOMMU.
> 
> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
> 
> For ACPI, I would have expected the information to be found in the IOREQ.
> 
> So can you add more context why this is necessary for everyone?

We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
each one connected to a different SMMU and assigned to the guest.

> 
>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>> 	• PMASTER_ID is the physical master ID of the device from the physical DT.
>> 	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>> 	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
> 
> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?

In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
vSMMUv3 node will be created in xl.

> 
>>  Example: Let's say the user wants to assign the below physical device in DT to the guest.
>>  iommu@4f000000 {
>>                 compatible = "arm,smmu-v3";
>>              	interrupts = <0x00 0xe4 0xf04>;
>>                 interrupt-parent = <0x01>;
>>                 #iommu-cells = <0x01>;
>>                 interrupt-names = "combined";
>>                 reg = <0x00 0x4f000000 0x00 0x40000>;
>>                 phandle = <0xfdeb>;
>>                 name = "iommu";
>> };
> 
> So I guess this node will be written by Xen. How will you the case where there are extra property to added (e.g. dma-coherent)?

In this example this is physical IOMMU node. vIOMMU node wil be created by xl during guest creation.
> 
>>  test@10000000 {
>> 	compatible = "viommu-test”;
>> 	iommus = <0xfdeb 0x10>;
> 
> I am a bit confused. Here you use 0xfdeb for the phandle but below...

Here 0xfdeb is the physical IOMMU node phandle...
> 
>> 	interrupts = <0x00 0xff 0x04>;
>> 	reg = <0x00 0x10000000 0x00 0x1000>;
>> 	name = "viommu-test";
>> };
>>  The partial Device tree node will be like this:
>>  / {
>>     /* #*cells are here to keep DTC happy */
>>     #address-cells = <2>;
>>     #size-cells = <2>;
>>       passthrough {
>>         compatible = "simple-bus";
>>         ranges;
>>         #address-cells = <2>;
>>         #size-cells = <2>;
>> 	test@10000000 {
>>             	compatible = "viommu-test";
>>             	reg = <0 0x10000000 0 0x1000>;
>>             	interrupts = <0 80 4  0 81 4  0 82 4>;
>>             	iommus = <0xfdea 0x01>;
> 
> ... you use 0xfdea. Does this mean 'xl' will rewrite the phandle?

but here user has to set the “iommus” property with magic phanle as explained earlier. 0xfdea is magic phandle. 
 
Regards,
Rahul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 14:33   ` Rahul Singh
@ 2022-10-26 17:17     ` Michal Orzel
  2022-10-26 18:23       ` Oleksandr Tyshchenko
  2022-10-27 16:12       ` Rahul Singh
  2022-10-26 19:48     ` Julien Grall
  1 sibling, 2 replies; 33+ messages in thread
From: Michal Orzel @ 2022-10-26 17:17 UTC (permalink / raw)
  To: Rahul Singh, Julien Grall
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Rahul,

On 26/10/2022 16:33, Rahul Singh wrote:
> 
> 
> Hi Julien,
> 
>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 26/10/2022 14:17, Rahul Singh wrote:
>>> Hi All,
>>
>> Hi Rahul,
>>
>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>> how to add the IOMMU binding for guest OS.
>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>> configuration is called nesting.
>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>> XEN will trap the access and configure the hardware accordingly.
>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>> IOMMU node phandle.
>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
> 
> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
> phandle and same base address.
> 
> For domU guests one vIOMMU per guest will be created.
> 
>>
>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>> "iommus = < &magic_phandle 0xvMasterID>
>>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>
>> Does this mean only one IOMMU will be supported in the guest?
> 
> Yes.
> 
>>
>>>      • vMasterID will be the virtual master ID that the user will provide.
>>> The partial device tree will look like this:
>>> /dts-v1/;
>>>  / {
>>>     /* #*cells are here to keep DTC happy */
>>>     #address-cells = <2>;
>>>     #size-cells = <2>;
>>>       aliases {
>>>         net = &mac0;
>>>     };
>>>       passthrough {
>>>         compatible = "simple-bus";
>>>         ranges;
>>>         #address-cells = <2>;
>>>         #size-cells = <2>;
>>>         mac0: ethernet@10000000 {
>>>             compatible = "calxeda,hb-xgmac";
>>>             reg = <0 0x10000000 0 0x1000>;
>>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>>            iommus = <0xfdea 0x01>;
>>>         };
>>>     };
>>> };
>>>  In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>> the same master ID but behind a different IOMMU.
>>
>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>
>> For ACPI, I would have expected the information to be found in the IOREQ.
>>
>> So can you add more context why this is necessary for everyone?
> 
> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
> each one connected to a different SMMU and assigned to the guest.

I think the proposed solution would work and I would just like to clear some issues.

Please correct me if I'm wrong:

In the xl config file we already need to specify dtdev to point to the device path in host dtb.
In the partial device tree we specify the vMasterId as well as magic phandle.
Isn't it that we already have all the information necessary without the need for iommu_devid_map?
For me it looks like the partial dtb provides vMasterID and dtdev provides pMasterID as well as physical phandle to SMMU.

Having said that, I can also understand that specifying everything in one place using iommu_devid_map can be easier
and reduces the need for device tree parsing.

Apart from that, what is the reason of exposing only one vSMMU to guest instead of one vSMMU per pSMMU?
In the latter solution, the whole issue with handling devices with the same stream ID but belonging to different SMMUs
would be gone. It would also result in a more natural way of the device tree look. Normally a guest would see
e.g. both SMMUs and exposing only one can be misleading.

> 
>>
>>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>>      • PMASTER_ID is the physical master ID of the device from the physical DT.
>>>      • VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>>
>> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?
> 
> In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
> vSMMUv3 node will be created in xl.
> 
>>
>>>  Example: Let's say the user wants to assign the below physical device in DT to the guest.
>>>  iommu@4f000000 {
>>>                 compatible = "arm,smmu-v3";
>>>                      interrupts = <0x00 0xe4 0xf04>;
>>>                 interrupt-parent = <0x01>;
>>>                 #iommu-cells = <0x01>;
>>>                 interrupt-names = "combined";
>>>                 reg = <0x00 0x4f000000 0x00 0x40000>;
>>>                 phandle = <0xfdeb>;
>>>                 name = "iommu";
>>> };
>>
>> So I guess this node will be written by Xen. How will you the case where there are extra property to added (e.g. dma-coherent)?
> 
> In this example this is physical IOMMU node. vIOMMU node wil be created by xl during guest creation.
>>
>>>  test@10000000 {
>>>      compatible = "viommu-test”;
>>>      iommus = <0xfdeb 0x10>;
>>
>> I am a bit confused. Here you use 0xfdeb for the phandle but below...
> 
> Here 0xfdeb is the physical IOMMU node phandle...
>>
>>>      interrupts = <0x00 0xff 0x04>;
>>>      reg = <0x00 0x10000000 0x00 0x1000>;
>>>      name = "viommu-test";
>>> };
>>>  The partial Device tree node will be like this:
>>>  / {
>>>     /* #*cells are here to keep DTC happy */
>>>     #address-cells = <2>;
>>>     #size-cells = <2>;
>>>       passthrough {
>>>         compatible = "simple-bus";
>>>         ranges;
>>>         #address-cells = <2>;
>>>         #size-cells = <2>;
>>>      test@10000000 {
>>>              compatible = "viommu-test";
>>>              reg = <0 0x10000000 0 0x1000>;
>>>              interrupts = <0 80 4  0 81 4  0 82 4>;
>>>              iommus = <0xfdea 0x01>;
>>
>> ... you use 0xfdea. Does this mean 'xl' will rewrite the phandle?
> 
> but here user has to set the “iommus” property with magic phanle as explained earlier. 0xfdea is magic phandle.
> 
> Regards,
> Rahul

~Michal



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 17:17     ` Michal Orzel
@ 2022-10-26 18:23       ` Oleksandr Tyshchenko
  2022-10-27 16:49         ` Rahul Singh
  2022-10-27 16:12       ` Rahul Singh
  1 sibling, 1 reply; 33+ messages in thread
From: Oleksandr Tyshchenko @ 2022-10-26 18:23 UTC (permalink / raw)
  To: Michal Orzel, Rahul Singh
  Cc: Julien Grall, Xen developer discussion, Stefano Stabellini,
	Bertrand Marquis, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 10851 bytes --]

On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.orzel@amd.com> wrote:

> Hi Rahul,
>


Hello all

[sorry for the possible format issues]


>
> On 26/10/2022 16:33, Rahul Singh wrote:
> >
> >
> > Hi Julien,
> >
> >> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
> >>
> >>
> >>
> >> On 26/10/2022 14:17, Rahul Singh wrote:
> >>> Hi All,
> >>
> >> Hi Rahul,
> >>
> >>> At Arm, we started to implement the POC to support 2 levels of page
> tables/nested translation in SMMUv3.
> >>> To support nested translation for guest OS Xen needs to expose the
> virtual IOMMU. If we passthrough the
> >>> device to the guest that is behind an IOMMU and virtual IOMMU is
> enabled for the guest there is a need to
> >>> add IOMMU binding for the device in the passthrough node as per [1].
> This email is to get an agreement on
> >>> how to add the IOMMU binding for guest OS.
> >>> Before I will explain how to add the IOMMU binding let me give a brief
> overview of how we will add support for virtual
> >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3
> Nested translation support. SMMUv3 hardware
> >>> supports two stages of translation. Each stage of translation can be
> independently enabled. An incoming address is logically
> >>> translated from VA to IPA in stage 1, then the IPA is input to stage 2
> which translates the IPA to the output PA. Stage 1 is
> >>> intended to be used by a software entity( Guest OS) to provide
> isolation or translation to buffers within the entity, for example,
> >>> DMA isolation within an OS. Stage 2 is intended to be available in
> systems supporting the Virtualization Extensions and is
> >>> intended to virtualize device DMA to guest VM address spaces. When
> both stage 1 and stage 2 are enabled, the translation
> >>> configuration is called nesting.
> >>> Stage 1 translation support is required to provide isolation between
> different devices within the guest OS. XEN already supports
> >>> Stage 2 translation but there is no support for Stage 1 translation
> for guests. We will add support for guests to configure
> >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU
> hardware and exposes the virtual SMMU to the guest.
> >>> Guest can use the native SMMU driver to configure the stage 1
> translation. When the guest configures the SMMU for Stage 1,
> >>> XEN will trap the access and configure the hardware accordingly.
> >>> Now back to the question of how we can add the IOMMU binding between
> the virtual IOMMU and the master devices so that
> >>> guests can configure the IOMMU correctly. The solution that I am
> suggesting is as below:
> >>> For dom0, while handling the DT node(handle_node()) Xen will replace
> the phandle in the "iommus" property with the virtual
> >>> IOMMU node phandle.
> >> Below, you said that each IOMMUs may have a different ID space. So
> shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the
> user to specify the mapping?
> >
> > Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This
> also helps in the ACPI case
> > where we don’t need to modify the tables to delete the pIOMMU entries
> and create one vIOMMU.
> > In this case, no need to replace the phandle as Xen create the vIOMMU
> with the same pIOMMU
> > phandle and same base address.
> >
> > For domU guests one vIOMMU per guest will be created.
> >
> >>
> >>> For domU guests, when passthrough the device to the guest as per [2],
> add the below property in the partial device tree
> >>> node that is required to describe the generic device tree binding for
> IOMMUs and their master(s)
> >>> "iommus = < &magic_phandle 0xvMasterID>
> >>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that
> will be documented so that the user can set that in partial DT node
> (0xfdea).
> >>
> >> Does this mean only one IOMMU will be supported in the guest?
> >
> > Yes.
> >
> >>
> >>>      • vMasterID will be the virtual master ID that the user will
> provide.
> >>> The partial device tree will look like this:
> >>> /dts-v1/;
> >>>  / {
> >>>     /* #*cells are here to keep DTC happy */
> >>>     #address-cells = <2>;
> >>>     #size-cells = <2>;
> >>>       aliases {
> >>>         net = &mac0;
> >>>     };
> >>>       passthrough {
> >>>         compatible = "simple-bus";
> >>>         ranges;
> >>>         #address-cells = <2>;
> >>>         #size-cells = <2>;
> >>>         mac0: ethernet@10000000 {
> >>>             compatible = "calxeda,hb-xgmac";
> >>>             reg = <0 0x10000000 0 0x1000>;
> >>>             interrupts = <0 80 4  0 81 4  0 82 4>;
> >>>            iommus = <0xfdea 0x01>;
> >>>         };
> >>>     };
> >>> };
> >>>  In xl.cfg we need to define a new option to inform Xen about
> vMasterId to pMasterId mapping and to which IOMMU device this
> >>> the master device is connected so that Xen can configure the right
> IOMMU. This is required if the system has devices that have
> >>> the same master ID but behind a different IOMMU.
> >>
> >> In xl.cfg, we already pass the device-tree node path to passthrough. So
> Xen should already have all the information about the IOMMU and Master-ID.
> So it doesn't seem necessary for Device-Tree.
> >>
> >> For ACPI, I would have expected the information to be found in the
> IOREQ.
> >>
> >> So can you add more context why this is necessary for everyone?
> >
> > We have information for IOMMU and Master-ID but we don’t have
> information for linking vMaster-ID to pMaster-ID.
> > The device tree node will be used to assign the device to the guest and
> configure the Stage-2 translation. Guest will use the
> > vMaster-ID to configure the vIOMMU during boot. Xen needs information to
> link vMaster-ID to pMaster-ID to configure
> > the corresponding pIOMMU. As I mention we need vMaster-ID in case a
> system could have 2 identical Master-ID but
> > each one connected to a different SMMU and assigned to the guest.
>
> I think the proposed solution would work and I would just like to clear
> some issues.
>
> Please correct me if I'm wrong:
>
> In the xl config file we already need to specify dtdev to point to the
> device path in host dtb.
> In the partial device tree we specify the vMasterId as well as magic
> phandle.
> Isn't it that we already have all the information necessary without the
> need for iommu_devid_map?
> For me it looks like the partial dtb provides vMasterID and dtdev provides
> pMasterID as well as physical phandle to SMMU.
>
> Having said that, I can also understand that specifying everything in one
> place using iommu_devid_map can be easier
> and reduces the need for device tree parsing.
>
> Apart from that, what is the reason of exposing only one vSMMU to guest
> instead of one vSMMU per pSMMU?
> In the latter solution, the whole issue with handling devices with the
> same stream ID but belonging to different SMMUs
> would be gone. It would also result in a more natural way of the device
> tree look. Normally a guest would see
> e.g. both SMMUs and exposing only one can be misleading.
>

I also have the same question. From earlier answers as I understand it is
going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge for
DomU?

Also I am thinking how this solution would work for IPMMU-VMSA Gen3(Gen4),
which also supports two stages of translation, so the nested translation
could be possible in general, although there might be some pitfalls
(yes, I understand that code to emulate access to control registers would
be different in comparison with SMMUv3, but some other code could be
common).




>
> >>
> >>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” ,
> “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> >>>      • PMASTER_ID is the physical master ID of the device from the
> physical DT.
> >>>      • VMASTER_ID is the virtual master Id that the user will
> configure in the partial device tree.
> >>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU
> device to which this device is connected.



If iommu_devid_map is a way to go, I have a question, would this
configuration cover the following cases?
1. Device has several stream IDs
2. Several devices share the stream ID (or several stream IDs)




> >>
> >> Below you give an example for Platform device. How would that fit in
> the context of PCI passthrough?
> >
> > In PCI passthrough case, xl will create the "iommu-map" property in vpci
> host bridge node with phandle to vIOMMU node.
> > vSMMUv3 node will be created in xl.
> >
> >>
> >>>  Example: Let's say the user wants to assign the below physical device
> in DT to the guest.
> >>>  iommu@4f000000 {
> >>>                 compatible = "arm,smmu-v3";
> >>>                      interrupts = <0x00 0xe4 0xf04>;
> >>>                 interrupt-parent = <0x01>;
> >>>                 #iommu-cells = <0x01>;
> >>>                 interrupt-names = "combined";
> >>>                 reg = <0x00 0x4f000000 0x00 0x40000>;
> >>>                 phandle = <0xfdeb>;
> >>>                 name = "iommu";
> >>> };
> >>
> >> So I guess this node will be written by Xen. How will you the case
> where there are extra property to added (e.g. dma-coherent)?
> >
> > In this example this is physical IOMMU node. vIOMMU node wil be created
> by xl during guest creation.
> >>
> >>>  test@10000000 {
> >>>      compatible = "viommu-test”;
> >>>      iommus = <0xfdeb 0x10>;
> >>
> >> I am a bit confused. Here you use 0xfdeb for the phandle but below...
> >
> > Here 0xfdeb is the physical IOMMU node phandle...
> >>
> >>>      interrupts = <0x00 0xff 0x04>;
> >>>      reg = <0x00 0x10000000 0x00 0x1000>;
> >>>      name = "viommu-test";
> >>> };
> >>>  The partial Device tree node will be like this:
> >>>  / {
> >>>     /* #*cells are here to keep DTC happy */
> >>>     #address-cells = <2>;
> >>>     #size-cells = <2>;
> >>>       passthrough {
> >>>         compatible = "simple-bus";
> >>>         ranges;
> >>>         #address-cells = <2>;
> >>>         #size-cells = <2>;
> >>>      test@10000000 {
> >>>              compatible = "viommu-test";
> >>>              reg = <0 0x10000000 0 0x1000>;
> >>>              interrupts = <0 80 4  0 81 4  0 82 4>;
> >>>              iommus = <0xfdea 0x01>;
> >>
> >> ... you use 0xfdea. Does this mean 'xl' will rewrite the phandle?
> >
> > but here user has to set the “iommus” property with magic phanle as
> explained earlier. 0xfdea is magic phandle.
> >
> > Regards,
> > Rahul
>
> ~Michal
>
>
>

-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 14132 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 14:33   ` Rahul Singh
  2022-10-26 17:17     ` Michal Orzel
@ 2022-10-26 19:48     ` Julien Grall
  2022-10-27 16:08       ` Rahul Singh
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-10-26 19:48 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross



On 26/10/2022 15:33, Rahul Singh wrote:
> Hi Julien,

Hi Rahul,

>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 26/10/2022 14:17, Rahul Singh wrote:
>>> Hi All,
>>
>> Hi Rahul,
>>
>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>> how to add the IOMMU binding for guest OS.
>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>> configuration is called nesting.
>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>> XEN will trap the access and configure the hardware accordingly.
>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>> IOMMU node phandle.
>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
> 
> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
> phandle and same base address.
> 
> For domU guests one vIOMMU per guest will be created.

IIRC, the SMMUv3 is using a ring like the GICv3 ITS. I think we need to 
be open here because this may end up to be tricky to security support it 
(we have N guest ring that can write to M host ring).

> 
>>
>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>> "iommus = < &magic_phandle 0xvMasterID>
>>> 	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>
>> Does this mean only one IOMMU will be supported in the guest?
> 
> Yes.
> 
>>
>>> 	• vMasterID will be the virtual master ID that the user will provide.
>>> The partial device tree will look like this:
>>> /dts-v1/;
>>>   / {
>>>      /* #*cells are here to keep DTC happy */
>>>      #address-cells = <2>;
>>>      #size-cells = <2>;
>>>        aliases {
>>>          net = &mac0;
>>>      };
>>>        passthrough {
>>>          compatible = "simple-bus";
>>>          ranges;
>>>          #address-cells = <2>;
>>>          #size-cells = <2>;
>>>          mac0: ethernet@10000000 {
>>>              compatible = "calxeda,hb-xgmac";
>>>              reg = <0 0x10000000 0 0x1000>;
>>>              interrupts = <0 80 4  0 81 4  0 82 4>;
>>>             iommus = <0xfdea 0x01>;
>>>          };
>>>      };
>>> };
>>>   In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>> the same master ID but behind a different IOMMU.
>>
>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>
>> For ACPI, I would have expected the information to be found in the IOREQ.
>>
>> So can you add more context why this is necessary for everyone?
> 
> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.

I am confused. Below, you are making the virtual master ID optional. So 
shouldn't this be mandatory if you really need the mapping with the 
virtual ID?

> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
> each one connected to a different SMMU and assigned to the guest.

I am afraid I still don't understand why this is a requirement. Libxl 
could have enough knowledge (which will be necessarry for the PCI case) 
to know the IOMMU and pMasterID associated with a device.

So libxl could allocate the vMasterID, tell Xen the corresponding 
mapping and update the device-tree.

IOW, it doesn't seem to be necessary to involve the user in the process 
here.

> 
>>
>>>   iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>> 	• PMASTER_ID is the physical master ID of the device from the physical DT.
>>> 	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>>> 	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>>
>> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?
> 
> In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
> vSMMUv3 node will be created in xl.

This means that libxl will need to know the associated pMasterID to a 
PCI device. So, I don't understand why you can't do the same for 
platform devices.

> 
>>
>>>   Example: Let's say the user wants to assign the below physical device in DT to the guest.
>>>   iommu@4f000000 {
>>>                  compatible = "arm,smmu-v3";
>>>               	interrupts = <0x00 0xe4 0xf04>;
>>>                  interrupt-parent = <0x01>;
>>>                  #iommu-cells = <0x01>;
>>>                  interrupt-names = "combined";
>>>                  reg = <0x00 0x4f000000 0x00 0x40000>;
>>>                  phandle = <0xfdeb>;
>>>                  name = "iommu";
>>> };
>>
>> So I guess this node will be written by Xen. How will you the case where there are extra property to added (e.g. dma-coherent)?
> 
> In this example this is physical IOMMU node. vIOMMU node wil be created by xl during guest creation.

Ok. I think it would be better to use very different phandle in your 
example so it doesn't look like a mistake.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 13:17 Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices Rahul Singh
  2022-10-26 13:36 ` Julien Grall
@ 2022-10-27  9:01 ` Ayan Kumar Halder
  2022-10-27  9:41   ` Ayan Kumar Halder
  1 sibling, 1 reply; 33+ messages in thread
From: Ayan Kumar Halder @ 2022-10-27  9:01 UTC (permalink / raw)
  To: Rahul Singh, Xen developer discussion
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Oleksandr Tyshchenko, Oleksandr Andrushchenko, Volodymyr Babchuk,
	Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross


On 26/10/2022 14:17, Rahul Singh wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
> Hi All,

Hi Rahul,

I have a very basic question.

>
> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
> how to add the IOMMU binding for guest OS.
>
> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
> configuration is called nesting.
>
> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports

Doesn't this imply that there is support for Stage 1 translation for 
guests ? Otherwise, how will the guest provide isolation between 
different devices or dma-masters ?

- Ayan

> Stage 2 translation but there is no support for Stage 1 translation for guests.

> We will add support for guests to configure
> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
> XEN will trap the access and configure the hardware accordingly.
>
> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>
> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
> IOMMU node phandle.
>
> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>
> "iommus = < &magic_phandle 0xvMasterID>
>          • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>          • vMasterID will be the virtual master ID that the user will provide.
>
> The partial device tree will look like this:
> /dts-v1/;
>
> / {
>      /* #*cells are here to keep DTC happy */
>      #address-cells = <2>;
>      #size-cells = <2>;
>
>      aliases {
>          net = &mac0;
>      };
>
>      passthrough {
>          compatible = "simple-bus";
>          ranges;
>          #address-cells = <2>;
>          #size-cells = <2>;
>          mac0: ethernet@10000000 {
>              compatible = "calxeda,hb-xgmac";
>              reg = <0 0x10000000 0 0x1000>;
>              interrupts = <0 80 4  0 81 4  0 82 4>;
>             iommus = <0xfdea 0x01>;
>          };
>      };
> };
>
> In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
> the same master ID but behind a different IOMMU.
>
> iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>
>          • PMASTER_ID is the physical master ID of the device from the physical DT.
>          • VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>          • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>
> Example: Let's say the user wants to assign the below physical device in DT to the guest.
>
> iommu@4f000000 {
>                  compatible = "arm,smmu-v3";
>                  interrupts = <0x00 0xe4 0xf04>;
>                  interrupt-parent = <0x01>;
>                  #iommu-cells = <0x01>;
>                  interrupt-names = "combined";
>                  reg = <0x00 0x4f000000 0x00 0x40000>;
>                  phandle = <0xfdeb>;
>                  name = "iommu";
> };
>
> test@10000000 {
>          compatible = "viommu-test”;
>          iommus = <0xfdeb 0x10>;
>          interrupts = <0x00 0xff 0x04>;
>          reg = <0x00 0x10000000 0x00 0x1000>;
>          name = "viommu-test";
> };
>
> The partial Device tree node will be like this:
>
> / {
>      /* #*cells are here to keep DTC happy */
>      #address-cells = <2>;
>      #size-cells = <2>;
>
>      passthrough {
>          compatible = "simple-bus";
>          ranges;
>          #address-cells = <2>;
>          #size-cells = <2>;
>
>          test@10000000 {
>                  compatible = "viommu-test";
>                  reg = <0 0x10000000 0 0x1000>;
>                  interrupts = <0 80 4  0 81 4  0 82 4>;
>                  iommus = <0xfdea 0x01>;
>          };
>      };
> };
>
>   iommu_devid_map = [ “0x10@0x01,0x4f000000”]
>          • 0x10 is the real physical master id from the physical DT.
>          • 0x01 is the virtual master Id that the user defines as a partial device tree.
>          • 0x4f000000 is the base address of the IOMMU device.
>
> [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt
> [2] https://xenbits.xen.org/docs/unstable/misc/arm/passthrough.txt
>
> Regards,
> Rahul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-27  9:01 ` Ayan Kumar Halder
@ 2022-10-27  9:41   ` Ayan Kumar Halder
  0 siblings, 0 replies; 33+ messages in thread
From: Ayan Kumar Halder @ 2022-10-27  9:41 UTC (permalink / raw)
  To: Rahul Singh, Xen developer discussion


On 27/10/2022 10:01, Ayan Kumar Halder wrote:
>
> On 26/10/2022 14:17, Rahul Singh wrote:
>> CAUTION: This message has originated from an External Source. Please 
>> use proper judgment and caution when opening attachments, clicking 
>> links, or responding to this email.
>>
>>
>> Hi All,
>
> Hi Rahul,
>
> I have a very basic question.
>
>>
>> At Arm, we started to implement the POC to support 2 levels of page 
>> tables/nested translation in SMMUv3.
>> To support nested translation for guest OS Xen needs to expose the 
>> virtual IOMMU. If we passthrough the
>> device to the guest that is behind an IOMMU and virtual IOMMU is 
>> enabled for the guest there is a need to
>> add IOMMU binding for the device in the passthrough node as per [1]. 
>> This email is to get an agreement on
>> how to add the IOMMU binding for guest OS.
>>
>> Before I will explain how to add the IOMMU binding let me give a 
>> brief overview of how we will add support for virtual
>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 
>> Nested translation support. SMMUv3 hardware
>> supports two stages of translation. Each stage of translation can be 
>> independently enabled. An incoming address is logically
>> translated from VA to IPA in stage 1, then the IPA is input to stage 
>> 2 which translates the IPA to the output PA. Stage 1 is
>> intended to be used by a software entity( Guest OS) to provide 
>> isolation or translation to buffers within the entity, for example,
>> DMA isolation within an OS. Stage 2 is intended to be available in 
>> systems supporting the Virtualization Extensions and is
>> intended to virtualize device DMA to guest VM address spaces. When 
>> both stage 1 and stage 2 are enabled, the translation
>> configuration is called nesting.
>>
>> Stage 1 translation support is required to provide isolation between 
>> different devices within the guest OS. XEN already supports
>
> Doesn't this imply that there is support for Stage 1 translation for 
> guests ? Otherwise, how will the guest provide isolation between 
> different devices or dma-masters ?

Michal explained to me offline. This query is answered. Sorry for the noise.

- Ayan

>
> - Ayan
>
>> Stage 2 translation but there is no support for Stage 1 translation 
>> for guests.
>
>> We will add support for guests to configure
>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU 
>> hardware and exposes the virtual SMMU to the guest.
>> Guest can use the native SMMU driver to configure the stage 1 
>> translation. When the guest configures the SMMU for Stage 1,
>> XEN will trap the access and configure the hardware accordingly.
>>
>> Now back to the question of how we can add the IOMMU binding between 
>> the virtual IOMMU and the master devices so that
>> guests can configure the IOMMU correctly. The solution that I am 
>> suggesting is as below:
>>
>> For dom0, while handling the DT node(handle_node()) Xen will replace 
>> the phandle in the "iommus" property with the virtual
>> IOMMU node phandle.
>>
>> For domU guests, when passthrough the device to the guest as per 
>> [2],  add the below property in the partial device tree
>> node that is required to describe the generic device tree binding for 
>> IOMMUs and their master(s)
>>
>> "iommus = < &magic_phandle 0xvMasterID>
>>          • magic_phandle will be the phandle ( vIOMMU phandle in xl)  
>> that will be documented so that the user can set that in partial DT 
>> node (0xfdea).
>>          • vMasterID will be the virtual master ID that the user will 
>> provide.
>>
>> The partial device tree will look like this:
>> /dts-v1/;
>>
>> / {
>>      /* #*cells are here to keep DTC happy */
>>      #address-cells = <2>;
>>      #size-cells = <2>;
>>
>>      aliases {
>>          net = &mac0;
>>      };
>>
>>      passthrough {
>>          compatible = "simple-bus";
>>          ranges;
>>          #address-cells = <2>;
>>          #size-cells = <2>;
>>          mac0: ethernet@10000000 {
>>              compatible = "calxeda,hb-xgmac";
>>              reg = <0 0x10000000 0 0x1000>;
>>              interrupts = <0 80 4  0 81 4  0 82 4>;
>>             iommus = <0xfdea 0x01>;
>>          };
>>      };
>> };
>>
>> In xl.cfg we need to define a new option to inform Xen about 
>> vMasterId to pMasterId mapping and to which IOMMU device this
>> the master device is connected so that Xen can configure the right 
>> IOMMU. This is required if the system has devices that have
>> the same master ID but behind a different IOMMU.
>>
>> iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , 
>> “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>
>>          • PMASTER_ID is the physical master ID of the device from 
>> the physical DT.
>>          • VMASTER_ID is the virtual master Id that the user will 
>> configure in the partial device tree.
>>          • IOMMU_BASE_ADDRESS is the base address of the physical 
>> IOMMU device to which this device is connected.
>>
>> Example: Let's say the user wants to assign the below physical device 
>> in DT to the guest.
>>
>> iommu@4f000000 {
>>                  compatible = "arm,smmu-v3";
>>                  interrupts = <0x00 0xe4 0xf04>;
>>                  interrupt-parent = <0x01>;
>>                  #iommu-cells = <0x01>;
>>                  interrupt-names = "combined";
>>                  reg = <0x00 0x4f000000 0x00 0x40000>;
>>                  phandle = <0xfdeb>;
>>                  name = "iommu";
>> };
>>
>> test@10000000 {
>>          compatible = "viommu-test”;
>>          iommus = <0xfdeb 0x10>;
>>          interrupts = <0x00 0xff 0x04>;
>>          reg = <0x00 0x10000000 0x00 0x1000>;
>>          name = "viommu-test";
>> };
>>
>> The partial Device tree node will be like this:
>>
>> / {
>>      /* #*cells are here to keep DTC happy */
>>      #address-cells = <2>;
>>      #size-cells = <2>;
>>
>>      passthrough {
>>          compatible = "simple-bus";
>>          ranges;
>>          #address-cells = <2>;
>>          #size-cells = <2>;
>>
>>          test@10000000 {
>>                  compatible = "viommu-test";
>>                  reg = <0 0x10000000 0 0x1000>;
>>                  interrupts = <0 80 4  0 81 4  0 82 4>;
>>                  iommus = <0xfdea 0x01>;
>>          };
>>      };
>> };
>>
>>   iommu_devid_map = [ “0x10@0x01,0x4f000000”]
>>          • 0x10 is the real physical master id from the physical DT.
>>          • 0x01 is the virtual master Id that the user defines as a 
>> partial device tree.
>>          • 0x4f000000 is the base address of the IOMMU device.
>>
>> [1] 
>> https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt
>> [2] https://xenbits.xen.org/docs/unstable/misc/arm/passthrough.txt
>>
>> Regards,
>> Rahul
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 19:48     ` Julien Grall
@ 2022-10-27 16:08       ` Rahul Singh
  2022-10-27 16:33         ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Rahul Singh @ 2022-10-27 16:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Julien,


> On 26 Oct 2022, at 8:48 pm, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 26/10/2022 15:33, Rahul Singh wrote:
>> Hi Julien,
> 
> Hi Rahul,
> 
>>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 26/10/2022 14:17, Rahul Singh wrote:
>>>> Hi All,
>>> 
>>> Hi Rahul,
>>> 
>>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>>> how to add the IOMMU binding for guest OS.
>>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>>> configuration is called nesting.
>>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>>> XEN will trap the access and configure the hardware accordingly.
>>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>>> IOMMU node phandle.
>>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
>> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
>> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
>> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
>> phandle and same base address.
>> For domU guests one vIOMMU per guest will be created.
> 
> IIRC, the SMMUv3 is using a ring like the GICv3 ITS. I think we need to be open here because this may end up to be tricky to security support it (we have N guest ring that can write to M host ring).

If xl want to creates the one vIOMMU per pIOMMU for domU then xl needs to know the below information:
 -  Find the number of holes in guest memory same as the number of vIOMMU that needs the creation to create the vIOMMU DT nodes. (Think about a big system that has 50+ IOMMUs)
    Yes, we will create vIOMMU for only those devices that are assigned to guests but still we need to find the hole in guest memory.
 -  Find the pIOMMU attached to the assigned device and create mapping b/w vIOMMU -> pIOMMU to register the MMIO handler.
    Either we need to modify the current hyerpcall or need to implement a new hypercall to find this information. 

Because of the above reason I thought of creating one vIOMMU for domU. Yes you are right this may end up to be tricky to security support
but as per my understanding one vIOMMU  per domU guest is easy to implement and simple to handle as compared to one vIOMMU per pIOMMU


> 
>>> 
>>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>>> "iommus = < &magic_phandle 0xvMasterID>
>>>> 	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>> 
>>> Does this mean only one IOMMU will be supported in the guest?
>> Yes.
>>> 
>>>> 	• vMasterID will be the virtual master ID that the user will provide.
>>>> The partial device tree will look like this:
>>>> /dts-v1/;
>>>>  / {
>>>>     /* #*cells are here to keep DTC happy */
>>>>     #address-cells = <2>;
>>>>     #size-cells = <2>;
>>>>       aliases {
>>>>         net = &mac0;
>>>>     };
>>>>       passthrough {
>>>>         compatible = "simple-bus";
>>>>         ranges;
>>>>         #address-cells = <2>;
>>>>         #size-cells = <2>;
>>>>         mac0: ethernet@10000000 {
>>>>             compatible = "calxeda,hb-xgmac";
>>>>             reg = <0 0x10000000 0 0x1000>;
>>>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>            iommus = <0xfdea 0x01>;
>>>>         };
>>>>     };
>>>> };
>>>>  In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>>> the same master ID but behind a different IOMMU.
>>> 
>>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>> 
>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>> 
>>> So can you add more context why this is necessary for everyone?
>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
> 
> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?

vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.

> 
>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>> each one connected to a different SMMU and assigned to the guest.
> 
> I am afraid I still don't understand why this is a requirement. Libxl could have enough knowledge (which will be necessarry for the PCI case) to know the IOMMU and pMasterID associated with a device.
> 
> So libxl could allocate the vMasterID, tell Xen the corresponding mapping and update the device-tree.
> 
> IOW, it doesn't seem to be necessary to involve the user in the process here.

Yes, libxl could allocate the vMasterID but there is no way we can find the link b/w vMasterID created to pMasterID from dtdev.

What I understand from the code is that there is no link between the passthrough node and dtdev config option. The passthrough
node is directly copied to guest DT without any modification. Dtdev is used to add and assign the device to IOMMU. 

Let's take an example if the user wants to assign two devices to the guest via passthrough node.

/dts-v1/;

/ {
   /* #*cells are here to keep DTC happy */
   #address-cells = <2>;
   #size-cells = <2>;

   aliases {
       net = &mac0;
   };

   passthrough {
       compatible = "simple-bus";
       ranges;
       #address-cells = <2>;
       #size-cells = <2>;

       mac0: ethernet@10000000 {
           compatible = "calxeda,hb-xgmac";
           reg = <0 0x10000000 0 0x1000>;
           interrupts = <0 80 4  0 81 4  0 82 4>;
       };

     mac1: ethernet@20000000 {
           compatible = “r8169";
           reg = <0 0x10000000 0 0x1000>;
           interrupts = <0 80 4  0 81 4  0 82 4>;
       };

   };
};

dtdev = [ "/soc/ethernet@10000000”, “/soc/ethernet@f2000000” ]

There is no link which dtdev entry belongs to which node. Therefor there is no way to link the vMasterID created to pMasterID.

> 
>>> 
>>>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>>> 	• PMASTER_ID is the physical master ID of the device from the physical DT.
>>>> 	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>>>> 	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>>> 
>>> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?
>> In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
>> vSMMUv3 node will be created in xl.
> 
> This means that libxl will need to know the associated pMasterID to a PCI device. So, I don't understand why you can't do the same for platform devices.

For the PCI passthrough case, we don’t need to provide the MasterID to create "iommu-map” property as for 
PCI device MasterID is RID ( BDF ). For non-PCI devices, MasterID is required to create “iommus” property.

Also, VPCI will create the virtual SBDF when we assigned PCI devices to the guest. Xen can easily find the
physical SBDF and pIOMMU from virtual SBDF as Xen has all the information for PCI devices.

> 
>>> 
>>>>  Example: Let's say the user wants to assign the below physical device in DT to the guest.
>>>>  iommu@4f000000 {
>>>>                 compatible = "arm,smmu-v3";
>>>>              	interrupts = <0x00 0xe4 0xf04>;
>>>>                 interrupt-parent = <0x01>;
>>>>                 #iommu-cells = <0x01>;
>>>>                 interrupt-names = "combined";
>>>>                 reg = <0x00 0x4f000000 0x00 0x40000>;
>>>>                 phandle = <0xfdeb>;
>>>>                 name = "iommu";
>>>> };
>>> 
>>> So I guess this node will be written by Xen. How will you the case where there are extra property to added (e.g. dma-coherent)?
>> In this example this is physical IOMMU node. vIOMMU node wil be created by xl during guest creation.
> 
> Ok. I think it would be better to use very different phandle in your example so it doesn't look like a mistake.
> 

Ack

Regards,
Rahul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 17:17     ` Michal Orzel
  2022-10-26 18:23       ` Oleksandr Tyshchenko
@ 2022-10-27 16:12       ` Rahul Singh
  1 sibling, 0 replies; 33+ messages in thread
From: Rahul Singh @ 2022-10-27 16:12 UTC (permalink / raw)
  To: Michal Orzel
  Cc: Julien Grall, Xen developer discussion, Stefano Stabellini,
	Bertrand Marquis, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Michal,

> On 26 Oct 2022, at 6:17 pm, Michal Orzel <michal.orzel@amd.com> wrote:
> 
> Hi Rahul,
> 
> On 26/10/2022 16:33, Rahul Singh wrote:
>> 
>> 
>> Hi Julien,
>> 
>>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 26/10/2022 14:17, Rahul Singh wrote:
>>>> Hi All,
>>> 
>>> Hi Rahul,
>>> 
>>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>>> how to add the IOMMU binding for guest OS.
>>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>>> configuration is called nesting.
>>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>>> XEN will trap the access and configure the hardware accordingly.
>>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>>> IOMMU node phandle.
>>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
>> 
>> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
>> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
>> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
>> phandle and same base address.
>> 
>> For domU guests one vIOMMU per guest will be created.
>> 
>>> 
>>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>>> "iommus = < &magic_phandle 0xvMasterID>
>>>>     • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>> 
>>> Does this mean only one IOMMU will be supported in the guest?
>> 
>> Yes.
>> 
>>> 
>>>>     • vMasterID will be the virtual master ID that the user will provide.
>>>> The partial device tree will look like this:
>>>> /dts-v1/;
>>>> / {
>>>>    /* #*cells are here to keep DTC happy */
>>>>    #address-cells = <2>;
>>>>    #size-cells = <2>;
>>>>      aliases {
>>>>        net = &mac0;
>>>>    };
>>>>      passthrough {
>>>>        compatible = "simple-bus";
>>>>        ranges;
>>>>        #address-cells = <2>;
>>>>        #size-cells = <2>;
>>>>        mac0: ethernet@10000000 {
>>>>            compatible = "calxeda,hb-xgmac";
>>>>            reg = <0 0x10000000 0 0x1000>;
>>>>            interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>           iommus = <0xfdea 0x01>;
>>>>        };
>>>>    };
>>>> };
>>>> In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>>> the same master ID but behind a different IOMMU.
>>> 
>>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>> 
>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>> 
>>> So can you add more context why this is necessary for everyone?
>> 
>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>> each one connected to a different SMMU and assigned to the guest.
> 
> I think the proposed solution would work and I would just like to clear some issues.
> 
> Please correct me if I'm wrong:
> 
> In the xl config file we already need to specify dtdev to point to the device path in host dtb.
> In the partial device tree we specify the vMasterId as well as magic phandle.
> Isn't it that we already have all the information necessary without the need for iommu_devid_map?
> For me it looks like the partial dtb provides vMasterID and dtdev provides pMasterID as well as physical phandle to SMMU.
> 
> Having said that, I can also understand that specifying everything in one place using iommu_devid_map can be easier
> and reduces the need for device tree parsing.
> 
> Apart from that, what is the reason of exposing only one vSMMU to guest instead of one vSMMU per pSMMU?
> In the latter solution, the whole issue with handling devices with the same stream ID but belonging to different SMMUs
> would be gone. It would also result in a more natural way of the device tree look. Normally a guest would see
> e.g. both SMMUs and exposing only one can be misleading.

Please see the other email that I replied to Julien to know the answer to the above question.

Regards,
Rahul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-27 16:08       ` Rahul Singh
@ 2022-10-27 16:33         ` Julien Grall
  2022-10-27 17:18           ` Michal Orzel
  2022-10-28 12:54           ` Rahul Singh
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2022-10-27 16:33 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

On 27/10/2022 17:08, Rahul Singh wrote:
> Hi Julien,

Hi Rahul,

>> On 26 Oct 2022, at 8:48 pm, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 26/10/2022 15:33, Rahul Singh wrote:
>>> Hi Julien,
>>
>> Hi Rahul,
>>
>>>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>>>
>>>>
>>>>
>>>> On 26/10/2022 14:17, Rahul Singh wrote:
>>>>> Hi All,
>>>>
>>>> Hi Rahul,
>>>>
>>>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>>>> how to add the IOMMU binding for guest OS.
>>>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>>>> configuration is called nesting.
>>>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>>>> XEN will trap the access and configure the hardware accordingly.
>>>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>>>> IOMMU node phandle.
>>>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
>>> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
>>> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
>>> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
>>> phandle and same base address.
>>> For domU guests one vIOMMU per guest will be created.
>>
>> IIRC, the SMMUv3 is using a ring like the GICv3 ITS. I think we need to be open here because this may end up to be tricky to security support it (we have N guest ring that can write to M host ring).
> 
> If xl want to creates the one vIOMMU per pIOMMU for domU then xl needs to know the below information:
>   -  Find the number of holes in guest memory same as the number of vIOMMU that needs the creation to create the vIOMMU DT nodes. (Think about a big system that has 50+ IOMMUs)
>      Yes, we will create vIOMMU for only those devices that are assigned to guests but still we need to find the hole in guest memory.

I agree this is a problem with the one vIOMMU per pIOMMU.

>   -  Find the pIOMMU attached to the assigned device and create mapping b/w vIOMMU -> pIOMMU to register the MMIO handler.
>      Either we need to modify the current hyerpcall or need to implement a new hypercall to find this information.

Adding hypercalls are is not a big problem.

> 
> Because of the above reason I thought of creating one vIOMMU for domU. Yes you are right this may end up to be tricky to security support
> but as per my understanding one vIOMMU  per domU guest is easy to implement and simple to handle as compared to one vIOMMU per pIOMMU

I am not sure about this. My gut feeling is the code in Xen will end up 
to be tricky (there more that Xen doesn't support preemption). So I 
think we will trade-off complexity in Xen over simplicity in libxl.

That said, I haven't looked deeper in the code. So I may be wrong. I 
will need to see the code to confirm.

>>>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>>>> "iommus = < &magic_phandle 0xvMasterID>
>>>>> 	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>>>
>>>> Does this mean only one IOMMU will be supported in the guest?
>>> Yes.
>>>>
>>>>> 	• vMasterID will be the virtual master ID that the user will provide.
>>>>> The partial device tree will look like this:
>>>>> /dts-v1/;
>>>>>   / {
>>>>>      /* #*cells are here to keep DTC happy */
>>>>>      #address-cells = <2>;
>>>>>      #size-cells = <2>;
>>>>>        aliases {
>>>>>          net = &mac0;
>>>>>      };
>>>>>        passthrough {
>>>>>          compatible = "simple-bus";
>>>>>          ranges;
>>>>>          #address-cells = <2>;
>>>>>          #size-cells = <2>;
>>>>>          mac0: ethernet@10000000 {
>>>>>              compatible = "calxeda,hb-xgmac";
>>>>>              reg = <0 0x10000000 0 0x1000>;
>>>>>              interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>>             iommus = <0xfdea 0x01>;
>>>>>          };
>>>>>      };
>>>>> };
>>>>>   In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>>>> the same master ID but behind a different IOMMU.
>>>>
>>>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>>>
>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>
>>>> So can you add more context why this is necessary for everyone?
>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>
>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
> 
> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.

So the expectation is the user will be able to know that the pMasterID 
is uniq. This may be easy with a couple of SMMUs, but if you have 50+ 
(as suggested above). This will become a pain on larger system.

IHMO, it would be much better if we can detect that in libxl (see below).

> 
>>
>>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>>> each one connected to a different SMMU and assigned to the guest.
>>
>> I am afraid I still don't understand why this is a requirement. Libxl could have enough knowledge (which will be necessarry for the PCI case) to know the IOMMU and pMasterID associated with a device.
>>
>> So libxl could allocate the vMasterID, tell Xen the corresponding mapping and update the device-tree.
>>
>> IOW, it doesn't seem to be necessary to involve the user in the process here.
> 
> Yes, libxl could allocate the vMasterID but there is no way we can find the link b/w vMasterID created to pMasterID from dtdev.
> 
> What I understand from the code is that there is no link between the passthrough node and dtdev config option. The passthrough
> node is directly copied to guest DT without any modification. Dtdev is used to add and assign the device to IOMMU.
> 
> Let's take an example if the user wants to assign two devices to the guest via passthrough node.
> 
> /dts-v1/;
> 
> / {
>     /* #*cells are here to keep DTC happy */
>     #address-cells = <2>;
>     #size-cells = <2>;
> 
>     aliases {
>         net = &mac0;
>     };
> 
>     passthrough {
>         compatible = "simple-bus";
>         ranges;
>         #address-cells = <2>;
>         #size-cells = <2>;
> 
>         mac0: ethernet@10000000 {
>             compatible = "calxeda,hb-xgmac";
>             reg = <0 0x10000000 0 0x1000>;
>             interrupts = <0 80 4  0 81 4  0 82 4>;
>         };
> 
>       mac1: ethernet@20000000 {
>             compatible = “r8169";
>             reg = <0 0x10000000 0 0x1000>;
>             interrupts = <0 80 4  0 81 4  0 82 4>;
>         };
> 
>     };
> };
> 
> dtdev = [ "/soc/ethernet@10000000”, “/soc/ethernet@f2000000” ]
> 
> There is no link which dtdev entry belongs to which node. Therefor there is no way to link the vMasterID created to pMasterID.

I agree there is no link today. But we could add a property in the 
partial device-tree to mention which physical device is associated.

With that, I think all, the complexity is moved to libxl and it will be 
easier for the user to use vIOMMU.

[...]

>>>>>   iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>>>> 	• PMASTER_ID is the physical master ID of the device from the physical DT.
>>>>> 	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>>>>> 	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>>>>
>>>> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?
>>> In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
>>> vSMMUv3 node will be created in xl.
>>
>> This means that libxl will need to know the associated pMasterID to a PCI device. So, I don't understand why you can't do the same for platform devices.
> 
> For the PCI passthrough case, we don’t need to provide the MasterID to create "iommu-map” property as for
> PCI device MasterID is RID ( BDF ). For non-PCI devices, MasterID is required to create “iommus” property.

Are you talking about the physical MasterID or virtual one? If physical 
MasterID then I don't think this is always the RID (see [1]). But for 
the virtual Master ID we could make this association.

This still means that in some way the toolstack need to let Xen know (or 
the other way around) the mapping between the pMasterID and vMasterID.

[1] Documentation/devicetree/bindings/pci/pci-iommu.txt.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-26 18:23       ` Oleksandr Tyshchenko
@ 2022-10-27 16:49         ` Rahul Singh
  2022-10-28 15:26           ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 33+ messages in thread
From: Rahul Singh @ 2022-10-27 16:49 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Michal Orzel, Julien Grall, Xen developer discussion,
	Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Oleksandr Tyshchenko, Oleksandr Andrushchenko, Volodymyr Babchuk,
	Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Oleksandr,

> On 26 Oct 2022, at 7:23 pm, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> 
> 
> On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.orzel@amd.com> wrote:
> Hi Rahul,
>  
> 
> Hello all
> 
> [sorry for the possible format issues]
>  
> 
> On 26/10/2022 16:33, Rahul Singh wrote:
> > 
> > 
> > Hi Julien,
> > 
> >> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
> >>
> >>
> >>
> >> On 26/10/2022 14:17, Rahul Singh wrote:
> >>> Hi All,
> >>
> >> Hi Rahul,
> >>
> >>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
> >>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
> >>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
> >>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
> >>> how to add the IOMMU binding for guest OS.
> >>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
> >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
> >>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
> >>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
> >>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
> >>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
> >>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
> >>> configuration is called nesting.
> >>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
> >>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
> >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
> >>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
> >>> XEN will trap the access and configure the hardware accordingly.
> >>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
> >>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
> >>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
> >>> IOMMU node phandle.
> >> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
> > 
> > Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
> > where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
> > In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
> > phandle and same base address.
> > 
> > For domU guests one vIOMMU per guest will be created.
> > 
> >>
> >>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
> >>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
> >>> "iommus = < &magic_phandle 0xvMasterID>
> >>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
> >>
> >> Does this mean only one IOMMU will be supported in the guest?
> > 
> > Yes.
> > 
> >>
> >>>      • vMasterID will be the virtual master ID that the user will provide.
> >>> The partial device tree will look like this:
> >>> /dts-v1/;
> >>>  / {
> >>>     /* #*cells are here to keep DTC happy */
> >>>     #address-cells = <2>;
> >>>     #size-cells = <2>;
> >>>       aliases {
> >>>         net = &mac0;
> >>>     };
> >>>       passthrough {
> >>>         compatible = "simple-bus";
> >>>         ranges;
> >>>         #address-cells = <2>;
> >>>         #size-cells = <2>;
> >>>         mac0: ethernet@10000000 {
> >>>             compatible = "calxeda,hb-xgmac";
> >>>             reg = <0 0x10000000 0 0x1000>;
> >>>             interrupts = <0 80 4  0 81 4  0 82 4>;
> >>>            iommus = <0xfdea 0x01>;
> >>>         };
> >>>     };
> >>> };
> >>>  In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
> >>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
> >>> the same master ID but behind a different IOMMU.
> >>
> >> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
> >>
> >> For ACPI, I would have expected the information to be found in the IOREQ.
> >>
> >> So can you add more context why this is necessary for everyone?
> > 
> > We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
> > The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
> > vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
> > the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
> > each one connected to a different SMMU and assigned to the guest.
> 
> I think the proposed solution would work and I would just like to clear some issues.
> 
> Please correct me if I'm wrong:
> 
> In the xl config file we already need to specify dtdev to point to the device path in host dtb.
> In the partial device tree we specify the vMasterId as well as magic phandle.
> Isn't it that we already have all the information necessary without the need for iommu_devid_map?
> For me it looks like the partial dtb provides vMasterID and dtdev provides pMasterID as well as physical phandle to SMMU.
> 
> Having said that, I can also understand that specifying everything in one place using iommu_devid_map can be easier
> and reduces the need for device tree parsing.
> 
> Apart from that, what is the reason of exposing only one vSMMU to guest instead of one vSMMU per pSMMU?
> In the latter solution, the whole issue with handling devices with the same stream ID but belonging to different SMMUs
> would be gone. It would also result in a more natural way of the device tree look. Normally a guest would see
> e.g. both SMMUs and exposing only one can be misleading.
> 
> I also have the same question. From earlier answers as I understand it is going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge for DomU?
> 
> Also I am thinking how this solution would work for IPMMU-VMSA Gen3(Gen4), which also supports two stages of translation, so the nested translation could be possible in general, although there might be some pitfalls
> (yes, I understand that code to emulate access to control registers would be different in comparison with SMMUv3, but some other code could be common).  

Yes we will try to make code common so that other vIOMMU can be implemented easily. 
>    
> 
> 
>  
> 
> >>
> >>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> >>>      • PMASTER_ID is the physical master ID of the device from the physical DT.
> >>>      • VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
> >>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>  
> 
> If iommu_devid_map is a way to go, I have a question, would this configuration cover the following cases?
> 1. Device has several stream IDs

Yes in that case user needs to create the mapping for each streamIDs. For example if device has streamId 0x10 , 0x20 and 0x30.
iommu_devid_map will be:

iommu_devid_map = ["0x10@0x01,0x40000000”, "0x20@0x02,0x40000000”,"0x30@0x03,0x40000000”]

Here 0x40000000 is physical IOMMU base address.

> 2. Several devices share the stream ID (or several stream IDs)

Let take an example of two devices :

Device 1: 0x10
Device 2: 0x10

Iommu_devid_map = [“0x10@0x1,0x40000000”,"0x10@0x2,0x40000000”]

Xen will create the data structure that include vStreamID, pMasterID and IOMMU base address. 
With the help of three tuples we will be able to find the right physical IOMMU. 


Regards,
Rahul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-27 16:33         ` Julien Grall
@ 2022-10-27 17:18           ` Michal Orzel
  2022-10-28 12:54           ` Rahul Singh
  1 sibling, 0 replies; 33+ messages in thread
From: Michal Orzel @ 2022-10-27 17:18 UTC (permalink / raw)
  To: Julien Grall, Rahul Singh
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Rahul,

On 27/10/2022 18:33, Julien Grall wrote:
> 
> 
> On 27/10/2022 17:08, Rahul Singh wrote:
>> Hi Julien,
> 
> Hi Rahul,
> 
>>> On 26 Oct 2022, at 8:48 pm, Julien Grall <julien@xen.org> wrote:
>>>
>>>
>>>
>>> On 26/10/2022 15:33, Rahul Singh wrote:
>>>> Hi Julien,
>>>
>>> Hi Rahul,
>>>
>>>>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 26/10/2022 14:17, Rahul Singh wrote:
>>>>>> Hi All,
>>>>>
>>>>> Hi Rahul,
>>>>>
>>>>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>>>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>>>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>>>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>>>>> how to add the IOMMU binding for guest OS.
>>>>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>>>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>>>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>>>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>>>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>>>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>>>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>>>>> configuration is called nesting.
>>>>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>>>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>>>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>>>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>>>>> XEN will trap the access and configure the hardware accordingly.
>>>>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>>>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>>>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>>>>> IOMMU node phandle.
>>>>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
>>>> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
>>>> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
>>>> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
>>>> phandle and same base address.
>>>> For domU guests one vIOMMU per guest will be created.
>>>
>>> IIRC, the SMMUv3 is using a ring like the GICv3 ITS. I think we need to be open here because this may end up to be tricky to security support it (we have N guest ring that can write to M host ring).
>>
>> If xl want to creates the one vIOMMU per pIOMMU for domU then xl needs to know the below information:
>>   -  Find the number of holes in guest memory same as the number of vIOMMU that needs the creation to create the vIOMMU DT nodes. (Think about a big system that has 50+ IOMMUs)
>>      Yes, we will create vIOMMU for only those devices that are assigned to guests but still we need to find the hole in guest memory.
> 
> I agree this is a problem with the one vIOMMU per pIOMMU.
> 
>>   -  Find the pIOMMU attached to the assigned device and create mapping b/w vIOMMU -> pIOMMU to register the MMIO handler.
>>      Either we need to modify the current hyerpcall or need to implement a new hypercall to find this information.
> 
> Adding hypercalls are is not a big problem.
> 
>>
>> Because of the above reason I thought of creating one vIOMMU for domU. Yes you are right this may end up to be tricky to security support
>> but as per my understanding one vIOMMU  per domU guest is easy to implement and simple to handle as compared to one vIOMMU per pIOMMU
> 
> I am not sure about this. My gut feeling is the code in Xen will end up
> to be tricky (there more that Xen doesn't support preemption). So I
> think we will trade-off complexity in Xen over simplicity in libxl.
> 
> That said, I haven't looked deeper in the code. So I may be wrong. I
> will need to see the code to confirm.
> 
>>>>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>>>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>>>>> "iommus = < &magic_phandle 0xvMasterID>
>>>>>>   • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>>>>
>>>>> Does this mean only one IOMMU will be supported in the guest?
>>>> Yes.
>>>>>
>>>>>>   • vMasterID will be the virtual master ID that the user will provide.
>>>>>> The partial device tree will look like this:
>>>>>> /dts-v1/;
>>>>>>   / {
>>>>>>      /* #*cells are here to keep DTC happy */
>>>>>>      #address-cells = <2>;
>>>>>>      #size-cells = <2>;
>>>>>>        aliases {
>>>>>>          net = &mac0;
>>>>>>      };
>>>>>>        passthrough {
>>>>>>          compatible = "simple-bus";
>>>>>>          ranges;
>>>>>>          #address-cells = <2>;
>>>>>>          #size-cells = <2>;
>>>>>>          mac0: ethernet@10000000 {
>>>>>>              compatible = "calxeda,hb-xgmac";
>>>>>>              reg = <0 0x10000000 0 0x1000>;
>>>>>>              interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>>>             iommus = <0xfdea 0x01>;
>>>>>>          };
>>>>>>      };
>>>>>> };
>>>>>>   In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>>>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>>>>> the same master ID but behind a different IOMMU.
>>>>>
>>>>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>>>>
>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>
>>>>> So can you add more context why this is necessary for everyone?
>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>
>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>
>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
> 
> So the expectation is the user will be able to know that the pMasterID
> is uniq. This may be easy with a couple of SMMUs, but if you have 50+
> (as suggested above). This will become a pain on larger system.
> 
> IHMO, it would be much better if we can detect that in libxl (see below).
> 
>>
>>>
>>>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>>>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>>>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>>>> each one connected to a different SMMU and assigned to the guest.
>>>
>>> I am afraid I still don't understand why this is a requirement. Libxl could have enough knowledge (which will be necessarry for the PCI case) to know the IOMMU and pMasterID associated with a device.
>>>
>>> So libxl could allocate the vMasterID, tell Xen the corresponding mapping and update the device-tree.
>>>
>>> IOW, it doesn't seem to be necessary to involve the user in the process here.
>>
>> Yes, libxl could allocate the vMasterID but there is no way we can find the link b/w vMasterID created to pMasterID from dtdev.
>>
>> What I understand from the code is that there is no link between the passthrough node and dtdev config option. The passthrough
>> node is directly copied to guest DT without any modification. Dtdev is used to add and assign the device to IOMMU.
>>
>> Let's take an example if the user wants to assign two devices to the guest via passthrough node.
>>
>> /dts-v1/;
>>
>> / {
>>     /* #*cells are here to keep DTC happy */
>>     #address-cells = <2>;
>>     #size-cells = <2>;
>>
>>     aliases {
>>         net = &mac0;
>>     };
>>
>>     passthrough {
>>         compatible = "simple-bus";
>>         ranges;
>>         #address-cells = <2>;
>>         #size-cells = <2>;
>>
>>         mac0: ethernet@10000000 {
>>             compatible = "calxeda,hb-xgmac";
>>             reg = <0 0x10000000 0 0x1000>;
>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>         };
>>
>>       mac1: ethernet@20000000 {
>>             compatible = “r8169";
>>             reg = <0 0x10000000 0 0x1000>;
>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>         };
>>
>>     };
>> };
>>
>> dtdev = [ "/soc/ethernet@10000000”, “/soc/ethernet@f2000000” ]
>>
>> There is no link which dtdev entry belongs to which node. Therefor there is no way to link the vMasterID created to pMasterID.
> 
> I agree there is no link today. But we could add a property in the
> partial device-tree to mention which physical device is associated.
+1

And we already have this property in partial device trees for dom0less domUs:
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/arm/passthrough.txt;h=219d1cca571b01bc8f0afbbe64435299547fed75;hb=HEAD#l104

FWIK, the solution proposed in this thread was chosen due to the fact that at the moment we do not parse the partial device tree in libxl.
But if this is a way to go (to reduce the complexity in Xen), then it will allow us to drop the need for both specifying vMasterID and iommu_devid_map.

> 
> With that, I think all, the complexity is moved to libxl and it will be
> easier for the user to use vIOMMU.
> 
> [...]
> 
>>>>>>   iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>>>>>   • PMASTER_ID is the physical master ID of the device from the physical DT.
>>>>>>   • VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>>>>>>   • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>>>>>
>>>>> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?
>>>> In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
>>>> vSMMUv3 node will be created in xl.
>>>
>>> This means that libxl will need to know the associated pMasterID to a PCI device. So, I don't understand why you can't do the same for platform devices.
>>
>> For the PCI passthrough case, we don’t need to provide the MasterID to create "iommu-map” property as for
>> PCI device MasterID is RID ( BDF ). For non-PCI devices, MasterID is required to create “iommus” property.
> 
> Are you talking about the physical MasterID or virtual one? If physical
> MasterID then I don't think this is always the RID (see [1]). But for
> the virtual Master ID we could make this association.
> 
> This still means that in some way the toolstack need to let Xen know (or
> the other way around) the mapping between the pMasterID and vMasterID.
> 
> [1] Documentation/devicetree/bindings/pci/pci-iommu.txt.
> 
> Cheers,
> 
> --
> Julien Grall
> 

~Michal


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-27 16:33         ` Julien Grall
  2022-10-27 17:18           ` Michal Orzel
@ 2022-10-28 12:54           ` Rahul Singh
  2022-10-28 13:06             ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Rahul Singh @ 2022-10-28 12:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Julien,

> On 27 Oct 2022, at 5:33 pm, Julien Grall <julien@xen.org> wrote:
> 
> On 27/10/2022 17:08, Rahul Singh wrote:
>> Hi Julien,
> 
> Hi Rahul,
> 
>>> On 26 Oct 2022, at 8:48 pm, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 26/10/2022 15:33, Rahul Singh wrote:
>>>> Hi Julien,
>>> 
>>> Hi Rahul,
>>> 
>>>>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 26/10/2022 14:17, Rahul Singh wrote:
>>>>>> Hi All,
>>>>> 
>>>>> Hi Rahul,
>>>>> 
>>>>>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
>>>>>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
>>>>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
>>>>>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
>>>>>> how to add the IOMMU binding for guest OS.
>>>>>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
>>>>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
>>>>>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
>>>>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
>>>>>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
>>>>>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
>>>>>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
>>>>>> configuration is called nesting.
>>>>>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
>>>>>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
>>>>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
>>>>>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
>>>>>> XEN will trap the access and configure the hardware accordingly.
>>>>>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
>>>>>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
>>>>>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
>>>>>> IOMMU node phandle.
>>>>> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
>>>> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
>>>> where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
>>>> In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
>>>> phandle and same base address.
>>>> For domU guests one vIOMMU per guest will be created.
>>> 
>>> IIRC, the SMMUv3 is using a ring like the GICv3 ITS. I think we need to be open here because this may end up to be tricky to security support it (we have N guest ring that can write to M host ring).
>> If xl want to creates the one vIOMMU per pIOMMU for domU then xl needs to know the below information:
>>  -  Find the number of holes in guest memory same as the number of vIOMMU that needs the creation to create the vIOMMU DT nodes. (Think about a big system that has 50+ IOMMUs)
>>     Yes, we will create vIOMMU for only those devices that are assigned to guests but still we need to find the hole in guest memory.
> 
> I agree this is a problem with the one vIOMMU per pIOMMU.
> 
>>  -  Find the pIOMMU attached to the assigned device and create mapping b/w vIOMMU -> pIOMMU to register the MMIO handler.
>>     Either we need to modify the current hyerpcall or need to implement a new hypercall to find this information.
> 
> Adding hypercalls are is not a big problem.
> 
>> Because of the above reason I thought of creating one vIOMMU for domU. Yes you are right this may end up to be tricky to security support
>> but as per my understanding one vIOMMU  per domU guest is easy to implement and simple to handle as compared to one vIOMMU per pIOMMU
> 
> I am not sure about this. My gut feeling is the code in Xen will end up to be tricky (there more that Xen doesn't support preemption). So I think we will trade-off complexity in Xen over simplicity in libxl.
> 
> That said, I haven't looked deeper in the code. So I may be wrong. I will need to see the code to confirm.


I have implemented the code based on one vIOMMU per domU guest and will share the code for review.
We can make a decision at the time of review about which approach is better.

> 
>>>>>> For domU guests, when passthrough the device to the guest as per [2],  add the below property in the partial device tree
>>>>>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
>>>>>> "iommus = < &magic_phandle 0xvMasterID>
>>>>>> 	• magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will be documented so that the user can set that in partial DT node (0xfdea).
>>>>> 
>>>>> Does this mean only one IOMMU will be supported in the guest?
>>>> Yes.
>>>>> 
>>>>>> 	• vMasterID will be the virtual master ID that the user will provide.
>>>>>> The partial device tree will look like this:
>>>>>> /dts-v1/;
>>>>>>  / {
>>>>>>     /* #*cells are here to keep DTC happy */
>>>>>>     #address-cells = <2>;
>>>>>>     #size-cells = <2>;
>>>>>>       aliases {
>>>>>>         net = &mac0;
>>>>>>     };
>>>>>>       passthrough {
>>>>>>         compatible = "simple-bus";
>>>>>>         ranges;
>>>>>>         #address-cells = <2>;
>>>>>>         #size-cells = <2>;
>>>>>>         mac0: ethernet@10000000 {
>>>>>>             compatible = "calxeda,hb-xgmac";
>>>>>>             reg = <0 0x10000000 0 0x1000>;
>>>>>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>>>            iommus = <0xfdea 0x01>;
>>>>>>         };
>>>>>>     };
>>>>>> };
>>>>>>  In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
>>>>>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
>>>>>> the same master ID but behind a different IOMMU.
>>>>> 
>>>>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
>>>>> 
>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>> 
>>>>> So can you add more context why this is necessary for everyone?
>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>> 
>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
> 
> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
> 
> IHMO, it would be much better if we can detect that in libxl (see below).

We can make the vMasterID compulsory to avoid complexity in libxl to solve this.

> 
>>> 
>>>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>>>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>>>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>>>> each one connected to a different SMMU and assigned to the guest.
>>> 
>>> I am afraid I still don't understand why this is a requirement. Libxl could have enough knowledge (which will be necessarry for the PCI case) to know the IOMMU and pMasterID associated with a device.
>>> 
>>> So libxl could allocate the vMasterID, tell Xen the corresponding mapping and update the device-tree.
>>> 
>>> IOW, it doesn't seem to be necessary to involve the user in the process here.
>> Yes, libxl could allocate the vMasterID but there is no way we can find the link b/w vMasterID created to pMasterID from dtdev.
>> What I understand from the code is that there is no link between the passthrough node and dtdev config option. The passthrough
>> node is directly copied to guest DT without any modification. Dtdev is used to add and assign the device to IOMMU.
>> Let's take an example if the user wants to assign two devices to the guest via passthrough node.
>> /dts-v1/;
>> / {
>>    /* #*cells are here to keep DTC happy */
>>    #address-cells = <2>;
>>    #size-cells = <2>;
>>    aliases {
>>        net = &mac0;
>>    };
>>    passthrough {
>>        compatible = "simple-bus";
>>        ranges;
>>        #address-cells = <2>;
>>        #size-cells = <2>;
>>        mac0: ethernet@10000000 {
>>            compatible = "calxeda,hb-xgmac";
>>            reg = <0 0x10000000 0 0x1000>;
>>            interrupts = <0 80 4  0 81 4  0 82 4>;
>>        };
>>      mac1: ethernet@20000000 {
>>            compatible = “r8169";
>>            reg = <0 0x10000000 0 0x1000>;
>>            interrupts = <0 80 4  0 81 4  0 82 4>;
>>        };
>>    };
>> };
>> dtdev = [ "/soc/ethernet@10000000”, “/soc/ethernet@f2000000” ]
>> There is no link which dtdev entry belongs to which node. Therefor there is no way to link the vMasterID created to pMasterID.
> 
> I agree there is no link today. But we could add a property in the partial device-tree to mention which physical device is associated.
> 
> With that, I think all, the complexity is moved to libxl and it will be easier for the user to use vIOMMU.
> 
> [...]

As of now libxl directly coping the partial DT to guest DT without any modification. If we have to go to this route libxl has to modify
the partial DT in libxl to include “iommus” or "iommu-map”. Is that okay to modify the partial DT in libxl ?

> 
>>>>>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>>>>> 	• PMASTER_ID is the physical master ID of the device from the physical DT.
>>>>>> 	• VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
>>>>>> 	• IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>>>>> 
>>>>> Below you give an example for Platform device. How would that fit in the context of PCI passthrough?
>>>> In PCI passthrough case, xl will create the "iommu-map" property in vpci host bridge node with phandle to vIOMMU node.
>>>> vSMMUv3 node will be created in xl.
>>> 
>>> This means that libxl will need to know the associated pMasterID to a PCI device. So, I don't understand why you can't do the same for platform devices.
>> For the PCI passthrough case, we don’t need to provide the MasterID to create "iommu-map” property as for
>> PCI device MasterID is RID ( BDF ). For non-PCI devices, MasterID is required to create “iommus” property.
> 
> Are you talking about the physical MasterID or virtual one? If physical MasterID then I don't think this is always the RID (see [1]). But for the virtual Master ID we could make this association.
> 
> This still means that in some way the toolstack need to let Xen know (or the other way around) the mapping between the pMasterID and vMasterID.

Yes, I agree if RID is not the BDF then we need to let Xen know the mapping between the pMasterID and vMasterID. 

Regards,
Rahul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 12:54           ` Rahul Singh
@ 2022-10-28 13:06             ` Julien Grall
  2022-10-28 13:13               ` Bertrand Marquis
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-10-28 13:06 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Xen developer discussion, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Rahul,

On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>
>>>>>> So can you add more context why this is necessary for everyone?
>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>
>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>
>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>
>> IHMO, it would be much better if we can detect that in libxl (see below).
> 
> We can make the vMasterID compulsory to avoid complexity in libxl to solve this

In general, complexity in libxl is not too much of problem.

> 
>>
>>>>
>>>>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>>>>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>>>>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>>>>> each one connected to a different SMMU and assigned to the guest.
>>>>
>>>> I am afraid I still don't understand why this is a requirement. Libxl could have enough knowledge (which will be necessarry for the PCI case) to know the IOMMU and pMasterID associated with a device.
>>>>
>>>> So libxl could allocate the vMasterID, tell Xen the corresponding mapping and update the device-tree.
>>>>
>>>> IOW, it doesn't seem to be necessary to involve the user in the process here.
>>> Yes, libxl could allocate the vMasterID but there is no way we can find the link b/w vMasterID created to pMasterID from dtdev.
>>> What I understand from the code is that there is no link between the passthrough node and dtdev config option. The passthrough
>>> node is directly copied to guest DT without any modification. Dtdev is used to add and assign the device to IOMMU.
>>> Let's take an example if the user wants to assign two devices to the guest via passthrough node.
>>> /dts-v1/;
>>> / {
>>>     /* #*cells are here to keep DTC happy */
>>>     #address-cells = <2>;
>>>     #size-cells = <2>;
>>>     aliases {
>>>         net = &mac0;
>>>     };
>>>     passthrough {
>>>         compatible = "simple-bus";
>>>         ranges;
>>>         #address-cells = <2>;
>>>         #size-cells = <2>;
>>>         mac0: ethernet@10000000 {
>>>             compatible = "calxeda,hb-xgmac";
>>>             reg = <0 0x10000000 0 0x1000>;
>>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>>         };
>>>       mac1: ethernet@20000000 {
>>>             compatible = “r8169";
>>>             reg = <0 0x10000000 0 0x1000>;
>>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>>         };
>>>     };
>>> };
>>> dtdev = [ "/soc/ethernet@10000000”, “/soc/ethernet@f2000000” ]
>>> There is no link which dtdev entry belongs to which node. Therefor there is no way to link the vMasterID created to pMasterID.
>>
>> I agree there is no link today. But we could add a property in the partial device-tree to mention which physical device is associated.
>>
>> With that, I think all, the complexity is moved to libxl and it will be easier for the user to use vIOMMU.
>>
>> [...]
> 
> As of now libxl directly coping the partial DT to guest DT without any modification. If we have to go to this route libxl has to modify
> the partial DT in libxl to include “iommus” or "iommu-map”. Is that okay to modify the partial DT in libxl 

I am not aware of any issue to modify the partial device-tree. In fact, 
I am strongly in favor of libxl to modify it if it greatly improve the 
user experience.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 13:06             ` Julien Grall
@ 2022-10-28 13:13               ` Bertrand Marquis
  2022-10-28 13:27                 ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Bertrand Marquis @ 2022-10-28 13:13 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Xen developer discussion, Stefano Stabellini,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Julien,

> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
> 
> Hi Rahul,
> 
> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>> 
>>>>>>> So can you add more context why this is necessary for everyone?
>>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>> 
>>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>> 
>>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>> 
>>> IHMO, it would be much better if we can detect that in libxl (see below).
>> We can make the vMasterID compulsory to avoid complexity in libxl to solve this
> 
> In general, complexity in libxl is not too much of problem.

I am a bit unsure about this strategy.
Currently xl has one configuration file where you put all Xen parameters. The device tree is only needed by some guests to have a description of the system they run on.
If we change the model and say that Xen configuration parameters are both in the configuration and the device tree, we somehow enforce to have a device tree even though some guests do not need it at all (for example Zephyr).
I think we need to discuss that and make sure we stay coherent because right now the user will have to do things on the configuration and one thing in the device tree.

Cheers
Bertrand

> 
>>> 
>>>>> 
>>>>>> The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
>>>>>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
>>>>>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
>>>>>> each one connected to a different SMMU and assigned to the guest.
>>>>> 
>>>>> I am afraid I still don't understand why this is a requirement. Libxl could have enough knowledge (which will be necessarry for the PCI case) to know the IOMMU and pMasterID associated with a device.
>>>>> 
>>>>> So libxl could allocate the vMasterID, tell Xen the corresponding mapping and update the device-tree.
>>>>> 
>>>>> IOW, it doesn't seem to be necessary to involve the user in the process here.
>>>> Yes, libxl could allocate the vMasterID but there is no way we can find the link b/w vMasterID created to pMasterID from dtdev.
>>>> What I understand from the code is that there is no link between the passthrough node and dtdev config option. The passthrough
>>>> node is directly copied to guest DT without any modification. Dtdev is used to add and assign the device to IOMMU.
>>>> Let's take an example if the user wants to assign two devices to the guest via passthrough node.
>>>> /dts-v1/;
>>>> / {
>>>>    /* #*cells are here to keep DTC happy */
>>>>    #address-cells = <2>;
>>>>    #size-cells = <2>;
>>>>    aliases {
>>>>        net = &mac0;
>>>>    };
>>>>    passthrough {
>>>>        compatible = "simple-bus";
>>>>        ranges;
>>>>        #address-cells = <2>;
>>>>        #size-cells = <2>;
>>>>        mac0: ethernet@10000000 {
>>>>            compatible = "calxeda,hb-xgmac";
>>>>            reg = <0 0x10000000 0 0x1000>;
>>>>            interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>        };
>>>>      mac1: ethernet@20000000 {
>>>>            compatible = “r8169";
>>>>            reg = <0 0x10000000 0 0x1000>;
>>>>            interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>        };
>>>>    };
>>>> };
>>>> dtdev = [ "/soc/ethernet@10000000”, “/soc/ethernet@f2000000” ]
>>>> There is no link which dtdev entry belongs to which node. Therefor there is no way to link the vMasterID created to pMasterID.
>>> 
>>> I agree there is no link today. But we could add a property in the partial device-tree to mention which physical device is associated.
>>> 
>>> With that, I think all, the complexity is moved to libxl and it will be easier for the user to use vIOMMU.
>>> 
>>> [...]
>> As of now libxl directly coping the partial DT to guest DT without any modification. If we have to go to this route libxl has to modify
>> the partial DT in libxl to include “iommus” or "iommu-map”. Is that okay to modify the partial DT in libxl 
> 
> I am not aware of any issue to modify the partial device-tree. In fact, I am strongly in favor of libxl to modify it if it greatly improve the user experience.
> 
> Cheers,
> 
> -- 
> Julien Grall
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 13:13               ` Bertrand Marquis
@ 2022-10-28 13:27                 ` Julien Grall
  2022-10-28 14:37                   ` Bertrand Marquis
  2022-10-30 14:23                   ` Stefano Stabellini
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2022-10-28 13:27 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Xen developer discussion, Stefano Stabellini,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross



On 28/10/2022 14:13, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Rahul,
>>
>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>>>
>>>>>>>> So can you add more context why this is necessary for everyone?
>>>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>>>
>>>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>>>
>>>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>>>
>>>> IHMO, it would be much better if we can detect that in libxl (see below).
>>> We can make the vMasterID compulsory to avoid complexity in libxl to solve this
>>
>> In general, complexity in libxl is not too much of problem.
> 
> I am a bit unsure about this strategy.
> Currently xl has one configuration file where you put all Xen parameters. The device tree is only needed by some guests to have a description of the system they run on.
> If we change the model and say that Xen configuration parameters are both in the configuration and the device tree, we somehow enforce to have a device tree even though some guests do not need it at all (for example Zephyr).

I think my approach was misunderstood because there is no change in the 
existing model.

What I am suggesting is to not introduce iommu_devid_map but instead let 
libxl allocate the virtual Master-ID and create the mapping with the 
physical Master-ID.

Libxl would then update the property "iommus" in the device-tree with 
the allocated virtual Master-ID.

Each node in the partial device-tree would need to have a property
to refer to the physical device just so we know how to update the 
"iommus". The list of device passthrough will still be specified in the 
configuration file. IOW, the partial device-tree is not directly 
involved in the configuration of the guest.

So far, I don't see a particular issue with this approach because the 
vMaster ID algorithm allocation should be generic. But please let me 
know if you think there are bits I am missing.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 13:27                 ` Julien Grall
@ 2022-10-28 14:37                   ` Bertrand Marquis
  2022-10-28 15:01                     ` Julien Grall
  2022-10-30 14:23                   ` Stefano Stabellini
  1 sibling, 1 reply; 33+ messages in thread
From: Bertrand Marquis @ 2022-10-28 14:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Xen developer discussion, Stefano Stabellini,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Julien,

> On 28 Oct 2022, at 14:27, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 28/10/2022 14:13, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>> 
>>> Hi Rahul,
>>> 
>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>>>> 
>>>>>>>>> So can you add more context why this is necessary for everyone?
>>>>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>>>> 
>>>>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>>>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>>>> 
>>>>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>>>> 
>>>>> IHMO, it would be much better if we can detect that in libxl (see below).
>>>> We can make the vMasterID compulsory to avoid complexity in libxl to solve this
>>> 
>>> In general, complexity in libxl is not too much of problem.
>> I am a bit unsure about this strategy.
>> Currently xl has one configuration file where you put all Xen parameters. The device tree is only needed by some guests to have a description of the system they run on.
>> If we change the model and say that Xen configuration parameters are both in the configuration and the device tree, we somehow enforce to have a device tree even though some guests do not need it at all (for example Zephyr).
> 
> I think my approach was misunderstood because there is no change in the existing model.
> 
> What I am suggesting is to not introduce iommu_devid_map but instead let libxl allocate the virtual Master-ID and create the mapping with the physical Master-ID.
> 
> Libxl would then update the property "iommus" in the device-tree with the allocated virtual Master-ID.

Ok I understand now.

> 
> Each node in the partial device-tree would need to have a property
> to refer to the physical device just so we know how to update the "iommus". The list of device passthrough will still be specified in the configuration file. IOW, the partial device-tree is not directly involved in the configuration of the guest.

But we will generate it. How would something like Zephyr guest work ? Zephyr is not using the device tree we pass, it has an embedded one.

> 
> So far, I don't see a particular issue with this approach because the vMaster ID algorithm allocation should be generic. But please let me know if you think there are bits I am missing.

I am a bit afraid of things that are “automatic”.

For everything else we let the user in control (IPA for mapping, virtual interrupt number) and in this case we switch to a model where we automatically generated a vMaster ID.
With this model, guest not using the device tree will have to guess the vMaster ID or somehow know how the tools are generating it to use the right one.

Cheers
Bertrand


> 
> Cheers,
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 14:37                   ` Bertrand Marquis
@ 2022-10-28 15:01                     ` Julien Grall
  2022-10-28 15:45                       ` Bertrand Marquis
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-10-28 15:01 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Rahul Singh, Xen developer discussion, Stefano Stabellini,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross



On 28/10/2022 15:37, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 28 Oct 2022, at 14:27, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 28/10/2022 14:13, Bertrand Marquis wrote:
>>> Hi Julien,
>>
>> Hi Bertrand,
>>
>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>>>
>>>> Hi Rahul,
>>>>
>>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>>>>>
>>>>>>>>>> So can you add more context why this is necessary for everyone?
>>>>>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>>>>>
>>>>>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>>>>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>>>>>
>>>>>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>>>>>
>>>>>> IHMO, it would be much better if we can detect that in libxl (see below).
>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to solve this
>>>>
>>>> In general, complexity in libxl is not too much of problem.
>>> I am a bit unsure about this strategy.
>>> Currently xl has one configuration file where you put all Xen parameters. The device tree is only needed by some guests to have a description of the system they run on.
>>> If we change the model and say that Xen configuration parameters are both in the configuration and the device tree, we somehow enforce to have a device tree even though some guests do not need it at all (for example Zephyr).
>>
>> I think my approach was misunderstood because there is no change in the existing model.
>>
>> What I am suggesting is to not introduce iommu_devid_map but instead let libxl allocate the virtual Master-ID and create the mapping with the physical Master-ID.
>>
>> Libxl would then update the property "iommus" in the device-tree with the allocated virtual Master-ID.
> 
> Ok I understand now.
> 
>>
>> Each node in the partial device-tree would need to have a property
>> to refer to the physical device just so we know how to update the "iommus". The list of device passthrough will still be specified in the configuration file. IOW, the partial device-tree is not directly involved in the configuration of the guest.
> 
> But we will generate it. How would something like Zephyr guest work ? Zephyr is not using the device tree we pass, it has an embedded one.

In general, guest that don't use the device-tree/ACPI table to detect 
the layout are already in a bad situation because we don't guarantee 
that the layout (memory, interrupt...) will be stable across Xen 
version. Although, there are a implicit agreement that the layout will 
not change for minor release (i.e. 4.14.x).

But see below for some suggestions how this could be handled.

> 
>>
>> So far, I don't see a particular issue with this approach because the vMaster ID algorithm allocation should be generic. But please let me know if you think there are bits I am missing.
> 
> I am a bit afraid of things that are “automatic”.
> 
> For everything else we let the user in control (IPA for mapping, virtual interrupt number) and in this case we switch to a model where we automatically generated a vMaster ID.

We only let the user control where the device is mapped. But this is 
quite fragile... I think this should be generated at runtime.

> With this model, guest not using the device tree will have to guess the vMaster ID or somehow know how the tools are generating it to use the right one.

To be honest, this is already the case today because the layout exposed 
to the guest is technically not fixed. Yes, so far, we haven't changed 
it too much. But sooner or later, this is going to bite because we made 
clear that the layout is not stable.

Now, if those projects are willing to rebuild for each version, then we 
could use the following approach:
   1) Write the xl.cfg
   2) Ask libxl to generate the device-tree
   3) Build Zephyr
   4) Create the domain

The expectation is for a given Xen version (and compatible), libxl will 
always generate the same Device-Tree.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-27 16:49         ` Rahul Singh
@ 2022-10-28 15:26           ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 33+ messages in thread
From: Oleksandr Tyshchenko @ 2022-10-28 15:26 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Michal Orzel, Julien Grall, Xen developer discussion,
	Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Oleksandr Tyshchenko, Oleksandr Andrushchenko, Volodymyr Babchuk,
	Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 11255 bytes --]

On Thu, Oct 27, 2022 at 7:49 PM Rahul Singh <Rahul.Singh@arm.com> wrote:

> Hi Oleksandr,
>

Hello Rahul

[sorry for the possible format issues]


>
> > On 26 Oct 2022, at 7:23 pm, Oleksandr Tyshchenko <olekstysh@gmail.com>
> wrote:
> >
> >
> >
> > On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.orzel@amd.com>
> wrote:
> > Hi Rahul,
> >
> >
> > Hello all
> >
> > [sorry for the possible format issues]
> >
> >
> > On 26/10/2022 16:33, Rahul Singh wrote:
> > >
> > >
> > > Hi Julien,
> > >
> > >> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xen.org> wrote:
> > >>
> > >>
> > >>
> > >> On 26/10/2022 14:17, Rahul Singh wrote:
> > >>> Hi All,
> > >>
> > >> Hi Rahul,
> > >>
> > >>> At Arm, we started to implement the POC to support 2 levels of page
> tables/nested translation in SMMUv3.
> > >>> To support nested translation for guest OS Xen needs to expose the
> virtual IOMMU. If we passthrough the
> > >>> device to the guest that is behind an IOMMU and virtual IOMMU is
> enabled for the guest there is a need to
> > >>> add IOMMU binding for the device in the passthrough node as per [1].
> This email is to get an agreement on
> > >>> how to add the IOMMU binding for guest OS.
> > >>> Before I will explain how to add the IOMMU binding let me give a
> brief overview of how we will add support for virtual
> > >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3
> Nested translation support. SMMUv3 hardware
> > >>> supports two stages of translation. Each stage of translation can be
> independently enabled. An incoming address is logically
> > >>> translated from VA to IPA in stage 1, then the IPA is input to stage
> 2 which translates the IPA to the output PA. Stage 1 is
> > >>> intended to be used by a software entity( Guest OS) to provide
> isolation or translation to buffers within the entity, for example,
> > >>> DMA isolation within an OS. Stage 2 is intended to be available in
> systems supporting the Virtualization Extensions and is
> > >>> intended to virtualize device DMA to guest VM address spaces. When
> both stage 1 and stage 2 are enabled, the translation
> > >>> configuration is called nesting.
> > >>> Stage 1 translation support is required to provide isolation between
> different devices within the guest OS. XEN already supports
> > >>> Stage 2 translation but there is no support for Stage 1 translation
> for guests. We will add support for guests to configure
> > >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU
> hardware and exposes the virtual SMMU to the guest.
> > >>> Guest can use the native SMMU driver to configure the stage 1
> translation. When the guest configures the SMMU for Stage 1,
> > >>> XEN will trap the access and configure the hardware accordingly.
> > >>> Now back to the question of how we can add the IOMMU binding between
> the virtual IOMMU and the master devices so that
> > >>> guests can configure the IOMMU correctly. The solution that I am
> suggesting is as below:
> > >>> For dom0, while handling the DT node(handle_node()) Xen will replace
> the phandle in the "iommus" property with the virtual
> > >>> IOMMU node phandle.
> > >> Below, you said that each IOMMUs may have a different ID space. So
> shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the
> user to specify the mapping?
> > >
> > > Yes you are right we need to create one vIOMMU per pIOMMU for dom0.
> This also helps in the ACPI case
> > > where we don’t need to modify the tables to delete the pIOMMU entries
> and create one vIOMMU.
> > > In this case, no need to replace the phandle as Xen create the vIOMMU
> with the same pIOMMU
> > > phandle and same base address.
> > >
> > > For domU guests one vIOMMU per guest will be created.
> > >
> > >>
> > >>> For domU guests, when passthrough the device to the guest as per
> [2],  add the below property in the partial device tree
> > >>> node that is required to describe the generic device tree binding
> for IOMMUs and their master(s)
> > >>> "iommus = < &magic_phandle 0xvMasterID>
> > >>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)
> that will be documented so that the user can set that in partial DT node
> (0xfdea).
> > >>
> > >> Does this mean only one IOMMU will be supported in the guest?
> > >
> > > Yes.
> > >
> > >>
> > >>>      • vMasterID will be the virtual master ID that the user will
> provide.
> > >>> The partial device tree will look like this:
> > >>> /dts-v1/;
> > >>>  / {
> > >>>     /* #*cells are here to keep DTC happy */
> > >>>     #address-cells = <2>;
> > >>>     #size-cells = <2>;
> > >>>       aliases {
> > >>>         net = &mac0;
> > >>>     };
> > >>>       passthrough {
> > >>>         compatible = "simple-bus";
> > >>>         ranges;
> > >>>         #address-cells = <2>;
> > >>>         #size-cells = <2>;
> > >>>         mac0: ethernet@10000000 {
> > >>>             compatible = "calxeda,hb-xgmac";
> > >>>             reg = <0 0x10000000 0 0x1000>;
> > >>>             interrupts = <0 80 4  0 81 4  0 82 4>;
> > >>>            iommus = <0xfdea 0x01>;
> > >>>         };
> > >>>     };
> > >>> };
> > >>>  In xl.cfg we need to define a new option to inform Xen about
> vMasterId to pMasterId mapping and to which IOMMU device this
> > >>> the master device is connected so that Xen can configure the right
> IOMMU. This is required if the system has devices that have
> > >>> the same master ID but behind a different IOMMU.
> > >>
> > >> In xl.cfg, we already pass the device-tree node path to passthrough.
> So Xen should already have all the information about the IOMMU and
> Master-ID. So it doesn't seem necessary for Device-Tree.
> > >>
> > >> For ACPI, I would have expected the information to be found in the
> IOREQ.
> > >>
> > >> So can you add more context why this is necessary for everyone?
> > >
> > > We have information for IOMMU and Master-ID but we don’t have
> information for linking vMaster-ID to pMaster-ID.
> > > The device tree node will be used to assign the device to the guest
> and configure the Stage-2 translation. Guest will use the
> > > vMaster-ID to configure the vIOMMU during boot. Xen needs information
> to link vMaster-ID to pMaster-ID to configure
> > > the corresponding pIOMMU. As I mention we need vMaster-ID in case a
> system could have 2 identical Master-ID but
> > > each one connected to a different SMMU and assigned to the guest.
> >
> > I think the proposed solution would work and I would just like to clear
> some issues.
> >
> > Please correct me if I'm wrong:
> >
> > In the xl config file we already need to specify dtdev to point to the
> device path in host dtb.
> > In the partial device tree we specify the vMasterId as well as magic
> phandle.
> > Isn't it that we already have all the information necessary without the
> need for iommu_devid_map?
> > For me it looks like the partial dtb provides vMasterID and dtdev
> provides pMasterID as well as physical phandle to SMMU.
> >
> > Having said that, I can also understand that specifying everything in
> one place using iommu_devid_map can be easier
> > and reduces the need for device tree parsing.
> >
> > Apart from that, what is the reason of exposing only one vSMMU to guest
> instead of one vSMMU per pSMMU?
> > In the latter solution, the whole issue with handling devices with the
> same stream ID but belonging to different SMMUs
> > would be gone. It would also result in a more natural way of the device
> tree look. Normally a guest would see
> > e.g. both SMMUs and exposing only one can be misleading.
> >
> > I also have the same question. From earlier answers as I understand it
> is going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge
> for DomU?
> >
> > Also I am thinking how this solution would work for IPMMU-VMSA
> Gen3(Gen4), which also supports two stages of translation, so the nested
> translation could be possible in general, although there might be some
> pitfalls
> > (yes, I understand that code to emulate access to control registers
> would be different in comparison with SMMUv3, but some other code could be
> common).
>
> Yes we will try to make code common so that other vIOMMU can be
> implemented easily.
> >
> >
> >
> >
> >
> > >>
> > >>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” ,
> “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> > >>>      • PMASTER_ID is the physical master ID of the device from the
> physical DT.
> > >>>      • VMASTER_ID is the virtual master Id that the user will
> configure in the partial device tree.
> > >>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU
> device to which this device is connected.
> >
> >
> > If iommu_devid_map is a way to go, I have a question, would this
> configuration cover the following cases?
> > 1. Device has several stream IDs
>
> Yes in that case user needs to create the mapping for each streamIDs. For
> example if device has streamId 0x10 , 0x20 and 0x30.
> iommu_devid_map will be:
>
> iommu_devid_map = ["0x10@0x01,0x40000000”, "0x20@0x02
> ,0x40000000”,"0x30@0x03,0x40000000”]
>
> Here 0x40000000 is physical IOMMU base address.
>
> > 2. Several devices share the stream ID (or several stream IDs)
>
> Let take an example of two devices :
>
> Device 1: 0x10
> Device 2: 0x10
>
> Iommu_devid_map = [“0x10@0x1,0x40000000”,"0x10@0x2,0x40000000”]
>
> Xen will create the data structure that include vStreamID, pMasterID and
> IOMMU base address.
> With the help of three tuples we will be able to find the right physical
> IOMMU.



Thanks for the clarification, I see that iommu_devid_map is able to
describe various combinations, which is good. But, the user should be very
careful when filling in iommu_devid_map especially
if dealing with a system that has many iommus and devices with many stream
IDs, as it would be easy to make a mistake in that case.
As a real example, if I want to describe 5 DMA controllers assigned to the
guest where each has 16 uTLBs (this is an equivalent of stream IDs) I would
need to add 80 entries (quite lot) to iommu_devid_map with specifying
VMASTER_ID for each entry (as uTLBs are not unique across the system).

https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1042
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1084
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1126
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2450
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2492


So I agree in general with what has been said earlier in that thread to
*better* avoid user interaction
and teach the toolstack to do this automatically. At the same time I
understand this might be quite difficult to implement, etc.



>
>
>
> Regards,
> Rahul



-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 14677 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 15:01                     ` Julien Grall
@ 2022-10-28 15:45                       ` Bertrand Marquis
  2022-10-28 16:54                         ` Michal Orzel
  0 siblings, 1 reply; 33+ messages in thread
From: Bertrand Marquis @ 2022-10-28 15:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Rahul Singh, Xen developer discussion, Stefano Stabellini,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Julien,

> On 28 Oct 2022, at 16:01, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 28/10/2022 15:37, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>>> On 28 Oct 2022, at 14:27, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 28/10/2022 14:13, Bertrand Marquis wrote:
>>>> Hi Julien,
>>> 
>>> Hi Bertrand,
>>> 
>>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>>>> 
>>>>> Hi Rahul,
>>>>> 
>>>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>>>>>> 
>>>>>>>>>>> So can you add more context why this is necessary for everyone?
>>>>>>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>>>>>> 
>>>>>>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>>>>>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>>>>>> 
>>>>>>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>>>>>> 
>>>>>>> IHMO, it would be much better if we can detect that in libxl (see below).
>>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to solve this
>>>>> 
>>>>> In general, complexity in libxl is not too much of problem.
>>>> I am a bit unsure about this strategy.
>>>> Currently xl has one configuration file where you put all Xen parameters. The device tree is only needed by some guests to have a description of the system they run on.
>>>> If we change the model and say that Xen configuration parameters are both in the configuration and the device tree, we somehow enforce to have a device tree even though some guests do not need it at all (for example Zephyr).
>>> 
>>> I think my approach was misunderstood because there is no change in the existing model.
>>> 
>>> What I am suggesting is to not introduce iommu_devid_map but instead let libxl allocate the virtual Master-ID and create the mapping with the physical Master-ID.
>>> 
>>> Libxl would then update the property "iommus" in the device-tree with the allocated virtual Master-ID.
>> Ok I understand now.
>>> 
>>> Each node in the partial device-tree would need to have a property
>>> to refer to the physical device just so we know how to update the "iommus". The list of device passthrough will still be specified in the configuration file. IOW, the partial device-tree is not directly involved in the configuration of the guest.
>> But we will generate it. How would something like Zephyr guest work ? Zephyr is not using the device tree we pass, it has an embedded one.
> 
> In general, guest that don't use the device-tree/ACPI table to detect the layout are already in a bad situation because we don't guarantee that the layout (memory, interrupt...) will be stable across Xen version. Although, there are a implicit agreement that the layout will not change for minor release (i.e. 4.14.x).

Well right now we have no ACPI support.
But I still think that a non dtb guest is definitely a use case we need to keep in mind for embedded and safety as most proprietary RTOS are not using a device tree.

> 
> But see below for some suggestions how this could be handled.
> 
>>> 
>>> So far, I don't see a particular issue with this approach because the vMaster ID algorithm allocation should be generic. But please let me know if you think there are bits I am missing.
>> I am a bit afraid of things that are “automatic”.
>> For everything else we let the user in control (IPA for mapping, virtual interrupt number) and in this case we switch to a model where we automatically generated a vMaster ID.
> 
> We only let the user control where the device is mapped. But this is quite fragile... I think this should be generated at runtime.
> 
>> With this model, guest not using the device tree will have to guess the vMaster ID or somehow know how the tools are generating it to use the right one.
> 
> To be honest, this is already the case today because the layout exposed to the guest is technically not fixed. Yes, so far, we haven't changed it too much. But sooner or later, this is going to bite because we made clear that the layout is not stable.
> 
> Now, if those projects are willing to rebuild for each version, then we could use the following approach:
>  1) Write the xl.cfg
>  2) Ask libxl to generate the device-tree
>  3) Build Zephyr
>  4) Create the domain
> 
> The expectation is for a given Xen version (and compatible), libxl will always generate the same Device-Tree.

This is a good idea yes :-)

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 15:45                       ` Bertrand Marquis
@ 2022-10-28 16:54                         ` Michal Orzel
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Orzel @ 2022-10-28 16:54 UTC (permalink / raw)
  To: Bertrand Marquis, Julien Grall
  Cc: Rahul Singh, Xen developer discussion, Stefano Stabellini,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross


On 28/10/2022 17:45, Bertrand Marquis wrote:
> 
> 
> Hi Julien,
> 
>> On 28 Oct 2022, at 16:01, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 28/10/2022 15:37, Bertrand Marquis wrote:
>>> Hi Julien,
>>
>> Hi Bertrand,
>>
>>>> On 28 Oct 2022, at 14:27, Julien Grall <julien@xen.org> wrote:
>>>>
>>>>
>>>>
>>>> On 28/10/2022 14:13, Bertrand Marquis wrote:
>>>>> Hi Julien,
>>>>
>>>> Hi Bertrand,
>>>>
>>>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>>>>>
>>>>>> Hi Rahul,
>>>>>>
>>>>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>>>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>>>>>>>>>>>
>>>>>>>>>>>> So can you add more context why this is necessary for everyone?
>>>>>>>>>>> We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
>>>>>>>>>>
>>>>>>>>>> I am confused. Below, you are making the virtual master ID optional. So shouldn't this be mandatory if you really need the mapping with the virtual ID?
>>>>>>>>> vMasterID is optional if user knows pMasterID is unique on the system. But if pMasterId is not unique then user needs to provide the vMasterID.
>>>>>>>>
>>>>>>>> So the expectation is the user will be able to know that the pMasterID is uniq. This may be easy with a couple of SMMUs, but if you have 50+ (as suggested above). This will become a pain on larger system.
>>>>>>>>
>>>>>>>> IHMO, it would be much better if we can detect that in libxl (see below).
>>>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to solve this
>>>>>>
>>>>>> In general, complexity in libxl is not too much of problem.
>>>>> I am a bit unsure about this strategy.
>>>>> Currently xl has one configuration file where you put all Xen parameters. The device tree is only needed by some guests to have a description of the system they run on.
>>>>> If we change the model and say that Xen configuration parameters are both in the configuration and the device tree, we somehow enforce to have a device tree even though some guests do not need it at all (for example Zephyr).
>>>>
>>>> I think my approach was misunderstood because there is no change in the existing model.
>>>>
>>>> What I am suggesting is to not introduce iommu_devid_map but instead let libxl allocate the virtual Master-ID and create the mapping with the physical Master-ID.
>>>>
>>>> Libxl would then update the property "iommus" in the device-tree with the allocated virtual Master-ID.
>>> Ok I understand now.
>>>>
>>>> Each node in the partial device-tree would need to have a property
>>>> to refer to the physical device just so we know how to update the "iommus". The list of device passthrough will still be specified in the configuration file. IOW, the partial device-tree is not directly involved in the configuration of the guest.
>>> But we will generate it. How would something like Zephyr guest work ? Zephyr is not using the device tree we pass, it has an embedded one.
>>
>> In general, guest that don't use the device-tree/ACPI table to detect the layout are already in a bad situation because we don't guarantee that the layout (memory, interrupt...) will be stable across Xen version. Although, there are a implicit agreement that the layout will not change for minor release (i.e. 4.14.x).
> 
> Well right now we have no ACPI support.
> But I still think that a non dtb guest is definitely a use case we need to keep in mind for embedded and safety as most proprietary RTOS are not using a device tree.
> 
>>
>> But see below for some suggestions how this could be handled.
>>
>>>>
>>>> So far, I don't see a particular issue with this approach because the vMaster ID algorithm allocation should be generic. But please let me know if you think there are bits I am missing.
>>> I am a bit afraid of things that are “automatic”.
>>> For everything else we let the user in control (IPA for mapping, virtual interrupt number) and in this case we switch to a model where we automatically generated a vMaster ID.
>>
>> We only let the user control where the device is mapped. But this is quite fragile... I think this should be generated at runtime.
>>
>>> With this model, guest not using the device tree will have to guess the vMaster ID or somehow know how the tools are generating it to use the right one.
>>
>> To be honest, this is already the case today because the layout exposed to the guest is technically not fixed. Yes, so far, we haven't changed it too much. But sooner or later, this is going to bite because we made clear that the layout is not stable.
>>
>> Now, if those projects are willing to rebuild for each version, then we could use the following approach:
>>  1) Write the xl.cfg
>>  2) Ask libxl to generate the device-tree
>>  3) Build Zephyr
>>  4) Create the domain
>>
>> The expectation is for a given Xen version (and compatible), libxl will always generate the same Device-Tree.
> 
> This is a good idea yes :-)

Zephyr still uses a device tree but in a static way - everything must be defined in a .dts before building it.
The steps mentioned by Julien are already followed by Zephyr when building it to run as a Xen VM.
You can take a look at the "Updating configuration" section at the bottom of the following site:
https://docs.zephyrproject.org/latest/boards/arm64/xenvm/doc/index.html

So, as we tend to use Zephyr as a de facto RTOS for Xen, it is already aware of possible changes to the layout.

> 
> Cheers
> Bertrand
> 
>>
>> Cheers,
>>
>> --
>> Julien Grall
> 

~Michal


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-28 13:27                 ` Julien Grall
  2022-10-28 14:37                   ` Bertrand Marquis
@ 2022-10-30 14:23                   ` Stefano Stabellini
  2022-10-30 19:57                     ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2022-10-30 14:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Bertrand Marquis, Rahul Singh, Xen developer discussion,
	Stefano Stabellini, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 5379 bytes --]

On Fri, 28 Oct 2022, Julien Grall wrote:
> On 28/10/2022 14:13, Bertrand Marquis wrote:
> > > On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
> > > 
> > > Hi Rahul,
> > > 
> > > On 28/10/2022 13:54, Rahul Singh wrote:
> > > > > > > > > For ACPI, I would have expected the information to be found in
> > > > > > > > > the IOREQ.
> > > > > > > > > 
> > > > > > > > > So can you add more context why this is necessary for
> > > > > > > > > everyone?
> > > > > > > > We have information for IOMMU and Master-ID but we don’t have
> > > > > > > > information for linking vMaster-ID to pMaster-ID.
> > > > > > > 
> > > > > > > I am confused. Below, you are making the virtual master ID
> > > > > > > optional. So shouldn't this be mandatory if you really need the
> > > > > > > mapping with the virtual ID?
> > > > > > vMasterID is optional if user knows pMasterID is unique on the
> > > > > > system. But if pMasterId is not unique then user needs to provide
> > > > > > the vMasterID.
> > > > > 
> > > > > So the expectation is the user will be able to know that the pMasterID
> > > > > is uniq. This may be easy with a couple of SMMUs, but if you have 50+
> > > > > (as suggested above). This will become a pain on larger system.
> > > > > 
> > > > > IHMO, it would be much better if we can detect that in libxl (see
> > > > > below).
> > > > We can make the vMasterID compulsory to avoid complexity in libxl to
> > > > solve this
> > > 
> > > In general, complexity in libxl is not too much of problem.

I agree with this and also I agree with Julien's other statement:

"I am strongly in favor of libxl to modify it if it greatly improves the
user experience."

I am always in favor of reducing complexity for the user as they
typically can't deal with tricky details such as MasterIDs. In general,
I think we need more automation with our tooling.

However, it might not be as simple as adding support for automatically
generating IDs in libxl because we have 2 additional cases to support:
1) dom0less
2) statically built guests

For 1) we would need the same support also in Xen? Which means more
complexity in Xen.

2) are guests like Zephyr that consume a device tree at
build time instead of runtime. These guests are built specifically for a
given environment and it is not a problem to rebuild them for every Xen
release.

However I think it is going to be a problem if we have to run libxl to
get the device tree needed for the Zephyr build. That is because it
means that the Zephyr build system would have to learn how to compile
(or crosscompile) libxl in order to retrieve the data needed for its
input. Even for systems based on Yocto (Yocto already knows how to build
libxl) would cause issues because of internal dependencies this would
introduce.

So I think the automatic generation might be best done in another tool.

I think we need something like a script that takes a partial device tree
as input and provides a more detailed partial device tree as output with
the generated IDs.

If we did it that way, we could call the script from libxl, but also we
could call it separately from ImageBuilder for dom0less and Zephyr/Yocto
could also call it.

Basically we make it easier for everyone to use it. The only price to
pay is that it will be a bit less efficient for xl guests (one more
script to fork and exec) but I think is a good compromise.

Another advantage is that in fully static workflows we could call the
script ahead of time (e.g. from Lopper/ImageBuilder) and still have full
knowledge of the device tree of all the guests which is great from a
safety perspective.


> > I am a bit unsure about this strategy.
> > Currently xl has one configuration file where you put all Xen parameters.
> > The device tree is only needed by some guests to have a description of the
> > system they run on.
> > If we change the model and say that Xen configuration parameters are both in
> > the configuration and the device tree, we somehow enforce to have a device
> > tree even though some guests do not need it at all (for example Zephyr).
> 
> I think my approach was misunderstood because there is no change in the
> existing model.
> 
> What I am suggesting is to not introduce iommu_devid_map but instead let libxl
> allocate the virtual Master-ID and create the mapping with the physical
> Master-ID.
>
> Libxl would then update the property "iommus" in the device-tree with the
> allocated virtual Master-ID.
> 
> Each node in the partial device-tree would need to have a property
> to refer to the physical device just so we know how to update the "iommus".
> The list of device passthrough will still be specified in the configuration
> file. IOW, the partial device-tree is not directly involved in the
> configuration of the guest.
> 
> So far, I don't see a particular issue with this approach because the vMaster
> ID algorithm allocation should be generic. But please let me know if you think
> there are bits I am missing.
>
> For everything else we let the user in control (IPA for mapping, virtual interrupt number) and in this case we switch to a model where we
> automatically generated a vMaster ID.

I think this is a great idea, I only suggest that we move the automatic
generation out of libxl (a separate stand-alone script), in another
place that can be more easily reused by multiple projects and different
use-cases.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-30 14:23                   ` Stefano Stabellini
@ 2022-10-30 19:57                     ` Julien Grall
  2022-10-30 21:14                       ` Stefano Stabellini
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-10-30 19:57 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Bertrand Marquis, Rahul Singh, Xen developer discussion,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Stefano,

On 30/10/2022 14:23, Stefano Stabellini wrote:
> On Fri, 28 Oct 2022, Julien Grall wrote:
>> On 28/10/2022 14:13, Bertrand Marquis wrote:
>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>>>
>>>> Hi Rahul,
>>>>
>>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>>> For ACPI, I would have expected the information to be found in
>>>>>>>>>> the IOREQ.
>>>>>>>>>>
>>>>>>>>>> So can you add more context why this is necessary for
>>>>>>>>>> everyone?
>>>>>>>>> We have information for IOMMU and Master-ID but we don’t have
>>>>>>>>> information for linking vMaster-ID to pMaster-ID.
>>>>>>>>
>>>>>>>> I am confused. Below, you are making the virtual master ID
>>>>>>>> optional. So shouldn't this be mandatory if you really need the
>>>>>>>> mapping with the virtual ID?
>>>>>>> vMasterID is optional if user knows pMasterID is unique on the
>>>>>>> system. But if pMasterId is not unique then user needs to provide
>>>>>>> the vMasterID.
>>>>>>
>>>>>> So the expectation is the user will be able to know that the pMasterID
>>>>>> is uniq. This may be easy with a couple of SMMUs, but if you have 50+
>>>>>> (as suggested above). This will become a pain on larger system.
>>>>>>
>>>>>> IHMO, it would be much better if we can detect that in libxl (see
>>>>>> below).
>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to
>>>>> solve this
>>>>
>>>> In general, complexity in libxl is not too much of problem.
> 
> I agree with this and also I agree with Julien's other statement:
> 
> "I am strongly in favor of libxl to modify it if it greatly improves the
> user experience."
> 
> I am always in favor of reducing complexity for the user as they
> typically can't deal with tricky details such as MasterIDs. In general,
> I think we need more automation with our tooling.
> 
> However, it might not be as simple as adding support for automatically
> generating IDs in libxl because we have 2 additional cases to support:
> 1) dom0less
> 2) statically built guests
> 
> For 1) we would need the same support also in Xen? Which means more
> complexity in Xen.
Xen will need to parse the device-tree to find the mapping. So I am not 
entirely convinced there will be more complexity needed other than 
requiring a bitmap to know which vMasterID has been allocated.

That said, you would still need one to validate the input provided by 
the user. So overall maybe there will be no added complexity?

> 
> 2) are guests like Zephyr that consume a device tree at
> build time instead of runtime. These guests are built specifically for a
> given environment and it is not a problem to rebuild them for every Xen
> release.
> 
> However I think it is going to be a problem if we have to run libxl to
> get the device tree needed for the Zephyr build. That is because it
> means that the Zephyr build system would have to learn how to compile
> (or crosscompile) libxl in order to retrieve the data needed for its
> input. Even for systems based on Yocto (Yocto already knows how to build
> libxl) would cause issues because of internal dependencies this would
> introduce.

That would not be very different to how this works today for Zephyr. 
They need libxl to generate the guest DT.

That said, I agree this is a bit of a pain...

> 
> So I think the automatic generation might be best done in another tool.
It sounds like what you want is creating something similar to libacpi 
but for Device-Tree. That should work with some caveats.

> 
> I think we need something like a script that takes a partial device tree
> as input and provides a more detailed partial device tree as output with
> the generated IDs.

AFAICT, having the partial device-tree is not enough. You also need the 
real DT to figure out the pMaster-ID.

> 
> If we did it that way, we could call the script from libxl, but also we
> could call it separately from ImageBuilder for dom0less and Zephyr/Yocto
> could also call it.
> 
> Basically we make it easier for everyone to use it. The only price to
> pay is that it will be a bit less efficient for xl guests (one more
> script to fork and exec) but I think is a good compromise.

We would need an hypercall to retrieve the host Device-Tree. But that 
would not be too difficult to add.

[...]

> 
> I think this is a great idea, I only suggest that we move the automatic
> generation out of libxl (a separate stand-alone script), in another
> place that can be more easily reused by multiple projects and different
> use-cases.

If we use the concept of libacpi, we may not need a to have a 
stand-alone script. It could directly linked in libxl or any other tools.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-30 19:57                     ` Julien Grall
@ 2022-10-30 21:14                       ` Stefano Stabellini
  2022-10-31 13:26                         ` Bertrand Marquis
  0 siblings, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2022-10-30 21:14 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Bertrand Marquis, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 6210 bytes --]

On Sun, 30 Oct 2022, Julien Grall wrote:
> Hi Stefano,
> 
> On 30/10/2022 14:23, Stefano Stabellini wrote:
> > On Fri, 28 Oct 2022, Julien Grall wrote:
> > > On 28/10/2022 14:13, Bertrand Marquis wrote:
> > > > > On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
> > > > > 
> > > > > Hi Rahul,
> > > > > 
> > > > > On 28/10/2022 13:54, Rahul Singh wrote:
> > > > > > > > > > > For ACPI, I would have expected the information to be
> > > > > > > > > > > found in
> > > > > > > > > > > the IOREQ.
> > > > > > > > > > > 
> > > > > > > > > > > So can you add more context why this is necessary for
> > > > > > > > > > > everyone?
> > > > > > > > > > We have information for IOMMU and Master-ID but we don’t
> > > > > > > > > > have
> > > > > > > > > > information for linking vMaster-ID to pMaster-ID.
> > > > > > > > > 
> > > > > > > > > I am confused. Below, you are making the virtual master ID
> > > > > > > > > optional. So shouldn't this be mandatory if you really need
> > > > > > > > > the
> > > > > > > > > mapping with the virtual ID?
> > > > > > > > vMasterID is optional if user knows pMasterID is unique on the
> > > > > > > > system. But if pMasterId is not unique then user needs to
> > > > > > > > provide
> > > > > > > > the vMasterID.
> > > > > > > 
> > > > > > > So the expectation is the user will be able to know that the
> > > > > > > pMasterID
> > > > > > > is uniq. This may be easy with a couple of SMMUs, but if you have
> > > > > > > 50+
> > > > > > > (as suggested above). This will become a pain on larger system.
> > > > > > > 
> > > > > > > IHMO, it would be much better if we can detect that in libxl (see
> > > > > > > below).
> > > > > > We can make the vMasterID compulsory to avoid complexity in libxl to
> > > > > > solve this
> > > > > 
> > > > > In general, complexity in libxl is not too much of problem.
> > 
> > I agree with this and also I agree with Julien's other statement:
> > 
> > "I am strongly in favor of libxl to modify it if it greatly improves the
> > user experience."
> > 
> > I am always in favor of reducing complexity for the user as they
> > typically can't deal with tricky details such as MasterIDs. In general,
> > I think we need more automation with our tooling.
> > 
> > However, it might not be as simple as adding support for automatically
> > generating IDs in libxl because we have 2 additional cases to support:
> > 1) dom0less
> > 2) statically built guests
> > 
> > For 1) we would need the same support also in Xen? Which means more
> > complexity in Xen.
> Xen will need to parse the device-tree to find the mapping. So I am not
> entirely convinced there will be more complexity needed other than requiring a
> bitmap to know which vMasterID has been allocated.
> 
> That said, you would still need one to validate the input provided by the
> user. So overall maybe there will be no added complexity?
> 
> > 
> > 2) are guests like Zephyr that consume a device tree at
> > build time instead of runtime. These guests are built specifically for a
> > given environment and it is not a problem to rebuild them for every Xen
> > release.
> > 
> > However I think it is going to be a problem if we have to run libxl to
> > get the device tree needed for the Zephyr build. That is because it
> > means that the Zephyr build system would have to learn how to compile
> > (or crosscompile) libxl in order to retrieve the data needed for its
> > input. Even for systems based on Yocto (Yocto already knows how to build
> > libxl) would cause issues because of internal dependencies this would
> > introduce.
> 
> That would not be very different to how this works today for Zephyr. They need
> libxl to generate the guest DT.
> 
> That said, I agree this is a bit of a pain...

Yeah..


> > So I think the automatic generation might be best done in another tool.
> It sounds like what you want is creating something similar to libacpi but for
> Device-Tree. That should work with some caveats.

Yes, something like that. We have a framework for reading, editing and
generating Device Tree: Lopper https://github.com/devicetree-org/lopper

It is mostly targeted at build time but it could also be invoked on
target at runtime.

 
> > I think we need something like a script that takes a partial device tree
> > as input and provides a more detailed partial device tree as output with
> > the generated IDs.
> 
> AFAICT, having the partial device-tree is not enough. You also need the real
> DT to figure out the pMaster-ID.
> 
> > 
> > If we did it that way, we could call the script from libxl, but also we
> > could call it separately from ImageBuilder for dom0less and Zephyr/Yocto
> > could also call it.
> > 
> > Basically we make it easier for everyone to use it. The only price to
> > pay is that it will be a bit less efficient for xl guests (one more
> > script to fork and exec) but I think is a good compromise.
> 
> We would need an hypercall to retrieve the host Device-Tree. But that would
> not be too difficult to add.

Good point


> > I think this is a great idea, I only suggest that we move the automatic
> > generation out of libxl (a separate stand-alone script), in another
> > place that can be more easily reused by multiple projects and different
> > use-cases.
> 
> If we use the concept of libacpi, we may not need a to have a stand-alone
> script. It could directly linked in libxl or any other tools.
 
I don't feel strongly whether it should be a library, a script or
something else. My only point is that it should be easy to use both at
build time (e.g. Yocto/Zephyr/ImageBuilder/Lopper) and runtime
(xl/libxl).

We have already a partial DTB generator as a Lopper "lop" (a Lopper
plugin). Probably using Lopper would be the easiest way to implement it,
and the "lop" could be under xen.git (it doesn't have to reside under
the lopper repository).

But if we wanted a library that would be OK too. The issue with libxl is
not much that it is a library but that it is complex to build and has
many dependencies (it can only be built from the top level ./configure
and make).

Ideally this would be something quick that can be easily invoked as the
first step of an external third-party build process.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-30 21:14                       ` Stefano Stabellini
@ 2022-10-31 13:26                         ` Bertrand Marquis
  2022-11-01 15:01                           ` Elliott Mitchell
  2022-11-10 23:01                           ` Stefano Stabellini
  0 siblings, 2 replies; 33+ messages in thread
From: Bertrand Marquis @ 2022-10-31 13:26 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Rahul Singh, Xen developer discussion,
	Michal Orzel, Oleksandr Tyshchenko, Oleksandr Andrushchenko,
	Volodymyr Babchuk, Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi All,

> On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Sun, 30 Oct 2022, Julien Grall wrote:
>> Hi Stefano,
>> 
>> On 30/10/2022 14:23, Stefano Stabellini wrote:
>>> On Fri, 28 Oct 2022, Julien Grall wrote:
>>>> On 28/10/2022 14:13, Bertrand Marquis wrote:
>>>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
>>>>>> 
>>>>>> Hi Rahul,
>>>>>> 
>>>>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>>>>> For ACPI, I would have expected the information to be
>>>>>>>>>>>> found in
>>>>>>>>>>>> the IOREQ.
>>>>>>>>>>>> 
>>>>>>>>>>>> So can you add more context why this is necessary for
>>>>>>>>>>>> everyone?
>>>>>>>>>>> We have information for IOMMU and Master-ID but we don’t
>>>>>>>>>>> have
>>>>>>>>>>> information for linking vMaster-ID to pMaster-ID.
>>>>>>>>>> 
>>>>>>>>>> I am confused. Below, you are making the virtual master ID
>>>>>>>>>> optional. So shouldn't this be mandatory if you really need
>>>>>>>>>> the
>>>>>>>>>> mapping with the virtual ID?
>>>>>>>>> vMasterID is optional if user knows pMasterID is unique on the
>>>>>>>>> system. But if pMasterId is not unique then user needs to
>>>>>>>>> provide
>>>>>>>>> the vMasterID.
>>>>>>>> 
>>>>>>>> So the expectation is the user will be able to know that the
>>>>>>>> pMasterID
>>>>>>>> is uniq. This may be easy with a couple of SMMUs, but if you have
>>>>>>>> 50+
>>>>>>>> (as suggested above). This will become a pain on larger system.
>>>>>>>> 
>>>>>>>> IHMO, it would be much better if we can detect that in libxl (see
>>>>>>>> below).
>>>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to
>>>>>>> solve this
>>>>>> 
>>>>>> In general, complexity in libxl is not too much of problem.
>>> 
>>> I agree with this and also I agree with Julien's other statement:
>>> 
>>> "I am strongly in favor of libxl to modify it if it greatly improves the
>>> user experience."
>>> 
>>> I am always in favor of reducing complexity for the user as they
>>> typically can't deal with tricky details such as MasterIDs. In general,
>>> I think we need more automation with our tooling.
>>> 
>>> However, it might not be as simple as adding support for automatically
>>> generating IDs in libxl because we have 2 additional cases to support:
>>> 1) dom0less
>>> 2) statically built guests
>>> 
>>> For 1) we would need the same support also in Xen? Which means more
>>> complexity in Xen.
>> Xen will need to parse the device-tree to find the mapping. So I am not
>> entirely convinced there will be more complexity needed other than requiring a
>> bitmap to know which vMasterID has been allocated.
>> 
>> That said, you would still need one to validate the input provided by the
>> user. So overall maybe there will be no added complexity?
>> 
>>> 
>>> 2) are guests like Zephyr that consume a device tree at
>>> build time instead of runtime. These guests are built specifically for a
>>> given environment and it is not a problem to rebuild them for every Xen
>>> release.
>>> 
>>> However I think it is going to be a problem if we have to run libxl to
>>> get the device tree needed for the Zephyr build. That is because it
>>> means that the Zephyr build system would have to learn how to compile
>>> (or crosscompile) libxl in order to retrieve the data needed for its
>>> input. Even for systems based on Yocto (Yocto already knows how to build
>>> libxl) would cause issues because of internal dependencies this would
>>> introduce.
>> 
>> That would not be very different to how this works today for Zephyr. They need
>> libxl to generate the guest DT.
>> 
>> That said, I agree this is a bit of a pain...
> 
> Yeah..
> 
> 
>>> So I think the automatic generation might be best done in another tool.
>> It sounds like what you want is creating something similar to libacpi but for
>> Device-Tree. That should work with some caveats.
> 
> Yes, something like that. We have a framework for reading, editing and
> generating Device Tree: Lopper https://github.com/devicetree-org/lopper
> 
> It is mostly targeted at build time but it could also be invoked on
> target at runtime.
> 
> 
>>> I think we need something like a script that takes a partial device tree
>>> as input and provides a more detailed partial device tree as output with
>>> the generated IDs.
>> 
>> AFAICT, having the partial device-tree is not enough. You also need the real
>> DT to figure out the pMaster-ID.
>> 
>>> 
>>> If we did it that way, we could call the script from libxl, but also we
>>> could call it separately from ImageBuilder for dom0less and Zephyr/Yocto
>>> could also call it.
>>> 
>>> Basically we make it easier for everyone to use it. The only price to
>>> pay is that it will be a bit less efficient for xl guests (one more
>>> script to fork and exec) but I think is a good compromise.
>> 
>> We would need an hypercall to retrieve the host Device-Tree. But that would
>> not be too difficult to add.
> 
> Good point
> 
> 
>>> I think this is a great idea, I only suggest that we move the automatic
>>> generation out of libxl (a separate stand-alone script), in another
>>> place that can be more easily reused by multiple projects and different
>>> use-cases.
>> 
>> If we use the concept of libacpi, we may not need a to have a stand-alone
>> script. It could directly linked in libxl or any other tools.
> 
> I don't feel strongly whether it should be a library, a script or
> something else. My only point is that it should be easy to use both at
> build time (e.g. Yocto/Zephyr/ImageBuilder/Lopper) and runtime
> (xl/libxl).
> 
> We have already a partial DTB generator as a Lopper "lop" (a Lopper
> plugin). Probably using Lopper would be the easiest way to implement it,
> and the "lop" could be under xen.git (it doesn't have to reside under
> the lopper repository).
> 
> But if we wanted a library that would be OK too. The issue with libxl is
> not much that it is a library but that it is complex to build and has
> many dependencies (it can only be built from the top level ./configure
> and make).
> 
> Ideally this would be something quick that can be easily invoked as the
> first step of an external third-party build process.

I think that we are making this problem a lot to complex and I am not sure
that all this complexity is required.

For now, we could make the assumption that a master ID is uniq and never
reused on a system. Linux is currently making this assumption to simplify
the code. We also found no hardware with the same master ID reused.

It would mean that the user would just need to keep the stream-id property
in the device tree, replace the link to the SMMU with a fake phandle. The
tools could then add the vIOMMU node and fix all phandle in the device tree
to properly point to it. In practice the user can simply copy the whole device
node with the stream-id properties and just replace the phandle by 0x0.

This will make the first implementation a lot simpler and prevent adding
hyper calls or to much magic in the tools for now.
This will also give us more time to check if we need more complex use
cases and how they could be configured.

What do you think ?

Cheers
Bertrand



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-31 13:26                         ` Bertrand Marquis
@ 2022-11-01 15:01                           ` Elliott Mitchell
  2022-11-01 16:30                             ` Bertrand Marquis
  2022-11-10 23:01                           ` Stefano Stabellini
  1 sibling, 1 reply; 33+ messages in thread
From: Elliott Mitchell @ 2022-11-01 15:01 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

On Mon, Oct 31, 2022 at 01:26:44PM +0000, Bertrand Marquis wrote:
> 
> > On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > Ideally this would be something quick that can be easily invoked as the
> > first step of an external third-party build process.
> 
> I think that we are making this problem a lot to complex and I am not sure
> that all this complexity is required.

Speaking of complexity.  Is it just me or does a vIOMMU had an odd sort
of similarity with a Grant Table?

Both are about allowing foreign entities access to portions of the
current domain's memory.  Just in the case of a Grant Table the entity
happens to be another domain, whereas for a vIOMMU it is a hardware
device.

Perhaps some functionality could be shared between the two?  Perhaps
this is something for the designer of the next version of IOMMU to think
about?  (or perhaps I'm off the deep end and bringing in a silly idea)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-11-01 15:01                           ` Elliott Mitchell
@ 2022-11-01 16:30                             ` Bertrand Marquis
  2022-11-01 20:25                               ` Elliott Mitchell
  0 siblings, 1 reply; 33+ messages in thread
From: Bertrand Marquis @ 2022-11-01 16:30 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Elliot,

> On 1 Nov 2022, at 15:01, Elliott Mitchell <ehem+xen@m5p.com> wrote:
> 
> On Mon, Oct 31, 2022 at 01:26:44PM +0000, Bertrand Marquis wrote:
>> 
>>> On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>> 
>>> Ideally this would be something quick that can be easily invoked as the
>>> first step of an external third-party build process.
>> 
>> I think that we are making this problem a lot to complex and I am not sure
>> that all this complexity is required.
> 
> Speaking of complexity.  Is it just me or does a vIOMMU had an odd sort
> of similarity with a Grant Table?
> 
> Both are about allowing foreign entities access to portions of the
> current domain's memory.  Just in the case of a Grant Table the entity
> happens to be another domain, whereas for a vIOMMU it is a hardware
> device.
> 
> Perhaps some functionality could be shared between the two?  Perhaps
> this is something for the designer of the next version of IOMMU to think
> about?  (or perhaps I'm off the deep end and bringing in a silly idea)

I am not quite sure what you mean here.

The IOMMU is something not Xen specific. Linux is using it to restrict the area
of memory accessible to a device using its DMA engine. Here we just try to give
the same possibility when running on top Xen in a transparent way so that the
Linux (or an other guest) can continue to do the same even if it is running on
top of Xen.
In practice, the guest is not telling us what it does, we just get the pointer to the
first level of page table and we write it in the hardware which is doing the rest.
We need to have a vIOMMU because we need to make sure the guest is only
doing this for devices assigned to him and that it is not modifying the second
level of page tables which is used by Xen to make sure that only the memory
from the guest is accessible using the DMA engine. 

So I am not exactly seeing the common part with grant tables here.

Cheers
Bertrand



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-11-01 16:30                             ` Bertrand Marquis
@ 2022-11-01 20:25                               ` Elliott Mitchell
  2022-11-02  8:50                                 ` Bertrand Marquis
  0 siblings, 1 reply; 33+ messages in thread
From: Elliott Mitchell @ 2022-11-01 20:25 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

On Tue, Nov 01, 2022 at 04:30:31PM +0000, Bertrand Marquis wrote:
> 
> > On 1 Nov 2022, at 15:01, Elliott Mitchell <ehem+xen@m5p.com> wrote:
> > 
> > On Mon, Oct 31, 2022 at 01:26:44PM +0000, Bertrand Marquis wrote:
> >> 
> >>> On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >>> 
> >>> Ideally this would be something quick that can be easily invoked as the
> >>> first step of an external third-party build process.
> >> 
> >> I think that we are making this problem a lot to complex and I am not sure
> >> that all this complexity is required.
> > 
> > Speaking of complexity.  Is it just me or does a vIOMMU had an odd sort
> > of similarity with a Grant Table?
> > 
> > Both are about allowing foreign entities access to portions of the
> > current domain's memory.  Just in the case of a Grant Table the entity
> > happens to be another domain, whereas for a vIOMMU it is a hardware
> > device.
> > 
> > Perhaps some functionality could be shared between the two?  Perhaps
> > this is something for the designer of the next version of IOMMU to think
> > about?  (or perhaps I'm off the deep end and bringing in a silly idea)
> 
> I am not quite sure what you mean here.
> 
> The IOMMU is something not Xen specific. Linux is using it to restrict the area
> of memory accessible to a device using its DMA engine. Here we just try to give
> the same possibility when running on top Xen in a transparent way so that the
> Linux (or an other guest) can continue to do the same even if it is running on
> top of Xen.
> In practice, the guest is not telling us what it does, we just get the pointer to the
> first level of page table and we write it in the hardware which is doing the rest.
> We need to have a vIOMMU because we need to make sure the guest is only
> doing this for devices assigned to him and that it is not modifying the second
> level of page tables which is used by Xen to make sure that only the memory
> from the guest is accessible using the DMA engine. 
> 
> So I am not exactly seeing the common part with grant tables here.

With Grant Tables, one domain is allocating pages and then allowing
another domain to read and potentially write to them.  What is being
given to Xen is the tuple of page address and other domain.

With the model presently being discussed you would have a vIOMMU for each
other domain.  The the pages access is being granted to are the pages
being entered into the vIOMMU page table.

Allocate a domain Id to each IOMMU domain and this very much seems quite
similar to Xen's grant tables.  I'm unsure the two can be unified, but
they appear to have many common aspects.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-11-01 20:25                               ` Elliott Mitchell
@ 2022-11-02  8:50                                 ` Bertrand Marquis
  2022-11-02  8:58                                   ` Juergen Gross
  2022-11-02 21:02                                   ` Elliott Mitchell
  0 siblings, 2 replies; 33+ messages in thread
From: Bertrand Marquis @ 2022-11-02  8:50 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

Hi Elliott,

> On 1 Nov 2022, at 20:25, Elliott Mitchell <ehem+xen@m5p.com> wrote:
> 
> On Tue, Nov 01, 2022 at 04:30:31PM +0000, Bertrand Marquis wrote:
>> 
>>> On 1 Nov 2022, at 15:01, Elliott Mitchell <ehem+xen@m5p.com> wrote:
>>> 
>>> On Mon, Oct 31, 2022 at 01:26:44PM +0000, Bertrand Marquis wrote:
>>>> 
>>>>> On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>>> 
>>>>> Ideally this would be something quick that can be easily invoked as the
>>>>> first step of an external third-party build process.
>>>> 
>>>> I think that we are making this problem a lot to complex and I am not sure
>>>> that all this complexity is required.
>>> 
>>> Speaking of complexity.  Is it just me or does a vIOMMU had an odd sort
>>> of similarity with a Grant Table?
>>> 
>>> Both are about allowing foreign entities access to portions of the
>>> current domain's memory.  Just in the case of a Grant Table the entity
>>> happens to be another domain, whereas for a vIOMMU it is a hardware
>>> device.
>>> 
>>> Perhaps some functionality could be shared between the two?  Perhaps
>>> this is something for the designer of the next version of IOMMU to think
>>> about?  (or perhaps I'm off the deep end and bringing in a silly idea)
>> 
>> I am not quite sure what you mean here.
>> 
>> The IOMMU is something not Xen specific. Linux is using it to restrict the area
>> of memory accessible to a device using its DMA engine. Here we just try to give
>> the same possibility when running on top Xen in a transparent way so that the
>> Linux (or an other guest) can continue to do the same even if it is running on
>> top of Xen.
>> In practice, the guest is not telling us what it does, we just get the pointer to the
>> first level of page table and we write it in the hardware which is doing the rest.
>> We need to have a vIOMMU because we need to make sure the guest is only
>> doing this for devices assigned to him and that it is not modifying the second
>> level of page tables which is used by Xen to make sure that only the memory
>> from the guest is accessible using the DMA engine. 
>> 
>> So I am not exactly seeing the common part with grant tables here.
> 
> With Grant Tables, one domain is allocating pages and then allowing
> another domain to read and potentially write to them.  What is being
> given to Xen is the tuple of page address and other domain.

With the IOMMU we do not get to that information, we only get the first level of
page table pointer and the hardware is doing the rest, protecting the access
using the second level of page tables handled by Xen.

> 
> With the model presently being discussed you would have a vIOMMU for each
> other domain.  The the pages access is being granted to are the pages
> being entered into the vIOMMU page table.

Which Xen does not check.

> 
> Allocate a domain Id to each IOMMU domain and this very much seems quite
> similar to Xen's grant tables.  I'm unsure the two can be unified, but
> they appear to have many common aspects.

From an high level point of view it might but from the guest point of view the
IOMMU is something used with or without Xen where grant tables are very
specific to Xen. I do not see anything that could be unified there.

Maybe I am missing something here that other could see though :-)

Cheers
Bertrand



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-11-02  8:50                                 ` Bertrand Marquis
@ 2022-11-02  8:58                                   ` Juergen Gross
  2022-11-02 21:02                                   ` Elliott Mitchell
  1 sibling, 0 replies; 33+ messages in thread
From: Juergen Gross @ 2022-11-02  8:58 UTC (permalink / raw)
  To: Bertrand Marquis, Elliott Mitchell
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 3683 bytes --]

On 02.11.22 09:50, Bertrand Marquis wrote:
> Hi Elliott,
> 
>> On 1 Nov 2022, at 20:25, Elliott Mitchell <ehem+xen@m5p.com> wrote:
>>
>> On Tue, Nov 01, 2022 at 04:30:31PM +0000, Bertrand Marquis wrote:
>>>
>>>> On 1 Nov 2022, at 15:01, Elliott Mitchell <ehem+xen@m5p.com> wrote:
>>>>
>>>> On Mon, Oct 31, 2022 at 01:26:44PM +0000, Bertrand Marquis wrote:
>>>>>
>>>>>> On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>>>>
>>>>>> Ideally this would be something quick that can be easily invoked as the
>>>>>> first step of an external third-party build process.
>>>>>
>>>>> I think that we are making this problem a lot to complex and I am not sure
>>>>> that all this complexity is required.
>>>>
>>>> Speaking of complexity.  Is it just me or does a vIOMMU had an odd sort
>>>> of similarity with a Grant Table?
>>>>
>>>> Both are about allowing foreign entities access to portions of the
>>>> current domain's memory.  Just in the case of a Grant Table the entity
>>>> happens to be another domain, whereas for a vIOMMU it is a hardware
>>>> device.
>>>>
>>>> Perhaps some functionality could be shared between the two?  Perhaps
>>>> this is something for the designer of the next version of IOMMU to think
>>>> about?  (or perhaps I'm off the deep end and bringing in a silly idea)
>>>
>>> I am not quite sure what you mean here.
>>>
>>> The IOMMU is something not Xen specific. Linux is using it to restrict the area
>>> of memory accessible to a device using its DMA engine. Here we just try to give
>>> the same possibility when running on top Xen in a transparent way so that the
>>> Linux (or an other guest) can continue to do the same even if it is running on
>>> top of Xen.
>>> In practice, the guest is not telling us what it does, we just get the pointer to the
>>> first level of page table and we write it in the hardware which is doing the rest.
>>> We need to have a vIOMMU because we need to make sure the guest is only
>>> doing this for devices assigned to him and that it is not modifying the second
>>> level of page tables which is used by Xen to make sure that only the memory
>>> from the guest is accessible using the DMA engine.
>>>
>>> So I am not exactly seeing the common part with grant tables here.
>>
>> With Grant Tables, one domain is allocating pages and then allowing
>> another domain to read and potentially write to them.  What is being
>> given to Xen is the tuple of page address and other domain.
> 
> With the IOMMU we do not get to that information, we only get the first level of
> page table pointer and the hardware is doing the rest, protecting the access
> using the second level of page tables handled by Xen.
> 
>>
>> With the model presently being discussed you would have a vIOMMU for each
>> other domain.  The the pages access is being granted to are the pages
>> being entered into the vIOMMU page table.
> 
> Which Xen does not check.
> 
>>
>> Allocate a domain Id to each IOMMU domain and this very much seems quite
>> similar to Xen's grant tables.  I'm unsure the two can be unified, but
>> they appear to have many common aspects.
> 
>>From an high level point of view it might but from the guest point of view the
> IOMMU is something used with or without Xen where grant tables are very
> specific to Xen. I do not see anything that could be unified there.
> 
> Maybe I am missing something here that other could see though :-)

You might want to have a look at my "Grant table V3" design session at the
Xen Summit this year:

https://lists.xen.org/archives/html/xen-devel/2022-09/msg01429.html


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-11-02  8:50                                 ` Bertrand Marquis
  2022-11-02  8:58                                   ` Juergen Gross
@ 2022-11-02 21:02                                   ` Elliott Mitchell
  1 sibling, 0 replies; 33+ messages in thread
From: Elliott Mitchell @ 2022-11-02 21:02 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

On Wed, Nov 02, 2022 at 08:50:58AM +0000, Bertrand Marquis wrote:
> 
> > On 1 Nov 2022, at 20:25, Elliott Mitchell <ehem+xen@m5p.com> wrote:
> > 
> > Allocate a domain Id to each IOMMU domain and this very much seems quite
> > similar to Xen's grant tables.  I'm unsure the two can be unified, but
> > they appear to have many common aspects.
> 
> >From an high level point of view it might but from the guest point of view the
> IOMMU is something used with or without Xen where grant tables are very
> specific to Xen. I do not see anything that could be unified there.
> 
> Maybe I am missing something here that other could see though :-)

Imagine a SoC design which has a bunch of cores, memory and 48 IOMMUs.
On a particular board, the designer finds they only need 16 of the IOMMUs
for devices.

Since nothing needs to be done, the designer leaves IOMMUs 16-47 wired
together as loopback.  ie a write to IOMMU 16 will show up as a DMA write
on IOMMU 17 and vice versa, similar situation with 18/19 and all the
remaining IOMMUs.

Imagine running Xen on such a hypothetical board.  If there were less
than 16 DomUs and all I/O went through Dom0, the loopback pairs could do
a job very similar to the grant tables.  Adjustments would be needed to
make use of this, but it seems an interesting thought experiment.

This requires hardware support with the hardware setup in a particular
way, but doesn't really seem like that much of a stretch.  Virtualization
support has been increasing, so perhaps something akin to a
next-generation IOMMU would include the needed functionality.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
  2022-10-31 13:26                         ` Bertrand Marquis
  2022-11-01 15:01                           ` Elliott Mitchell
@ 2022-11-10 23:01                           ` Stefano Stabellini
  1 sibling, 0 replies; 33+ messages in thread
From: Stefano Stabellini @ 2022-11-10 23:01 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Rahul Singh,
	Xen developer discussion, Michal Orzel, Oleksandr Tyshchenko,
	Oleksandr Andrushchenko, Volodymyr Babchuk, Jan Beulich,
	Roger Pau Monné,
	Andrew Cooper, Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 7734 bytes --]

On Mon, 31 Oct 2022, Bertrand Marquis wrote:
> Hi All,
> 
> > On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > On Sun, 30 Oct 2022, Julien Grall wrote:
> >> Hi Stefano,
> >> 
> >> On 30/10/2022 14:23, Stefano Stabellini wrote:
> >>> On Fri, 28 Oct 2022, Julien Grall wrote:
> >>>> On 28/10/2022 14:13, Bertrand Marquis wrote:
> >>>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xen.org> wrote:
> >>>>>> 
> >>>>>> Hi Rahul,
> >>>>>> 
> >>>>>> On 28/10/2022 13:54, Rahul Singh wrote:
> >>>>>>>>>>>> For ACPI, I would have expected the information to be
> >>>>>>>>>>>> found in
> >>>>>>>>>>>> the IOREQ.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> So can you add more context why this is necessary for
> >>>>>>>>>>>> everyone?
> >>>>>>>>>>> We have information for IOMMU and Master-ID but we don’t
> >>>>>>>>>>> have
> >>>>>>>>>>> information for linking vMaster-ID to pMaster-ID.
> >>>>>>>>>> 
> >>>>>>>>>> I am confused. Below, you are making the virtual master ID
> >>>>>>>>>> optional. So shouldn't this be mandatory if you really need
> >>>>>>>>>> the
> >>>>>>>>>> mapping with the virtual ID?
> >>>>>>>>> vMasterID is optional if user knows pMasterID is unique on the
> >>>>>>>>> system. But if pMasterId is not unique then user needs to
> >>>>>>>>> provide
> >>>>>>>>> the vMasterID.
> >>>>>>>> 
> >>>>>>>> So the expectation is the user will be able to know that the
> >>>>>>>> pMasterID
> >>>>>>>> is uniq. This may be easy with a couple of SMMUs, but if you have
> >>>>>>>> 50+
> >>>>>>>> (as suggested above). This will become a pain on larger system.
> >>>>>>>> 
> >>>>>>>> IHMO, it would be much better if we can detect that in libxl (see
> >>>>>>>> below).
> >>>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to
> >>>>>>> solve this
> >>>>>> 
> >>>>>> In general, complexity in libxl is not too much of problem.
> >>> 
> >>> I agree with this and also I agree with Julien's other statement:
> >>> 
> >>> "I am strongly in favor of libxl to modify it if it greatly improves the
> >>> user experience."
> >>> 
> >>> I am always in favor of reducing complexity for the user as they
> >>> typically can't deal with tricky details such as MasterIDs. In general,
> >>> I think we need more automation with our tooling.
> >>> 
> >>> However, it might not be as simple as adding support for automatically
> >>> generating IDs in libxl because we have 2 additional cases to support:
> >>> 1) dom0less
> >>> 2) statically built guests
> >>> 
> >>> For 1) we would need the same support also in Xen? Which means more
> >>> complexity in Xen.
> >> Xen will need to parse the device-tree to find the mapping. So I am not
> >> entirely convinced there will be more complexity needed other than requiring a
> >> bitmap to know which vMasterID has been allocated.
> >> 
> >> That said, you would still need one to validate the input provided by the
> >> user. So overall maybe there will be no added complexity?
> >> 
> >>> 
> >>> 2) are guests like Zephyr that consume a device tree at
> >>> build time instead of runtime. These guests are built specifically for a
> >>> given environment and it is not a problem to rebuild them for every Xen
> >>> release.
> >>> 
> >>> However I think it is going to be a problem if we have to run libxl to
> >>> get the device tree needed for the Zephyr build. That is because it
> >>> means that the Zephyr build system would have to learn how to compile
> >>> (or crosscompile) libxl in order to retrieve the data needed for its
> >>> input. Even for systems based on Yocto (Yocto already knows how to build
> >>> libxl) would cause issues because of internal dependencies this would
> >>> introduce.
> >> 
> >> That would not be very different to how this works today for Zephyr. They need
> >> libxl to generate the guest DT.
> >> 
> >> That said, I agree this is a bit of a pain...
> > 
> > Yeah..
> > 
> > 
> >>> So I think the automatic generation might be best done in another tool.
> >> It sounds like what you want is creating something similar to libacpi but for
> >> Device-Tree. That should work with some caveats.
> > 
> > Yes, something like that. We have a framework for reading, editing and
> > generating Device Tree: Lopper https://github.com/devicetree-org/lopper
> > 
> > It is mostly targeted at build time but it could also be invoked on
> > target at runtime.
> > 
> > 
> >>> I think we need something like a script that takes a partial device tree
> >>> as input and provides a more detailed partial device tree as output with
> >>> the generated IDs.
> >> 
> >> AFAICT, having the partial device-tree is not enough. You also need the real
> >> DT to figure out the pMaster-ID.
> >> 
> >>> 
> >>> If we did it that way, we could call the script from libxl, but also we
> >>> could call it separately from ImageBuilder for dom0less and Zephyr/Yocto
> >>> could also call it.
> >>> 
> >>> Basically we make it easier for everyone to use it. The only price to
> >>> pay is that it will be a bit less efficient for xl guests (one more
> >>> script to fork and exec) but I think is a good compromise.
> >> 
> >> We would need an hypercall to retrieve the host Device-Tree. But that would
> >> not be too difficult to add.
> > 
> > Good point
> > 
> > 
> >>> I think this is a great idea, I only suggest that we move the automatic
> >>> generation out of libxl (a separate stand-alone script), in another
> >>> place that can be more easily reused by multiple projects and different
> >>> use-cases.
> >> 
> >> If we use the concept of libacpi, we may not need a to have a stand-alone
> >> script. It could directly linked in libxl or any other tools.
> > 
> > I don't feel strongly whether it should be a library, a script or
> > something else. My only point is that it should be easy to use both at
> > build time (e.g. Yocto/Zephyr/ImageBuilder/Lopper) and runtime
> > (xl/libxl).
> > 
> > We have already a partial DTB generator as a Lopper "lop" (a Lopper
> > plugin). Probably using Lopper would be the easiest way to implement it,
> > and the "lop" could be under xen.git (it doesn't have to reside under
> > the lopper repository).
> > 
> > But if we wanted a library that would be OK too. The issue with libxl is
> > not much that it is a library but that it is complex to build and has
> > many dependencies (it can only be built from the top level ./configure
> > and make).
> > 
> > Ideally this would be something quick that can be easily invoked as the
> > first step of an external third-party build process.
> 
> I think that we are making this problem a lot to complex and I am not sure
> that all this complexity is required.
> 
> For now, we could make the assumption that a master ID is uniq and never
> reused on a system. Linux is currently making this assumption to simplify
> the code. We also found no hardware with the same master ID reused.
> 
> It would mean that the user would just need to keep the stream-id property
> in the device tree, replace the link to the SMMU with a fake phandle. The
> tools could then add the vIOMMU node and fix all phandle in the device tree
> to properly point to it. In practice the user can simply copy the whole device
> node with the stream-id properties and just replace the phandle by 0x0.
> 
> This will make the first implementation a lot simpler and prevent adding
> hyper calls or to much magic in the tools for now.
> This will also give us more time to check if we need more complex use
> cases and how they could be configured.
> 
> What do you think ?

I think it is a good idea. It will allow us to have something that works
and learn the details of the implementation. I think we'll be able to
come up with a better idea on how to solve it afterwards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-11-10 23:01 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 13:17 Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices Rahul Singh
2022-10-26 13:36 ` Julien Grall
2022-10-26 14:33   ` Rahul Singh
2022-10-26 17:17     ` Michal Orzel
2022-10-26 18:23       ` Oleksandr Tyshchenko
2022-10-27 16:49         ` Rahul Singh
2022-10-28 15:26           ` Oleksandr Tyshchenko
2022-10-27 16:12       ` Rahul Singh
2022-10-26 19:48     ` Julien Grall
2022-10-27 16:08       ` Rahul Singh
2022-10-27 16:33         ` Julien Grall
2022-10-27 17:18           ` Michal Orzel
2022-10-28 12:54           ` Rahul Singh
2022-10-28 13:06             ` Julien Grall
2022-10-28 13:13               ` Bertrand Marquis
2022-10-28 13:27                 ` Julien Grall
2022-10-28 14:37                   ` Bertrand Marquis
2022-10-28 15:01                     ` Julien Grall
2022-10-28 15:45                       ` Bertrand Marquis
2022-10-28 16:54                         ` Michal Orzel
2022-10-30 14:23                   ` Stefano Stabellini
2022-10-30 19:57                     ` Julien Grall
2022-10-30 21:14                       ` Stefano Stabellini
2022-10-31 13:26                         ` Bertrand Marquis
2022-11-01 15:01                           ` Elliott Mitchell
2022-11-01 16:30                             ` Bertrand Marquis
2022-11-01 20:25                               ` Elliott Mitchell
2022-11-02  8:50                                 ` Bertrand Marquis
2022-11-02  8:58                                   ` Juergen Gross
2022-11-02 21:02                                   ` Elliott Mitchell
2022-11-10 23:01                           ` Stefano Stabellini
2022-10-27  9:01 ` Ayan Kumar Halder
2022-10-27  9:41   ` Ayan Kumar Halder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.