All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option
@ 2024-01-23 18:15 Eric Auger
  2024-01-23 18:15 ` [PATCH 1/3] virtio-iommu: Add an option to define the input range width Eric Auger
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Eric Auger @ 2024-01-23 18:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, jean-philippe,
	alex.williamson, peter.maydell, zhenzhong.duan, peterx, yanghliu
  Cc: mst, clg, jasowang

In [1] and [2] we attempted to fix a case where a VFIO-PCI device
protected with a virtio-iommu is assigned to an x86 guest. On x86
the physical IOMMU may have an address width (gaw) of 39 or 48 bits
whereas the virtio-iommu exposes a 64b input address space by default.
Hence the guest may try to use the full 64b space and DMA MAP
failures may be encountered. To work around this issue we endeavoured
to pass usable host IOVA regions (excluding the out of range space) from
VFIO to the virtio-iommu device so that the virtio-iommu driver can
query those latter during the probe request and let the guest iommu
kernel subsystem carve them out.

However if there are several devices in the same iommu group,
only the reserved regions of the first one are taken into
account by the iommu subsystem of the guest. This generally
works on baremetal because devices are not going to
expose different reserved regions. However in our case, this
may prevent from taking into account the host iommu geometry.

So the simplest solution to this problem looks to introduce an
input address width option, aw-bits, which matches what is
done on the intel-iommu. By default, from now on it is set
to 39 bits with pc_q35 and 64b with arm virt. This replaces the
previous default value of 64b. So we need to introduce a compat
for pc_q35 machines older than 9.0 to behave similarly.

Outstanding series [2] remains useful to let resv regions beeing
communicated on time before the probe request.

[1] [PATCH v4 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
    https://lore.kernel.org/all/20231019134651.842175-1-eric.auger@redhat.com/
    - This is merged -

[2] [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged devices
    https://lore.kernel.org/all/20240117080414.316890-1-eric.auger@redhat.com/
    - This is pending for review on the ML -

This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu-aw-bits-v1

Applied on top of [3]
[PATCH v2] virtio-iommu: Use qemu_real_host_page_mask as default page_size_mask
https://lore.kernel.org/all/20240117132039.332273-1-eric.auger@redhat.com/

Eric Auger (3):
  virtio-iommu: Add an option to define the input range width
  virtio-iommu: Trace domain range limits as unsigned int
  hw/pc: Set the default virtio-iommu aw-bits to 39 on pc_q35_9.0
    onwards

 include/hw/virtio/virtio-iommu.h |  1 +
 hw/arm/virt.c                    |  6 ++++++
 hw/i386/pc.c                     | 10 +++++++++-
 hw/virtio/virtio-iommu.c         |  4 +++-
 hw/virtio/trace-events           |  2 +-
 5 files changed, 20 insertions(+), 3 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-23 18:15 [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Eric Auger
@ 2024-01-23 18:15 ` Eric Auger
  2024-01-23 23:51   ` Alex Williamson
  2024-01-23 18:15 ` [PATCH 2/3] virtio-iommu: Trace domain range limits as unsigned int Eric Auger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Eric Auger @ 2024-01-23 18:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, jean-philippe,
	alex.williamson, peter.maydell, zhenzhong.duan, peterx, yanghliu
  Cc: mst, clg, jasowang

aw-bits is a new option that allows to set the bit width of
the input address range. This value will be used as a default for
the device config input_range.end. By default it is set to 64 bits
which is the current value.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/hw/virtio/virtio-iommu.h | 1 +
 hw/virtio/virtio-iommu.c         | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 781ebaea8f..5fbe4677c2 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -66,6 +66,7 @@ struct VirtIOIOMMU {
     bool boot_bypass;
     Notifier machine_done;
     bool granule_frozen;
+    uint8_t aw_bits;
 };
 
 #endif
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ec2ba11d1d..e7f299e0c6 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
      */
     s->config.bypass = s->boot_bypass;
     s->config.page_size_mask = qemu_real_host_page_mask();
-    s->config.input_range.end = UINT64_MAX;
+    s->config.input_range.end =
+        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;
     s->config.domain_range.end = UINT32_MAX;
     s->config.probe_size = VIOMMU_PROBE_SIZE;
 
@@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
     DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
                      TYPE_PCI_BUS, PCIBus *),
     DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
+    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] virtio-iommu: Trace domain range limits as unsigned int
  2024-01-23 18:15 [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Eric Auger
  2024-01-23 18:15 ` [PATCH 1/3] virtio-iommu: Add an option to define the input range width Eric Auger
@ 2024-01-23 18:15 ` Eric Auger
  2024-01-23 18:15 ` [PATCH 3/3] hw/pc: Set the default virtio-iommu aw-bits to 39 on pc_q35_9.0 onwards Eric Auger
  2024-01-29 12:23 ` [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Jean-Philippe Brucker
  3 siblings, 0 replies; 14+ messages in thread
From: Eric Auger @ 2024-01-23 18:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, jean-philippe,
	alex.williamson, peter.maydell, zhenzhong.duan, peterx, yanghliu
  Cc: mst, clg, jasowang

Use %u format to trace domain_range limits.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 77905d1994..2350849fbd 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -111,7 +111,7 @@ virtio_iommu_device_reset(void) "reset!"
 virtio_iommu_system_reset(void) "system reset!"
 virtio_iommu_get_features(uint64_t features) "device supports features=0x%"PRIx64
 virtio_iommu_device_status(uint8_t status) "driver status = %d"
-virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_start, uint32_t domain_end, uint32_t probe_size, uint8_t bypass) "page_size_mask=0x%"PRIx64" input range start=0x%"PRIx64" input range end=0x%"PRIx64" domain range start=%d domain range end=%d probe_size=0x%x bypass=0x%x"
+virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_start, uint32_t domain_end, uint32_t probe_size, uint8_t bypass) "page_size_mask=0x%"PRIx64" input range start=0x%"PRIx64" input range end=0x%"PRIx64" domain range start=%u domain range end=%u probe_size=0x%x bypass=0x%x"
 virtio_iommu_set_config(uint8_t bypass) "bypass=0x%x"
 virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] hw/pc: Set the default virtio-iommu aw-bits to 39 on pc_q35_9.0 onwards
  2024-01-23 18:15 [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Eric Auger
  2024-01-23 18:15 ` [PATCH 1/3] virtio-iommu: Add an option to define the input range width Eric Auger
  2024-01-23 18:15 ` [PATCH 2/3] virtio-iommu: Trace domain range limits as unsigned int Eric Auger
@ 2024-01-23 18:15 ` Eric Auger
  2024-01-29 12:23 ` [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Jean-Philippe Brucker
  3 siblings, 0 replies; 14+ messages in thread
From: Eric Auger @ 2024-01-23 18:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, jean-philippe,
	alex.williamson, peter.maydell, zhenzhong.duan, peterx, yanghliu
  Cc: mst, clg, jasowang

Currently the default input range can extend to 64 bits. On x86,
when the virtio-iommu protects vfio devices, the physical iommu
may support only 39 bits. Let's set the default to 39, as done
for the intel-iommu. We introduce a new compat to make sure older
pc_q35 machine types still use 64 bits. On ARM we keep 64b as
the default input address width. Of course if aw-bits is set
from the command line, the default is overriden.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c            |  6 ++++++
 hw/i386/pc.c             | 10 +++++++++-
 hw/virtio/virtio-iommu.c |  2 +-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 2793121cb4..a27fe13c73 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2715,10 +2715,16 @@ static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MD_PCI)) {
         virtio_md_pci_pre_plug(VIRTIO_MD_PCI(dev), MACHINE(hotplug_dev), errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
+        uint8_t aw_bits = object_property_get_uint(OBJECT(dev),
+                                                   "aw-bits", NULL);
         hwaddr db_start = 0, db_end = 0;
         QList *reserved_regions;
         char *resv_prop_str;
 
+        if (!aw_bits) {
+            qdev_prop_set_uint8(dev, "aw-bits", 64);
+        }
+
         if (vms->iommu != VIRT_IOMMU_NONE) {
             error_setg(errp, "virt machine does not support multiple IOMMUs");
             return;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 496498df3a..a8dbc99946 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -78,7 +78,9 @@
     { "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
     { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
 
-GlobalProperty pc_compat_8_2[] = {};
+GlobalProperty pc_compat_8_2[] = {
+    { TYPE_VIRTIO_IOMMU_PCI, "aw-bits", "64" },
+};
 const size_t pc_compat_8_2_len = G_N_ELEMENTS(pc_compat_8_2);
 
 GlobalProperty pc_compat_8_1[] = {};
@@ -1458,6 +1460,8 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MD_PCI)) {
         virtio_md_pci_pre_plug(VIRTIO_MD_PCI(dev), MACHINE(hotplug_dev), errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
+        uint8_t aw_bits = object_property_get_uint(OBJECT(dev),
+                                                   "aw-bits", NULL);
         /* Declare the APIC range as the reserved MSI region */
         char *resv_prop_str = g_strdup_printf("0xfee00000:0xfeefffff:%d",
                                               VIRTIO_IOMMU_RESV_MEM_T_MSI);
@@ -1466,6 +1470,10 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
         qlist_append_str(reserved_regions, resv_prop_str);
         qdev_prop_set_array(dev, "reserved-regions", reserved_regions);
 
+        if (!aw_bits) {
+            qdev_prop_set_uint8(dev, "aw-bits", 39);
+        }
+
         g_free(resv_prop_str);
     }
 
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index e7f299e0c6..a7c268a01a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1526,7 +1526,7 @@ static Property virtio_iommu_properties[] = {
     DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
                      TYPE_PCI_BUS, PCIBus *),
     DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
-    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
+    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-23 18:15 ` [PATCH 1/3] virtio-iommu: Add an option to define the input range width Eric Auger
@ 2024-01-23 23:51   ` Alex Williamson
  2024-01-24 13:14     ` Eric Auger
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2024-01-23 23:51 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jean-philippe,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

On Tue, 23 Jan 2024 19:15:55 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> aw-bits is a new option that allows to set the bit width of
> the input address range. This value will be used as a default for
> the device config input_range.end. By default it is set to 64 bits
> which is the current value.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  include/hw/virtio/virtio-iommu.h | 1 +
>  hw/virtio/virtio-iommu.c         | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index 781ebaea8f..5fbe4677c2 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -66,6 +66,7 @@ struct VirtIOIOMMU {
>      bool boot_bypass;
>      Notifier machine_done;
>      bool granule_frozen;
> +    uint8_t aw_bits;
>  };
>  
>  #endif
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index ec2ba11d1d..e7f299e0c6 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>       */
>      s->config.bypass = s->boot_bypass;
>      s->config.page_size_mask = qemu_real_host_page_mask();
> -    s->config.input_range.end = UINT64_MAX;
> +    s->config.input_range.end =
> +        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;

What happens when someone sets aw_bits = 1?  There are a whole bunch of
impractical values here ripe for annoying bug reports.  vtd_realize()
returns an Error for any values other than 39 or 48.  We might pick an
arbitrary lower bound (39?) or some other more creative solution here
to avoid those silly issues in our future.  Thanks,

Alex

>      s->config.domain_range.end = UINT32_MAX;
>      s->config.probe_size = VIOMMU_PROBE_SIZE;
>  
> @@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
>      DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
>                       TYPE_PCI_BUS, PCIBus *),
>      DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
> +    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-23 23:51   ` Alex Williamson
@ 2024-01-24 13:14     ` Eric Auger
  2024-01-24 13:37       ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Auger @ 2024-01-24 13:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jean-philippe,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

Hi Alex,

On 1/24/24 00:51, Alex Williamson wrote:
> On Tue, 23 Jan 2024 19:15:55 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> aw-bits is a new option that allows to set the bit width of
>> the input address range. This value will be used as a default for
>> the device config input_range.end. By default it is set to 64 bits
>> which is the current value.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  include/hw/virtio/virtio-iommu.h | 1 +
>>  hw/virtio/virtio-iommu.c         | 4 +++-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>> index 781ebaea8f..5fbe4677c2 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -66,6 +66,7 @@ struct VirtIOIOMMU {
>>      bool boot_bypass;
>>      Notifier machine_done;
>>      bool granule_frozen;
>> +    uint8_t aw_bits;
>>  };
>>  
>>  #endif
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index ec2ba11d1d..e7f299e0c6 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>>       */
>>      s->config.bypass = s->boot_bypass;
>>      s->config.page_size_mask = qemu_real_host_page_mask();
>> -    s->config.input_range.end = UINT64_MAX;
>> +    s->config.input_range.end =
>> +        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;
> What happens when someone sets aw_bits = 1?  There are a whole bunch of
> impractical values here ripe for annoying bug reports.  vtd_realize()
> returns an Error for any values other than 39 or 48.  We might pick an
> arbitrary lower bound (39?) or some other more creative solution here
> to avoid those silly issues in our future.  Thanks,
You're right. I can check the input value. This needs to be dependent on
the machine though but this should be feasable.
Then I would allow 39 and 48 for q35 and 64 only on ARM.

Eric
>
> Alex
>
>>      s->config.domain_range.end = UINT32_MAX;
>>      s->config.probe_size = VIOMMU_PROBE_SIZE;
>>  
>> @@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
>>      DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
>>                       TYPE_PCI_BUS, PCIBus *),
>>      DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
>> +    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>>  



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-24 13:14     ` Eric Auger
@ 2024-01-24 13:37       ` Alex Williamson
  2024-01-24 13:57         ` Eric Auger
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2024-01-24 13:37 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jean-philippe,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

On Wed, 24 Jan 2024 14:14:19 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 1/24/24 00:51, Alex Williamson wrote:
> > On Tue, 23 Jan 2024 19:15:55 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> >  
> >> aw-bits is a new option that allows to set the bit width of
> >> the input address range. This value will be used as a default for
> >> the device config input_range.end. By default it is set to 64 bits
> >> which is the current value.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> ---
> >>  include/hw/virtio/virtio-iommu.h | 1 +
> >>  hw/virtio/virtio-iommu.c         | 4 +++-
> >>  2 files changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> >> index 781ebaea8f..5fbe4677c2 100644
> >> --- a/include/hw/virtio/virtio-iommu.h
> >> +++ b/include/hw/virtio/virtio-iommu.h
> >> @@ -66,6 +66,7 @@ struct VirtIOIOMMU {
> >>      bool boot_bypass;
> >>      Notifier machine_done;
> >>      bool granule_frozen;
> >> +    uint8_t aw_bits;
> >>  };
> >>  
> >>  #endif
> >> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >> index ec2ba11d1d..e7f299e0c6 100644
> >> --- a/hw/virtio/virtio-iommu.c
> >> +++ b/hw/virtio/virtio-iommu.c
> >> @@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> >>       */
> >>      s->config.bypass = s->boot_bypass;
> >>      s->config.page_size_mask = qemu_real_host_page_mask();
> >> -    s->config.input_range.end = UINT64_MAX;
> >> +    s->config.input_range.end =
> >> +        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;  
> > What happens when someone sets aw_bits = 1?  There are a whole bunch of
> > impractical values here ripe for annoying bug reports.  vtd_realize()
> > returns an Error for any values other than 39 or 48.  We might pick an
> > arbitrary lower bound (39?) or some other more creative solution here
> > to avoid those silly issues in our future.  Thanks,  
> You're right. I can check the input value. This needs to be dependent on
> the machine though but this should be feasable.
> Then I would allow 39 and 48 for q35 and 64 only on ARM.

AFAIK AMD-Vi supports 64-bit address space.  Without querying the host
there's no way to place an accurate limit below 64-bit.  Thanks,

Alex

> >>      s->config.domain_range.end = UINT32_MAX;
> >>      s->config.probe_size = VIOMMU_PROBE_SIZE;
> >>  
> >> @@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
> >>      DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
> >>                       TYPE_PCI_BUS, PCIBus *),
> >>      DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
> >> +    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
> >>      DEFINE_PROP_END_OF_LIST(),
> >>  };
> >>    
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-24 13:37       ` Alex Williamson
@ 2024-01-24 13:57         ` Eric Auger
  2024-01-24 14:15           ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Auger @ 2024-01-24 13:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jean-philippe,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

Hi Alex,

On 1/24/24 14:37, Alex Williamson wrote:
> On Wed, 24 Jan 2024 14:14:19 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Hi Alex,
>>
>> On 1/24/24 00:51, Alex Williamson wrote:
>>> On Tue, 23 Jan 2024 19:15:55 +0100
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>  
>>>> aw-bits is a new option that allows to set the bit width of
>>>> the input address range. This value will be used as a default for
>>>> the device config input_range.end. By default it is set to 64 bits
>>>> which is the current value.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>  include/hw/virtio/virtio-iommu.h | 1 +
>>>>  hw/virtio/virtio-iommu.c         | 4 +++-
>>>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>>>> index 781ebaea8f..5fbe4677c2 100644
>>>> --- a/include/hw/virtio/virtio-iommu.h
>>>> +++ b/include/hw/virtio/virtio-iommu.h
>>>> @@ -66,6 +66,7 @@ struct VirtIOIOMMU {
>>>>      bool boot_bypass;
>>>>      Notifier machine_done;
>>>>      bool granule_frozen;
>>>> +    uint8_t aw_bits;
>>>>  };
>>>>  
>>>>  #endif
>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>>> index ec2ba11d1d..e7f299e0c6 100644
>>>> --- a/hw/virtio/virtio-iommu.c
>>>> +++ b/hw/virtio/virtio-iommu.c
>>>> @@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>>>>       */
>>>>      s->config.bypass = s->boot_bypass;
>>>>      s->config.page_size_mask = qemu_real_host_page_mask();
>>>> -    s->config.input_range.end = UINT64_MAX;
>>>> +    s->config.input_range.end =
>>>> +        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;  
>>> What happens when someone sets aw_bits = 1?  There are a whole bunch of
>>> impractical values here ripe for annoying bug reports.  vtd_realize()
>>> returns an Error for any values other than 39 or 48.  We might pick an
>>> arbitrary lower bound (39?) or some other more creative solution here
>>> to avoid those silly issues in our future.  Thanks,  
>> You're right. I can check the input value. This needs to be dependent on
>> the machine though but this should be feasable.
>> Then I would allow 39 and 48 for q35 and 64 only on ARM.
> AFAIK AMD-Vi supports 64-bit address space.  Without querying the host
> there's no way to place an accurate limit below 64-bit.  Thanks,

Hum this means I would need to look at
/sys/class/iommu/<iommu>/amd-iommu/ or /sys/class/iommu/dmar* to
discriminate between AMD IOMMU and INTEL IOMMU physical IOMMU. Would
that be acceptable?

Eric
>
> Alex
>
>>>>      s->config.domain_range.end = UINT32_MAX;
>>>>      s->config.probe_size = VIOMMU_PROBE_SIZE;
>>>>  
>>>> @@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
>>>>      DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
>>>>                       TYPE_PCI_BUS, PCIBus *),
>>>>      DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
>>>> +    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
>>>>      DEFINE_PROP_END_OF_LIST(),
>>>>  };
>>>>    



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-24 13:57         ` Eric Auger
@ 2024-01-24 14:15           ` Alex Williamson
  2024-01-24 15:09             ` Eric Auger
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2024-01-24 14:15 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jean-philippe,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

On Wed, 24 Jan 2024 14:57:41 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 1/24/24 14:37, Alex Williamson wrote:
> > On Wed, 24 Jan 2024 14:14:19 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> >  
> >> Hi Alex,
> >>
> >> On 1/24/24 00:51, Alex Williamson wrote:  
> >>> On Tue, 23 Jan 2024 19:15:55 +0100
> >>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>    
> >>>> aw-bits is a new option that allows to set the bit width of
> >>>> the input address range. This value will be used as a default for
> >>>> the device config input_range.end. By default it is set to 64 bits
> >>>> which is the current value.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>> ---
> >>>>  include/hw/virtio/virtio-iommu.h | 1 +
> >>>>  hw/virtio/virtio-iommu.c         | 4 +++-
> >>>>  2 files changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> >>>> index 781ebaea8f..5fbe4677c2 100644
> >>>> --- a/include/hw/virtio/virtio-iommu.h
> >>>> +++ b/include/hw/virtio/virtio-iommu.h
> >>>> @@ -66,6 +66,7 @@ struct VirtIOIOMMU {
> >>>>      bool boot_bypass;
> >>>>      Notifier machine_done;
> >>>>      bool granule_frozen;
> >>>> +    uint8_t aw_bits;
> >>>>  };
> >>>>  
> >>>>  #endif
> >>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >>>> index ec2ba11d1d..e7f299e0c6 100644
> >>>> --- a/hw/virtio/virtio-iommu.c
> >>>> +++ b/hw/virtio/virtio-iommu.c
> >>>> @@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> >>>>       */
> >>>>      s->config.bypass = s->boot_bypass;
> >>>>      s->config.page_size_mask = qemu_real_host_page_mask();
> >>>> -    s->config.input_range.end = UINT64_MAX;
> >>>> +    s->config.input_range.end =
> >>>> +        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;    
> >>> What happens when someone sets aw_bits = 1?  There are a whole bunch of
> >>> impractical values here ripe for annoying bug reports.  vtd_realize()
> >>> returns an Error for any values other than 39 or 48.  We might pick an
> >>> arbitrary lower bound (39?) or some other more creative solution here
> >>> to avoid those silly issues in our future.  Thanks,    
> >> You're right. I can check the input value. This needs to be dependent on
> >> the machine though but this should be feasable.
> >> Then I would allow 39 and 48 for q35 and 64 only on ARM.  
> > AFAIK AMD-Vi supports 64-bit address space.  Without querying the host
> > there's no way to place an accurate limit below 64-bit.  Thanks,  
> 
> Hum this means I would need to look at
> /sys/class/iommu/<iommu>/amd-iommu/ or /sys/class/iommu/dmar* to
> discriminate between AMD IOMMU and INTEL IOMMU physical IOMMU. Would
> that be acceptable?

I'm not necessarily suggesting that we look at the host, I'm mostly
just stating that enforcing 39/48 bits on non-ARM is incorrect for a
large portion of the non-ARM world too.  There might even be some
interesting use cases for a 32-bit IOVA space, so maybe just set
defaults tuned for compatibility and accept anything between 32- and
64-bits.  Thanks,

Alex

> >>>>      s->config.domain_range.end = UINT32_MAX;
> >>>>      s->config.probe_size = VIOMMU_PROBE_SIZE;
> >>>>  
> >>>> @@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
> >>>>      DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
> >>>>                       TYPE_PCI_BUS, PCIBus *),
> >>>>      DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
> >>>> +    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
> >>>>      DEFINE_PROP_END_OF_LIST(),
> >>>>  };
> >>>>      
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] virtio-iommu: Add an option to define the input range width
  2024-01-24 14:15           ` Alex Williamson
@ 2024-01-24 15:09             ` Eric Auger
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Auger @ 2024-01-24 15:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jean-philippe,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

Hi Alex,
On 1/24/24 15:15, Alex Williamson wrote:
> On Wed, 24 Jan 2024 14:57:41 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Hi Alex,
>>
>> On 1/24/24 14:37, Alex Williamson wrote:
>>> On Wed, 24 Jan 2024 14:14:19 +0100
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>  
>>>> Hi Alex,
>>>>
>>>> On 1/24/24 00:51, Alex Williamson wrote:  
>>>>> On Tue, 23 Jan 2024 19:15:55 +0100
>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>    
>>>>>> aw-bits is a new option that allows to set the bit width of
>>>>>> the input address range. This value will be used as a default for
>>>>>> the device config input_range.end. By default it is set to 64 bits
>>>>>> which is the current value.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>> ---
>>>>>>  include/hw/virtio/virtio-iommu.h | 1 +
>>>>>>  hw/virtio/virtio-iommu.c         | 4 +++-
>>>>>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>>>>>> index 781ebaea8f..5fbe4677c2 100644
>>>>>> --- a/include/hw/virtio/virtio-iommu.h
>>>>>> +++ b/include/hw/virtio/virtio-iommu.h
>>>>>> @@ -66,6 +66,7 @@ struct VirtIOIOMMU {
>>>>>>      bool boot_bypass;
>>>>>>      Notifier machine_done;
>>>>>>      bool granule_frozen;
>>>>>> +    uint8_t aw_bits;
>>>>>>  };
>>>>>>  
>>>>>>  #endif
>>>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>>>>> index ec2ba11d1d..e7f299e0c6 100644
>>>>>> --- a/hw/virtio/virtio-iommu.c
>>>>>> +++ b/hw/virtio/virtio-iommu.c
>>>>>> @@ -1314,7 +1314,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>>>>>>       */
>>>>>>      s->config.bypass = s->boot_bypass;
>>>>>>      s->config.page_size_mask = qemu_real_host_page_mask();
>>>>>> -    s->config.input_range.end = UINT64_MAX;
>>>>>> +    s->config.input_range.end =
>>>>>> +        s->aw_bits == 64 ? UINT64_MAX : BIT_ULL(s->aw_bits) - 1;    
>>>>> What happens when someone sets aw_bits = 1?  There are a whole bunch of
>>>>> impractical values here ripe for annoying bug reports.  vtd_realize()
>>>>> returns an Error for any values other than 39 or 48.  We might pick an
>>>>> arbitrary lower bound (39?) or some other more creative solution here
>>>>> to avoid those silly issues in our future.  Thanks,    
>>>> You're right. I can check the input value. This needs to be dependent on
>>>> the machine though but this should be feasable.
>>>> Then I would allow 39 and 48 for q35 and 64 only on ARM.  
>>> AFAIK AMD-Vi supports 64-bit address space.  Without querying the host
>>> there's no way to place an accurate limit below 64-bit.  Thanks,  
>> Hum this means I would need to look at
>> /sys/class/iommu/<iommu>/amd-iommu/ or /sys/class/iommu/dmar* to
>> discriminate between AMD IOMMU and INTEL IOMMU physical IOMMU. Would
>> that be acceptable?
> I'm not necessarily suggesting that we look at the host, I'm mostly
> just stating that enforcing 39/48 bits on non-ARM is incorrect for a
> large portion of the non-ARM world too.  There might even be some
> interesting use cases for a 32-bit IOVA space, so maybe just set
> defaults tuned for compatibility and accept anything between 32- and
> 64-bits.  Thanks,
Yup that would be even simpler. Thank you for the clarification.

I will add that

Eric
>
> Alex
>
>>>>>>      s->config.domain_range.end = UINT32_MAX;
>>>>>>      s->config.probe_size = VIOMMU_PROBE_SIZE;
>>>>>>  
>>>>>> @@ -1525,6 +1526,7 @@ static Property virtio_iommu_properties[] = {
>>>>>>      DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
>>>>>>                       TYPE_PCI_BUS, PCIBus *),
>>>>>>      DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
>>>>>> +    DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
>>>>>>      DEFINE_PROP_END_OF_LIST(),
>>>>>>  };
>>>>>>      



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option
  2024-01-23 18:15 [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Eric Auger
                   ` (2 preceding siblings ...)
  2024-01-23 18:15 ` [PATCH 3/3] hw/pc: Set the default virtio-iommu aw-bits to 39 on pc_q35_9.0 onwards Eric Auger
@ 2024-01-29 12:23 ` Jean-Philippe Brucker
  2024-01-29 14:07   ` Eric Auger
  3 siblings, 1 reply; 14+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-29 12:23 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

Hi Eric,

On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote:
> In [1] and [2] we attempted to fix a case where a VFIO-PCI device
> protected with a virtio-iommu is assigned to an x86 guest. On x86
> the physical IOMMU may have an address width (gaw) of 39 or 48 bits
> whereas the virtio-iommu exposes a 64b input address space by default.
> Hence the guest may try to use the full 64b space and DMA MAP
> failures may be encountered. To work around this issue we endeavoured
> to pass usable host IOVA regions (excluding the out of range space) from
> VFIO to the virtio-iommu device so that the virtio-iommu driver can
> query those latter during the probe request and let the guest iommu
> kernel subsystem carve them out.
> 
> However if there are several devices in the same iommu group,
> only the reserved regions of the first one are taken into
> account by the iommu subsystem of the guest. This generally
> works on baremetal because devices are not going to
> expose different reserved regions. However in our case, this
> may prevent from taking into account the host iommu geometry.
> 
> So the simplest solution to this problem looks to introduce an
> input address width option, aw-bits, which matches what is
> done on the intel-iommu. By default, from now on it is set
> to 39 bits with pc_q35 and 64b with arm virt.

Doesn't Arm have the same problem?  The TTB0 page tables limit what can be
mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB.
A Linux host driver could configure smaller VA sizes:
* SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which
  can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts).
* SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which
  could be as low as 36 bits (but realistically 39, since 36 depends on
  16kB pages and CONFIG_EXPERT).

But 64-bit definitely can't work for VFIO, and I suppose isn't useful for
virtual devices, so maybe 39 is also a reasonable default on Arm.

Thanks,
Jean

> This replaces the
> previous default value of 64b. So we need to introduce a compat
> for pc_q35 machines older than 9.0 to behave similarly.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option
  2024-01-29 12:23 ` [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Jean-Philippe Brucker
@ 2024-01-29 14:07   ` Eric Auger
  2024-01-29 17:42     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Auger @ 2024-01-29 14:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

Hi Jean-Philippe,

On 1/29/24 13:23, Jean-Philippe Brucker wrote:
> Hi Eric,
>
> On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote:
>> In [1] and [2] we attempted to fix a case where a VFIO-PCI device
>> protected with a virtio-iommu is assigned to an x86 guest. On x86
>> the physical IOMMU may have an address width (gaw) of 39 or 48 bits
>> whereas the virtio-iommu exposes a 64b input address space by default.
>> Hence the guest may try to use the full 64b space and DMA MAP
>> failures may be encountered. To work around this issue we endeavoured
>> to pass usable host IOVA regions (excluding the out of range space) from
>> VFIO to the virtio-iommu device so that the virtio-iommu driver can
>> query those latter during the probe request and let the guest iommu
>> kernel subsystem carve them out.
>>
>> However if there are several devices in the same iommu group,
>> only the reserved regions of the first one are taken into
>> account by the iommu subsystem of the guest. This generally
>> works on baremetal because devices are not going to
>> expose different reserved regions. However in our case, this
>> may prevent from taking into account the host iommu geometry.
>>
>> So the simplest solution to this problem looks to introduce an
>> input address width option, aw-bits, which matches what is
>> done on the intel-iommu. By default, from now on it is set
>> to 39 bits with pc_q35 and 64b with arm virt.
> Doesn't Arm have the same problem?  The TTB0 page tables limit what can be
> mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB.
> A Linux host driver could configure smaller VA sizes:
> * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which
>   can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts).
Yes I think we can ignore that use case.
> * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which
>   could be as low as 36 bits (but realistically 39, since 36 depends on
>   16kB pages and CONFIG_EXPERT).
Further reading "3.4.1 Input address size and Virtual Address size" ooks
indeed SMMU_IDR5.VAX gives info on the physical SMMU actual
implementation max (which matches intel iommu gaw). I missed that. Now I
am confused about should we limit VAS to 39 to accomodate of the worst
case host SW configuration or shall we use 48 instead? If we set such a
low 39b value, won't it prevent some guests from properly working?
Thanks Eric
>
> But 64-bit definitely can't work for VFIO, and I suppose isn't useful for
> virtual devices, so maybe 39 is also a reasonable default on Arm.
>
> Thanks,
> Jean
>
>> This replaces the
>> previous default value of 64b. So we need to introduce a compat
>> for pc_q35 machines older than 9.0 to behave similarly.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option
  2024-01-29 14:07   ` Eric Auger
@ 2024-01-29 17:42     ` Jean-Philippe Brucker
  2024-01-29 17:56       ` Eric Auger
  0 siblings, 1 reply; 14+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-29 17:42 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

On Mon, Jan 29, 2024 at 03:07:41PM +0100, Eric Auger wrote:
> Hi Jean-Philippe,
> 
> On 1/29/24 13:23, Jean-Philippe Brucker wrote:
> > Hi Eric,
> >
> > On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote:
> >> In [1] and [2] we attempted to fix a case where a VFIO-PCI device
> >> protected with a virtio-iommu is assigned to an x86 guest. On x86
> >> the physical IOMMU may have an address width (gaw) of 39 or 48 bits
> >> whereas the virtio-iommu exposes a 64b input address space by default.
> >> Hence the guest may try to use the full 64b space and DMA MAP
> >> failures may be encountered. To work around this issue we endeavoured
> >> to pass usable host IOVA regions (excluding the out of range space) from
> >> VFIO to the virtio-iommu device so that the virtio-iommu driver can
> >> query those latter during the probe request and let the guest iommu
> >> kernel subsystem carve them out.
> >>
> >> However if there are several devices in the same iommu group,
> >> only the reserved regions of the first one are taken into
> >> account by the iommu subsystem of the guest. This generally
> >> works on baremetal because devices are not going to
> >> expose different reserved regions. However in our case, this
> >> may prevent from taking into account the host iommu geometry.
> >>
> >> So the simplest solution to this problem looks to introduce an
> >> input address width option, aw-bits, which matches what is
> >> done on the intel-iommu. By default, from now on it is set
> >> to 39 bits with pc_q35 and 64b with arm virt.
> > Doesn't Arm have the same problem?  The TTB0 page tables limit what can be
> > mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB.
> > A Linux host driver could configure smaller VA sizes:
> > * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which
> >   can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts).
> Yes I think we can ignore that use case.
> > * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which
> >   could be as low as 36 bits (but realistically 39, since 36 depends on
> >   16kB pages and CONFIG_EXPERT).
> Further reading "3.4.1 Input address size and Virtual Address size" ooks
> indeed SMMU_IDR5.VAX gives info on the physical SMMU actual
> implementation max (which matches intel iommu gaw). I missed that. Now I
> am confused about should we limit VAS to 39 to accomodate of the worst
> case host SW configuration or shall we use 48 instead?

I don't know what's best either. 48 should be fine if hosts normally
enable VA_BITS_48 (I see debian has it [1], not sure how to find the
others).

[1] https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/config/arm64/config?ref_type=heads#L18

> If we set such a low 39b value, won't it prevent some guests from
> properly working?

It's not that low, since it gives each endpoint a private 512GB address
space, but yes there might be special cases that reach the limit. Maybe
assign a multi-queue NIC to a 256-vCPU guest, and if you want per-vCPU DMA
pools, then with a 39-bit address space you only get 2GB per vCPU. With
48-bit you get 1TB which should be plenty.

52-bit private IOVA space doesn't seem useful, I doubt we'll ever need to
support that on the MAP/UNMAP interface.

So I guess 48-bit can be the default, and users with special setups can
override aw-bits.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option
  2024-01-29 17:42     ` Jean-Philippe Brucker
@ 2024-01-29 17:56       ` Eric Auger
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Auger @ 2024-01-29 17:56 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	peter.maydell, zhenzhong.duan, peterx, yanghliu, mst, clg,
	jasowang

Hi Jean,

On 1/29/24 18:42, Jean-Philippe Brucker wrote:
> On Mon, Jan 29, 2024 at 03:07:41PM +0100, Eric Auger wrote:
>> Hi Jean-Philippe,
>>
>> On 1/29/24 13:23, Jean-Philippe Brucker wrote:
>>> Hi Eric,
>>>
>>> On Tue, Jan 23, 2024 at 07:15:54PM +0100, Eric Auger wrote:
>>>> In [1] and [2] we attempted to fix a case where a VFIO-PCI device
>>>> protected with a virtio-iommu is assigned to an x86 guest. On x86
>>>> the physical IOMMU may have an address width (gaw) of 39 or 48 bits
>>>> whereas the virtio-iommu exposes a 64b input address space by default.
>>>> Hence the guest may try to use the full 64b space and DMA MAP
>>>> failures may be encountered. To work around this issue we endeavoured
>>>> to pass usable host IOVA regions (excluding the out of range space) from
>>>> VFIO to the virtio-iommu device so that the virtio-iommu driver can
>>>> query those latter during the probe request and let the guest iommu
>>>> kernel subsystem carve them out.
>>>>
>>>> However if there are several devices in the same iommu group,
>>>> only the reserved regions of the first one are taken into
>>>> account by the iommu subsystem of the guest. This generally
>>>> works on baremetal because devices are not going to
>>>> expose different reserved regions. However in our case, this
>>>> may prevent from taking into account the host iommu geometry.
>>>>
>>>> So the simplest solution to this problem looks to introduce an
>>>> input address width option, aw-bits, which matches what is
>>>> done on the intel-iommu. By default, from now on it is set
>>>> to 39 bits with pc_q35 and 64b with arm virt.
>>> Doesn't Arm have the same problem?  The TTB0 page tables limit what can be
>>> mapped to 48-bit, or 52-bit when SMMU_IDR5.VAX==1 and granule is 64kB.
>>> A Linux host driver could configure smaller VA sizes:
>>> * SMMUv2 limits the VA to SMMU_IDR2.UBS (upstream bus size) which
>>>   can go as low as 32-bit (I'm assuming we don't care about 32-bit hosts).
>> Yes I think we can ignore that use case.
>>> * SMMUv3 currently limits the VA to CONFIG_ARM64_VA_BITS, which
>>>   could be as low as 36 bits (but realistically 39, since 36 depends on
>>>   16kB pages and CONFIG_EXPERT).
>> Further reading "3.4.1 Input address size and Virtual Address size" ooks
>> indeed SMMU_IDR5.VAX gives info on the physical SMMU actual
>> implementation max (which matches intel iommu gaw). I missed that. Now I
>> am confused about should we limit VAS to 39 to accomodate of the worst
>> case host SW configuration or shall we use 48 instead?
> I don't know what's best either. 48 should be fine if hosts normally
> enable VA_BITS_48 (I see debian has it [1], not sure how to find the
> others).
>
> [1] https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/config/arm64/config?ref_type=heads#L18
>
>> If we set such a low 39b value, won't it prevent some guests from
>> properly working?
> It's not that low, since it gives each endpoint a private 512GB address
> space, but yes there might be special cases that reach the limit. Maybe
> assign a multi-queue NIC to a 256-vCPU guest, and if you want per-vCPU DMA
> pools, then with a 39-bit address space you only get 2GB per vCPU. With
> 48-bit you get 1TB which should be plenty.
>
> 52-bit private IOVA space doesn't seem useful, I doubt we'll ever need to
> support that on the MAP/UNMAP interface.
>
> So I guess 48-bit can be the default, and users with special setups can
> override aw-bits.

Yes it looks safe also on my side. I will respin with 48b default then.

Thank you!

Eric
>
> Thanks,
> Jean
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-01-29 17:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-23 18:15 [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Eric Auger
2024-01-23 18:15 ` [PATCH 1/3] virtio-iommu: Add an option to define the input range width Eric Auger
2024-01-23 23:51   ` Alex Williamson
2024-01-24 13:14     ` Eric Auger
2024-01-24 13:37       ` Alex Williamson
2024-01-24 13:57         ` Eric Auger
2024-01-24 14:15           ` Alex Williamson
2024-01-24 15:09             ` Eric Auger
2024-01-23 18:15 ` [PATCH 2/3] virtio-iommu: Trace domain range limits as unsigned int Eric Auger
2024-01-23 18:15 ` [PATCH 3/3] hw/pc: Set the default virtio-iommu aw-bits to 39 on pc_q35_9.0 onwards Eric Auger
2024-01-29 12:23 ` [PATCH 0/3] VIRTIO-IOMMU: Introduce an aw-bits option Jean-Philippe Brucker
2024-01-29 14:07   ` Eric Auger
2024-01-29 17:42     ` Jean-Philippe Brucker
2024-01-29 17:56       ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.