* Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
2020-03-18 16:14 ` Auger Eric
@ 2020-03-18 17:06 ` Jean-Philippe Brucker
2020-03-18 17:13 ` Robin Murphy
2020-03-19 18:54 ` Alex Williamson
2 siblings, 0 replies; 8+ messages in thread
From: Jean-Philippe Brucker @ 2020-03-18 17:06 UTC (permalink / raw)
To: Auger Eric
Cc: iommu, virtualization, Alex Williamson, Bharat Bhushan, Robin Murphy
On Wed, Mar 18, 2020 at 05:14:05PM +0100, Auger Eric wrote:
> Hi,
>
> On 3/18/20 1:00 PM, Robin Murphy wrote:
> > On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:
> >> We don't currently support IOMMUs with a page granule larger than the
> >> system page size. The IOVA allocator has a BUG_ON() in this case, and
> >> VFIO has a WARN_ON().
>
> Adding Alex in CC in case he has time to jump in. At the moment I don't
> get why this WARN_ON() is here.
>
> This was introduced in
> c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow
>
> >>
> >> It might be possible to remove these obstacles if necessary. If the host
> >> uses 64kB pages and the guest uses 4kB, then a device driver calling
> >> alloc_page() followed by dma_map_page() will create a 64kB mapping for a
> >> 4kB physical page, allowing the endpoint to access the neighbouring 60kB
> >> of memory. This problem could be worked around with bounce buffers.
> >
> > FWIW the fundamental issue is that callers of iommu_map() may expect to
> > be able to map two or more page-aligned regions directly adjacent to
> > each other for scatter-gather purposes (or ring buffer tricks), and
> > that's just not possible if the IOMMU granule is too big. Bounce
> > buffering would be a viable workaround for the streaming DMA API and
> > certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
> > GPUs, etc.)
> >
> > Robin.
> >
> >> For the moment, rather than triggering the IOVA BUG_ON() on mismatched
> >> page sizes, abort the virtio-iommu probe with an error message.
>
> I understand this is a introduced as a temporary solution but this
> sounds as an important limitation to me. For instance this will prevent
> from running a fedora guest exposed with a virtio-iommu with a RHEL host.
Looks like you have another argument for nested translation :) We could
probably enable 64k-4k for VFIO, but I don't think we can check and fix
all uses of iommu_map() across the kernel, even if we disallow
IOMMU_DOMAIN_DMA for this case.
Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
2020-03-18 16:14 ` Auger Eric
2020-03-18 17:06 ` Jean-Philippe Brucker
@ 2020-03-18 17:13 ` Robin Murphy
2020-03-18 17:47 ` Jean-Philippe Brucker
2020-03-19 18:54 ` Alex Williamson
2 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2020-03-18 17:13 UTC (permalink / raw)
To: Auger Eric, Jean-Philippe Brucker, iommu, virtualization
Cc: Alex Williamson, Bharat Bhushan
On 2020-03-18 4:14 pm, Auger Eric wrote:
> Hi,
>
> On 3/18/20 1:00 PM, Robin Murphy wrote:
>> On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:
>>> We don't currently support IOMMUs with a page granule larger than the
>>> system page size. The IOVA allocator has a BUG_ON() in this case, and
>>> VFIO has a WARN_ON().
>
> Adding Alex in CC in case he has time to jump in. At the moment I don't
> get why this WARN_ON() is here.
>
> This was introduced in
> c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow
>
>>>
>>> It might be possible to remove these obstacles if necessary. If the host
>>> uses 64kB pages and the guest uses 4kB, then a device driver calling
>>> alloc_page() followed by dma_map_page() will create a 64kB mapping for a
>>> 4kB physical page, allowing the endpoint to access the neighbouring 60kB
>>> of memory. This problem could be worked around with bounce buffers.
>>
>> FWIW the fundamental issue is that callers of iommu_map() may expect to
>> be able to map two or more page-aligned regions directly adjacent to
>> each other for scatter-gather purposes (or ring buffer tricks), and
>> that's just not possible if the IOMMU granule is too big. Bounce
>> buffering would be a viable workaround for the streaming DMA API and
>> certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
>> GPUs, etc.)
>>
>> Robin.
>>
>>> For the moment, rather than triggering the IOVA BUG_ON() on mismatched
>>> page sizes, abort the virtio-iommu probe with an error message.
>
> I understand this is a introduced as a temporary solution but this
> sounds as an important limitation to me. For instance this will prevent
> from running a fedora guest exposed with a virtio-iommu with a RHEL host.
As above, even if you bypassed all the warnings it wouldn't really work
properly anyway. In all cases that wouldn't be considered broken, the
underlying hardware IOMMUs should support the same set of granules as
the CPUs (or at least the smallest one), so is it actually appropriate
for RHEL to (presumably) expose a 64K granule in the first place, rather
than "works with anything" 4K? And/or more generally is there perhaps a
hole in the virtio-iommu spec WRT being able to negotiate page_size_mask
for a particular granule if multiple options are available?
Robin.
>
> Thanks
>
> Eric
>>>
>>> Reported-by: Bharat Bhushan <bbhushan2@marvell.com>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>> ---
>>> drivers/iommu/virtio-iommu.c | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
>>> index 6d4e3c2a2ddb..80d5d8f621ab 100644
>>> --- a/drivers/iommu/virtio-iommu.c
>>> +++ b/drivers/iommu/virtio-iommu.c
>>> @@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
>>> struct device *parent_dev = vdev->dev.parent;
>>> struct viommu_dev *viommu = NULL;
>>> struct device *dev = &vdev->dev;
>>> + unsigned long viommu_page_size;
>>> u64 input_start = 0;
>>> u64 input_end = -1UL;
>>> int ret;
>>> @@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device
>>> *vdev)
>>> goto err_free_vqs;
>>> }
>>> + viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
>>> + if (viommu_page_size > PAGE_SIZE) {
>>> + dev_err(dev, "granule 0x%lx larger than system page size
>>> 0x%lx\n",
>>> + viommu_page_size, PAGE_SIZE);
>>> + ret = -EINVAL;
>>> + goto err_free_vqs;
>>> + }
>>> +
>>> viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ |
>>> VIRTIO_IOMMU_MAP_F_WRITE;
>>> viommu->last_domain = ~0U;
>>>
>>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
2020-03-18 17:13 ` Robin Murphy
@ 2020-03-18 17:47 ` Jean-Philippe Brucker
0 siblings, 0 replies; 8+ messages in thread
From: Jean-Philippe Brucker @ 2020-03-18 17:47 UTC (permalink / raw)
To: Robin Murphy; +Cc: iommu, Bharat Bhushan, Alex Williamson, virtualization
On Wed, Mar 18, 2020 at 05:13:59PM +0000, Robin Murphy wrote:
> On 2020-03-18 4:14 pm, Auger Eric wrote:
> > Hi,
> >
> > On 3/18/20 1:00 PM, Robin Murphy wrote:
> > > On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:
> > > > We don't currently support IOMMUs with a page granule larger than the
> > > > system page size. The IOVA allocator has a BUG_ON() in this case, and
> > > > VFIO has a WARN_ON().
> >
> > Adding Alex in CC in case he has time to jump in. At the moment I don't
> > get why this WARN_ON() is here.
> >
> > This was introduced in
> > c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow
> >
> > > >
> > > > It might be possible to remove these obstacles if necessary. If the host
> > > > uses 64kB pages and the guest uses 4kB, then a device driver calling
> > > > alloc_page() followed by dma_map_page() will create a 64kB mapping for a
> > > > 4kB physical page, allowing the endpoint to access the neighbouring 60kB
> > > > of memory. This problem could be worked around with bounce buffers.
> > >
> > > FWIW the fundamental issue is that callers of iommu_map() may expect to
> > > be able to map two or more page-aligned regions directly adjacent to
> > > each other for scatter-gather purposes (or ring buffer tricks), and
> > > that's just not possible if the IOMMU granule is too big. Bounce
> > > buffering would be a viable workaround for the streaming DMA API and
> > > certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
> > > GPUs, etc.)
> > >
> > > Robin.
> > >
> > > > For the moment, rather than triggering the IOVA BUG_ON() on mismatched
> > > > page sizes, abort the virtio-iommu probe with an error message.
> >
> > I understand this is a introduced as a temporary solution but this
> > sounds as an important limitation to me. For instance this will prevent
> > from running a fedora guest exposed with a virtio-iommu with a RHEL host.
>
> As above, even if you bypassed all the warnings it wouldn't really work
> properly anyway. In all cases that wouldn't be considered broken, the
> underlying hardware IOMMUs should support the same set of granules as the
> CPUs (or at least the smallest one), so is it actually appropriate for RHEL
> to (presumably) expose a 64K granule in the first place, rather than "works
> with anything" 4K? And/or more generally is there perhaps a hole in the
> virtio-iommu spec WRT being able to negotiate page_size_mask for a
> particular granule if multiple options are available?
That could be added (by allowing config write). The larger problems are:
1) Supporting granularity smaller than the host's PAGE_SIZE in host VFIO.
At the moment it restricts the exposed page mask and rejects DMA_MAP
requests not aligned on that granularity.
2) Propagating this negotiation all the way to io-pgtable and the SMMU
driver in the host.
Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
2020-03-18 16:14 ` Auger Eric
2020-03-18 17:06 ` Jean-Philippe Brucker
2020-03-18 17:13 ` Robin Murphy
@ 2020-03-19 18:54 ` Alex Williamson
2 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2020-03-19 18:54 UTC (permalink / raw)
To: Auger Eric
Cc: Jean-Philippe Brucker, virtualization, iommu, Bharat Bhushan,
Robin Murphy
On Wed, 18 Mar 2020 17:14:05 +0100
Auger Eric <eric.auger@redhat.com> wrote:
> Hi,
>
> On 3/18/20 1:00 PM, Robin Murphy wrote:
> > On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:
> >> We don't currently support IOMMUs with a page granule larger than the
> >> system page size. The IOVA allocator has a BUG_ON() in this case, and
> >> VFIO has a WARN_ON().
>
> Adding Alex in CC in case he has time to jump in. At the moment I don't
> get why this WARN_ON() is here.
>
> This was introduced in
> c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow
I don't recall if I had something specific in mind when adding this
warning, but if PAGE_SIZE is smaller than the minimum IOMMU page size,
then we need multiple PAGE_SIZE pages per IOMMU TLB entry. Therefore
those pages must be contiguous. Aside from requiring only hugepage
backing, how could a user make sure that their virtual address buffer
is physically contiguous? I don't think we have any sanity checking
code that does this, thus the warning. Thanks,
Alex
> >> It might be possible to remove these obstacles if necessary. If the host
> >> uses 64kB pages and the guest uses 4kB, then a device driver calling
> >> alloc_page() followed by dma_map_page() will create a 64kB mapping for a
> >> 4kB physical page, allowing the endpoint to access the neighbouring 60kB
> >> of memory. This problem could be worked around with bounce buffers.
> >
> > FWIW the fundamental issue is that callers of iommu_map() may expect to
> > be able to map two or more page-aligned regions directly adjacent to
> > each other for scatter-gather purposes (or ring buffer tricks), and
> > that's just not possible if the IOMMU granule is too big. Bounce
> > buffering would be a viable workaround for the streaming DMA API and
> > certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
> > GPUs, etc.)
> >
> > Robin.
> >
> >> For the moment, rather than triggering the IOVA BUG_ON() on mismatched
> >> page sizes, abort the virtio-iommu probe with an error message.
>
> I understand this is a introduced as a temporary solution but this
> sounds as an important limitation to me. For instance this will prevent
> from running a fedora guest exposed with a virtio-iommu with a RHEL host.
>
> Thanks
>
> Eric
> >>
> >> Reported-by: Bharat Bhushan <bbhushan2@marvell.com>
> >> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> >> ---
> >> drivers/iommu/virtio-iommu.c | 9 +++++++++
> >> 1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> >> index 6d4e3c2a2ddb..80d5d8f621ab 100644
> >> --- a/drivers/iommu/virtio-iommu.c
> >> +++ b/drivers/iommu/virtio-iommu.c
> >> @@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
> >> struct device *parent_dev = vdev->dev.parent;
> >> struct viommu_dev *viommu = NULL;
> >> struct device *dev = &vdev->dev;
> >> + unsigned long viommu_page_size;
> >> u64 input_start = 0;
> >> u64 input_end = -1UL;
> >> int ret;
> >> @@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device
> >> *vdev)
> >> goto err_free_vqs;
> >> }
> >> + viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> >> + if (viommu_page_size > PAGE_SIZE) {
> >> + dev_err(dev, "granule 0x%lx larger than system page size
> >> 0x%lx\n",
> >> + viommu_page_size, PAGE_SIZE);
> >> + ret = -EINVAL;
> >> + goto err_free_vqs;
> >> + }
> >> +
> >> viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ |
> >> VIRTIO_IOMMU_MAP_F_WRITE;
> >> viommu->last_domain = ~0U;
> >>
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread