All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
@ 2015-11-17  7:46 Pavel Fedin
  2015-11-18 22:04 ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Fedin @ 2015-11-17  7:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: 'Alex Williamson'

On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
support 1K page size, while minimum SMMU page size is 4K.

This fixes problems like:

2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 0x80002f0400, 0x10fc00, 0x7f89b40400) = -22 (Invalid
argument)
qemu: hardware error: vfio: DMA mapping failed, unable to continue

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 hw/vfio/common.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ff5a89a..328140c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -326,7 +326,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    hwaddr iova, end;
+    hwaddr iova, end, iommu_page_size;
     Int128 llend;
     void *vaddr;
     int ret;
@@ -346,6 +346,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
     }
 
     iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    iommu_page_size = vfio_container_granularity(container);
+    iova = (iova + iommu_page_size - 1) & ~(iommu_page_size - 1);
     llend = int128_make64(section->offset_within_address_space);
     llend = int128_add(llend, section->size);
     llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
@@ -390,8 +392,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
         memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-        memory_region_iommu_replay(giommu->iommu, &giommu->n,
-                                   vfio_container_granularity(container),
+        memory_region_iommu_replay(giommu->iommu, &giommu->n, iommu_page_size,
                                    false);
 
         return;
-- 
1.9.5.msysgit.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-17  7:46 [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size Pavel Fedin
@ 2015-11-18 22:04 ` Alex Williamson
  2015-11-19 10:29   ` Pavel Fedin
  2015-11-24 15:34   ` Peter Maydell
  0 siblings, 2 replies; 20+ messages in thread
From: Alex Williamson @ 2015-11-18 22:04 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: qemu-devel

On Tue, 2015-11-17 at 10:46 +0300, Pavel Fedin wrote:
> On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
> alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
> support 1K page size, while minimum SMMU page size is 4K.
> 
> This fixes problems like:
> 
> 2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
> 2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 0x80002f0400, 0x10fc00, 0x7f89b40400) = -22 (Invalid
> argument)
> qemu: hardware error: vfio: DMA mapping failed, unable to continue
> 
> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
> ---
>  hw/vfio/common.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ff5a89a..328140c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -326,7 +326,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    hwaddr iova, end;
> +    hwaddr iova, end, iommu_page_size;
>      Int128 llend;
>      void *vaddr;
>      int ret;
> @@ -346,6 +346,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      }
>  
>      iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> +    iommu_page_size = vfio_container_granularity(container);
> +    iova = (iova + iommu_page_size - 1) & ~(iommu_page_size - 1);
>      llend = int128_make64(section->offset_within_address_space);
>      llend = int128_add(llend, section->size);
>      llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> @@ -390,8 +392,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
>          memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> -        memory_region_iommu_replay(giommu->iommu, &giommu->n,
> -                                   vfio_container_granularity(container),
> +        memory_region_iommu_replay(giommu->iommu, &giommu->n, iommu_page_size,
>                                     false);
>  
>          return;

I don't understand how this is supposed to work, if we align to a larger
size than the processor, then there are processor size pages of RAM than
could be handed out as DMA targets for devices, but we can't map them
through the IOMMU.  Thus if the guest tries to use them, we get IOMMU
faults in the host and likely memory corruption in the guest because the
device can't read or write to the page it's supposed to.  This doesn't
seem like the right solution.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-18 22:04 ` Alex Williamson
@ 2015-11-19 10:29   ` Pavel Fedin
  2015-11-19 23:33     ` Alex Williamson
  2015-12-09 10:09     ` Alex Bennée
  2015-11-24 15:34   ` Peter Maydell
  1 sibling, 2 replies; 20+ messages in thread
From: Pavel Fedin @ 2015-11-19 10:29 UTC (permalink / raw)
  To: 'Alex Williamson'; +Cc: 'Peter Maydell', qemu-devel

 Hello!

> > On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
> > alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
> > support 1K page size, while minimum SMMU page size is 4K.
> >
> > This fixes problems like:
> >
> > 2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
> > 2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 0x80002f0400,
> 0x10fc00, 0x7f89b40400) = -22 (Invalid
> > argument)
> > qemu: hardware error: vfio: DMA mapping failed, unable to continue

[skip]

> I don't understand how this is supposed to work, if we align to a larger
> size than the processor, then there are processor size pages of RAM than
> could be handed out as DMA targets for devices, but we can't map them
> through the IOMMU.  Thus if the guest tries to use them, we get IOMMU
> faults in the host and likely memory corruption in the guest because the
> device can't read or write to the page it's supposed to.  This doesn't
> seem like the right solution.

 Well, this was my first try on the problem. I've got your idea. But i guess we should discuss the proper solution then.
 So, i've got this problem on ARM64. On ARM64 we actually can never have 1K pages. This page size was supported only by old 32-bit ARM CPUs, up to ARMv5 IIRC, then it was dropped. Linux OS never even used it.
 But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is defined to 10 for ARM. And, ARM64 and ARM32 is actually the same target for qemu, so this is why we still get it.
 Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should set it according to the actual used CPU. Then this IOMMU alignment problem would disappear automatically. What do you think?
 Cc'ed Peter since he is the main ARM guy here.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-19 10:29   ` Pavel Fedin
@ 2015-11-19 23:33     ` Alex Williamson
  2015-11-24 15:24       ` Pavel Fedin
  2015-12-09 10:09     ` Alex Bennée
  1 sibling, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2015-11-19 23:33 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: 'Peter Maydell', qemu-devel

On Thu, 2015-11-19 at 13:29 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > > On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
> > > alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
> > > support 1K page size, while minimum SMMU page size is 4K.
> > >
> > > This fixes problems like:
> > >
> > > 2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
> > > 2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 0x80002f0400,
> > 0x10fc00, 0x7f89b40400) = -22 (Invalid
> > > argument)
> > > qemu: hardware error: vfio: DMA mapping failed, unable to continue
> 
> [skip]
> 
> > I don't understand how this is supposed to work, if we align to a larger
> > size than the processor, then there are processor size pages of RAM than
> > could be handed out as DMA targets for devices, but we can't map them
> > through the IOMMU.  Thus if the guest tries to use them, we get IOMMU
> > faults in the host and likely memory corruption in the guest because the
> > device can't read or write to the page it's supposed to.  This doesn't
> > seem like the right solution.
> 
>  Well, this was my first try on the problem. I've got your idea. But i guess we should discuss the proper solution then.
>  So, i've got this problem on ARM64. On ARM64 we actually can never have 1K pages. This page size was supported only by old 32-bit ARM CPUs, up to ARMv5 IIRC, then it was dropped. Linux OS never even used it.
>  But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is defined to 10 for ARM. And, ARM64 and ARM32 is actually the same target for qemu, so this is why we still get it.
>  Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should set it according to the actual used CPU. Then this IOMMU alignment problem would disappear automatically. What do you think?
>  Cc'ed Peter since he is the main ARM guy here.

Do we only see these alignments when we're emulating those old 1k page
processors?  If not, should we really be telling a 4k page VM about 1k
aligned memory?  If that's all legit, maybe we should be aligning down
rather than up, we know the host memory is at least 4k pages, so
anything in the gap between those alignments should be backed by memory,
right?  The device could theoretically get to up to 3k of memory on the
edges of each mapping, but it shouldn't touch it, the memory should be
allocated and part of the VM, could there be anything bad there?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-19 23:33     ` Alex Williamson
@ 2015-11-24 15:24       ` Pavel Fedin
  2015-12-02 19:40         ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Fedin @ 2015-11-24 15:24 UTC (permalink / raw)
  To: 'Alex Williamson'; +Cc: 'Peter Maydell', qemu-devel

 Hello!

> >  So, i've got this problem on ARM64. On ARM64 we actually can never have 1K pages. This page
> > size was supported only by old 32-bit ARM CPUs, up to ARMv5 IIRC, then it was dropped. Linux
> > OS never even used it.
> >  But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is defined to 10 for ARM.
> > And, ARM64 and ARM32 is actually the same target for qemu, so this is why we still get it.
> >  Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should set it according to
> > the actual used CPU. Then this IOMMU alignment problem would disappear automatically. What do
> > you think?
> >  Cc'ed Peter since he is the main ARM guy here.
> 
> Do we only see these alignments when we're emulating those old 1k page
> processors?

 No. We see them when we are emulating brand new ARM64. And we see them only because TARGET_PAGE_BITS is still hardcoded to 10.

> If not, should we really be telling a 4k page VM about 1k
> aligned memory?

 Well, first of all, we are not telling about aligned memory at all. I've researched for the origin of this address, it appears when we are mapping MSI-X BAR of the VFIO device.
 My device defines this BAR to be of 2M size. In this case qemu splits it up into three regions:
 1) Region below the MSI-X table (it's called "mmap", for me it's empty because table offset is 0)
 2) MSI-X table itself (20 vectors = 0x00000140 bytes for me).
 3) Region above the MSi-X table (it's called "mmap msix-hi"). Size for my case is 2M - REAL_HOST_PAGE_ALIGN(0x140) = 0x1FF000.
Regions (1) and (3) are passed through and directly mapped by VFIO, region (2) is emulated.
 Now, we have PBA. The device says that PBA is placed in memory specified by the same BAR as table, and its offset is 0x000f0000. PBA is also emulated by qemu, and as a result it overlaps "mmap msix-hi" region, which breaks up into two fragments as a consequence:
 3a) offset 0x00000 size 0x0EF000
 3b) offset 0xEF008 size 0x10FFF8
 PBA sits between these fragments, having only 8 bytes in size.
 Attempt to map (3b) causes the problem. vfio_mmap_region() is expected to align this up, and it does, but it does it to a minimum possible granularity for ARM platform, which, as qemu thinks, is always 1K.

>  If that's all legit, maybe we should be aligning down
> rather than up, we know the host memory is at least 4k pages, so
> anything in the gap between those alignments should be backed by memory,
> right?

 You know, i also have this question. Why are we aligning up? What if the device tries to access parts that we skipped by our alignment?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-18 22:04 ` Alex Williamson
  2015-11-19 10:29   ` Pavel Fedin
@ 2015-11-24 15:34   ` Peter Maydell
  2015-11-25  7:00     ` Pavel Fedin
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2015-11-24 15:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Pavel Fedin, QEMU Developers

On 18 November 2015 at 22:04, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2015-11-17 at 10:46 +0300, Pavel Fedin wrote:
>> On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
>> alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
>> support 1K page size, while minimum SMMU page size is 4K.

> I don't understand how this is supposed to work, if we align to a larger
> size than the processor, then there are processor size pages of RAM than
> could be handed out as DMA targets for devices, but we can't map them
> through the IOMMU.  Thus if the guest tries to use them, we get IOMMU
> faults in the host and likely memory corruption in the guest because the
> device can't read or write to the page it's supposed to.  This doesn't
> seem like the right solution.  Thanks,

There are a number of different interesting page sizes here:
 * the host kernel page size
 * the target CPU architecture's worst-case smallest page size
 * the page size the guest kernel is actually using at the moment
   (consider a 4K-page guest kernel on a 64K-page host kernel)

These don't necessarily have to all be the same. I would
expect VFIO to be interested in the host kernel page size,
not TARGET_PAGE_ALIGN. It might also be interested in the
in-practice guest kernel page settings, but you can't actually
determine those from outside. (In general non-TCG code should
probably not try to use the TARGET_PAGE_* constants.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-24 15:34   ` Peter Maydell
@ 2015-11-25  7:00     ` Pavel Fedin
  2015-12-02 19:05       ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Fedin @ 2015-11-25  7:00 UTC (permalink / raw)
  To: 'Peter Maydell', 'Alex Williamson'
  Cc: 'QEMU Developers'

 Hello!

> There are a number of different interesting page sizes here:
>  * the host kernel page size
>  * the target CPU architecture's worst-case smallest page size
>  * the page size the guest kernel is actually using at the moment
>    (consider a 4K-page guest kernel on a 64K-page host kernel)
> 
> These don't necessarily have to all be the same. I would
> expect VFIO to be interested in the host kernel page size,
> not TARGET_PAGE_ALIGN.

 So would the appropriate fix be just replacing TARGET_PAGE_ALIGN with HOST_PAGE_ALIGN?
 Alex: there's however, still no answer to "why are we aligning up, not down?" question..

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-25  7:00     ` Pavel Fedin
@ 2015-12-02 19:05       ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2015-12-02 19:05 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: 'Peter Maydell', 'QEMU Developers'

On Wed, 2015-11-25 at 10:00 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > There are a number of different interesting page sizes here:
> >  * the host kernel page size
> >  * the target CPU architecture's worst-case smallest page size
> >  * the page size the guest kernel is actually using at the moment
> >    (consider a 4K-page guest kernel on a 64K-page host kernel)
> > 
> > These don't necessarily have to all be the same. I would
> > expect VFIO to be interested in the host kernel page size,
> > not TARGET_PAGE_ALIGN.
> 
>  So would the appropriate fix be just replacing TARGET_PAGE_ALIGN with HOST_PAGE_ALIGN?
>  Alex: there's however, still no answer to "why are we aligning up, not down?" question..


Sorry for the late reply, I've been on holiday.  It may simply be a
matter of "because it worked".  If we round down then any sub-mapping
within the page aliases to the same IOMMU mappings and we don't know how
to deal with that.  Duplicate IOMMU mappings will fail and any unmap
will unmap the whole page.  As it is, sub-page mappings get ignored,
which is sometimes the correct (or at least adequate) behavior.

For instance, one case I know of realtek NICs with a 4K PCI BAR which is
split by a quirk.  We currently align up, which means that either side
of the mapping is ignored (becomes zero sized).  Being an MMIO region,
dropping the mapping disables peer-to-peer for this area, which is
generally not something that gets used anyway.  The right thing to do
would be to ignore the quirk for the IOMMU mapping, but then we need
enough smarts to handle not failing when the second sub-region is
mapped.

We might do the same thing in some of the x86-specific PAM regions, but
these are never DMA targets, so it doesn't cause problems either.

So I don't think there's an easy solution of "just round down", it
involves something much more significant than that.  It might be the
right thing to do though if we ever expect any of those sub-mappings to
be DMA targets.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-24 15:24       ` Pavel Fedin
@ 2015-12-02 19:40         ` Alex Williamson
  2015-12-03  9:02           ` Pavel Fedin
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2015-12-02 19:40 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: 'Peter Maydell', qemu-devel

On Tue, 2015-11-24 at 18:24 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > >  So, i've got this problem on ARM64. On ARM64 we actually can never have 1K pages. This page
> > > size was supported only by old 32-bit ARM CPUs, up to ARMv5 IIRC, then it was dropped. Linux
> > > OS never even used it.
> > >  But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is defined to 10 for ARM.
> > > And, ARM64 and ARM32 is actually the same target for qemu, so this is why we still get it.
> > >  Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should set it according to
> > > the actual used CPU. Then this IOMMU alignment problem would disappear automatically. What do
> > > you think?
> > >  Cc'ed Peter since he is the main ARM guy here.
> > 
> > Do we only see these alignments when we're emulating those old 1k page
> > processors?
> 
>  No. We see them when we are emulating brand new ARM64. And we see them only because TARGET_PAGE_BITS is still hardcoded to 10.
> 
> > If not, should we really be telling a 4k page VM about 1k
> > aligned memory?
> 
>  Well, first of all, we are not telling about aligned memory at all. I've researched for the origin of this address, it appears when we are mapping MSI-X BAR of the VFIO device.
>  My device defines this BAR to be of 2M size. In this case qemu splits it up into three regions:
>  1) Region below the MSI-X table (it's called "mmap", for me it's empty because table offset is 0)
>  2) MSI-X table itself (20 vectors = 0x00000140 bytes for me).
>  3) Region above the MSi-X table (it's called "mmap msix-hi"). Size for my case is 2M - REAL_HOST_PAGE_ALIGN(0x140) = 0x1FF000.
> Regions (1) and (3) are passed through and directly mapped by VFIO, region (2) is emulated.
>  Now, we have PBA. The device says that PBA is placed in memory specified by the same BAR as table, and its offset is 0x000f0000. PBA is also emulated by qemu, and as a result it overlaps "mmap msix-hi" region, which breaks up into two fragments as a consequence:
>  3a) offset 0x00000 size 0x0EF000
>  3b) offset 0xEF008 size 0x10FFF8
>  PBA sits between these fragments, having only 8 bytes in size.
>  Attempt to map (3b) causes the problem. vfio_mmap_region() is expected to align this up, and it does, but it does it to a minimum possible granularity for ARM platform, which, as qemu thinks, is always 1K.

Yep, on x86 we'd have target page size == host page size == minimum
iommu page size and we'd align that up to something that can be mapped,
which means we wouldn't have an iommu mapping for peer-to-peer in this
space, but nobody is likely to notice.  There are devices that want to
start taking advantage of the fact that we map mmio through the iommu
and start enabling p2p though, so it's also not something I'm eager to
remove.

Really, the problem seems to be the flat _processor_ view of memory vs
what we'd like to map through the iommu, the flat _device_ view of
memory.  All the quirks and QEMU trapping of sub-regions get wiped away
from the device view.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-02 19:40         ` Alex Williamson
@ 2015-12-03  9:02           ` Pavel Fedin
  2015-12-03 16:26             ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Fedin @ 2015-12-03  9:02 UTC (permalink / raw)
  To: 'Alex Williamson'; +Cc: 'Peter Maydell', qemu-devel

 Hello!

> >  My device defines this BAR to be of 2M size. In this case qemu splits it up into three
> > regions:
> >  1) Region below the MSI-X table (it's called "mmap", for me it's empty because table offset
> > is 0)
> >  2) MSI-X table itself (20 vectors = 0x00000140 bytes for me).
> >  3) Region above the MSi-X table (it's called "mmap msix-hi"). Size for my case is 2M -
> > REAL_HOST_PAGE_ALIGN(0x140) = 0x1FF000.
> > Regions (1) and (3) are passed through and directly mapped by VFIO, region (2) is emulated.
> >  Now, we have PBA. The device says that PBA is placed in memory specified by the same BAR as
> > table, and its offset is 0x000f0000. PBA is also emulated by qemu, and as a result it overlaps
> > "mmap msix-hi" region, which breaks up into two fragments as a consequence:
> >  3a) offset 0x00000 size 0x0EF000
> >  3b) offset 0xEF008 size 0x10FFF8
> >  PBA sits between these fragments, having only 8 bytes in size.
> >  Attempt to map (3b) causes the problem. vfio_mmap_region() is expected to align this up,
> > and it does, but it does it to a minimum possible granularity for ARM platform, which, as qemu
> > thinks, is always 1K.
> 
> Yep, on x86 we'd have target page size == host page size == minimum
> iommu page size and we'd align that up to something that can be mapped,
> which means we wouldn't have an iommu mapping for peer-to-peer in this
> space, but nobody is likely to notice.

 So, OK, to keep the quoting short... I've also read your previous reply about that "rounding up just works". Let's sum things up.

1. Rounding up "just works" in my case too. So i don't see a direct reason to change it.
2. The only problem is rounding up to 1K, which TARGET_PAGE_ALIGN() does on ARM. What we need is 4K,
   which is appropriate for host too. And, by the way, for modern ARM guests too. So, we can do either of:
   a) Keep my original approach - TARGET_PAGE_ALIGN(), then align to vfio_container_granularity().
   b) Just align to vfio_container_granularity() instead of using TARGET_PAGE_ALIGN().
   c) Use HOST_PAGE_ALIGN() instead of TARGET_PAGE_ALIGN() (what Peter suggested).

   On x86 there would be no difference, because all these alignments are identical. On ARM, actually, all of these approaches would also give correct result, because it's only TARGET_PAGE_ALIGN() wrong in this case. So - what do you think is the most appropriate? Let's make a choice and i'll send a proper patch.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-03  9:02           ` Pavel Fedin
@ 2015-12-03 16:26             ` Alex Williamson
  2015-12-03 16:33               ` Peter Maydell
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2015-12-03 16:26 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: 'Peter Maydell', qemu-devel

On Thu, 2015-12-03 at 12:02 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > >  My device defines this BAR to be of 2M size. In this case qemu splits it up into three
> > > regions:
> > >  1) Region below the MSI-X table (it's called "mmap", for me it's empty because table offset
> > > is 0)
> > >  2) MSI-X table itself (20 vectors = 0x00000140 bytes for me).
> > >  3) Region above the MSi-X table (it's called "mmap msix-hi"). Size for my case is 2M -
> > > REAL_HOST_PAGE_ALIGN(0x140) = 0x1FF000.
> > > Regions (1) and (3) are passed through and directly mapped by VFIO, region (2) is emulated.
> > >  Now, we have PBA. The device says that PBA is placed in memory specified by the same BAR as
> > > table, and its offset is 0x000f0000. PBA is also emulated by qemu, and as a result it overlaps
> > > "mmap msix-hi" region, which breaks up into two fragments as a consequence:
> > >  3a) offset 0x00000 size 0x0EF000
> > >  3b) offset 0xEF008 size 0x10FFF8
> > >  PBA sits between these fragments, having only 8 bytes in size.
> > >  Attempt to map (3b) causes the problem. vfio_mmap_region() is expected to align this up,
> > > and it does, but it does it to a minimum possible granularity for ARM platform, which, as qemu
> > > thinks, is always 1K.
> > 
> > Yep, on x86 we'd have target page size == host page size == minimum
> > iommu page size and we'd align that up to something that can be mapped,
> > which means we wouldn't have an iommu mapping for peer-to-peer in this
> > space, but nobody is likely to notice.
> 
>  So, OK, to keep the quoting short... I've also read your previous reply about that "rounding up just works". Let's sum things up.
> 
> 1. Rounding up "just works" in my case too. So i don't see a direct reason to change it.
> 2. The only problem is rounding up to 1K, which TARGET_PAGE_ALIGN() does on ARM. What we need is 4K,
>    which is appropriate for host too. And, by the way, for modern ARM guests too. So, we can do either of:
>    a) Keep my original approach - TARGET_PAGE_ALIGN(), then align to vfio_container_granularity().
>    b) Just align to vfio_container_granularity() instead of using TARGET_PAGE_ALIGN().
>    c) Use HOST_PAGE_ALIGN() instead of TARGET_PAGE_ALIGN() (what Peter suggested).
> 
>    On x86 there would be no difference, because all these alignments are identical. On ARM, actually, all of these approaches would also give correct result, because it's only TARGET_PAGE_ALIGN() wrong in this case. So - what do you think is the most appropriate? Let's make a choice and i'll send a proper patch.

I feel a lot more comfortable if we limit the scope to MMIO regions of
PCI devices.  The problems I brought up before about the device not
being able to DMA to a target aligned RAM address are still a
possibility that I think we want to catch.  To do that, I think we just
need:

Object *obj = memory_region_owner(section->mr);

if (object_dynamic_cast(obj, "pci-device")) {
    /* HOST_PAGE_ALIGN... */
} else {
    /* TARGET_PAGE_ALIGN... */
}

Thanks,
Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-03 16:26             ` Alex Williamson
@ 2015-12-03 16:33               ` Peter Maydell
  2015-12-03 17:19                 ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2015-12-03 16:33 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Pavel Fedin, QEMU Developers

On 3 December 2015 at 16:26, Alex Williamson <alex.williamson@redhat.com> wrote:
> I feel a lot more comfortable if we limit the scope to MMIO regions of
> PCI devices.  The problems I brought up before about the device not
> being able to DMA to a target aligned RAM address are still a
> possibility that I think we want to catch.  To do that, I think we just
> need:
>
> Object *obj = memory_region_owner(section->mr);
>
> if (object_dynamic_cast(obj, "pci-device")) {
>     /* HOST_PAGE_ALIGN... */
> } else {
>     /* TARGET_PAGE_ALIGN... */
> }

This looks very odd to me, in two ways: (a) behaving differently
for PCI passthrough vs other kinds of passthrough, and (b) caring
about TARGET_PAGE_ALIGN at all. TARGET_PAGE_ALIGN really isn't
something vfio should need to care about I think.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-03 16:33               ` Peter Maydell
@ 2015-12-03 17:19                 ` Alex Williamson
  2015-12-03 17:36                   ` Peter Maydell
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2015-12-03 17:19 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Pavel Fedin, QEMU Developers

On Thu, 2015-12-03 at 16:33 +0000, Peter Maydell wrote:
> On 3 December 2015 at 16:26, Alex Williamson <alex.williamson@redhat.com> wrote:
> > I feel a lot more comfortable if we limit the scope to MMIO regions of
> > PCI devices.  The problems I brought up before about the device not
> > being able to DMA to a target aligned RAM address are still a
> > possibility that I think we want to catch.  To do that, I think we just
> > need:
> >
> > Object *obj = memory_region_owner(section->mr);
> >
> > if (object_dynamic_cast(obj, "pci-device")) {
> >     /* HOST_PAGE_ALIGN... */
> > } else {
> >     /* TARGET_PAGE_ALIGN... */
> > }
> 
> This looks very odd to me, in two ways: (a) behaving differently
> for PCI passthrough vs other kinds of passthrough,

It's a matter of risk.  If we align an MMIO range out of existence all
we've prevented is peer-to-peer DMA between assigned devices.  Chances
of anyone caring about that are slim to none.  If we align RAM out of
existence, that's a much, much more significant risk that we've just
introduced a data integrity issue for the VM.

>  and (b) caring
> about TARGET_PAGE_ALIGN at all. TARGET_PAGE_ALIGN really isn't
> something vfio should need to care about I think.

But I think we do.  If a RAM address is target page aligned, it could be
a valid DMA target for the device.  If we align it out of existence and
the device is programmed to perform a DMA to that address, the IOMMU
will block it, the VM will not be informed and will continue executing
with invalid data.  The host page alignment is only relevant here if we
wanted to round down, which is probably the more correct thing to do,
but is much more complicated due to the aliasing issue I mentioned in a
previous reply.  To do that we really need a MemoryListener on the
device view of the address map rather than the processor view of the
address map.  Lacking that, we want the IOMMU to fault if when we're
asking it to do mappings below the granularity that it's able to do.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-03 17:19                 ` Alex Williamson
@ 2015-12-03 17:36                   ` Peter Maydell
  2015-12-03 17:58                     ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2015-12-03 17:36 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Pavel Fedin, QEMU Developers

On 3 December 2015 at 17:19, Alex Williamson <alex.williamson@redhat.com> wrote:
> On Thu, 2015-12-03 at 16:33 +0000, Peter Maydell wrote:
>> On 3 December 2015 at 16:26, Alex Williamson <alex.williamson@redhat.com> wrote:
>> > I feel a lot more comfortable if we limit the scope to MMIO regions of
>> > PCI devices.  The problems I brought up before about the device not
>> > being able to DMA to a target aligned RAM address are still a
>> > possibility that I think we want to catch.  To do that, I think we just
>> > need:
>> >
>> > Object *obj = memory_region_owner(section->mr);
>> >
>> > if (object_dynamic_cast(obj, "pci-device")) {
>> >     /* HOST_PAGE_ALIGN... */
>> > } else {
>> >     /* TARGET_PAGE_ALIGN... */
>> > }
>>
>> This looks very odd to me, in two ways: (a) behaving differently
>> for PCI passthrough vs other kinds of passthrough,
>
> It's a matter of risk.  If we align an MMIO range out of existence all
> we've prevented is peer-to-peer DMA between assigned devices.  Chances
> of anyone caring about that are slim to none.  If we align RAM out of
> existence, that's a much, much more significant risk that we've just
> introduced a data integrity issue for the VM.

I don't see why this is different for PCI devices versus
memory-mapped passthrough devices, though. If what you mean
is "is this MemoryRegion not RAM" maybe you want
if (!memory_region_is_ram(mr))  ?

>>  and (b) caring
>> about TARGET_PAGE_ALIGN at all. TARGET_PAGE_ALIGN really isn't
>> something vfio should need to care about I think.
>
> But I think we do.  If a RAM address is target page aligned, it could be
> a valid DMA target for the device.

TARGET_PAGE_ALIGN doesn't tell you whether an address is actually
page aligned for the guest, though. In fact, you can't tell what
page size the guest happens to be using (or what the alignment
restrictions on doing DMA might be, or the page size being used by
the IOMMU, which isn't necessarily the guest page size either).

> If we align it out of existence and
> the device is programmed to perform a DMA to that address, the IOMMU
> will block it, the VM will not be informed and will continue executing
> with invalid data.

Shouldn't this cause the device to say "hey, my DMA transaction
failed, I will flag that up as an error" ? (That's not much better
as a failure situation, of course.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-03 17:36                   ` Peter Maydell
@ 2015-12-03 17:58                     ` Alex Williamson
  2015-12-07 10:53                       ` Pavel Fedin
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2015-12-03 17:58 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Pavel Fedin, QEMU Developers

On Thu, 2015-12-03 at 17:36 +0000, Peter Maydell wrote:
> On 3 December 2015 at 17:19, Alex Williamson <alex.williamson@redhat.com> wrote:
> > On Thu, 2015-12-03 at 16:33 +0000, Peter Maydell wrote:
> >> On 3 December 2015 at 16:26, Alex Williamson <alex.williamson@redhat.com> wrote:
> >> > I feel a lot more comfortable if we limit the scope to MMIO regions of
> >> > PCI devices.  The problems I brought up before about the device not
> >> > being able to DMA to a target aligned RAM address are still a
> >> > possibility that I think we want to catch.  To do that, I think we just
> >> > need:
> >> >
> >> > Object *obj = memory_region_owner(section->mr);
> >> >
> >> > if (object_dynamic_cast(obj, "pci-device")) {
> >> >     /* HOST_PAGE_ALIGN... */
> >> > } else {
> >> >     /* TARGET_PAGE_ALIGN... */
> >> > }
> >>
> >> This looks very odd to me, in two ways: (a) behaving differently
> >> for PCI passthrough vs other kinds of passthrough,
> >
> > It's a matter of risk.  If we align an MMIO range out of existence all
> > we've prevented is peer-to-peer DMA between assigned devices.  Chances
> > of anyone caring about that are slim to none.  If we align RAM out of
> > existence, that's a much, much more significant risk that we've just
> > introduced a data integrity issue for the VM.
> 
> I don't see why this is different for PCI devices versus
> memory-mapped passthrough devices, though. If what you mean
> is "is this MemoryRegion not RAM" maybe you want
> if (!memory_region_is_ram(mr))  ?

We would already skip it entirely if that worked, see
hw/vfio/common.c:vfio_listener_skipped_section().  MemoryRegions backed
by a ram ptr still sneak through that.  We're dealing with sub-host-page
MemoryRegionSections that are backed by an mmap to a vfio region.

> >>  and (b) caring
> >> about TARGET_PAGE_ALIGN at all. TARGET_PAGE_ALIGN really isn't
> >> something vfio should need to care about I think.
> >
> > But I think we do.  If a RAM address is target page aligned, it could be
> > a valid DMA target for the device.
> 
> TARGET_PAGE_ALIGN doesn't tell you whether an address is actually
> page aligned for the guest, though. In fact, you can't tell what
> page size the guest happens to be using (or what the alignment
> restrictions on doing DMA might be, or the page size being used by
> the IOMMU, which isn't necessarily the guest page size either).

TAGET_PAGE_ALIGN tells us that it *could* be a valid DMA target though.
The VM model is capable of using that as a page size, which means we
assume it is and want to generate a fault.

> > If we align it out of existence and
> > the device is programmed to perform a DMA to that address, the IOMMU
> > will block it, the VM will not be informed and will continue executing
> > with invalid data.
> 
> Shouldn't this cause the device to say "hey, my DMA transaction
> failed, I will flag that up as an error" ? (That's not much better
> as a failure situation, of course.)

DMA reads can get a target abort, DMA writes fail silently, other than
the possible IOMMU fault from the IOMMU driver, for which we have
nothing resembling an infrastructure for collecting that and reporting
it out to the user.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-03 17:58                     ` Alex Williamson
@ 2015-12-07 10:53                       ` Pavel Fedin
  2015-12-07 11:20                         ` Peter Maydell
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Fedin @ 2015-12-07 10:53 UTC (permalink / raw)
  To: 'Alex Williamson', 'Peter Maydell'
  Cc: 'QEMU Developers'

 Hello!

> TAGET_PAGE_ALIGN tells us that it *could* be a valid DMA target though.
> The VM model is capable of using that as a page size, which means we
> assume it is and want to generate a fault.

 We seem to have looped back. So...
 It is possible to fix this according to this assumption. In this case we would need to make TARGET_PAGE_BITS a variable. If we are emulating ancient armv5te, it will be set to 10. For modern targets, ARMv6 and newer, it will be 12.
 Peter, would you ACK this approach? BTW, we even have a kind of hack in target-arm/cpu.h:
--- cut ---
#if defined(CONFIG_USER_ONLY)
#define TARGET_PAGE_BITS 12
#else
/* The ARM MMU allows 1k pages.  */
/* ??? Linux doesn't actually use these, and they're deprecated in recent
   architecture revisions.  Maybe a configure option to disable them.  */
#define TARGET_PAGE_BITS 10
#endif
--- cut ---

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-07 10:53                       ` Pavel Fedin
@ 2015-12-07 11:20                         ` Peter Maydell
  2015-12-08 23:42                           ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2015-12-07 11:20 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: Alex Williamson, QEMU Developers

On 7 December 2015 at 10:53, Pavel Fedin <p.fedin@samsung.com> wrote:
>> TAGET_PAGE_ALIGN tells us that it *could* be a valid DMA target though.
>> The VM model is capable of using that as a page size, which means we
>> assume it is and want to generate a fault.
>
>  We seem to have looped back. So...
>  It is possible to fix this according to this assumption. In this
> case we would need to make TARGET_PAGE_BITS a variable. If we are
> emulating ancient armv5te, it will be set to 10. For modern targets,
> ARMv6 and newer, it will be 12.

You can't just make TARGET_PAGE_BITS a variable, it is used as a compile
time constant in a bunch of TCG internal stuff. It would be nice
if we didn't require it to be compile time, but it would be a lot of
work to fix (especially if you want to avoid it being a performance
hit).

In any case, that still doesn't fix the problem. On an AArch64
target CPU, TARGET_PAGE_BITS still has to be 12 (for a 4K
minimum page size), but the guest and host could still be using
64K pages. So your VFIO code *must* be able to deal with the
situation where TARGET_PAGE_BITS is smaller than any alignment
that the guest, host or IOMMU need to care about.

I still think the VFIO code needs to figure out what alignment
it actually cares about and find some way to determine what
that is, or alternatively if the relevant alignment is not
possible to determine, write the code so that it doesn't
need to care. Either way, TARGET_PAGE_ALIGN is not the answer.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-07 11:20                         ` Peter Maydell
@ 2015-12-08 23:42                           ` Alex Williamson
  2015-12-09  8:08                             ` Pavel Fedin
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2015-12-08 23:42 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Pavel Fedin, QEMU Developers

On Mon, 2015-12-07 at 11:20 +0000, Peter Maydell wrote:
> On 7 December 2015 at 10:53, Pavel Fedin <p.fedin@samsung.com> wrote:
> >> TAGET_PAGE_ALIGN tells us that it *could* be a valid DMA target though.
> >> The VM model is capable of using that as a page size, which means we
> >> assume it is and want to generate a fault.
> >
> >  We seem to have looped back. So...
> >  It is possible to fix this according to this assumption. In this
> > case we would need to make TARGET_PAGE_BITS a variable. If we are
> > emulating ancient armv5te, it will be set to 10. For modern targets,
> > ARMv6 and newer, it will be 12.
> 
> You can't just make TARGET_PAGE_BITS a variable, it is used as a compile
> time constant in a bunch of TCG internal stuff. It would be nice
> if we didn't require it to be compile time, but it would be a lot of
> work to fix (especially if you want to avoid it being a performance
> hit).
> 
> In any case, that still doesn't fix the problem. On an AArch64
> target CPU, TARGET_PAGE_BITS still has to be 12 (for a 4K
> minimum page size), but the guest and host could still be using
> 64K pages. So your VFIO code *must* be able to deal with the
> situation where TARGET_PAGE_BITS is smaller than any alignment
> that the guest, host or IOMMU need to care about.
> 
> I still think the VFIO code needs to figure out what alignment
> it actually cares about and find some way to determine what
> that is, or alternatively if the relevant alignment is not
> possible to determine, write the code so that it doesn't
> need to care. Either way, TARGET_PAGE_ALIGN is not the answer.

Ok, let's work our way down through the relevant page sizes, host,
IOMMU, and target.

The host page size is relevant because this is the granularity with
which the kernel can pin pages.  Every IOMMU mapping must be backed by a
pinned page in the current model since we don't really have hardware to
support IOMMU page faults.

The IOMMU page size defines the granularity with which we can map IOVA
to physical memory.  The IOMMU may support multiple page sizes, but what
we're really talking about here is the minimum page size.

The target page size is relevant because this defines the minimum
possible page size used within the VM.  We presume that anything less
than TARGET_PAGE_ALIGN cannot be referenced as a page by the VM CPU and
therefore is probably not allocated as a DMA buffer for a driver running
within the guest.

An implementation detail here is that the vfio type1 IOMMU model
currently exposes the host page size as the minimum IOMMU page size.
The reason for this is to simplify page accounting, if we don't allow
sub-host page mappings we don't need per page reference counting.  This
can be fixed within the current API, but kernel changes are required or
else locked page requirements due to over-counting become a problem.
The benefit though is that this abstracts the host page size from QEMU.

So let's take the easy scenario first, if target page size is greater
than or equal to the minimum IOMMU page size, we're golden.  We can map
anything that could be a target DMA buffer.  This leads to the current
situation that we simply ignore any ranges which disappear when we align
to the target page size.  It can't be a DMA buffer, ignore it.  Note
that the 64k host, 4k target problem goes away if type1 accounting is
fixed to allow IOMMU granularity mapping, since I think in the cases we
care about the IOMMU still supports 4k pages, otherwise...

Then we come to the scenario here, where target page size is less than
the minimum IOMMU page size.  The current code is intentionally trying
to trigger the vfio type1 error that this cannot be mapped.  To resolve
this, QEMU needs to decide if it's ok to provide the device with DMA
access to everything on that IOMMU granularity page, ensure that aliases
mapping the same IOMMU page are consistent and handle the reference
counting for those sub-mappings to avoid duplicate mappings and
premature unmaps.

So I think in the end, the one page size we care about is the minimum
IOMMU granularity.  We don't really care about the target page size at
all and maybe we only care about the host page size for determining what
might share a page with a sub-page mapping.  However, there's work to
get there (QEMU, kernel, or both depending on the specific config) and
the target page size trick has so far been a useful simplification.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-12-08 23:42                           ` Alex Williamson
@ 2015-12-09  8:08                             ` Pavel Fedin
  0 siblings, 0 replies; 20+ messages in thread
From: Pavel Fedin @ 2015-12-09  8:08 UTC (permalink / raw)
  To: 'Alex Williamson', 'Peter Maydell'
  Cc: 'QEMU Developers'

 Hello!

> So I think in the end, the one page size we care about is the minimum
> IOMMU granularity.  We don't really care about the target page size at
> all and maybe we only care about the host page size for determining what
> might share a page with a sub-page mapping.

 Ok, so, in v2 i remove TARGET_PAGE_ALIGN and simply align to vfio_container_granularity. Would it be OK for now?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size
  2015-11-19 10:29   ` Pavel Fedin
  2015-11-19 23:33     ` Alex Williamson
@ 2015-12-09 10:09     ` Alex Bennée
  1 sibling, 0 replies; 20+ messages in thread
From: Alex Bennée @ 2015-12-09 10:09 UTC (permalink / raw)
  To: Pavel Fedin
  Cc: 'Peter Maydell', 'Alex Williamson', qemu-devel


Pavel Fedin <p.fedin@samsung.com> writes:

>  Hello!
>
>> > On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
>> > alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
>> > support 1K page size, while minimum SMMU page size is 4K.
>> >
>> > This fixes problems like:
>> >
>> > 2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
>> > 2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 0x80002f0400,
>> 0x10fc00, 0x7f89b40400) = -22 (Invalid
>> > argument)
>> > qemu: hardware error: vfio: DMA mapping failed, unable to continue
>
> [skip]
>
>> I don't understand how this is supposed to work, if we align to a larger
>> size than the processor, then there are processor size pages of RAM than
>> could be handed out as DMA targets for devices, but we can't map them
>> through the IOMMU.  Thus if the guest tries to use them, we get IOMMU
>> faults in the host and likely memory corruption in the guest because the
>> device can't read or write to the page it's supposed to.  This doesn't
>> seem like the right solution.
>
>  Well, this was my first try on the problem. I've got your idea. But i guess we should discuss the proper solution then.
>  So, i've got this problem on ARM64. On ARM64 we actually can never have 1K pages. This page size was supported only by old 32-bit ARM CPUs, up to ARMv5 IIRC, then it was dropped. Linux OS never even used it.
>  But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is defined to 10 for ARM. And, ARM64 and ARM32 is actually the same target for qemu, so this is why we still get it.
>  Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should
>  set it according to the actual used CPU. Then this IOMMU alignment
>  problem would disappear automatically. What do you think?

Yes it should be. For one thing we pay a fairly high performance penalty
for using these smaller pages for no reason. What the best way to do
this remains to be seen as I think there a lot of fixed sized arrays
currently in the system based on various derivations of TARGET_PAGE_BITS.

>  Cc'ed Peter since he is the main ARM guy here.
>
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia


--
Alex Bennée

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-12-09 10:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17  7:46 [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size Pavel Fedin
2015-11-18 22:04 ` Alex Williamson
2015-11-19 10:29   ` Pavel Fedin
2015-11-19 23:33     ` Alex Williamson
2015-11-24 15:24       ` Pavel Fedin
2015-12-02 19:40         ` Alex Williamson
2015-12-03  9:02           ` Pavel Fedin
2015-12-03 16:26             ` Alex Williamson
2015-12-03 16:33               ` Peter Maydell
2015-12-03 17:19                 ` Alex Williamson
2015-12-03 17:36                   ` Peter Maydell
2015-12-03 17:58                     ` Alex Williamson
2015-12-07 10:53                       ` Pavel Fedin
2015-12-07 11:20                         ` Peter Maydell
2015-12-08 23:42                           ` Alex Williamson
2015-12-09  8:08                             ` Pavel Fedin
2015-12-09 10:09     ` Alex Bennée
2015-11-24 15:34   ` Peter Maydell
2015-11-25  7:00     ` Pavel Fedin
2015-12-02 19:05       ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.