All of lore.kernel.org
 help / color / mirror / Atom feed
* Shattering superpages impact on IOMMU in Xen
@ 2017-04-03 16:24 Oleksandr Tyshchenko
  2017-04-03 16:42 ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-03 16:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, al1img,
	Andrii Anisov, Volodymyr Babchuk, Julien Grall, Artem Mygaiev

Hi, all.

Playing with non-shared IOMMU in Xen on ARM I faced one interesting
thing. I found out that the superpages were shattered during domain
life cycle.
This is the result of mapping of foreign pages, ballooning memory,
even if domain maps Xen shared pages, etc.
I don't bother with the memory fragmentation at the moment. But,
shattering bothers me from the IOMMU point of view.
As the Xen owns IOMMU it might manipulate IOMMU page tables when
passthoughed/protected device doing DMA in Linux. It is hard to detect
when the DMA transaction isn't in progress
in order to prevent this race. So, if we have inflight transaction
from a device when changing IOMMU mapping we might get into trouble.
Unfortunately, not in all the cases the
faulting transaction can be restarted. The chance to hit the problem
increases during shattering.

I did next test:
The dom0 on my setup contains ethernet IP that are protected by IOMMU.
What is more, as the IOMMU I am playing with supports superpages (2M,
1G) the IOMMU driver
takes into account these capabilities when building page tables. As I
gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
only. As I am using NFS for both dom0 and domU the ethernet IP
performs DMA transactions almost all the time.
Sometimes, I see the IOMMU page faults during creating guest domain. I
think, it happens during Xen is shattering 2M mappings 4K mappings (it
unmaps dom0 pages by one 4K page at a time, then maps domU pages there
for copying domU images).
But, I don't see any page faults when the IOMMU page table was built
by 4K pages only.

I had a talk with Julien on IIRC and we came to conclusion that the
safest way would be to use 4K pages to prevent shattering, so the
IOMMU shouldn't report superpage capability.
On the other hand, if we build IOMMU from 4K pages we will have
performance drop (during building, walking page tables), TLB pressure,
etc.
Another possible solution Julien was suggesting is to always
ballooning with 2M, 1G, and not using 4K. That would help us to
prevent shattering effect.
The discussion was moved to the ML since it seems to be a generic
issue and the right solution should be think of.

What do you think is the right way to follow? Use 4K pages and don't
bother with shattering or try to optimize? And if the idea to make
balloon mechanism smarter makes sense how to teach balloon to do so?
Thank you.

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 16:24 Shattering superpages impact on IOMMU in Xen Oleksandr Tyshchenko
@ 2017-04-03 16:42 ` Andrew Cooper
  2017-04-03 17:02   ` Julien Grall
  2017-04-03 18:26   ` Oleksandr Tyshchenko
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2017-04-03 16:42 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Julien Grall,
	Andrii Anisov, Volodymyr Babchuk, al1img, Artem Mygaiev

On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
> Hi, all.
>
> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
> thing. I found out that the superpages were shattered during domain
> life cycle.
> This is the result of mapping of foreign pages, ballooning memory,
> even if domain maps Xen shared pages, etc.
> I don't bother with the memory fragmentation at the moment. But,
> shattering bothers me from the IOMMU point of view.
> As the Xen owns IOMMU it might manipulate IOMMU page tables when
> passthoughed/protected device doing DMA in Linux. It is hard to detect
> when the DMA transaction isn't in progress
> in order to prevent this race. So, if we have inflight transaction
> from a device when changing IOMMU mapping we might get into trouble.
> Unfortunately, not in all the cases the
> faulting transaction can be restarted. The chance to hit the problem
> increases during shattering.
>
> I did next test:
> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
> What is more, as the IOMMU I am playing with supports superpages (2M,
> 1G) the IOMMU driver
> takes into account these capabilities when building page tables. As I
> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
> only. As I am using NFS for both dom0 and domU the ethernet IP
> performs DMA transactions almost all the time.
> Sometimes, I see the IOMMU page faults during creating guest domain. I
> think, it happens during Xen is shattering 2M mappings 4K mappings (it
> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
> for copying domU images).
> But, I don't see any page faults when the IOMMU page table was built
> by 4K pages only.
>
> I had a talk with Julien on IIRC and we came to conclusion that the
> safest way would be to use 4K pages to prevent shattering, so the
> IOMMU shouldn't report superpage capability.
> On the other hand, if we build IOMMU from 4K pages we will have
> performance drop (during building, walking page tables), TLB pressure,
> etc.
> Another possible solution Julien was suggesting is to always
> ballooning with 2M, 1G, and not using 4K. That would help us to
> prevent shattering effect.
> The discussion was moved to the ML since it seems to be a generic
> issue and the right solution should be think of.
>
> What do you think is the right way to follow? Use 4K pages and don't
> bother with shattering or try to optimize? And if the idea to make
> balloon mechanism smarter makes sense how to teach balloon to do so?
> Thank you.

Ballooning and foreign mappings are terrible for trying to retain
superpage mappings.  No OS, not even Linux, can sensibly provide victim
pages in a useful way to avoid shattering.

If you care about performance, don't ever balloon.  Foreign mappings in
translated guests should start from the top of RAM, and work upwards.


As for the IOMMU specifically, things are rather easier.  It is the
guests responsibility to ensure that frames offered up for ballooning or
foreign mappings are unused.  Therefore, if anything cares about the
specific 4K region becoming non-present in the IOMMU mappings, it is the
guest kernels fault for offering up a frame already in use.

For the shattering however, It is Xen's responsibility to ensure that
all other mappings stay valid at all points.  The correct way to do this
is to construct a new L1 table, mirroring the L2 superpage but lacking
the specific 4K mapping in question, then atomically replace the L2
superpage entry with the new L1 table, then issue an IOMMU TLB
invalidation to remove any cached mappings.

By following that procedure, all DMA within the 2M region, but not
hitting the 4K frame, won't observe any interim lack of mappings.  It
appears from your description that Xen isn't following the procedure.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 16:42 ` Andrew Cooper
@ 2017-04-03 17:02   ` Julien Grall
  2017-04-03 17:16     ` Andrew Cooper
  2017-04-03 17:39     ` Oleksandr Tyshchenko
  2017-04-03 18:26   ` Oleksandr Tyshchenko
  1 sibling, 2 replies; 15+ messages in thread
From: Julien Grall @ 2017-04-03 17:02 UTC (permalink / raw)
  To: Andrew Cooper, Oleksandr Tyshchenko, xen-devel
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrii Anisov,
	Volodymyr Babchuk, al1img, Artem Mygaiev

Hi Andrew,

On 03/04/17 17:42, Andrew Cooper wrote:
> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>> Hi, all.
>>
>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>> thing. I found out that the superpages were shattered during domain
>> life cycle.
>> This is the result of mapping of foreign pages, ballooning memory,
>> even if domain maps Xen shared pages, etc.
>> I don't bother with the memory fragmentation at the moment. But,
>> shattering bothers me from the IOMMU point of view.
>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>> when the DMA transaction isn't in progress
>> in order to prevent this race. So, if we have inflight transaction
>> from a device when changing IOMMU mapping we might get into trouble.
>> Unfortunately, not in all the cases the
>> faulting transaction can be restarted. The chance to hit the problem
>> increases during shattering.
>>
>> I did next test:
>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>> What is more, as the IOMMU I am playing with supports superpages (2M,
>> 1G) the IOMMU driver
>> takes into account these capabilities when building page tables. As I
>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>> only. As I am using NFS for both dom0 and domU the ethernet IP
>> performs DMA transactions almost all the time.
>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>> for copying domU images).
>> But, I don't see any page faults when the IOMMU page table was built
>> by 4K pages only.
>>
>> I had a talk with Julien on IIRC and we came to conclusion that the
>> safest way would be to use 4K pages to prevent shattering, so the
>> IOMMU shouldn't report superpage capability.
>> On the other hand, if we build IOMMU from 4K pages we will have
>> performance drop (during building, walking page tables), TLB pressure,
>> etc.
>> Another possible solution Julien was suggesting is to always
>> ballooning with 2M, 1G, and not using 4K. That would help us to
>> prevent shattering effect.
>> The discussion was moved to the ML since it seems to be a generic
>> issue and the right solution should be think of.
>>
>> What do you think is the right way to follow? Use 4K pages and don't
>> bother with shattering or try to optimize? And if the idea to make
>> balloon mechanism smarter makes sense how to teach balloon to do so?
>> Thank you.
>
> Ballooning and foreign mappings are terrible for trying to retain
> superpage mappings.  No OS, not even Linux, can sensibly provide victim
> pages in a useful way to avoid shattering.
>
> If you care about performance, don't ever balloon.  Foreign mappings in
> translated guests should start from the top of RAM, and work upwards.

I am not sure to understand this. Can you extend?

>
>
> As for the IOMMU specifically, things are rather easier.  It is the
> guests responsibility to ensure that frames offered up for ballooning or
> foreign mappings are unused.  Therefore, if anything cares about the
> specific 4K region becoming non-present in the IOMMU mappings, it is the
> guest kernels fault for offering up a frame already in use.
>
> For the shattering however, It is Xen's responsibility to ensure that
> all other mappings stay valid at all points.  The correct way to do this
> is to construct a new L1 table, mirroring the L2 superpage but lacking
> the specific 4K mapping in question, then atomically replace the L2
> superpage entry with the new L1 table, then issue an IOMMU TLB
> invalidation to remove any cached mappings.
>
> By following that procedure, all DMA within the 2M region, but not
> hitting the 4K frame, won't observe any interim lack of mappings.  It
> appears from your description that Xen isn't following the procedure.

Xen is following what's the ARM ARM is mandating. For shattering page 
table, we have to follow the break-before-sequence i.e:
	- Invalidate the L2 entry
	- Flush the TLBs
	- Add the new L1 table

See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a 
small window where there are no valid mapping. It is easy to trap data 
abort from processor and restarting it but not for device memory 
transactions.

Xen by default is sharing stage-2 page tables with between the IOMMU and 
the MMU. However, from the discussion I had with Oleksandr, they are not 
sharing page tables and still see the problem. I am not sure how they 
are updating the page table here. Oleksandr, can you provide more details?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 17:02   ` Julien Grall
@ 2017-04-03 17:16     ` Andrew Cooper
  2017-04-03 18:06       ` Julien Grall
  2017-04-03 17:39     ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2017-04-03 17:16 UTC (permalink / raw)
  To: Julien Grall, Oleksandr Tyshchenko, xen-devel
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrii Anisov,
	Volodymyr Babchuk, al1img, Artem Mygaiev

On 03/04/17 18:02, Julien Grall wrote:
> Hi Andrew,
>
> On 03/04/17 17:42, Andrew Cooper wrote:
>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>> Hi, all.
>>>
>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>> thing. I found out that the superpages were shattered during domain
>>> life cycle.
>>> This is the result of mapping of foreign pages, ballooning memory,
>>> even if domain maps Xen shared pages, etc.
>>> I don't bother with the memory fragmentation at the moment. But,
>>> shattering bothers me from the IOMMU point of view.
>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>> when the DMA transaction isn't in progress
>>> in order to prevent this race. So, if we have inflight transaction
>>> from a device when changing IOMMU mapping we might get into trouble.
>>> Unfortunately, not in all the cases the
>>> faulting transaction can be restarted. The chance to hit the problem
>>> increases during shattering.
>>>
>>> I did next test:
>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>> 1G) the IOMMU driver
>>> takes into account these capabilities when building page tables. As I
>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>> performs DMA transactions almost all the time.
>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>> for copying domU images).
>>> But, I don't see any page faults when the IOMMU page table was built
>>> by 4K pages only.
>>>
>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>> safest way would be to use 4K pages to prevent shattering, so the
>>> IOMMU shouldn't report superpage capability.
>>> On the other hand, if we build IOMMU from 4K pages we will have
>>> performance drop (during building, walking page tables), TLB pressure,
>>> etc.
>>> Another possible solution Julien was suggesting is to always
>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>> prevent shattering effect.
>>> The discussion was moved to the ML since it seems to be a generic
>>> issue and the right solution should be think of.
>>>
>>> What do you think is the right way to follow? Use 4K pages and don't
>>> bother with shattering or try to optimize? And if the idea to make
>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>> Thank you.
>>
>> Ballooning and foreign mappings are terrible for trying to retain
>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>> pages in a useful way to avoid shattering.
>>
>> If you care about performance, don't ever balloon.  Foreign mappings in
>> translated guests should start from the top of RAM, and work upwards.
>
> I am not sure to understand this. Can you extend?

I am not sure what is unclear.  Handing random frames of RAM back to the
hypervisor is what exacerbates host superpage fragmentation, and all
balloon drivers currently do it.

If you want to avoid host superpage fragmentation, don't use a
scattergun approach of handing frames back to Xen.  However, because
even Linux doesn't provide enough hooks into the physical memory
management logic, the only solution is to not balloon at all, and to use
already-unoccupied frames for foreign mappings.

>
>>
>>
>> As for the IOMMU specifically, things are rather easier.  It is the
>> guests responsibility to ensure that frames offered up for ballooning or
>> foreign mappings are unused.  Therefore, if anything cares about the
>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>> guest kernels fault for offering up a frame already in use.
>>
>> For the shattering however, It is Xen's responsibility to ensure that
>> all other mappings stay valid at all points.  The correct way to do this
>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>> the specific 4K mapping in question, then atomically replace the L2
>> superpage entry with the new L1 table, then issue an IOMMU TLB
>> invalidation to remove any cached mappings.
>>
>> By following that procedure, all DMA within the 2M region, but not
>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>> appears from your description that Xen isn't following the procedure.
>
> Xen is following what's the ARM ARM is mandating. For shattering page
> table, we have to follow the break-before-sequence i.e:
>     - Invalidate the L2 entry
>     - Flush the TLBs
>     - Add the new L1 table
> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
> small window where there are no valid mapping. It is easy to trap data
> abort from processor and restarting it but not for device memory
> transactions.
>
> Xen by default is sharing stage-2 page tables with between the IOMMU
> and the MMU. However, from the discussion I had with Oleksandr, they
> are not sharing page tables and still see the problem. I am not sure
> how they are updating the page table here. Oleksandr, can you provide
> more details?

Are you saying that ARM has no way of making atomic updates to the IOMMU
mappings?  (How do I get access to that document?  Google gets me to
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
but
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
which looks like the document you specified results in 404.)

If so, that is an architecture bug IMO.  By design, the IOMMU is out of
control of guest software, and the hypervisor should be able to make
atomic modifications without guest cooperation.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 17:02   ` Julien Grall
  2017-04-03 17:16     ` Andrew Cooper
@ 2017-04-03 17:39     ` Oleksandr Tyshchenko
  2017-04-03 17:53       ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-03 17:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrew Cooper,
	Andrii Anisov, Volodymyr Babchuk, al1img, xen-devel,
	Artem Mygaiev

Hi, Julien.

On Mon, Apr 3, 2017 at 8:02 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Andrew,
>
>
> On 03/04/17 17:42, Andrew Cooper wrote:
>>
>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>>
>>> Hi, all.
>>>
>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>> thing. I found out that the superpages were shattered during domain
>>> life cycle.
>>> This is the result of mapping of foreign pages, ballooning memory,
>>> even if domain maps Xen shared pages, etc.
>>> I don't bother with the memory fragmentation at the moment. But,
>>> shattering bothers me from the IOMMU point of view.
>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>> when the DMA transaction isn't in progress
>>> in order to prevent this race. So, if we have inflight transaction
>>> from a device when changing IOMMU mapping we might get into trouble.
>>> Unfortunately, not in all the cases the
>>> faulting transaction can be restarted. The chance to hit the problem
>>> increases during shattering.
>>>
>>> I did next test:
>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>> 1G) the IOMMU driver
>>> takes into account these capabilities when building page tables. As I
>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>> performs DMA transactions almost all the time.
>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>> for copying domU images).
>>> But, I don't see any page faults when the IOMMU page table was built
>>> by 4K pages only.
>>>
>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>> safest way would be to use 4K pages to prevent shattering, so the
>>> IOMMU shouldn't report superpage capability.
>>> On the other hand, if we build IOMMU from 4K pages we will have
>>> performance drop (during building, walking page tables), TLB pressure,
>>> etc.
>>> Another possible solution Julien was suggesting is to always
>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>> prevent shattering effect.
>>> The discussion was moved to the ML since it seems to be a generic
>>> issue and the right solution should be think of.
>>>
>>> What do you think is the right way to follow? Use 4K pages and don't
>>> bother with shattering or try to optimize? And if the idea to make
>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>> Thank you.
>>
>>
>> Ballooning and foreign mappings are terrible for trying to retain
>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>> pages in a useful way to avoid shattering.
>>
>> If you care about performance, don't ever balloon.  Foreign mappings in
>> translated guests should start from the top of RAM, and work upwards.
>
>
> I am not sure to understand this. Can you extend?
>
>>
>>
>> As for the IOMMU specifically, things are rather easier.  It is the
>> guests responsibility to ensure that frames offered up for ballooning or
>> foreign mappings are unused.  Therefore, if anything cares about the
>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>> guest kernels fault for offering up a frame already in use.
>>
>> For the shattering however, It is Xen's responsibility to ensure that
>> all other mappings stay valid at all points.  The correct way to do this
>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>> the specific 4K mapping in question, then atomically replace the L2
>> superpage entry with the new L1 table, then issue an IOMMU TLB
>> invalidation to remove any cached mappings.
>>
>> By following that procedure, all DMA within the 2M region, but not
>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>> appears from your description that Xen isn't following the procedure.
>
>
> Xen is following what's the ARM ARM is mandating. For shattering page table,
> we have to follow the break-before-sequence i.e:
>         - Invalidate the L2 entry
>         - Flush the TLBs
>         - Add the new L1 table
>
> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a small
> window where there are no valid mapping. It is easy to trap data abort from
> processor and restarting it but not for device memory transactions.
>
> Xen by default is sharing stage-2 page tables with between the IOMMU and the
> MMU. However, from the discussion I had with Oleksandr, they are not sharing
> page tables and still see the problem. I am not sure how they are updating
> the page table here. Oleksandr, can you provide more details?

Yes, the IOMMU is a IPMMU-VMSA that doesn't share page table with the
CPU. It uses stage-1 page table. So, the IOMMU driver builds own page
table and feeds it to the HW IP.
For this reason I ported ARM LPAE page table allocator [1] and
modified it to work inside Xen [2]. I hope that I didn't break or
change allocation/updating/removing logic.

What we have during shattering:
In order to unmap single 4K page within 2M memory block the IOMMU
driver has to split memory block into 512 4K pages
and map them at the next level except one page we try to unmap, then
it has to replace old block entry with new table entry and perform
cache flush.

[1] http://lxr.free-electrons.com/source/drivers/iommu/io-pgtable-arm.c
[2] https://github.com/otyshchenko1/xen/blob/ipmmu_ml/xen/drivers/passthrough/arm/io-pgtable-arm.c

>
> Cheers,
>
> --
> Julien Grall



-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 17:39     ` Oleksandr Tyshchenko
@ 2017-04-03 17:53       ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-03 17:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrew Cooper,
	Andrii Anisov, Volodymyr Babchuk, al1img, xen-devel,
	Artem Mygaiev

On Mon, Apr 3, 2017 at 8:39 PM, Oleksandr Tyshchenko
<olekstysh@gmail.com> wrote:
> Hi, Julien.
>
> On Mon, Apr 3, 2017 at 8:02 PM, Julien Grall <julien.grall@arm.com> wrote:
>> Hi Andrew,
>>
>>
>> On 03/04/17 17:42, Andrew Cooper wrote:
>>>
>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>>>
>>>> Hi, all.
>>>>
>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>>> thing. I found out that the superpages were shattered during domain
>>>> life cycle.
>>>> This is the result of mapping of foreign pages, ballooning memory,
>>>> even if domain maps Xen shared pages, etc.
>>>> I don't bother with the memory fragmentation at the moment. But,
>>>> shattering bothers me from the IOMMU point of view.
>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>>> when the DMA transaction isn't in progress
>>>> in order to prevent this race. So, if we have inflight transaction
>>>> from a device when changing IOMMU mapping we might get into trouble.
>>>> Unfortunately, not in all the cases the
>>>> faulting transaction can be restarted. The chance to hit the problem
>>>> increases during shattering.
>>>>
>>>> I did next test:
>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>>> 1G) the IOMMU driver
>>>> takes into account these capabilities when building page tables. As I
>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>>> performs DMA transactions almost all the time.
>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>>> for copying domU images).
>>>> But, I don't see any page faults when the IOMMU page table was built
>>>> by 4K pages only.
>>>>
>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>>> safest way would be to use 4K pages to prevent shattering, so the
>>>> IOMMU shouldn't report superpage capability.
>>>> On the other hand, if we build IOMMU from 4K pages we will have
>>>> performance drop (during building, walking page tables), TLB pressure,
>>>> etc.
>>>> Another possible solution Julien was suggesting is to always
>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>>> prevent shattering effect.
>>>> The discussion was moved to the ML since it seems to be a generic
>>>> issue and the right solution should be think of.
>>>>
>>>> What do you think is the right way to follow? Use 4K pages and don't
>>>> bother with shattering or try to optimize? And if the idea to make
>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>>> Thank you.
>>>
>>>
>>> Ballooning and foreign mappings are terrible for trying to retain
>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>>> pages in a useful way to avoid shattering.
>>>
>>> If you care about performance, don't ever balloon.  Foreign mappings in
>>> translated guests should start from the top of RAM, and work upwards.
>>
>>
>> I am not sure to understand this. Can you extend?
>>
>>>
>>>
>>> As for the IOMMU specifically, things are rather easier.  It is the
>>> guests responsibility to ensure that frames offered up for ballooning or
>>> foreign mappings are unused.  Therefore, if anything cares about the
>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>>> guest kernels fault for offering up a frame already in use.
>>>
>>> For the shattering however, It is Xen's responsibility to ensure that
>>> all other mappings stay valid at all points.  The correct way to do this
>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>>> the specific 4K mapping in question, then atomically replace the L2
>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>>> invalidation to remove any cached mappings.
>>>
>>> By following that procedure, all DMA within the 2M region, but not
>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>>> appears from your description that Xen isn't following the procedure.
>>
>>
>> Xen is following what's the ARM ARM is mandating. For shattering page table,
>> we have to follow the break-before-sequence i.e:
>>         - Invalidate the L2 entry
>>         - Flush the TLBs
>>         - Add the new L1 table
>>
>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a small
>> window where there are no valid mapping. It is easy to trap data abort from
>> processor and restarting it but not for device memory transactions.
>>
>> Xen by default is sharing stage-2 page tables with between the IOMMU and the
>> MMU. However, from the discussion I had with Oleksandr, they are not sharing
>> page tables and still see the problem. I am not sure how they are updating
>> the page table here. Oleksandr, can you provide more details?
>
> Yes, the IOMMU is a IPMMU-VMSA that doesn't share page table with the
> CPU. It uses stage-1 page table. So, the IOMMU driver builds own page
> table and feeds it to the HW IP.
> For this reason I ported ARM LPAE page table allocator [1] and
> modified it to work inside Xen [2]. I hope that I didn't break or
> change allocation/updating/removing logic.
>
> What we have during shattering:
> In order to unmap single 4K page within 2M memory block the IOMMU
> driver has to split memory block into 512 4K pages
> and map them at the next level except one page we try to unmap, then
> it has to replace old block entry with new table entry and perform
> cache flush.
>
> [1] http://lxr.free-electrons.com/source/drivers/iommu/io-pgtable-arm.c
> [2] https://github.com/otyshchenko1/xen/blob/ipmmu_ml/xen/drivers/passthrough/arm/io-pgtable-arm.c
>
>>
>> Cheers,
>>
>> --
>> Julien Grall
>
>
>
> --
> Regards,
>
> Oleksandr Tyshchenko

Julien,

Did you mean how I update IOMMU mapping from P2M code?
If yes, here please:

https://www.mail-archive.com/xen-devel@lists.xen.org/msg100470.html

As IOMMU supports the same page granularity as the CPU on ARM (1G, 2M,
4K) I think that the IOMMU stage-1 page table finally contains the
same superpages as the CPU stage-2 page table.

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 17:16     ` Andrew Cooper
@ 2017-04-03 18:06       ` Julien Grall
  2017-04-03 18:21         ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2017-04-03 18:06 UTC (permalink / raw)
  To: Andrew Cooper, Oleksandr Tyshchenko, xen-devel
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrii Anisov,
	Volodymyr Babchuk, al1img, Artem Mygaiev

Hi Andrew,

On 03/04/17 18:16, Andrew Cooper wrote:
> On 03/04/17 18:02, Julien Grall wrote:
>> Hi Andrew,
>>
>> On 03/04/17 17:42, Andrew Cooper wrote:
>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>>> Hi, all.
>>>>
>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>>> thing. I found out that the superpages were shattered during domain
>>>> life cycle.
>>>> This is the result of mapping of foreign pages, ballooning memory,
>>>> even if domain maps Xen shared pages, etc.
>>>> I don't bother with the memory fragmentation at the moment. But,
>>>> shattering bothers me from the IOMMU point of view.
>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>>> when the DMA transaction isn't in progress
>>>> in order to prevent this race. So, if we have inflight transaction
>>>> from a device when changing IOMMU mapping we might get into trouble.
>>>> Unfortunately, not in all the cases the
>>>> faulting transaction can be restarted. The chance to hit the problem
>>>> increases during shattering.
>>>>
>>>> I did next test:
>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>>> 1G) the IOMMU driver
>>>> takes into account these capabilities when building page tables. As I
>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>>> performs DMA transactions almost all the time.
>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>>> for copying domU images).
>>>> But, I don't see any page faults when the IOMMU page table was built
>>>> by 4K pages only.
>>>>
>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>>> safest way would be to use 4K pages to prevent shattering, so the
>>>> IOMMU shouldn't report superpage capability.
>>>> On the other hand, if we build IOMMU from 4K pages we will have
>>>> performance drop (during building, walking page tables), TLB pressure,
>>>> etc.
>>>> Another possible solution Julien was suggesting is to always
>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>>> prevent shattering effect.
>>>> The discussion was moved to the ML since it seems to be a generic
>>>> issue and the right solution should be think of.
>>>>
>>>> What do you think is the right way to follow? Use 4K pages and don't
>>>> bother with shattering or try to optimize? And if the idea to make
>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>>> Thank you.
>>>
>>> Ballooning and foreign mappings are terrible for trying to retain
>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>>> pages in a useful way to avoid shattering.
>>>
>>> If you care about performance, don't ever balloon.  Foreign mappings in
>>> translated guests should start from the top of RAM, and work upwards.
>>
>> I am not sure to understand this. Can you extend?
>
> I am not sure what is unclear.  Handing random frames of RAM back to the
> hypervisor is what exacerbates host superpage fragmentation, and all
> balloon drivers currently do it.
>
> If you want to avoid host superpage fragmentation, don't use a
> scattergun approach of handing frames back to Xen.  However, because
> even Linux doesn't provide enough hooks into the physical memory
> management logic, the only solution is to not balloon at all, and to use
> already-unoccupied frames for foreign mappings.

Do you have any pointer in the Linux code?

>
>>
>>>
>>>
>>> As for the IOMMU specifically, things are rather easier.  It is the
>>> guests responsibility to ensure that frames offered up for ballooning or
>>> foreign mappings are unused.  Therefore, if anything cares about the
>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>>> guest kernels fault for offering up a frame already in use.
>>>
>>> For the shattering however, It is Xen's responsibility to ensure that
>>> all other mappings stay valid at all points.  The correct way to do this
>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>>> the specific 4K mapping in question, then atomically replace the L2
>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>>> invalidation to remove any cached mappings.
>>>
>>> By following that procedure, all DMA within the 2M region, but not
>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>>> appears from your description that Xen isn't following the procedure.
>>
>> Xen is following what's the ARM ARM is mandating. For shattering page
>> table, we have to follow the break-before-sequence i.e:
>>     - Invalidate the L2 entry
>>     - Flush the TLBs
>>     - Add the new L1 table
>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
>> small window where there are no valid mapping. It is easy to trap data
>> abort from processor and restarting it but not for device memory
>> transactions.
>>
>> Xen by default is sharing stage-2 page tables with between the IOMMU
>> and the MMU. However, from the discussion I had with Oleksandr, they
>> are not sharing page tables and still see the problem. I am not sure
>> how they are updating the page table here. Oleksandr, can you provide
>> more details?
>
> Are you saying that ARM has no way of making atomic updates to the IOMMU
> mappings?  (How do I get access to that document?  Google gets me to
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
> but
> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
> which looks like the document you specified results in 404.)

Below a link, I am not sure why google does not refer it:

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html

>
> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
> control of guest software, and the hypervisor should be able to make
> atomic modifications without guest cooperation.

I think you misread what I meant, IOMMU supports atomic operations. 
However, if you share the page table we have to apply Break-Before-Make 
when shattering superpage. This is mandatory if you want to get Xen 
running on all the micro-architectures.

Some IOMMU may cope with the BBM, some not. I haven't seen any issue so 
far (it does not mean there are none).

The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which 
I never used myself. In his case he needs different page tables because 
the layouts are not the same.

Oleksandr, looking at the code your provided, the superpage are split 
the way Andrew said, i.e:
	1) allocating level 3 table minus the 4K mapping
	2) replace level 2 entry with the new table

Am I right?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 18:06       ` Julien Grall
@ 2017-04-03 18:21         ` Oleksandr Tyshchenko
  2017-04-03 20:33           ` Stefano Stabellini
  0 siblings, 1 reply; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-03 18:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrew Cooper,
	Andrii Anisov, Volodymyr Babchuk, al1img, xen-devel,
	Artem Mygaiev

On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Andrew,
>
>
> On 03/04/17 18:16, Andrew Cooper wrote:
>>
>> On 03/04/17 18:02, Julien Grall wrote:
>>>
>>> Hi Andrew,
>>>
>>> On 03/04/17 17:42, Andrew Cooper wrote:
>>>>
>>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>>>>
>>>>> Hi, all.
>>>>>
>>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>>>> thing. I found out that the superpages were shattered during domain
>>>>> life cycle.
>>>>> This is the result of mapping of foreign pages, ballooning memory,
>>>>> even if domain maps Xen shared pages, etc.
>>>>> I don't bother with the memory fragmentation at the moment. But,
>>>>> shattering bothers me from the IOMMU point of view.
>>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>>>> when the DMA transaction isn't in progress
>>>>> in order to prevent this race. So, if we have inflight transaction
>>>>> from a device when changing IOMMU mapping we might get into trouble.
>>>>> Unfortunately, not in all the cases the
>>>>> faulting transaction can be restarted. The chance to hit the problem
>>>>> increases during shattering.
>>>>>
>>>>> I did next test:
>>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>>>> 1G) the IOMMU driver
>>>>> takes into account these capabilities when building page tables. As I
>>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>>>> performs DMA transactions almost all the time.
>>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>>>> for copying domU images).
>>>>> But, I don't see any page faults when the IOMMU page table was built
>>>>> by 4K pages only.
>>>>>
>>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>>>> safest way would be to use 4K pages to prevent shattering, so the
>>>>> IOMMU shouldn't report superpage capability.
>>>>> On the other hand, if we build IOMMU from 4K pages we will have
>>>>> performance drop (during building, walking page tables), TLB pressure,
>>>>> etc.
>>>>> Another possible solution Julien was suggesting is to always
>>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>>>> prevent shattering effect.
>>>>> The discussion was moved to the ML since it seems to be a generic
>>>>> issue and the right solution should be think of.
>>>>>
>>>>> What do you think is the right way to follow? Use 4K pages and don't
>>>>> bother with shattering or try to optimize? And if the idea to make
>>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>>>> Thank you.
>>>>
>>>>
>>>> Ballooning and foreign mappings are terrible for trying to retain
>>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>>>> pages in a useful way to avoid shattering.
>>>>
>>>> If you care about performance, don't ever balloon.  Foreign mappings in
>>>> translated guests should start from the top of RAM, and work upwards.
>>>
>>>
>>> I am not sure to understand this. Can you extend?
>>
>>
>> I am not sure what is unclear.  Handing random frames of RAM back to the
>> hypervisor is what exacerbates host superpage fragmentation, and all
>> balloon drivers currently do it.
>>
>> If you want to avoid host superpage fragmentation, don't use a
>> scattergun approach of handing frames back to Xen.  However, because
>> even Linux doesn't provide enough hooks into the physical memory
>> management logic, the only solution is to not balloon at all, and to use
>> already-unoccupied frames for foreign mappings.
>
>
> Do you have any pointer in the Linux code?
>
>
>>
>>>
>>>>
>>>>
>>>> As for the IOMMU specifically, things are rather easier.  It is the
>>>> guests responsibility to ensure that frames offered up for ballooning or
>>>> foreign mappings are unused.  Therefore, if anything cares about the
>>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>>>> guest kernels fault for offering up a frame already in use.
>>>>
>>>> For the shattering however, It is Xen's responsibility to ensure that
>>>> all other mappings stay valid at all points.  The correct way to do this
>>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>>>> the specific 4K mapping in question, then atomically replace the L2
>>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>>>> invalidation to remove any cached mappings.
>>>>
>>>> By following that procedure, all DMA within the 2M region, but not
>>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>>>> appears from your description that Xen isn't following the procedure.
>>>
>>>
>>> Xen is following what's the ARM ARM is mandating. For shattering page
>>> table, we have to follow the break-before-sequence i.e:
>>>     - Invalidate the L2 entry
>>>     - Flush the TLBs
>>>     - Add the new L1 table
>>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
>>> small window where there are no valid mapping. It is easy to trap data
>>> abort from processor and restarting it but not for device memory
>>> transactions.
>>>
>>> Xen by default is sharing stage-2 page tables with between the IOMMU
>>> and the MMU. However, from the discussion I had with Oleksandr, they
>>> are not sharing page tables and still see the problem. I am not sure
>>> how they are updating the page table here. Oleksandr, can you provide
>>> more details?
>>
>>
>> Are you saying that ARM has no way of making atomic updates to the IOMMU
>> mappings?  (How do I get access to that document?  Google gets me to
>>
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
>> but
>> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
>> which looks like the document you specified results in 404.)
>
>
> Below a link, I am not sure why google does not refer it:
>
> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html
>
>>
>> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
>> control of guest software, and the hypervisor should be able to make
>> atomic modifications without guest cooperation.
>
>
> I think you misread what I meant, IOMMU supports atomic operations. However,
> if you share the page table we have to apply Break-Before-Make when
> shattering superpage. This is mandatory if you want to get Xen running on
> all the micro-architectures.
>
> Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far
> (it does not mean there are none).
>
> The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I
> never used myself. In his case he needs different page tables because the
> layouts are not the same.
>
> Oleksandr, looking at the code your provided, the superpage are split the
> way Andrew said, i.e:
>         1) allocating level 3 table minus the 4K mapping
>         2) replace level 2 entry with the new table
>
> Am I right?

It seems, yes. Walking the page table down when trying to unmap we
bump into leaf entry (2M mapping),
so 2M-4K mapping are inserted at the next level and after that the
page table entry are replaced.

>
> Cheers,
>
> --
> Julien Grall



-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 16:42 ` Andrew Cooper
  2017-04-03 17:02   ` Julien Grall
@ 2017-04-03 18:26   ` Oleksandr Tyshchenko
  1 sibling, 0 replies; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-03 18:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Julien Grall,
	Andrii Anisov, Volodymyr Babchuk, al1img, xen-devel,
	Artem Mygaiev

Hi, Andrew

On Mon, Apr 3, 2017 at 7:42 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>> Hi, all.
>>
>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>> thing. I found out that the superpages were shattered during domain
>> life cycle.
>> This is the result of mapping of foreign pages, ballooning memory,
>> even if domain maps Xen shared pages, etc.
>> I don't bother with the memory fragmentation at the moment. But,
>> shattering bothers me from the IOMMU point of view.
>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>> when the DMA transaction isn't in progress
>> in order to prevent this race. So, if we have inflight transaction
>> from a device when changing IOMMU mapping we might get into trouble.
>> Unfortunately, not in all the cases the
>> faulting transaction can be restarted. The chance to hit the problem
>> increases during shattering.
>>
>> I did next test:
>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>> What is more, as the IOMMU I am playing with supports superpages (2M,
>> 1G) the IOMMU driver
>> takes into account these capabilities when building page tables. As I
>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>> only. As I am using NFS for both dom0 and domU the ethernet IP
>> performs DMA transactions almost all the time.
>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>> for copying domU images).
>> But, I don't see any page faults when the IOMMU page table was built
>> by 4K pages only.
>>
>> I had a talk with Julien on IIRC and we came to conclusion that the
>> safest way would be to use 4K pages to prevent shattering, so the
>> IOMMU shouldn't report superpage capability.
>> On the other hand, if we build IOMMU from 4K pages we will have
>> performance drop (during building, walking page tables), TLB pressure,
>> etc.
>> Another possible solution Julien was suggesting is to always
>> ballooning with 2M, 1G, and not using 4K. That would help us to
>> prevent shattering effect.
>> The discussion was moved to the ML since it seems to be a generic
>> issue and the right solution should be think of.
>>
>> What do you think is the right way to follow? Use 4K pages and don't
>> bother with shattering or try to optimize? And if the idea to make
>> balloon mechanism smarter makes sense how to teach balloon to do so?
>> Thank you.
>
> Ballooning and foreign mappings are terrible for trying to retain
> superpage mappings.  No OS, not even Linux, can sensibly provide victim
> pages in a useful way to avoid shattering.
>
> If you care about performance, don't ever balloon.  Foreign mappings in
> translated guests should start from the top of RAM, and work upwards.

I understand about disabling ballooning mechanism. I will keep it in mind.

>
>
> As for the IOMMU specifically, things are rather easier.  It is the
> guests responsibility to ensure that frames offered up for ballooning or
> foreign mappings are unused.  Therefore, if anything cares about the
> specific 4K region becoming non-present in the IOMMU mappings, it is the
> guest kernels fault for offering up a frame already in use.
>
> For the shattering however, It is Xen's responsibility to ensure that
> all other mappings stay valid at all points.  The correct way to do this
> is to construct a new L1 table, mirroring the L2 superpage but lacking
> the specific 4K mapping in question, then atomically replace the L2
> superpage entry with the new L1 table, then issue an IOMMU TLB
> invalidation to remove any cached mappings.

I think I do almost the same.

>
> By following that procedure, all DMA within the 2M region, but not
> hitting the 4K frame, won't observe any interim lack of mappings.  It
> appears from your description that Xen isn't following the procedure.
>
> ~Andrew

Thank you.

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 18:21         ` Oleksandr Tyshchenko
@ 2017-04-03 20:33           ` Stefano Stabellini
  2017-04-04  9:28             ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2017-04-03 20:33 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrew Cooper,
	al1img, Andrii Anisov, Volodymyr Babchuk, Julien Grall,
	xen-devel, Artem Mygaiev

On Mon, 3 Apr 2017, Oleksandr Tyshchenko wrote:
> On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@arm.com> wrote:
> > Hi Andrew,
> >
> >
> > On 03/04/17 18:16, Andrew Cooper wrote:
> >>
> >> On 03/04/17 18:02, Julien Grall wrote:
> >>>
> >>> Hi Andrew,
> >>>
> >>> On 03/04/17 17:42, Andrew Cooper wrote:
> >>>>
> >>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
> >>>>>
> >>>>> Hi, all.
> >>>>>
> >>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
> >>>>> thing. I found out that the superpages were shattered during domain
> >>>>> life cycle.
> >>>>> This is the result of mapping of foreign pages, ballooning memory,
> >>>>> even if domain maps Xen shared pages, etc.
> >>>>> I don't bother with the memory fragmentation at the moment. But,
> >>>>> shattering bothers me from the IOMMU point of view.
> >>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
> >>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
> >>>>> when the DMA transaction isn't in progress
> >>>>> in order to prevent this race. So, if we have inflight transaction
> >>>>> from a device when changing IOMMU mapping we might get into trouble.
> >>>>> Unfortunately, not in all the cases the
> >>>>> faulting transaction can be restarted. The chance to hit the problem
> >>>>> increases during shattering.
> >>>>>
> >>>>> I did next test:
> >>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
> >>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
> >>>>> 1G) the IOMMU driver
> >>>>> takes into account these capabilities when building page tables. As I
> >>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
> >>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
> >>>>> performs DMA transactions almost all the time.
> >>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
> >>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
> >>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
> >>>>> for copying domU images).
> >>>>> But, I don't see any page faults when the IOMMU page table was built
> >>>>> by 4K pages only.
> >>>>>
> >>>>> I had a talk with Julien on IIRC and we came to conclusion that the
> >>>>> safest way would be to use 4K pages to prevent shattering, so the
> >>>>> IOMMU shouldn't report superpage capability.
> >>>>> On the other hand, if we build IOMMU from 4K pages we will have
> >>>>> performance drop (during building, walking page tables), TLB pressure,
> >>>>> etc.
> >>>>> Another possible solution Julien was suggesting is to always
> >>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
> >>>>> prevent shattering effect.
> >>>>> The discussion was moved to the ML since it seems to be a generic
> >>>>> issue and the right solution should be think of.
> >>>>>
> >>>>> What do you think is the right way to follow? Use 4K pages and don't
> >>>>> bother with shattering or try to optimize? And if the idea to make
> >>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
> >>>>> Thank you.
> >>>>
> >>>>
> >>>> Ballooning and foreign mappings are terrible for trying to retain
> >>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
> >>>> pages in a useful way to avoid shattering.
> >>>>
> >>>> If you care about performance, don't ever balloon.  Foreign mappings in
> >>>> translated guests should start from the top of RAM, and work upwards.
> >>>
> >>>
> >>> I am not sure to understand this. Can you extend?
> >>
> >>
> >> I am not sure what is unclear.  Handing random frames of RAM back to the
> >> hypervisor is what exacerbates host superpage fragmentation, and all
> >> balloon drivers currently do it.
> >>
> >> If you want to avoid host superpage fragmentation, don't use a
> >> scattergun approach of handing frames back to Xen.  However, because
> >> even Linux doesn't provide enough hooks into the physical memory
> >> management logic, the only solution is to not balloon at all, and to use
> >> already-unoccupied frames for foreign mappings.
> >
> >
> > Do you have any pointer in the Linux code?
> >
> >
> >>
> >>>
> >>>>
> >>>>
> >>>> As for the IOMMU specifically, things are rather easier.  It is the
> >>>> guests responsibility to ensure that frames offered up for ballooning or
> >>>> foreign mappings are unused.  Therefore, if anything cares about the
> >>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
> >>>> guest kernels fault for offering up a frame already in use.
> >>>>
> >>>> For the shattering however, It is Xen's responsibility to ensure that
> >>>> all other mappings stay valid at all points.  The correct way to do this
> >>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
> >>>> the specific 4K mapping in question, then atomically replace the L2
> >>>> superpage entry with the new L1 table, then issue an IOMMU TLB
> >>>> invalidation to remove any cached mappings.
> >>>>
> >>>> By following that procedure, all DMA within the 2M region, but not
> >>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
> >>>> appears from your description that Xen isn't following the procedure.
> >>>
> >>>
> >>> Xen is following what's the ARM ARM is mandating. For shattering page
> >>> table, we have to follow the break-before-sequence i.e:
> >>>     - Invalidate the L2 entry
> >>>     - Flush the TLBs
> >>>     - Add the new L1 table
> >>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
> >>> small window where there are no valid mapping. It is easy to trap data
> >>> abort from processor and restarting it but not for device memory
> >>> transactions.
> >>>
> >>> Xen by default is sharing stage-2 page tables with between the IOMMU
> >>> and the MMU. However, from the discussion I had with Oleksandr, they
> >>> are not sharing page tables and still see the problem. I am not sure
> >>> how they are updating the page table here. Oleksandr, can you provide
> >>> more details?
> >>
> >>
> >> Are you saying that ARM has no way of making atomic updates to the IOMMU
> >> mappings?  (How do I get access to that document?  Google gets me to
> >>
> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
> >> but
> >> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
> >> which looks like the document you specified results in 404.)
> >
> >
> > Below a link, I am not sure why google does not refer it:
> >
> > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html
> >
> >>
> >> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
> >> control of guest software, and the hypervisor should be able to make
> >> atomic modifications without guest cooperation.
> >
> >
> > I think you misread what I meant, IOMMU supports atomic operations. However,
> > if you share the page table we have to apply Break-Before-Make when
> > shattering superpage. This is mandatory if you want to get Xen running on
> > all the micro-architectures.
> >
> > Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far
> > (it does not mean there are none).
> >
> > The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I
> > never used myself. In his case he needs different page tables because the
> > layouts are not the same.
> >
> > Oleksandr, looking at the code your provided, the superpage are split the
> > way Andrew said, i.e:
> >         1) allocating level 3 table minus the 4K mapping
> >         2) replace level 2 entry with the new table
> >
> > Am I right?
> 
> It seems, yes. Walking the page table down when trying to unmap we
> bump into leaf entry (2M mapping),
> so 2M-4K mapping are inserted at the next level and after that the
> page table entry are replaced.

Let me premise that Andrew well pointed out what should be the right
approach on dealing with this issue. However, if we have to use
break-before-make for IOMMU pagetables, then it means we cannot do
atomic updates to IOMMU mappings, like Andrew wrote. Therefore, we
have to make a choice: we either disable superpage IOMMU mappings or
ballooning. I would disable IOMMU superpage mappings, on the ground that
supporting superpage mappings without supporting atomic shattering or
restartable transactions is not really supporting superpage mappings.

However, you are not doing break-before-make here. I would investigate
if break-before-make is required by VMSA-IPMMU. If it is not required,
why are you seeing DMA faults?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-03 20:33           ` Stefano Stabellini
@ 2017-04-04  9:28             ` Oleksandr Tyshchenko
  2017-04-06 18:59               ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-04  9:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Andrushchenko, Andrew Cooper, al1img, Andrii Anisov,
	Volodymyr Babchuk, Julien Grall, xen-devel, Artem Mygaiev

Hi, Stefano.

On Mon, Apr 3, 2017 at 11:33 PM, Stefano Stabellini
<sstabellini@kernel.org> wrote:
> On Mon, 3 Apr 2017, Oleksandr Tyshchenko wrote:
>> On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@arm.com> wrote:
>> > Hi Andrew,
>> >
>> >
>> > On 03/04/17 18:16, Andrew Cooper wrote:
>> >>
>> >> On 03/04/17 18:02, Julien Grall wrote:
>> >>>
>> >>> Hi Andrew,
>> >>>
>> >>> On 03/04/17 17:42, Andrew Cooper wrote:
>> >>>>
>> >>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>> >>>>>
>> >>>>> Hi, all.
>> >>>>>
>> >>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>> >>>>> thing. I found out that the superpages were shattered during domain
>> >>>>> life cycle.
>> >>>>> This is the result of mapping of foreign pages, ballooning memory,
>> >>>>> even if domain maps Xen shared pages, etc.
>> >>>>> I don't bother with the memory fragmentation at the moment. But,
>> >>>>> shattering bothers me from the IOMMU point of view.
>> >>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>> >>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>> >>>>> when the DMA transaction isn't in progress
>> >>>>> in order to prevent this race. So, if we have inflight transaction
>> >>>>> from a device when changing IOMMU mapping we might get into trouble.
>> >>>>> Unfortunately, not in all the cases the
>> >>>>> faulting transaction can be restarted. The chance to hit the problem
>> >>>>> increases during shattering.
>> >>>>>
>> >>>>> I did next test:
>> >>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>> >>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>> >>>>> 1G) the IOMMU driver
>> >>>>> takes into account these capabilities when building page tables. As I
>> >>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>> >>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>> >>>>> performs DMA transactions almost all the time.
>> >>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>> >>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>> >>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>> >>>>> for copying domU images).
>> >>>>> But, I don't see any page faults when the IOMMU page table was built
>> >>>>> by 4K pages only.
>> >>>>>
>> >>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>> >>>>> safest way would be to use 4K pages to prevent shattering, so the
>> >>>>> IOMMU shouldn't report superpage capability.
>> >>>>> On the other hand, if we build IOMMU from 4K pages we will have
>> >>>>> performance drop (during building, walking page tables), TLB pressure,
>> >>>>> etc.
>> >>>>> Another possible solution Julien was suggesting is to always
>> >>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>> >>>>> prevent shattering effect.
>> >>>>> The discussion was moved to the ML since it seems to be a generic
>> >>>>> issue and the right solution should be think of.
>> >>>>>
>> >>>>> What do you think is the right way to follow? Use 4K pages and don't
>> >>>>> bother with shattering or try to optimize? And if the idea to make
>> >>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>> >>>>> Thank you.
>> >>>>
>> >>>>
>> >>>> Ballooning and foreign mappings are terrible for trying to retain
>> >>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>> >>>> pages in a useful way to avoid shattering.
>> >>>>
>> >>>> If you care about performance, don't ever balloon.  Foreign mappings in
>> >>>> translated guests should start from the top of RAM, and work upwards.
>> >>>
>> >>>
>> >>> I am not sure to understand this. Can you extend?
>> >>
>> >>
>> >> I am not sure what is unclear.  Handing random frames of RAM back to the
>> >> hypervisor is what exacerbates host superpage fragmentation, and all
>> >> balloon drivers currently do it.
>> >>
>> >> If you want to avoid host superpage fragmentation, don't use a
>> >> scattergun approach of handing frames back to Xen.  However, because
>> >> even Linux doesn't provide enough hooks into the physical memory
>> >> management logic, the only solution is to not balloon at all, and to use
>> >> already-unoccupied frames for foreign mappings.
>> >
>> >
>> > Do you have any pointer in the Linux code?
>> >
>> >
>> >>
>> >>>
>> >>>>
>> >>>>
>> >>>> As for the IOMMU specifically, things are rather easier.  It is the
>> >>>> guests responsibility to ensure that frames offered up for ballooning or
>> >>>> foreign mappings are unused.  Therefore, if anything cares about the
>> >>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>> >>>> guest kernels fault for offering up a frame already in use.
>> >>>>
>> >>>> For the shattering however, It is Xen's responsibility to ensure that
>> >>>> all other mappings stay valid at all points.  The correct way to do this
>> >>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>> >>>> the specific 4K mapping in question, then atomically replace the L2
>> >>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>> >>>> invalidation to remove any cached mappings.
>> >>>>
>> >>>> By following that procedure, all DMA within the 2M region, but not
>> >>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>> >>>> appears from your description that Xen isn't following the procedure.
>> >>>
>> >>>
>> >>> Xen is following what's the ARM ARM is mandating. For shattering page
>> >>> table, we have to follow the break-before-sequence i.e:
>> >>>     - Invalidate the L2 entry
>> >>>     - Flush the TLBs
>> >>>     - Add the new L1 table
>> >>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
>> >>> small window where there are no valid mapping. It is easy to trap data
>> >>> abort from processor and restarting it but not for device memory
>> >>> transactions.
>> >>>
>> >>> Xen by default is sharing stage-2 page tables with between the IOMMU
>> >>> and the MMU. However, from the discussion I had with Oleksandr, they
>> >>> are not sharing page tables and still see the problem. I am not sure
>> >>> how they are updating the page table here. Oleksandr, can you provide
>> >>> more details?
>> >>
>> >>
>> >> Are you saying that ARM has no way of making atomic updates to the IOMMU
>> >> mappings?  (How do I get access to that document?  Google gets me to
>> >>
>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
>> >> but
>> >> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
>> >> which looks like the document you specified results in 404.)
>> >
>> >
>> > Below a link, I am not sure why google does not refer it:
>> >
>> > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html
>> >
>> >>
>> >> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
>> >> control of guest software, and the hypervisor should be able to make
>> >> atomic modifications without guest cooperation.
>> >
>> >
>> > I think you misread what I meant, IOMMU supports atomic operations. However,
>> > if you share the page table we have to apply Break-Before-Make when
>> > shattering superpage. This is mandatory if you want to get Xen running on
>> > all the micro-architectures.
>> >
>> > Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far
>> > (it does not mean there are none).
>> >
>> > The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I
>> > never used myself. In his case he needs different page tables because the
>> > layouts are not the same.
>> >
>> > Oleksandr, looking at the code your provided, the superpage are split the
>> > way Andrew said, i.e:
>> >         1) allocating level 3 table minus the 4K mapping
>> >         2) replace level 2 entry with the new table
>> >
>> > Am I right?
>>
>> It seems, yes. Walking the page table down when trying to unmap we
>> bump into leaf entry (2M mapping),
>> so 2M-4K mapping are inserted at the next level and after that the
>> page table entry are replaced.
>
> Let me premise that Andrew well pointed out what should be the right
> approach on dealing with this issue. However, if we have to use
> break-before-make for IOMMU pagetables, then it means we cannot do
> atomic updates to IOMMU mappings, like Andrew wrote. Therefore, we
> have to make a choice: we either disable superpage IOMMU mappings or
> ballooning. I would disable IOMMU superpage mappings, on the ground that
> supporting superpage mappings without supporting atomic shattering or
> restartable transactions is not really supporting superpage mappings.

Sounds reasonable. As Julien mentioned too "using 4K pages only" is
the safest way.
At least until I will find a reason why DMA faults take place despite
the fast that shattering is
doing in an atomic way.

>
> However, you are not doing break-before-make here. I would investigate
> if break-before-make is required by VMSA-IPMMU. If it is not required,
> why are you seeing DMA faults?

Unfortunally, I can't say about break-before-make sequence for IPMMU
at the moment.
TRM says nothing about it.

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-04  9:28             ` Oleksandr Tyshchenko
@ 2017-04-06 18:59               ` Oleksandr Tyshchenko
  2017-04-06 19:22                 ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-06 18:59 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Andrushchenko, Andrew Cooper, al1img, Andrii Anisov,
	Volodymyr Babchuk, Julien Grall, xen-devel, Artem Mygaiev

On Tue, Apr 4, 2017 at 12:28 PM, Oleksandr Tyshchenko
<olekstysh@gmail.com> wrote:
> Hi, Stefano.
>
> On Mon, Apr 3, 2017 at 11:33 PM, Stefano Stabellini
> <sstabellini@kernel.org> wrote:
>> On Mon, 3 Apr 2017, Oleksandr Tyshchenko wrote:
>>> On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@arm.com> wrote:
>>> > Hi Andrew,
>>> >
>>> >
>>> > On 03/04/17 18:16, Andrew Cooper wrote:
>>> >>
>>> >> On 03/04/17 18:02, Julien Grall wrote:
>>> >>>
>>> >>> Hi Andrew,
>>> >>>
>>> >>> On 03/04/17 17:42, Andrew Cooper wrote:
>>> >>>>
>>> >>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>> >>>>>
>>> >>>>> Hi, all.
>>> >>>>>
>>> >>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>> >>>>> thing. I found out that the superpages were shattered during domain
>>> >>>>> life cycle.
>>> >>>>> This is the result of mapping of foreign pages, ballooning memory,
>>> >>>>> even if domain maps Xen shared pages, etc.
>>> >>>>> I don't bother with the memory fragmentation at the moment. But,
>>> >>>>> shattering bothers me from the IOMMU point of view.
>>> >>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>> >>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>> >>>>> when the DMA transaction isn't in progress
>>> >>>>> in order to prevent this race. So, if we have inflight transaction
>>> >>>>> from a device when changing IOMMU mapping we might get into trouble.
>>> >>>>> Unfortunately, not in all the cases the
>>> >>>>> faulting transaction can be restarted. The chance to hit the problem
>>> >>>>> increases during shattering.
>>> >>>>>
>>> >>>>> I did next test:
>>> >>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>> >>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>> >>>>> 1G) the IOMMU driver
>>> >>>>> takes into account these capabilities when building page tables. As I
>>> >>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>> >>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>> >>>>> performs DMA transactions almost all the time.
>>> >>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>> >>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>> >>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>> >>>>> for copying domU images).
>>> >>>>> But, I don't see any page faults when the IOMMU page table was built
>>> >>>>> by 4K pages only.
>>> >>>>>
>>> >>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>> >>>>> safest way would be to use 4K pages to prevent shattering, so the
>>> >>>>> IOMMU shouldn't report superpage capability.
>>> >>>>> On the other hand, if we build IOMMU from 4K pages we will have
>>> >>>>> performance drop (during building, walking page tables), TLB pressure,
>>> >>>>> etc.
>>> >>>>> Another possible solution Julien was suggesting is to always
>>> >>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>> >>>>> prevent shattering effect.
>>> >>>>> The discussion was moved to the ML since it seems to be a generic
>>> >>>>> issue and the right solution should be think of.
>>> >>>>>
>>> >>>>> What do you think is the right way to follow? Use 4K pages and don't
>>> >>>>> bother with shattering or try to optimize? And if the idea to make
>>> >>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>> >>>>> Thank you.
>>> >>>>
>>> >>>>
>>> >>>> Ballooning and foreign mappings are terrible for trying to retain
>>> >>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>>> >>>> pages in a useful way to avoid shattering.
>>> >>>>
>>> >>>> If you care about performance, don't ever balloon.  Foreign mappings in
>>> >>>> translated guests should start from the top of RAM, and work upwards.
>>> >>>
>>> >>>
>>> >>> I am not sure to understand this. Can you extend?
>>> >>
>>> >>
>>> >> I am not sure what is unclear.  Handing random frames of RAM back to the
>>> >> hypervisor is what exacerbates host superpage fragmentation, and all
>>> >> balloon drivers currently do it.
>>> >>
>>> >> If you want to avoid host superpage fragmentation, don't use a
>>> >> scattergun approach of handing frames back to Xen.  However, because
>>> >> even Linux doesn't provide enough hooks into the physical memory
>>> >> management logic, the only solution is to not balloon at all, and to use
>>> >> already-unoccupied frames for foreign mappings.
>>> >
>>> >
>>> > Do you have any pointer in the Linux code?
>>> >
>>> >
>>> >>
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> As for the IOMMU specifically, things are rather easier.  It is the
>>> >>>> guests responsibility to ensure that frames offered up for ballooning or
>>> >>>> foreign mappings are unused.  Therefore, if anything cares about the
>>> >>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>>> >>>> guest kernels fault for offering up a frame already in use.
>>> >>>>
>>> >>>> For the shattering however, It is Xen's responsibility to ensure that
>>> >>>> all other mappings stay valid at all points.  The correct way to do this
>>> >>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>>> >>>> the specific 4K mapping in question, then atomically replace the L2
>>> >>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>>> >>>> invalidation to remove any cached mappings.
>>> >>>>
>>> >>>> By following that procedure, all DMA within the 2M region, but not
>>> >>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>>> >>>> appears from your description that Xen isn't following the procedure.
>>> >>>
>>> >>>
>>> >>> Xen is following what's the ARM ARM is mandating. For shattering page
>>> >>> table, we have to follow the break-before-sequence i.e:
>>> >>>     - Invalidate the L2 entry
>>> >>>     - Flush the TLBs
>>> >>>     - Add the new L1 table
>>> >>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
>>> >>> small window where there are no valid mapping. It is easy to trap data
>>> >>> abort from processor and restarting it but not for device memory
>>> >>> transactions.
>>> >>>
>>> >>> Xen by default is sharing stage-2 page tables with between the IOMMU
>>> >>> and the MMU. However, from the discussion I had with Oleksandr, they
>>> >>> are not sharing page tables and still see the problem. I am not sure
>>> >>> how they are updating the page table here. Oleksandr, can you provide
>>> >>> more details?
>>> >>
>>> >>
>>> >> Are you saying that ARM has no way of making atomic updates to the IOMMU
>>> >> mappings?  (How do I get access to that document?  Google gets me to
>>> >>
>>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
>>> >> but
>>> >> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
>>> >> which looks like the document you specified results in 404.)
>>> >
>>> >
>>> > Below a link, I am not sure why google does not refer it:
>>> >
>>> > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html
>>> >
>>> >>
>>> >> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
>>> >> control of guest software, and the hypervisor should be able to make
>>> >> atomic modifications without guest cooperation.
>>> >
>>> >
>>> > I think you misread what I meant, IOMMU supports atomic operations. However,
>>> > if you share the page table we have to apply Break-Before-Make when
>>> > shattering superpage. This is mandatory if you want to get Xen running on
>>> > all the micro-architectures.
>>> >
>>> > Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far
>>> > (it does not mean there are none).
>>> >
>>> > The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I
>>> > never used myself. In his case he needs different page tables because the
>>> > layouts are not the same.
>>> >
>>> > Oleksandr, looking at the code your provided, the superpage are split the
>>> > way Andrew said, i.e:
>>> >         1) allocating level 3 table minus the 4K mapping
>>> >         2) replace level 2 entry with the new table
>>> >
>>> > Am I right?
>>>
>>> It seems, yes. Walking the page table down when trying to unmap we
>>> bump into leaf entry (2M mapping),
>>> so 2M-4K mapping are inserted at the next level and after that the
>>> page table entry are replaced.
>>
>> Let me premise that Andrew well pointed out what should be the right
>> approach on dealing with this issue. However, if we have to use
>> break-before-make for IOMMU pagetables, then it means we cannot do
>> atomic updates to IOMMU mappings, like Andrew wrote. Therefore, we
>> have to make a choice: we either disable superpage IOMMU mappings or
>> ballooning. I would disable IOMMU superpage mappings, on the ground that
>> supporting superpage mappings without supporting atomic shattering or
>> restartable transactions is not really supporting superpage mappings.
>
> Sounds reasonable. As Julien mentioned too "using 4K pages only" is
> the safest way.
> At least until I will find a reason why DMA faults take place despite
> the fast that shattering is
> doing in an atomic way.
>
>>
>> However, you are not doing break-before-make here. I would investigate
>> if break-before-make is required by VMSA-IPMMU. If it is not required,
>> why are you seeing DMA faults?
>
> Unfortunally, I can't say about break-before-make sequence for IPMMU
> at the moment.
> TRM says nothing about it.
>
> --
> Regards,
>
> Oleksandr Tyshchenko

Hi, guys.

Seems, it was only my fault. The issue wasn't exactly in shattering,
the shattering just increased probability for IOMMU page faults to
occur. I didn't do clean_dcache for the page table entry after
updating it. So, with clean_dcache I don't see page faults when
shattering superpages!
BTW, can I configure domheap pages (which I am using for the IOMMU
page table) to be uncached? What do you think?

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-06 18:59               ` Oleksandr Tyshchenko
@ 2017-04-06 19:22                 ` Julien Grall
  2017-04-06 20:36                   ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2017-04-06 19:22 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Stefano Stabellini
  Cc: Oleksandr Andrushchenko, Andrew Cooper, Andrii Anisov,
	Volodymyr Babchuk, al1img, xen-devel, Artem Mygaiev

Hi Oleksandr,

On 04/06/2017 07:59 PM, Oleksandr Tyshchenko wrote:
> Hi, guys.
>
> Seems, it was only my fault. The issue wasn't exactly in shattering,
> the shattering just increased probability for IOMMU page faults to
> occur. I didn't do clean_dcache for the page table entry after
> updating it. So, with clean_dcache I don't see page faults when
> shattering superpages!
> BTW, can I configure domheap pages (which I am using for the IOMMU
> page table) to be uncached? What do you think?

I am not sure if you suggest to configure all the domheap pages to be 
uncached or only a limited number.

In the case where you switch all domheap to uncached, you will have some 
trouble when copy data to/from the guest in hypercall because of 
mismatch attribute.

In the case where you only configure some of the domheap pages, you will 
remove the advantage of 1GB mapping of the domheap in the hypervisor 
table and will increase the memory usage of Xen. Also, you will have to 
be careful when switching back and forth the domheap memory attribute 
between cache and uncache.

If the IOMMU is not able to snoop the cache, then the way forward is to 
use a clean_dcache operation after writing a page table entry. This is 
how we deal in the p2m code.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-06 19:22                 ` Julien Grall
@ 2017-04-06 20:36                   ` Oleksandr Tyshchenko
  2017-04-06 20:39                     ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Oleksandr Tyshchenko @ 2017-04-06 20:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrew Cooper,
	Andrii Anisov, Volodymyr Babchuk, al1img, xen-devel,
	Artem Mygaiev


[-- Attachment #1.1: Type: text/plain, Size: 1840 bytes --]

6 апр. 2017 г. 22:22 пользователь "Julien Grall" <julien.grall@arm.com>
написал:

Hi Oleksandr,

Hi Julien.



On 04/06/2017 07:59 PM, Oleksandr Tyshchenko wrote:

> Hi, guys.
>
> Seems, it was only my fault. The issue wasn't exactly in shattering,
> the shattering just increased probability for IOMMU page faults to
> occur. I didn't do clean_dcache for the page table entry after
> updating it. So, with clean_dcache I don't see page faults when
> shattering superpages!
> BTW, can I configure domheap pages (which I am using for the IOMMU
> page table) to be uncached? What do you think?
>

I am not sure if you suggest to configure all the domheap pages to be
uncached or only a limited number.

I meant a limited number. Only pages the IOMMU page table was built from.


In the case where you switch all domheap to uncached, you will have some
trouble when copy data to/from the guest in hypercall because of mismatch
attribute.

In the case where you only configure some of the domheap pages, you will
remove the advantage of 1GB mapping of the domheap in the hypervisor table
and will increase the memory usage of Xen. Also, you will have to be
careful when switching back and forth the domheap memory attribute between
cache and uncache.

I got it. For me this means that performing cache flush after updating page
table entry is the safest and easiest way.


If the IOMMU is not able to snoop the cache, then the way forward is to use
a clean_dcache operation after writing a page table entry. This is how we
deal in the p2m code.

Agree.

As we update page table in an atomic way (no BBM sequence) and the reason
caused page faults was found, I think that the IPMMU driver can declare
superpage capability?

Thank you.


Cheers,

-- 
Julien Grall

[-- Attachment #1.2: Type: text/html, Size: 3313 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Shattering superpages impact on IOMMU in Xen
  2017-04-06 20:36                   ` Oleksandr Tyshchenko
@ 2017-04-06 20:39                     ` Julien Grall
  0 siblings, 0 replies; 15+ messages in thread
From: Julien Grall @ 2017-04-06 20:39 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Oleksandr Andrushchenko, Andrew Cooper,
	Andrii Anisov, Volodymyr Babchuk, al1img, xen-devel,
	Artem Mygaiev

Hi Oleksandr,

Please try to configure the mail client to quote with '>'. Using tab for 
quoting make it quite difficult to follow.

On 04/06/2017 09:36 PM, Oleksandr Tyshchenko wrote:
> 6 апр. 2017 г. 22:22 пользователь "Julien Grall" <julien.grall@arm.com
> <mailto:julien.grall@arm.com>> написал:
>     If the IOMMU is not able to snoop the cache, then the way forward is
>     to use a clean_dcache operation after writing a page table entry.
>     This is how we deal in the p2m code.
>
> Agree.
>
> As we update page table in an atomic way (no BBM sequence) and the
> reason caused page faults was found, I think that the IPMMU driver can
> declare superpage capability?

Yes for the IPMMU driver. Although we would have to do some things for 
the SMMU driver.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-04-06 20:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03 16:24 Shattering superpages impact on IOMMU in Xen Oleksandr Tyshchenko
2017-04-03 16:42 ` Andrew Cooper
2017-04-03 17:02   ` Julien Grall
2017-04-03 17:16     ` Andrew Cooper
2017-04-03 18:06       ` Julien Grall
2017-04-03 18:21         ` Oleksandr Tyshchenko
2017-04-03 20:33           ` Stefano Stabellini
2017-04-04  9:28             ` Oleksandr Tyshchenko
2017-04-06 18:59               ` Oleksandr Tyshchenko
2017-04-06 19:22                 ` Julien Grall
2017-04-06 20:36                   ` Oleksandr Tyshchenko
2017-04-06 20:39                     ` Julien Grall
2017-04-03 17:39     ` Oleksandr Tyshchenko
2017-04-03 17:53       ` Oleksandr Tyshchenko
2017-04-03 18:26   ` Oleksandr Tyshchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.