All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] virtio-iommu version 0.5
@ 2017-10-23  9:32 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23  9:32 UTC (permalink / raw)
  To: iommu, kvm, virtualization, virtio-dev
  Cc: will.deacon, robin.murphy, lorenzo.pieralisi, mst, jasowang,
	marc.zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj

This is version 0.5 of the virtio-iommu specification, the paravirtualized
IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
Please find the specification, LaTeX sources and pdf, at:
git://linux-arm.org/virtio-iommu.git viommu/v0.5
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf

A detailed changelog since v0.4 follows. You can find the pdf diff at:
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf

* Add an event virtqueue for the device to report translation faults to
  the driver. For the moment only unrecoverable faults are available but
  future versions will extend it.
* Simplify PROBE request by removing the ack part, and flattening RESV
  properties.
* Rename "address space" to "domain". The change might seem futile but
  allows to introduce PASIDs and other features cleanly in the next
  versions. In the same vein, the few remaining "device" occurrences were
  replaced by "endpoint", to avoid any confusion with "the device"
  referring to the virtio device across the document.
* Add implementation notes for RESV_MEM properties.
* Update ACPI table definition.
* Fix typos and clarify a few things.

I will publish the Linux driver for v0.5 shortly. Then for next versions
I'll focus on optimizations and adding support for hardware acceleration.

Existing implementations are simple and can certainly be optimized, even
without architectural changes. But the architecture itself can also be
improved in a number of ways. Currently it is designed to work well with
VFIO. However, having explicit MAP requests is less efficient* than page
tables for emulated and PV endpoints, and the current architecture doesn't
address this. Binding page tables is an obvious way to improve throughput
in that case, but we can explore cleverer (and possibly simpler) ways to
do it.

So first we'll work on getting the base device and driver merged, then
we'll analyze and compare several ideas for improving performance.

Thanks,
Jean

* I have yet to study this behaviour, and would be interested in any
prior art on the subject of analyzing devices DMA patterns (virtio and
others)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [virtio-dev] [RFC] virtio-iommu version 0.5
@ 2017-10-23  9:32 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23  9:32 UTC (permalink / raw)
  To: iommu, kvm, virtualization, virtio-dev
  Cc: will.deacon, robin.murphy, lorenzo.pieralisi, mst, jasowang,
	marc.zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj

This is version 0.5 of the virtio-iommu specification, the paravirtualized
IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
Please find the specification, LaTeX sources and pdf, at:
git://linux-arm.org/virtio-iommu.git viommu/v0.5
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf

A detailed changelog since v0.4 follows. You can find the pdf diff at:
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf

* Add an event virtqueue for the device to report translation faults to
  the driver. For the moment only unrecoverable faults are available but
  future versions will extend it.
* Simplify PROBE request by removing the ack part, and flattening RESV
  properties.
* Rename "address space" to "domain". The change might seem futile but
  allows to introduce PASIDs and other features cleanly in the next
  versions. In the same vein, the few remaining "device" occurrences were
  replaced by "endpoint", to avoid any confusion with "the device"
  referring to the virtio device across the document.
* Add implementation notes for RESV_MEM properties.
* Update ACPI table definition.
* Fix typos and clarify a few things.

I will publish the Linux driver for v0.5 shortly. Then for next versions
I'll focus on optimizations and adding support for hardware acceleration.

Existing implementations are simple and can certainly be optimized, even
without architectural changes. But the architecture itself can also be
improved in a number of ways. Currently it is designed to work well with
VFIO. However, having explicit MAP requests is less efficient* than page
tables for emulated and PV endpoints, and the current architecture doesn't
address this. Binding page tables is an obvious way to improve throughput
in that case, but we can explore cleverer (and possibly simpler) ways to
do it.

So first we'll work on getting the base device and driver merged, then
we'll analyze and compare several ideas for improving performance.

Thanks,
Jean

* I have yet to study this behaviour, and would be interested in any
prior art on the subject of analyzing devices DMA patterns (virtio and
others)

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
       [not found] ` <20171023093241.20113-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
@ 2017-10-24  6:27   ` Linu Cherian
  2017-10-24  8:37       ` [virtio-dev] " Jean-Philippe Brucker
  0 siblings, 1 reply; 25+ messages in thread
From: Linu Cherian @ 2017-10-24  6:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	kvm-u79uwXL29TY76Z2rM5mHXA, mst-H+wXaHxf7aLQT0dZR+AlfA,
	marc.zyngier-5wv7dgnIgG8, jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8,
	Jayachandran.Nair-YGCgFSpz5w/QT0dZR+AlfA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA,
	linu.cherian-YGCgFSpz5w/QT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Jean,

On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> Please find the specification, LaTeX sources and pdf, at:
> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> 
> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> 
> * Add an event virtqueue for the device to report translation faults to
>   the driver. For the moment only unrecoverable faults are available but
>   future versions will extend it.
> * Simplify PROBE request by removing the ack part, and flattening RESV
>   properties.
> * Rename "address space" to "domain". The change might seem futile but
>   allows to introduce PASIDs and other features cleanly in the next
>   versions. In the same vein, the few remaining "device" occurrences were
>   replaced by "endpoint", to avoid any confusion with "the device"
>   referring to the virtio device across the document.
> * Add implementation notes for RESV_MEM properties.
> * Update ACPI table definition.
> * Fix typos and clarify a few things.
> 
> I will publish the Linux driver for v0.5 shortly. Then for next versions
> I'll focus on optimizations and adding support for hardware acceleration.
> 
> Existing implementations are simple and can certainly be optimized, even
> without architectural changes. But the architecture itself can also be
> improved in a number of ways. Currently it is designed to work well with
> VFIO. However, having explicit MAP requests is less efficient* than page
> tables for emulated and PV endpoints, and the current architecture doesn't
> address this. Binding page tables is an obvious way to improve throughput
> in that case, but we can explore cleverer (and possibly simpler) ways to
> do it.
> 
> So first we'll work on getting the base device and driver merged, then
> we'll analyze and compare several ideas for improving performance.
> 
> Thanks,
> Jean
> 
> * I have yet to study this behaviour, and would be interested in any
> prior art on the subject of analyzing devices DMA patterns (virtio and
> others)


>From the spec,
Under future extensions.

"Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"

Had few questions on this.

1. Did you mean SVM support for vfio-pci devices attached to guest processes here.

2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
   driver need to create stage 1 page table as required by hardware which is not the case now. 
   CMIIW. 


-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-23  9:32 ` [virtio-dev] " Jean-Philippe Brucker
  (?)
@ 2017-10-24  6:27 ` Linu Cherian
  -1 siblings, 0 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-24  6:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, lorenzo.pieralisi, ashok.raj, kvm, mst, marc.zyngier,
	will.deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, linu.cherian, robin.murphy, eric.auger.pro

Hi Jean,

On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> Please find the specification, LaTeX sources and pdf, at:
> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> 
> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> 
> * Add an event virtqueue for the device to report translation faults to
>   the driver. For the moment only unrecoverable faults are available but
>   future versions will extend it.
> * Simplify PROBE request by removing the ack part, and flattening RESV
>   properties.
> * Rename "address space" to "domain". The change might seem futile but
>   allows to introduce PASIDs and other features cleanly in the next
>   versions. In the same vein, the few remaining "device" occurrences were
>   replaced by "endpoint", to avoid any confusion with "the device"
>   referring to the virtio device across the document.
> * Add implementation notes for RESV_MEM properties.
> * Update ACPI table definition.
> * Fix typos and clarify a few things.
> 
> I will publish the Linux driver for v0.5 shortly. Then for next versions
> I'll focus on optimizations and adding support for hardware acceleration.
> 
> Existing implementations are simple and can certainly be optimized, even
> without architectural changes. But the architecture itself can also be
> improved in a number of ways. Currently it is designed to work well with
> VFIO. However, having explicit MAP requests is less efficient* than page
> tables for emulated and PV endpoints, and the current architecture doesn't
> address this. Binding page tables is an obvious way to improve throughput
> in that case, but we can explore cleverer (and possibly simpler) ways to
> do it.
> 
> So first we'll work on getting the base device and driver merged, then
> we'll analyze and compare several ideas for improving performance.
> 
> Thanks,
> Jean
> 
> * I have yet to study this behaviour, and would be interested in any
> prior art on the subject of analyzing devices DMA patterns (virtio and
> others)


From the spec,
Under future extensions.

"Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"

Had few questions on this.

1. Did you mean SVM support for vfio-pci devices attached to guest processes here.

2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
   driver need to create stage 1 page table as required by hardware which is not the case now. 
   CMIIW. 


-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-24  6:27   ` Linu Cherian
@ 2017-10-24  8:37       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-24  8:37 UTC (permalink / raw)
  To: Linu Cherian
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, linu.cherian, Robin Murphy, eric.auger.pro

Hi Linu,

On 24/10/17 07:27, Linu Cherian wrote:
> Hi Jean,
> 
> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
>> Please find the specification, LaTeX sources and pdf, at:
>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
>>
>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
>>
>> * Add an event virtqueue for the device to report translation faults to
>>   the driver. For the moment only unrecoverable faults are available but
>>   future versions will extend it.
>> * Simplify PROBE request by removing the ack part, and flattening RESV
>>   properties.
>> * Rename "address space" to "domain". The change might seem futile but
>>   allows to introduce PASIDs and other features cleanly in the next
>>   versions. In the same vein, the few remaining "device" occurrences were
>>   replaced by "endpoint", to avoid any confusion with "the device"
>>   referring to the virtio device across the document.
>> * Add implementation notes for RESV_MEM properties.
>> * Update ACPI table definition.
>> * Fix typos and clarify a few things.
>>
>> I will publish the Linux driver for v0.5 shortly. Then for next versions
>> I'll focus on optimizations and adding support for hardware acceleration.
>>
>> Existing implementations are simple and can certainly be optimized, even
>> without architectural changes. But the architecture itself can also be
>> improved in a number of ways. Currently it is designed to work well with
>> VFIO. However, having explicit MAP requests is less efficient* than page
>> tables for emulated and PV endpoints, and the current architecture doesn't
>> address this. Binding page tables is an obvious way to improve throughput
>> in that case, but we can explore cleverer (and possibly simpler) ways to
>> do it.
>>
>> So first we'll work on getting the base device and driver merged, then
>> we'll analyze and compare several ideas for improving performance.
>>
>> Thanks,
>> Jean
>>
>> * I have yet to study this behaviour, and would be interested in any
>> prior art on the subject of analyzing devices DMA patterns (virtio and
>> others)
> 
> 
> From the spec,
> Under future extensions.
> 
> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> 
> Had few questions on this.
> 
> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.

Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
and adding requests in pretty much the same format to virtio-iommu.

> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
>    driver need to create stage 1 page table as required by hardware which is not the case now. 
>    CMIIW. 

The virtio-iommu device advertises which PASID/page table format is
supported by the host (obtained via sysfs and communicated in the PROBE
request), then the guest binds page tables or PASID tables to a domain and
populates it. Binding page tables alone is easy because we already have
the required drivers in the guest (io-pgtable or arch/* for SVM) and code
in the host to manage PASID tables. But since the PASID table pointer is
translated by stage-2, it would requires a little more work in the host
for obtaining GPA buffers from the guest on demand. In addition the BIND
ioctl is different from the one used by VT-d, so this solution didn't get
much appreciation.

The alternative is to bind PASID tables. It requires to factor the guest
PASID handling code into a library, which is difficult for SMMU. Luckily
I'm still working on adding PASID code for SMMUv3, so extracting it out of
the driver isn't a big overhead. The good thing about this solution is
that it reuses any specification work done for VFIO (and vice versa) and
any host driver change made for vSMMU/VT-d emulations.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [virtio-dev] Re: [RFC] virtio-iommu version 0.5
@ 2017-10-24  8:37       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-24  8:37 UTC (permalink / raw)
  To: Linu Cherian
  Cc: iommu, kvm, virtualization, virtio-dev, Will Deacon,
	Robin Murphy, Lorenzo Pieralisi, mst, jasowang, Marc Zyngier,
	eric.auger, eric.auger.pro, bharat.bhushan, peterx, kevin.tian,
	Jayachandran.Nair, ashok.raj, linu.cherian, sunil.goutham

Hi Linu,

On 24/10/17 07:27, Linu Cherian wrote:
> Hi Jean,
> 
> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
>> Please find the specification, LaTeX sources and pdf, at:
>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
>>
>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
>>
>> * Add an event virtqueue for the device to report translation faults to
>>   the driver. For the moment only unrecoverable faults are available but
>>   future versions will extend it.
>> * Simplify PROBE request by removing the ack part, and flattening RESV
>>   properties.
>> * Rename "address space" to "domain". The change might seem futile but
>>   allows to introduce PASIDs and other features cleanly in the next
>>   versions. In the same vein, the few remaining "device" occurrences were
>>   replaced by "endpoint", to avoid any confusion with "the device"
>>   referring to the virtio device across the document.
>> * Add implementation notes for RESV_MEM properties.
>> * Update ACPI table definition.
>> * Fix typos and clarify a few things.
>>
>> I will publish the Linux driver for v0.5 shortly. Then for next versions
>> I'll focus on optimizations and adding support for hardware acceleration.
>>
>> Existing implementations are simple and can certainly be optimized, even
>> without architectural changes. But the architecture itself can also be
>> improved in a number of ways. Currently it is designed to work well with
>> VFIO. However, having explicit MAP requests is less efficient* than page
>> tables for emulated and PV endpoints, and the current architecture doesn't
>> address this. Binding page tables is an obvious way to improve throughput
>> in that case, but we can explore cleverer (and possibly simpler) ways to
>> do it.
>>
>> So first we'll work on getting the base device and driver merged, then
>> we'll analyze and compare several ideas for improving performance.
>>
>> Thanks,
>> Jean
>>
>> * I have yet to study this behaviour, and would be interested in any
>> prior art on the subject of analyzing devices DMA patterns (virtio and
>> others)
> 
> 
> From the spec,
> Under future extensions.
> 
> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> 
> Had few questions on this.
> 
> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.

Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
and adding requests in pretty much the same format to virtio-iommu.

> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
>    driver need to create stage 1 page table as required by hardware which is not the case now. 
>    CMIIW. 

The virtio-iommu device advertises which PASID/page table format is
supported by the host (obtained via sysfs and communicated in the PROBE
request), then the guest binds page tables or PASID tables to a domain and
populates it. Binding page tables alone is easy because we already have
the required drivers in the guest (io-pgtable or arch/* for SVM) and code
in the host to manage PASID tables. But since the PASID table pointer is
translated by stage-2, it would requires a little more work in the host
for obtaining GPA buffers from the guest on demand. In addition the BIND
ioctl is different from the one used by VT-d, so this solution didn't get
much appreciation.

The alternative is to bind PASID tables. It requires to factor the guest
PASID handling code into a library, which is difficult for SMMU. Luckily
I'm still working on adding PASID code for SMMUv3, so extracting it out of
the driver isn't a big overhead. The good thing about this solution is
that it reuses any specification work done for VFIO (and vice versa) and
any host driver change made for vSMMU/VT-d emulations.

Thanks,
Jean

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-24  8:37       ` [virtio-dev] " Jean-Philippe Brucker
  (?)
@ 2017-10-24 16:58       ` Linu Cherian
  2017-10-25  7:07         ` Linu Cherian
  2017-10-25  7:07         ` Linu Cherian
  -1 siblings, 2 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-24 16:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian@intel.com

Hi Jean,
Thanks for your reply.

On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> Hi Linu,
> 
> On 24/10/17 07:27, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >> Please find the specification, LaTeX sources and pdf, at:
> >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>
> >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>
> >> * Add an event virtqueue for the device to report translation faults to
> >>   the driver. For the moment only unrecoverable faults are available but
> >>   future versions will extend it.
> >> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>   properties.
> >> * Rename "address space" to "domain". The change might seem futile but
> >>   allows to introduce PASIDs and other features cleanly in the next
> >>   versions. In the same vein, the few remaining "device" occurrences were
> >>   replaced by "endpoint", to avoid any confusion with "the device"
> >>   referring to the virtio device across the document.
> >> * Add implementation notes for RESV_MEM properties.
> >> * Update ACPI table definition.
> >> * Fix typos and clarify a few things.
> >>
> >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >> I'll focus on optimizations and adding support for hardware acceleration.
> >>
> >> Existing implementations are simple and can certainly be optimized, even
> >> without architectural changes. But the architecture itself can also be
> >> improved in a number of ways. Currently it is designed to work well with
> >> VFIO. However, having explicit MAP requests is less efficient* than page
> >> tables for emulated and PV endpoints, and the current architecture doesn't
> >> address this. Binding page tables is an obvious way to improve throughput
> >> in that case, but we can explore cleverer (and possibly simpler) ways to
> >> do it.
> >>
> >> So first we'll work on getting the base device and driver merged, then
> >> we'll analyze and compare several ideas for improving performance.
> >>
> >> Thanks,
> >> Jean
> >>
> >> * I have yet to study this behaviour, and would be interested in any
> >> prior art on the subject of analyzing devices DMA patterns (virtio and
> >> others)
> > 
> > 
> > From the spec,
> > Under future extensions.
> > 
> > "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> > 
> > Had few questions on this.
> > 
> > 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> 
> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> and adding requests in pretty much the same format to virtio-iommu.
> 
> > 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >    CMIIW. 
> 
> The virtio-iommu device advertises which PASID/page table format is
> supported by the host (obtained via sysfs and communicated in the PROBE
> request), then the guest binds page tables or PASID tables to a domain and
> populates it. Binding page tables alone is easy because we already have
> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> in the host to manage PASID tables. But since the PASID table pointer is
> translated by stage-2, it would requires a little more work in the host
> for obtaining GPA buffers from the guest on demand.
  Is this for resolving PCI PRI requests ?. 
  IIUC, PCI PRI requests for devices owned by guest need to be resolved
  by guest itself.


 In addition the BIND
> ioctl is different from the one used by VT-d, so this solution didn't get
> much appreciation.

Could you please share the links on this ?

> 
> The alternative is to bind PASID tables. 

Sorry, i didnt get the difference here.

It requires to factor the guest
> PASID handling code into a library, which is difficult for SMMU. Luckily
> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> the driver isn't a big overhead. The good thing about this solution is
> that it reuses any specification work done for VFIO (and vice versa) and
> any host driver change made for vSMMU/VT-d emulations.
> 
> Thanks,
> Jean

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-24  8:37       ` [virtio-dev] " Jean-Philippe Brucker
  (?)
  (?)
@ 2017-10-24 16:58       ` Linu Cherian
  -1 siblings, 0 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-24 16:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, Linu Cherian, Robin Murphy, eric.auger.pro

Hi Jean,
Thanks for your reply.

On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> Hi Linu,
> 
> On 24/10/17 07:27, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >> Please find the specification, LaTeX sources and pdf, at:
> >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>
> >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>
> >> * Add an event virtqueue for the device to report translation faults to
> >>   the driver. For the moment only unrecoverable faults are available but
> >>   future versions will extend it.
> >> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>   properties.
> >> * Rename "address space" to "domain". The change might seem futile but
> >>   allows to introduce PASIDs and other features cleanly in the next
> >>   versions. In the same vein, the few remaining "device" occurrences were
> >>   replaced by "endpoint", to avoid any confusion with "the device"
> >>   referring to the virtio device across the document.
> >> * Add implementation notes for RESV_MEM properties.
> >> * Update ACPI table definition.
> >> * Fix typos and clarify a few things.
> >>
> >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >> I'll focus on optimizations and adding support for hardware acceleration.
> >>
> >> Existing implementations are simple and can certainly be optimized, even
> >> without architectural changes. But the architecture itself can also be
> >> improved in a number of ways. Currently it is designed to work well with
> >> VFIO. However, having explicit MAP requests is less efficient* than page
> >> tables for emulated and PV endpoints, and the current architecture doesn't
> >> address this. Binding page tables is an obvious way to improve throughput
> >> in that case, but we can explore cleverer (and possibly simpler) ways to
> >> do it.
> >>
> >> So first we'll work on getting the base device and driver merged, then
> >> we'll analyze and compare several ideas for improving performance.
> >>
> >> Thanks,
> >> Jean
> >>
> >> * I have yet to study this behaviour, and would be interested in any
> >> prior art on the subject of analyzing devices DMA patterns (virtio and
> >> others)
> > 
> > 
> > From the spec,
> > Under future extensions.
> > 
> > "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> > 
> > Had few questions on this.
> > 
> > 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> 
> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> and adding requests in pretty much the same format to virtio-iommu.
> 
> > 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >    CMIIW. 
> 
> The virtio-iommu device advertises which PASID/page table format is
> supported by the host (obtained via sysfs and communicated in the PROBE
> request), then the guest binds page tables or PASID tables to a domain and
> populates it. Binding page tables alone is easy because we already have
> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> in the host to manage PASID tables. But since the PASID table pointer is
> translated by stage-2, it would requires a little more work in the host
> for obtaining GPA buffers from the guest on demand.
  Is this for resolving PCI PRI requests ?. 
  IIUC, PCI PRI requests for devices owned by guest need to be resolved
  by guest itself.


 In addition the BIND
> ioctl is different from the one used by VT-d, so this solution didn't get
> much appreciation.

Could you please share the links on this ?

> 
> The alternative is to bind PASID tables. 

Sorry, i didnt get the difference here.

It requires to factor the guest
> PASID handling code into a library, which is difficult for SMMU. Luckily
> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> the driver isn't a big overhead. The good thing about this solution is
> that it reuses any specification work done for VFIO (and vice versa) and
> any host driver change made for vSMMU/VT-d emulations.
> 
> Thanks,
> Jean

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-24 16:58       ` Linu Cherian
@ 2017-10-25  7:07         ` Linu Cherian
  2017-10-25  9:07           ` Jean-Philippe Brucker
  2017-10-25  9:07             ` [virtio-dev] " Jean-Philippe Brucker
  2017-10-25  7:07         ` Linu Cherian
  1 sibling, 2 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-25  7:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian@intel.com

Hi Jean,

On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> Hi Jean,
> Thanks for your reply.
> 
> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> > Hi Linu,
> > 
> > On 24/10/17 07:27, Linu Cherian wrote:
> > > Hi Jean,
> > > 
> > > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> > >> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> > >> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> > >> Please find the specification, LaTeX sources and pdf, at:
> > >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> > >>
> > >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> > >>
> > >> * Add an event virtqueue for the device to report translation faults to
> > >>   the driver. For the moment only unrecoverable faults are available but
> > >>   future versions will extend it.
> > >> * Simplify PROBE request by removing the ack part, and flattening RESV
> > >>   properties.
> > >> * Rename "address space" to "domain". The change might seem futile but
> > >>   allows to introduce PASIDs and other features cleanly in the next
> > >>   versions. In the same vein, the few remaining "device" occurrences were
> > >>   replaced by "endpoint", to avoid any confusion with "the device"
> > >>   referring to the virtio device across the document.
> > >> * Add implementation notes for RESV_MEM properties.
> > >> * Update ACPI table definition.
> > >> * Fix typos and clarify a few things.
> > >>
> > >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> > >> I'll focus on optimizations and adding support for hardware acceleration.
> > >>
> > >> Existing implementations are simple and can certainly be optimized, even
> > >> without architectural changes. But the architecture itself can also be
> > >> improved in a number of ways. Currently it is designed to work well with
> > >> VFIO. However, having explicit MAP requests is less efficient* than page
> > >> tables for emulated and PV endpoints, and the current architecture doesn't
> > >> address this. Binding page tables is an obvious way to improve throughput
> > >> in that case, but we can explore cleverer (and possibly simpler) ways to
> > >> do it.
> > >>
> > >> So first we'll work on getting the base device and driver merged, then
> > >> we'll analyze and compare several ideas for improving performance.
> > >>
> > >> Thanks,
> > >> Jean
> > >>
> > >> * I have yet to study this behaviour, and would be interested in any
> > >> prior art on the subject of analyzing devices DMA patterns (virtio and
> > >> others)
> > > 
> > > 
> > > From the spec,
> > > Under future extensions.
> > > 
> > > "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> > > 
> > > Had few questions on this.
> > > 
> > > 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> > 
> > Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> > and adding requests in pretty much the same format to virtio-iommu.
> > 
> > > 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> > >    driver need to create stage 1 page table as required by hardware which is not the case now. 
> > >    CMIIW. 
> > 
> > The virtio-iommu device advertises which PASID/page table format is
> > supported by the host (obtained via sysfs and communicated in the PROBE
> > request), then the guest binds page tables or PASID tables to a domain and
> > populates it. Binding page tables alone is easy because we already have
> > the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> > in the host to manage PASID tables. But since the PASID table pointer is
> > translated by stage-2, it would requires a little more work in the host
> > for obtaining GPA buffers from the guest on demand.
>   Is this for resolving PCI PRI requests ?. 
>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
>   by guest itself.
> 
> 
>  In addition the BIND
> > ioctl is different from the one used by VT-d, so this solution didn't get
> > much appreciation.
> 
> Could you please share the links on this ?
> 
> > 
> > The alternative is to bind PASID tables. 
> 
> Sorry, i didnt get the difference here.
>

Also does this solution intend to cover the page table sharing of non SVM 
cases. For example, if we need to share the IOMMU page table for 
a device used in guest kernel, so that map/unmap gets directly handled by the guest
and only TLB invalidates happens through a virtio-iommu channel.
 
> It requires to factor the guest
> > PASID handling code into a library, which is difficult for SMMU. Luckily
> > I'm still working on adding PASID code for SMMUv3, so extracting it out of
> > the driver isn't a big overhead. The good thing about this solution is
> > that it reuses any specification work done for VFIO (and vice versa) and
> > any host driver change made for vSMMU/VT-d emulations.
> > 
> > Thanks,
> > Jean
> 
> -- 
> Linu cherian

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-24 16:58       ` Linu Cherian
  2017-10-25  7:07         ` Linu Cherian
@ 2017-10-25  7:07         ` Linu Cherian
  1 sibling, 0 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-25  7:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, Linu Cherian, Robin Murphy, eric.auger.pro

Hi Jean,

On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> Hi Jean,
> Thanks for your reply.
> 
> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> > Hi Linu,
> > 
> > On 24/10/17 07:27, Linu Cherian wrote:
> > > Hi Jean,
> > > 
> > > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> > >> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> > >> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> > >> Please find the specification, LaTeX sources and pdf, at:
> > >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> > >>
> > >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> > >>
> > >> * Add an event virtqueue for the device to report translation faults to
> > >>   the driver. For the moment only unrecoverable faults are available but
> > >>   future versions will extend it.
> > >> * Simplify PROBE request by removing the ack part, and flattening RESV
> > >>   properties.
> > >> * Rename "address space" to "domain". The change might seem futile but
> > >>   allows to introduce PASIDs and other features cleanly in the next
> > >>   versions. In the same vein, the few remaining "device" occurrences were
> > >>   replaced by "endpoint", to avoid any confusion with "the device"
> > >>   referring to the virtio device across the document.
> > >> * Add implementation notes for RESV_MEM properties.
> > >> * Update ACPI table definition.
> > >> * Fix typos and clarify a few things.
> > >>
> > >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> > >> I'll focus on optimizations and adding support for hardware acceleration.
> > >>
> > >> Existing implementations are simple and can certainly be optimized, even
> > >> without architectural changes. But the architecture itself can also be
> > >> improved in a number of ways. Currently it is designed to work well with
> > >> VFIO. However, having explicit MAP requests is less efficient* than page
> > >> tables for emulated and PV endpoints, and the current architecture doesn't
> > >> address this. Binding page tables is an obvious way to improve throughput
> > >> in that case, but we can explore cleverer (and possibly simpler) ways to
> > >> do it.
> > >>
> > >> So first we'll work on getting the base device and driver merged, then
> > >> we'll analyze and compare several ideas for improving performance.
> > >>
> > >> Thanks,
> > >> Jean
> > >>
> > >> * I have yet to study this behaviour, and would be interested in any
> > >> prior art on the subject of analyzing devices DMA patterns (virtio and
> > >> others)
> > > 
> > > 
> > > From the spec,
> > > Under future extensions.
> > > 
> > > "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> > > 
> > > Had few questions on this.
> > > 
> > > 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> > 
> > Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> > and adding requests in pretty much the same format to virtio-iommu.
> > 
> > > 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> > >    driver need to create stage 1 page table as required by hardware which is not the case now. 
> > >    CMIIW. 
> > 
> > The virtio-iommu device advertises which PASID/page table format is
> > supported by the host (obtained via sysfs and communicated in the PROBE
> > request), then the guest binds page tables or PASID tables to a domain and
> > populates it. Binding page tables alone is easy because we already have
> > the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> > in the host to manage PASID tables. But since the PASID table pointer is
> > translated by stage-2, it would requires a little more work in the host
> > for obtaining GPA buffers from the guest on demand.
>   Is this for resolving PCI PRI requests ?. 
>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
>   by guest itself.
> 
> 
>  In addition the BIND
> > ioctl is different from the one used by VT-d, so this solution didn't get
> > much appreciation.
> 
> Could you please share the links on this ?
> 
> > 
> > The alternative is to bind PASID tables. 
> 
> Sorry, i didnt get the difference here.
>

Also does this solution intend to cover the page table sharing of non SVM 
cases. For example, if we need to share the IOMMU page table for 
a device used in guest kernel, so that map/unmap gets directly handled by the guest
and only TLB invalidates happens through a virtio-iommu channel.
 
> It requires to factor the guest
> > PASID handling code into a library, which is difficult for SMMU. Luckily
> > I'm still working on adding PASID code for SMMUv3, so extracting it out of
> > the driver isn't a big overhead. The good thing about this solution is
> > that it reuses any specification work done for VFIO (and vice versa) and
> > any host driver change made for vSMMU/VT-d emulations.
> > 
> > Thanks,
> > Jean
> 
> -- 
> Linu cherian

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-25  7:07         ` Linu Cherian
@ 2017-10-25  9:07             ` Jean-Philippe Brucker
  2017-10-25  9:07             ` [virtio-dev] " Jean-Philippe Brucker
  1 sibling, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25  9:07 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian@intel.com

On 25/10/17 08:07, Linu Cherian wrote:
> Hi Jean,
> 
> On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
>> Hi Jean,
>> Thanks for your reply.
>>
>> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
>>> Hi Linu,
>>>
>>> On 24/10/17 07:27, Linu Cherian wrote:
>>>> Hi Jean,
>>>>
>>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
>>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
>>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
>>>>> Please find the specification, LaTeX sources and pdf, at:
>>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
>>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
>>>>>
>>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
>>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
>>>>>
>>>>> * Add an event virtqueue for the device to report translation faults to
>>>>>   the driver. For the moment only unrecoverable faults are available but
>>>>>   future versions will extend it.
>>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
>>>>>   properties.
>>>>> * Rename "address space" to "domain". The change might seem futile but
>>>>>   allows to introduce PASIDs and other features cleanly in the next
>>>>>   versions. In the same vein, the few remaining "device" occurrences were
>>>>>   replaced by "endpoint", to avoid any confusion with "the device"
>>>>>   referring to the virtio device across the document.
>>>>> * Add implementation notes for RESV_MEM properties.
>>>>> * Update ACPI table definition.
>>>>> * Fix typos and clarify a few things.
>>>>>
>>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
>>>>> I'll focus on optimizations and adding support for hardware acceleration.
>>>>>
>>>>> Existing implementations are simple and can certainly be optimized, even
>>>>> without architectural changes. But the architecture itself can also be
>>>>> improved in a number of ways. Currently it is designed to work well with
>>>>> VFIO. However, having explicit MAP requests is less efficient* than page
>>>>> tables for emulated and PV endpoints, and the current architecture doesn't
>>>>> address this. Binding page tables is an obvious way to improve throughput
>>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
>>>>> do it.
>>>>>
>>>>> So first we'll work on getting the base device and driver merged, then
>>>>> we'll analyze and compare several ideas for improving performance.
>>>>>
>>>>> Thanks,
>>>>> Jean
>>>>>
>>>>> * I have yet to study this behaviour, and would be interested in any
>>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
>>>>> others)
>>>>
>>>>
>>>> From the spec,
>>>> Under future extensions.
>>>>
>>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
>>>>
>>>> Had few questions on this.
>>>>
>>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
>>>
>>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
>>> and adding requests in pretty much the same format to virtio-iommu.
>>>
>>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
>>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
>>>>    CMIIW. 
>>>
>>> The virtio-iommu device advertises which PASID/page table format is
>>> supported by the host (obtained via sysfs and communicated in the PROBE
>>> request), then the guest binds page tables or PASID tables to a domain and
>>> populates it. Binding page tables alone is easy because we already have
>>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
>>> in the host to manage PASID tables. But since the PASID table pointer is
>>> translated by stage-2, it would requires a little more work in the host
>>> for obtaining GPA buffers from the guest on demand.
>>   Is this for resolving PCI PRI requests ?. 
>>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
>>   by guest itself.

Supporting PCI PRI is a separate problem, that will be implemented by
extending the event queue proposed in v0.5. Once the guest bound the PASID
table and created the page tables, it will start some DMA job in the
device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
fault) to its driver, which is relayed to userspace by VFIO, then to the
guest via virtio-iommu. The guest handles the fault, then sends a PRI
response on the virtio-iommu request queue, relayed to the pIOMMU driver
via VFIO and the device retries the access.

>>  In addition the BIND
>>> ioctl is different from the one used by VT-d, so this solution didn't get
>>> much appreciation.
>>
>> Could you please share the links on this ?

Please find the latest discussion at
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20189.html

>>> The alternative is to bind PASID tables. 
>>
>> Sorry, i didnt get the difference here.

PASID table is what we call Context Table in SMMU, it's the array
associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
stream table entry (device descriptor) points to a PASID table. Each
context descriptor in the PASID table points to a page directory (pgd).

So the first solution was for the guest to send a BIND with pasid+pgd, and
let the host deal with the context tables. The second solution is to send
a BIND with a PASID table pointer, and have the guest handle the context
table.

> Also does this solution intend to cover the page table sharing of non SVM 
> cases. For example, if we need to share the IOMMU page table for 
> a device used in guest kernel, so that map/unmap gets directly handled by the guest
> and only TLB invalidates happens through a virtio-iommu channel.

Yes for non-SVM in SMMuv3, you still have a context table but with a
single descriptor, so the interface stays the same. But with the second
solution, nested with SMMUv2 isn't supported since it doesn't have context
tables. The second solution was considered simpler to implement, so we'll
first go with this one.

Thanks,
Jean

>> It requires to factor the guest
>>> PASID handling code into a library, which is difficult for SMMU. Luckily
>>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
>>> the driver isn't a big overhead. The good thing about this solution is
>>> that it reuses any specification work done for VFIO (and vice versa) and
>>> any host driver change made for vSMMU/VT-d emulations.
>>>
>>> Thanks,
>>> Jean
>>
>> -- 
>> Linu cherian
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-25  7:07         ` Linu Cherian
@ 2017-10-25  9:07           ` Jean-Philippe Brucker
  2017-10-25  9:07             ` [virtio-dev] " Jean-Philippe Brucker
  1 sibling, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25  9:07 UTC (permalink / raw)
  To: Linu Cherian
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, Linu Cherian, Robin Murphy, eric.auger.pro

On 25/10/17 08:07, Linu Cherian wrote:
> Hi Jean,
> 
> On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
>> Hi Jean,
>> Thanks for your reply.
>>
>> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
>>> Hi Linu,
>>>
>>> On 24/10/17 07:27, Linu Cherian wrote:
>>>> Hi Jean,
>>>>
>>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
>>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
>>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
>>>>> Please find the specification, LaTeX sources and pdf, at:
>>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
>>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
>>>>>
>>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
>>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
>>>>>
>>>>> * Add an event virtqueue for the device to report translation faults to
>>>>>   the driver. For the moment only unrecoverable faults are available but
>>>>>   future versions will extend it.
>>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
>>>>>   properties.
>>>>> * Rename "address space" to "domain". The change might seem futile but
>>>>>   allows to introduce PASIDs and other features cleanly in the next
>>>>>   versions. In the same vein, the few remaining "device" occurrences were
>>>>>   replaced by "endpoint", to avoid any confusion with "the device"
>>>>>   referring to the virtio device across the document.
>>>>> * Add implementation notes for RESV_MEM properties.
>>>>> * Update ACPI table definition.
>>>>> * Fix typos and clarify a few things.
>>>>>
>>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
>>>>> I'll focus on optimizations and adding support for hardware acceleration.
>>>>>
>>>>> Existing implementations are simple and can certainly be optimized, even
>>>>> without architectural changes. But the architecture itself can also be
>>>>> improved in a number of ways. Currently it is designed to work well with
>>>>> VFIO. However, having explicit MAP requests is less efficient* than page
>>>>> tables for emulated and PV endpoints, and the current architecture doesn't
>>>>> address this. Binding page tables is an obvious way to improve throughput
>>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
>>>>> do it.
>>>>>
>>>>> So first we'll work on getting the base device and driver merged, then
>>>>> we'll analyze and compare several ideas for improving performance.
>>>>>
>>>>> Thanks,
>>>>> Jean
>>>>>
>>>>> * I have yet to study this behaviour, and would be interested in any
>>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
>>>>> others)
>>>>
>>>>
>>>> From the spec,
>>>> Under future extensions.
>>>>
>>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
>>>>
>>>> Had few questions on this.
>>>>
>>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
>>>
>>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
>>> and adding requests in pretty much the same format to virtio-iommu.
>>>
>>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
>>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
>>>>    CMIIW. 
>>>
>>> The virtio-iommu device advertises which PASID/page table format is
>>> supported by the host (obtained via sysfs and communicated in the PROBE
>>> request), then the guest binds page tables or PASID tables to a domain and
>>> populates it. Binding page tables alone is easy because we already have
>>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
>>> in the host to manage PASID tables. But since the PASID table pointer is
>>> translated by stage-2, it would requires a little more work in the host
>>> for obtaining GPA buffers from the guest on demand.
>>   Is this for resolving PCI PRI requests ?. 
>>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
>>   by guest itself.

Supporting PCI PRI is a separate problem, that will be implemented by
extending the event queue proposed in v0.5. Once the guest bound the PASID
table and created the page tables, it will start some DMA job in the
device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
fault) to its driver, which is relayed to userspace by VFIO, then to the
guest via virtio-iommu. The guest handles the fault, then sends a PRI
response on the virtio-iommu request queue, relayed to the pIOMMU driver
via VFIO and the device retries the access.

>>  In addition the BIND
>>> ioctl is different from the one used by VT-d, so this solution didn't get
>>> much appreciation.
>>
>> Could you please share the links on this ?

Please find the latest discussion at
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20189.html

>>> The alternative is to bind PASID tables. 
>>
>> Sorry, i didnt get the difference here.

PASID table is what we call Context Table in SMMU, it's the array
associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
stream table entry (device descriptor) points to a PASID table. Each
context descriptor in the PASID table points to a page directory (pgd).

So the first solution was for the guest to send a BIND with pasid+pgd, and
let the host deal with the context tables. The second solution is to send
a BIND with a PASID table pointer, and have the guest handle the context
table.

> Also does this solution intend to cover the page table sharing of non SVM 
> cases. For example, if we need to share the IOMMU page table for 
> a device used in guest kernel, so that map/unmap gets directly handled by the guest
> and only TLB invalidates happens through a virtio-iommu channel.

Yes for non-SVM in SMMuv3, you still have a context table but with a
single descriptor, so the interface stays the same. But with the second
solution, nested with SMMUv2 isn't supported since it doesn't have context
tables. The second solution was considered simpler to implement, so we'll
first go with this one.

Thanks,
Jean

>> It requires to factor the guest
>>> PASID handling code into a library, which is difficult for SMMU. Luckily
>>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
>>> the driver isn't a big overhead. The good thing about this solution is
>>> that it reuses any specification work done for VFIO (and vice versa) and
>>> any host driver change made for vSMMU/VT-d emulations.
>>>
>>> Thanks,
>>> Jean
>>
>> -- 
>> Linu cherian
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [virtio-dev] Re: [RFC] virtio-iommu version 0.5
@ 2017-10-25  9:07             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25  9:07 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj, sunil.goutham

On 25/10/17 08:07, Linu Cherian wrote:
> Hi Jean,
> 
> On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
>> Hi Jean,
>> Thanks for your reply.
>>
>> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
>>> Hi Linu,
>>>
>>> On 24/10/17 07:27, Linu Cherian wrote:
>>>> Hi Jean,
>>>>
>>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
>>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
>>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
>>>>> Please find the specification, LaTeX sources and pdf, at:
>>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
>>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
>>>>>
>>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
>>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
>>>>>
>>>>> * Add an event virtqueue for the device to report translation faults to
>>>>>   the driver. For the moment only unrecoverable faults are available but
>>>>>   future versions will extend it.
>>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
>>>>>   properties.
>>>>> * Rename "address space" to "domain". The change might seem futile but
>>>>>   allows to introduce PASIDs and other features cleanly in the next
>>>>>   versions. In the same vein, the few remaining "device" occurrences were
>>>>>   replaced by "endpoint", to avoid any confusion with "the device"
>>>>>   referring to the virtio device across the document.
>>>>> * Add implementation notes for RESV_MEM properties.
>>>>> * Update ACPI table definition.
>>>>> * Fix typos and clarify a few things.
>>>>>
>>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
>>>>> I'll focus on optimizations and adding support for hardware acceleration.
>>>>>
>>>>> Existing implementations are simple and can certainly be optimized, even
>>>>> without architectural changes. But the architecture itself can also be
>>>>> improved in a number of ways. Currently it is designed to work well with
>>>>> VFIO. However, having explicit MAP requests is less efficient* than page
>>>>> tables for emulated and PV endpoints, and the current architecture doesn't
>>>>> address this. Binding page tables is an obvious way to improve throughput
>>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
>>>>> do it.
>>>>>
>>>>> So first we'll work on getting the base device and driver merged, then
>>>>> we'll analyze and compare several ideas for improving performance.
>>>>>
>>>>> Thanks,
>>>>> Jean
>>>>>
>>>>> * I have yet to study this behaviour, and would be interested in any
>>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
>>>>> others)
>>>>
>>>>
>>>> From the spec,
>>>> Under future extensions.
>>>>
>>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
>>>>
>>>> Had few questions on this.
>>>>
>>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
>>>
>>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
>>> and adding requests in pretty much the same format to virtio-iommu.
>>>
>>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
>>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
>>>>    CMIIW. 
>>>
>>> The virtio-iommu device advertises which PASID/page table format is
>>> supported by the host (obtained via sysfs and communicated in the PROBE
>>> request), then the guest binds page tables or PASID tables to a domain and
>>> populates it. Binding page tables alone is easy because we already have
>>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
>>> in the host to manage PASID tables. But since the PASID table pointer is
>>> translated by stage-2, it would requires a little more work in the host
>>> for obtaining GPA buffers from the guest on demand.
>>   Is this for resolving PCI PRI requests ?. 
>>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
>>   by guest itself.

Supporting PCI PRI is a separate problem, that will be implemented by
extending the event queue proposed in v0.5. Once the guest bound the PASID
table and created the page tables, it will start some DMA job in the
device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
fault) to its driver, which is relayed to userspace by VFIO, then to the
guest via virtio-iommu. The guest handles the fault, then sends a PRI
response on the virtio-iommu request queue, relayed to the pIOMMU driver
via VFIO and the device retries the access.

>>  In addition the BIND
>>> ioctl is different from the one used by VT-d, so this solution didn't get
>>> much appreciation.
>>
>> Could you please share the links on this ?

Please find the latest discussion at
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20189.html

>>> The alternative is to bind PASID tables. 
>>
>> Sorry, i didnt get the difference here.

PASID table is what we call Context Table in SMMU, it's the array
associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
stream table entry (device descriptor) points to a PASID table. Each
context descriptor in the PASID table points to a page directory (pgd).

So the first solution was for the guest to send a BIND with pasid+pgd, and
let the host deal with the context tables. The second solution is to send
a BIND with a PASID table pointer, and have the guest handle the context
table.

> Also does this solution intend to cover the page table sharing of non SVM 
> cases. For example, if we need to share the IOMMU page table for 
> a device used in guest kernel, so that map/unmap gets directly handled by the guest
> and only TLB invalidates happens through a virtio-iommu channel.

Yes for non-SVM in SMMuv3, you still have a context table but with a
single descriptor, so the interface stays the same. But with the second
solution, nested with SMMUv2 isn't supported since it doesn't have context
tables. The second solution was considered simpler to implement, so we'll
first go with this one.

Thanks,
Jean

>> It requires to factor the guest
>>> PASID handling code into a library, which is difficult for SMMU. Luckily
>>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
>>> the driver isn't a big overhead. The good thing about this solution is
>>> that it reuses any specification work done for VFIO (and vice versa) and
>>> any host driver change made for vSMMU/VT-d emulations.
>>>
>>> Thanks,
>>> Jean
>>
>> -- 
>> Linu cherian
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
       [not found]             ` <6e5c3a23-9e00-1936-f80c-085faf42c420-5wv7dgnIgG8@public.gmane.org>
@ 2017-10-25  9:26               ` Linu Cherian
  2017-10-25 11:05               ` Linu Cherian
  1 sibling, 0 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-25  9:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	kvm-u79uwXL29TY76Z2rM5mHXA, mst-H+wXaHxf7aLQT0dZR+AlfA,
	Marc Zyngier, jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	Jayachandran.Nair-YGCgFSpz5w/QT0dZR+AlfA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture doesn't
> >>>>> address this. Binding page tables is an obvious way to improve throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >>>>    CMIIW. 
> >>>
> >>> The virtio-iommu device advertises which PASID/page table format is
> >>> supported by the host (obtained via sysfs and communicated in the PROBE
> >>> request), then the guest binds page tables or PASID tables to a domain and
> >>> populates it. Binding page tables alone is easy because we already have
> >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> >>> in the host to manage PASID tables. But since the PASID table pointer is
> >>> translated by stage-2, it would requires a little more work in the host
> >>> for obtaining GPA buffers from the guest on demand.
> >>   Is this for resolving PCI PRI requests ?. 
> >>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
> >>   by guest itself.
> 
> Supporting PCI PRI is a separate problem, that will be implemented by
> extending the event queue proposed in v0.5. Once the guest bound the PASID
> table and created the page tables, it will start some DMA job in the
> device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
> fault) to its driver, which is relayed to userspace by VFIO, then to the
> guest via virtio-iommu. The guest handles the fault, then sends a PRI
> response on the virtio-iommu request queue, relayed to the pIOMMU driver
> via VFIO and the device retries the access.
> 
> >>  In addition the BIND
> >>> ioctl is different from the one used by VT-d, so this solution didn't get
> >>> much appreciation.
> >>
> >> Could you please share the links on this ?
> 
> Please find the latest discussion at
> https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg20189.html
> 
> >>> The alternative is to bind PASID tables. 
> >>
> >> Sorry, i didnt get the difference here.
> 
> PASID table is what we call Context Table in SMMU, it's the array
> associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
> stream table entry (device descriptor) points to a PASID table. Each
> context descriptor in the PASID table points to a page directory (pgd).
> 
> So the first solution was for the guest to send a BIND with pasid+pgd, and
> let the host deal with the context tables. The second solution is to send
> a BIND with a PASID table pointer, and have the guest handle the context
> table.
> 
> > Also does this solution intend to cover the page table sharing of non SVM 
> > cases. For example, if we need to share the IOMMU page table for 
> > a device used in guest kernel, so that map/unmap gets directly handled by the guest
> > and only TLB invalidates happens through a virtio-iommu channel.
> 
> Yes for non-SVM in SMMuv3, you still have a context table but with a
> single descriptor, so the interface stays the same. But with the second
> solution, nested with SMMUv2 isn't supported since it doesn't have context
> tables. The second solution was considered simpler to implement, so we'll
> first go with this one.
> 
> Thanks,
> Jean
>

Thanks a lot for the pointers and the explanation.

 
> >> It requires to factor the guest
> >>> PASID handling code into a library, which is difficult for SMMU. Luckily
> >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> >>> the driver isn't a big overhead. The good thing about this solution is
> >>> that it reuses any specification work done for VFIO (and vice versa) and
> >>> any host driver change made for vSMMU/VT-d emulations.
> >>>
> >>> Thanks,
> >>> Jean
> >>
> >> -- 
> >> Linu cherian
> > 

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-25  9:07             ` [virtio-dev] " Jean-Philippe Brucker
  (?)
@ 2017-10-25  9:26             ` Linu Cherian
  -1 siblings, 0 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-25  9:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, Linu Cherian, Robin Murphy, eric.auger.pro

Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture doesn't
> >>>>> address this. Binding page tables is an obvious way to improve throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >>>>    CMIIW. 
> >>>
> >>> The virtio-iommu device advertises which PASID/page table format is
> >>> supported by the host (obtained via sysfs and communicated in the PROBE
> >>> request), then the guest binds page tables or PASID tables to a domain and
> >>> populates it. Binding page tables alone is easy because we already have
> >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> >>> in the host to manage PASID tables. But since the PASID table pointer is
> >>> translated by stage-2, it would requires a little more work in the host
> >>> for obtaining GPA buffers from the guest on demand.
> >>   Is this for resolving PCI PRI requests ?. 
> >>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
> >>   by guest itself.
> 
> Supporting PCI PRI is a separate problem, that will be implemented by
> extending the event queue proposed in v0.5. Once the guest bound the PASID
> table and created the page tables, it will start some DMA job in the
> device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
> fault) to its driver, which is relayed to userspace by VFIO, then to the
> guest via virtio-iommu. The guest handles the fault, then sends a PRI
> response on the virtio-iommu request queue, relayed to the pIOMMU driver
> via VFIO and the device retries the access.
> 
> >>  In addition the BIND
> >>> ioctl is different from the one used by VT-d, so this solution didn't get
> >>> much appreciation.
> >>
> >> Could you please share the links on this ?
> 
> Please find the latest discussion at
> https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20189.html
> 
> >>> The alternative is to bind PASID tables. 
> >>
> >> Sorry, i didnt get the difference here.
> 
> PASID table is what we call Context Table in SMMU, it's the array
> associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
> stream table entry (device descriptor) points to a PASID table. Each
> context descriptor in the PASID table points to a page directory (pgd).
> 
> So the first solution was for the guest to send a BIND with pasid+pgd, and
> let the host deal with the context tables. The second solution is to send
> a BIND with a PASID table pointer, and have the guest handle the context
> table.
> 
> > Also does this solution intend to cover the page table sharing of non SVM 
> > cases. For example, if we need to share the IOMMU page table for 
> > a device used in guest kernel, so that map/unmap gets directly handled by the guest
> > and only TLB invalidates happens through a virtio-iommu channel.
> 
> Yes for non-SVM in SMMuv3, you still have a context table but with a
> single descriptor, so the interface stays the same. But with the second
> solution, nested with SMMUv2 isn't supported since it doesn't have context
> tables. The second solution was considered simpler to implement, so we'll
> first go with this one.
> 
> Thanks,
> Jean
>

Thanks a lot for the pointers and the explanation.

 
> >> It requires to factor the guest
> >>> PASID handling code into a library, which is difficult for SMMU. Luckily
> >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> >>> the driver isn't a big overhead. The good thing about this solution is
> >>> that it reuses any specification work done for VFIO (and vice versa) and
> >>> any host driver change made for vSMMU/VT-d emulations.
> >>>
> >>> Thanks,
> >>> Jean
> >>
> >> -- 
> >> Linu cherian
> > 

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
       [not found]             ` <6e5c3a23-9e00-1936-f80c-085faf42c420-5wv7dgnIgG8@public.gmane.org>
  2017-10-25  9:26               ` Linu Cherian
@ 2017-10-25 11:05               ` Linu Cherian
  2017-10-25 12:05                 ` Jean-Philippe Brucker
  2017-10-25 12:05                   ` Jean-Philippe Brucker
  1 sibling, 2 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-25 11:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	kvm-u79uwXL29TY76Z2rM5mHXA, mst-H+wXaHxf7aLQT0dZR+AlfA,
	Marc Zyngier, jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	Jayachandran.Nair-YGCgFSpz5w/QT0dZR+AlfA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture doesn't
> >>>>> address this. Binding page tables is an obvious way to improve throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >>>>    CMIIW. 
> >>>
> >>> The virtio-iommu device advertises which PASID/page table format is
> >>> supported by the host (obtained via sysfs and communicated in the PROBE
> >>> request), then the guest binds page tables or PASID tables to a domain and
> >>> populates it. Binding page tables alone is easy because we already have
> >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> >>> in the host to manage PASID tables. But since the PASID table pointer is
> >>> translated by stage-2, it would requires a little more work in the host
> >>> for obtaining GPA buffers from the guest on demand.
> >>   Is this for resolving PCI PRI requests ?. 
> >>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
> >>   by guest itself.
> 
> Supporting PCI PRI is a separate problem, that will be implemented by
> extending the event queue proposed in v0.5. Once the guest bound the PASID
> table and created the page tables, it will start some DMA job in the
> device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
> fault) to its driver, which is relayed to userspace by VFIO, then to the
> guest via virtio-iommu. The guest handles the fault, then sends a PRI
> response on the virtio-iommu request queue, relayed to the pIOMMU driver
> via VFIO and the device retries the access.
> 
> >>  In addition the BIND
> >>> ioctl is different from the one used by VT-d, so this solution didn't get
> >>> much appreciation.
> >>
> >> Could you please share the links on this ?
> 
> Please find the latest discussion at
> https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg20189.html
> 
> >>> The alternative is to bind PASID tables. 
> >>
> >> Sorry, i didnt get the difference here.
> 
> PASID table is what we call Context Table in SMMU, it's the array
> associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
> stream table entry (device descriptor) points to a PASID table. Each
> context descriptor in the PASID table points to a page directory (pgd).
> 
> So the first solution was for the guest to send a BIND with pasid+pgd, and
> let the host deal with the context tables. The second solution is to send
> a BIND with a PASID table pointer, and have the guest handle the context
> table.
> 
> > Also does this solution intend to cover the page table sharing of non SVM 
> > cases. For example, if we need to share the IOMMU page table for 
> > a device used in guest kernel, so that map/unmap gets directly handled by the guest
> > and only TLB invalidates happens through a virtio-iommu channel.
> 
> Yes for non-SVM in SMMuv3, you still have a context table but with a
> single descriptor, so the interface stays the same. 

So for non SVM case, 
guest virtio-iommu driver will program the context descriptor such a way that,
ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered 
from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? 

And for SVM case, ASID would be in shared set and explicit TLB invalidates 
are not required from software ?

But with the second
> solution, nested with SMMUv2 isn't supported since it doesn't have context
> tables. The second solution was considered simpler to implement, so we'll
> first go with this one.
> 
> Thanks,
> Jean
> 
> >> It requires to factor the guest
> >>> PASID handling code into a library, which is difficult for SMMU. Luckily
> >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> >>> the driver isn't a big overhead. The good thing about this solution is
> >>> that it reuses any specification work done for VFIO (and vice versa) and
> >>> any host driver change made for vSMMU/VT-d emulations.
> >>>
> >>> Thanks,
> >>> Jean
> >>
> >> -- 
> >> Linu cherian
> > 

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-25  9:07             ` [virtio-dev] " Jean-Philippe Brucker
  (?)
  (?)
@ 2017-10-25 11:05             ` Linu Cherian
  -1 siblings, 0 replies; 25+ messages in thread
From: Linu Cherian @ 2017-10-25 11:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, Linu Cherian, Robin Murphy, eric.auger.pro

Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture doesn't
> >>>>> address this. Binding page tables is an obvious way to improve throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >>>>    CMIIW. 
> >>>
> >>> The virtio-iommu device advertises which PASID/page table format is
> >>> supported by the host (obtained via sysfs and communicated in the PROBE
> >>> request), then the guest binds page tables or PASID tables to a domain and
> >>> populates it. Binding page tables alone is easy because we already have
> >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> >>> in the host to manage PASID tables. But since the PASID table pointer is
> >>> translated by stage-2, it would requires a little more work in the host
> >>> for obtaining GPA buffers from the guest on demand.
> >>   Is this for resolving PCI PRI requests ?. 
> >>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
> >>   by guest itself.
> 
> Supporting PCI PRI is a separate problem, that will be implemented by
> extending the event queue proposed in v0.5. Once the guest bound the PASID
> table and created the page tables, it will start some DMA job in the
> device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
> fault) to its driver, which is relayed to userspace by VFIO, then to the
> guest via virtio-iommu. The guest handles the fault, then sends a PRI
> response on the virtio-iommu request queue, relayed to the pIOMMU driver
> via VFIO and the device retries the access.
> 
> >>  In addition the BIND
> >>> ioctl is different from the one used by VT-d, so this solution didn't get
> >>> much appreciation.
> >>
> >> Could you please share the links on this ?
> 
> Please find the latest discussion at
> https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20189.html
> 
> >>> The alternative is to bind PASID tables. 
> >>
> >> Sorry, i didnt get the difference here.
> 
> PASID table is what we call Context Table in SMMU, it's the array
> associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
> stream table entry (device descriptor) points to a PASID table. Each
> context descriptor in the PASID table points to a page directory (pgd).
> 
> So the first solution was for the guest to send a BIND with pasid+pgd, and
> let the host deal with the context tables. The second solution is to send
> a BIND with a PASID table pointer, and have the guest handle the context
> table.
> 
> > Also does this solution intend to cover the page table sharing of non SVM 
> > cases. For example, if we need to share the IOMMU page table for 
> > a device used in guest kernel, so that map/unmap gets directly handled by the guest
> > and only TLB invalidates happens through a virtio-iommu channel.
> 
> Yes for non-SVM in SMMuv3, you still have a context table but with a
> single descriptor, so the interface stays the same. 

So for non SVM case, 
guest virtio-iommu driver will program the context descriptor such a way that,
ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered 
from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? 

And for SVM case, ASID would be in shared set and explicit TLB invalidates 
are not required from software ?

But with the second
> solution, nested with SMMUv2 isn't supported since it doesn't have context
> tables. The second solution was considered simpler to implement, so we'll
> first go with this one.
> 
> Thanks,
> Jean
> 
> >> It requires to factor the guest
> >>> PASID handling code into a library, which is difficult for SMMU. Luckily
> >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> >>> the driver isn't a big overhead. The good thing about this solution is
> >>> that it reuses any specification work done for VFIO (and vice versa) and
> >>> any host driver change made for vSMMU/VT-d emulations.
> >>>
> >>> Thanks,
> >>> Jean
> >>
> >> -- 
> >> Linu cherian
> > 

-- 
Linu cherian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-25 11:05               ` Linu Cherian
  2017-10-25 12:05                 ` Jean-Philippe Brucker
@ 2017-10-25 12:05                   ` Jean-Philippe Brucker
  1 sibling, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25 12:05 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj@intel.com

On 25/10/17 12:05, Linu Cherian wrote:
>>> Also does this solution intend to cover the page table sharing of non SVM 
>>> cases. For example, if we need to share the IOMMU page table for 
>>> a device used in guest kernel, so that map/unmap gets directly handled by the guest
>>> and only TLB invalidates happens through a virtio-iommu channel.
>>
>> Yes for non-SVM in SMMuv3, you still have a context table but with a
>> single descriptor, so the interface stays the same. 
> 
> So for non SVM case, 
> guest virtio-iommu driver will program the context descriptor such a way that,
> ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered 
> from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? 

That's right. viommu_unmap will send an INVALIDATE request on the virtio
request queue, forwarded to the driver via a VFIO ioctl.

> And for SVM case, ASID would be in shared set and explicit TLB invalidates 
> are not required from software ?

Yes, although that's only true for integrated devices (non-PCI). On PCI,
devices will have an ATC (a device IOTLB), in which case we still have to
send an invalidation request through virtio and then VFIO, so that the
SMMU driver sends an CMD_ATC_INV.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
@ 2017-10-25 12:05                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25 12:05 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair

On 25/10/17 12:05, Linu Cherian wrote:
>>> Also does this solution intend to cover the page table sharing of non SVM 
>>> cases. For example, if we need to share the IOMMU page table for 
>>> a device used in guest kernel, so that map/unmap gets directly handled by the guest
>>> and only TLB invalidates happens through a virtio-iommu channel.
>>
>> Yes for non-SVM in SMMuv3, you still have a context table but with a
>> single descriptor, so the interface stays the same. 
> 
> So for non SVM case, 
> guest virtio-iommu driver will program the context descriptor such a way that,
> ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered 
> from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? 

That's right. viommu_unmap will send an INVALIDATE request on the virtio
request queue, forwarded to the driver via a VFIO ioctl.

> And for SVM case, ASID would be in shared set and explicit TLB invalidates 
> are not required from software ?

Yes, although that's only true for integrated devices (non-PCI). On PCI,
devices will have an ATC (a device IOTLB), in which case we still have to
send an invalidation request through virtio and then VFIO, so that the
SMMU driver sends an CMD_ATC_INV.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-25 11:05               ` Linu Cherian
@ 2017-10-25 12:05                 ` Jean-Philippe Brucker
  2017-10-25 12:05                   ` Jean-Philippe Brucker
  1 sibling, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25 12:05 UTC (permalink / raw)
  To: Linu Cherian
  Cc: virtio-dev, Lorenzo Pieralisi, ashok.raj, kvm, mst, Marc Zyngier,
	Will Deacon, Jayachandran.Nair, virtualization, eric.auger,
	iommu, sunil.goutham, Linu Cherian, Robin Murphy, eric.auger.pro

On 25/10/17 12:05, Linu Cherian wrote:
>>> Also does this solution intend to cover the page table sharing of non SVM 
>>> cases. For example, if we need to share the IOMMU page table for 
>>> a device used in guest kernel, so that map/unmap gets directly handled by the guest
>>> and only TLB invalidates happens through a virtio-iommu channel.
>>
>> Yes for non-SVM in SMMuv3, you still have a context table but with a
>> single descriptor, so the interface stays the same. 
> 
> So for non SVM case, 
> guest virtio-iommu driver will program the context descriptor such a way that,
> ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered 
> from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? 

That's right. viommu_unmap will send an INVALIDATE request on the virtio
request queue, forwarded to the driver via a VFIO ioctl.

> And for SVM case, ASID would be in shared set and explicit TLB invalidates 
> are not required from software ?

Yes, although that's only true for integrated devices (non-PCI). On PCI,
devices will have an ATC (a device IOTLB), in which case we still have to
send an invalidation request through virtio and then VFIO, so that the
SMMU driver sends an CMD_ATC_INV.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [virtio-dev] Re: [RFC] virtio-iommu version 0.5
@ 2017-10-25 12:05                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-25 12:05 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Linu Cherian, iommu, kvm, virtualization, virtio-dev,
	Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj, sunil.goutham

On 25/10/17 12:05, Linu Cherian wrote:
>>> Also does this solution intend to cover the page table sharing of non SVM 
>>> cases. For example, if we need to share the IOMMU page table for 
>>> a device used in guest kernel, so that map/unmap gets directly handled by the guest
>>> and only TLB invalidates happens through a virtio-iommu channel.
>>
>> Yes for non-SVM in SMMuv3, you still have a context table but with a
>> single descriptor, so the interface stays the same. 
> 
> So for non SVM case, 
> guest virtio-iommu driver will program the context descriptor such a way that,
> ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered 
> from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? 

That's right. viommu_unmap will send an INVALIDATE request on the virtio
request queue, forwarded to the driver via a VFIO ioctl.

> And for SVM case, ASID would be in shared set and explicit TLB invalidates 
> are not required from software ?

Yes, although that's only true for integrated devices (non-PCI). On PCI,
devices will have an ATC (a device IOTLB), in which case we still have to
send an invalidation request through virtio and then VFIO, so that the
SMMU driver sends an CMD_ATC_INV.

Thanks,
Jean


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-23  9:32 ` [virtio-dev] " Jean-Philippe Brucker
@ 2017-12-01 15:46   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-12-01 15:46 UTC (permalink / raw)
  To: iommu, kvm, virtualization, virtio-dev
  Cc: Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj, Alex Williamson

Hello,

Since we still have time to introduce disruptive changes, I'm
considering changing the UNMAP parameters slightly. Behavior stays the
same, but instead of passing virt_addr and size, we pass virt_start and
virt_end:

struct virtio_iommu_req_unmap {
        le32    domain;
        le64    virt_start;
        le64    virt_end;
        le32    reserved;
};

And for symmetry, also change MAP:

struct virtio_iommu_req_map {
        le32    domain;
        le64    phys_start;
        le64    virt_start;
        le64    virt_end;
        le32    flags;
};

This would allow to express the full 64-bit range in MAP and UNMAP
requests. Currently the MAP description dismisses this case with "just
use VIRTIO_IOMMU_F_BYPASS if you need an identity mapping", and I think
it still holds. But for UNMAP it can be useful to send a single request
covering the full address space instead of lots of individual requests.

I had this problem when implementing ATTACH/DETACH in kvmtool, because
VFIO doesn't have an explicit unmap-all command (and kvmtool doesn't
keep track of the mappings). When changing a device's domain, I had to
send two unmap commands, one for each half of the address space. It's
not a huge problem, just a bit inconvenient.

I don't see how unmap-all would be useful for virtio-iommu at the moment
(since detaching the domain unmaps all) but just like it turns out to be
desirable in VFIO, I'm sure someone will need it in virtio-iommu one
day.

Alternatively we could change \field{reserved} of the unmap request into
\field{flags} and add an UNMAP_ALL flag. This is backward-compatible and
less invasive. Introducing a new UNMAP_ALL request would probably be
cleaner though. In any case, I personally prefer start and end
parameter, it looks nicer. The changeset below looks scary, but it's
mostly reformatting.

Please let me know if you have any objection, or other comment for 0.5.

Thanks,
Jean

--- 8< ---
Subject: [PATCH] Change MAP and UNMAP parameters

Passing start/end instead of start/size to MAP and UNMAP offers more
flexibility. UNMAP can now be used to unmap the whole address space in
one go. UNMAP(domain, 0, ~0ULL) should now remove all mappings.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 device-operations.tex | 124 +++++++++++++++++++++++++-------------------------
 1 file changed, 61 insertions(+), 63 deletions(-)

diff --git a/device-operations.tex b/device-operations.tex
index 9b35283..3ecafd3 100644
--- a/device-operations.tex
+++ b/device-operations.tex
@@ -21,15 +21,15 @@ types:
   \texttt{attach(device = 0x104, domain = 1)}
 \item Create a mapping between a range of guest-virtual and guest-physical
   address. \\
-  \texttt{map(domain = 1, virt = 0x1000, phys = 0xa000,
-          size = 0x1000, flags = READ)}
+  \texttt{map(domain = 1, virt_start = 0x1000, virt_end = 0x1fff,
+          phys = 0xa000, flags = READ)}

   Endpoint 0x104, for example a hardware PCI endpoint, can now read at
   addresses 0x1000-0x1fff. These accesses are translated into
   system-physical addresses by the IOMMU.

 \item Remove the mapping.\\
-  \texttt{unmap(domain = 1, virt = 0x1000, size = 0x1000)}
+  \texttt{unmap(domain = 1, virt_start= 0x1000, virt_end = 0x1fff)}

   Any access to addresses 0x1000-0x1fff by endpoint 0x104 would now be
   rejected.
@@ -286,8 +286,9 @@ written to it, the driver SHOULD interpret it as a failure from the device
 to parse the request.

 If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, the driver SHOULD
-NOT send requests with \field{virt_addr} less than
-\field{input_range.start} or greater than \field{input_range.end}.
+NOT send requests with \field{virt_start} less than
+\field{input_range.start} or \field{virt_end} greater than
+\field{input_range.end}.

 If the VIRTIO_IOMMU_F_DOMAIN_BITS feature is offered, the driver SHOULD
 NOT send requests with \field{domain} greater than the size described by
@@ -321,7 +322,7 @@ The device MUST ignore reserved fields of the head and the tail of a
 request.

 If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, the device MUST
-truncate the range described by \field{virt_addr} and \field{size} in
+truncate the range described by \field{virt_start} and \field{virt_end} in
 requests to fit in the range described by \field{input_range}.

 If the VIRTIO_IOMMU_F_DOMAIN_BITS is offered, the device MUST ignore bits
@@ -446,9 +447,9 @@ endpoint cannot access any mapping from that domain.
 \begin{lstlisting}
 struct virtio_iommu_req_map {
 	le32	domain;
-	le64	phys_addr;
-	le64	virt_addr;
-	le64	size;
+	le64	phys_start;
+	le64	virt_start;
+	le64	virt_end;
 	le32	flags;
 };

@@ -461,30 +462,21 @@ struct virtio_iommu_req_map {
 Map a range of virtually-contiguous addresses to a range of
 physically-contiguous addresses of the same size. After the request
 succeeds, all endpoints attached to this domain can access memory in the
-range $[phys\_addr; phys\_addr + size[$. For example, if an endpoint
-accesses address $VA \in [virt\_addr; virt\_addr + size[$, the device (or
-the physical IOMMU) translates the address: $PA = VA - virt\_addr +
-phys\_addr$. If the access parameters are compatible with \field{flags}
-(for instance, the access is write and \field{flags} are
-VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows
-the access to reach $PA$.
-
-The range defined by (\field{virt_addr}, \field{size}) must be within the
-limits specified by \field{input_range}. The range defined by
-(\field{phys_addr}, \field{size}) must be within the guest-physical
-address space. This includes upper and lower limits, as well as any
-carving of guest-physical addresses for use by the host (for instance MSI
-doorbells). Guest physical boundaries are set by the host using a firmware
-mechanism outside the scope of this specification.
-
-\begin{note}
-This format prevents from creating the identity mapping in a single
-request \texttt{[0x0; 0xfff....fff] $\rightarrow$ [0x0; 0xfff...fff]},
-since it would result in a size of zero. Hopefully allowing
-VIRTIO_IOMMU_F_BYPASS eliminates the need for issuing such request. It
-would also be unlikely to conform to the physical range restrictions
-from the previous paragraph.
-\end{note}
+range $[virt\_start; virt\_end]$. For example, if an endpoint accesses
+address $VA \in [virt\_start; virt\_end]$, the device (or the physical
+IOMMU) translates the address: $PA = VA - virt\_start + phys\_start$. If
+the access parameters are compatible with \field{flags} (for instance, the
+access is write and \field{flags} are VIRTIO_IOMMU_MAP_F_READ |
+VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows the access to reach $PA$.
+
+The range defined by \field{virt_start} and \field{virt_end} should be
+within the limits specified by \field{input_range}. Given $phys\_end =
+phys\_start + virt\_end - virt\_start$, the range defined by
+\field{phys_start} and phys_end should be within the guest-physical address
+space. This includes upper and lower limits, as well as any carving of
+guest-physical addresses for use by the host (for instance MSI doorbells).
+Guest physical boundaries are set by the host using a firmware mechanism
+outside the scope of this specification.

 \begin{note}
 On flags: it is unlikely that all possible combinations of flags will be
@@ -503,11 +495,13 @@ negotiated.

 The driver SHOULD set undefined \field{flags} bits to zero.

+\field{virt_end} MUST be strictly greater than \field{virt_start}.
+
 \devicenormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request}

-If \field{virt_addr}, \field{phys_addr} or \field{size} is not aligned on
-the page granularity, the device SHOULD set the request \field{status} to
-VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping.
+If \field{virt_start}, \field{phys_start} or (\field{virt_end} + 1) is
+not aligned on the page granularity, the device SHOULD set the request
+\field{status} to VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping.

 If the device doesn't recognize a \field{flags} bit, it SHOULD set the
 request \field{status} to VIRTIO_IOMMU_S_INVAL. In this case the device
@@ -524,45 +518,49 @@ If \field{domain} does not exist, the device SHOULD set the request
 \begin{lstlisting}
 struct virtio_iommu_req_unmap {
 	le32	domain;
-	le64	virt_addr;
-	le64	size;
+	le64	virt_start;
+	le64	virt_end;
 	le32	reserved;
 };
 \end{lstlisting}

 Unmap a range of addresses mapped with VIRTIO_IOMMU_T_MAP. We define here
 a mapping as a virtual region created with a single MAP request. All
-mappings covered by the range $[virt\_addr; virt\_addr + size [$ are
-removed.
+mappings covered by the range $[virt\_start; virt\_end]$ are removed.

-The semantics of unmapping are specified below, and illustrated with the
-following requests, assuming each example sequence starts with a blank
-address space. We define two pseudocode functions \texttt{map(virt\_addr,
-size) -> mapping} and \texttt{unmap(virt\_addr, size)}.
+The semantics of unmapping are specified in \ref{drivernormative:Device
+Types / IOMMU Device / Device operations / UNMAP request} and
+\ref{devicenormative:Device Types / IOMMU Device / Device operations /
+UNMAP request}, and illustrated with the following requests, assuming each
+example sequence starts with a blank address space. We define two
+pseudocode functions \texttt{map(virt_start, virt_end) -> mapping} and
+\texttt{unmap(virt_start, virt_end)}.

 \begin{lstlisting}
-(1) unmap(addr=0, size=5)        -> succeeds, doesn't unmap anything
+(1) unmap(virt_start=0,
+          virt_end=4)            -> succeeds, doesn't unmap anything

-(2) a = map(addr=0, size=10);
-    unmap(0, 10)                 -> succeeds, unmaps a
+(2) a = map(virt_start=0,
+            virt_end=9);
+    unmap(0, 9)                  -> succeeds, unmaps a

-(3) a = map(0, 5);
-    b = map(5, 5);
-    unmap(0, 10)                 -> succeeds, unmaps a and b
+(3) a = map(0, 4);
+    b = map(5, 9);
+    unmap(0, 9)                  -> succeeds, unmaps a and b

-(4) a = map(0, 10);
-    unmap(0, 5)                  -> faults, doesn't unmap anything
+(4) a = map(0, 9);
+    unmap(0, 4)                  -> faults, doesn't unmap anything

-(5) a = map(0, 5);
-    b = map(5, 5);
-    unmap(0, 5)                  -> succeeds, unmaps a
+(5) a = map(0, 4);
+    b = map(5, 9);
+    unmap(0, 4)                  -> succeeds, unmaps a

-(6) a = map(0, 5);
-    unmap(0, 10)                 -> succeeds, unmaps a
+(6) a = map(0, 4);
+    unmap(0, 9)                  -> succeeds, unmaps a

-(7) a = map(0, 5);
-    b = map(10, 5);
-    unmap(0, 15)                 -> succeeds, unmaps a and b
+(7) a = map(0, 4);
+    b = map(10, 14);
+    unmap(0, 14)                 -> succeeds, unmaps a and b
 \end{lstlisting}

 This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been
@@ -572,9 +570,9 @@ negotiated.

 The driver SHOULD set the \field{reserved} field to zero.

-The range, defined by \field{virt_addr} and \field{size}, SHOULD cover one
-or more contiguous mappings created with MAP requests. The range MAY spill
-over unmapped virtual addresses.
+The range, defined by \field{virt_start} and \field{virt_end}, SHOULD
+cover one or more contiguous mappings created with MAP requests. The range
+MAY spill over unmapped virtual addresses.

 The first address of a range SHOULD either be the first address of a
 mapping or be outside any mapping. The last address of a range SHOULD
--
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-iommu version 0.5
  2017-10-23  9:32 ` [virtio-dev] " Jean-Philippe Brucker
                   ` (3 preceding siblings ...)
  (?)
@ 2017-12-01 15:46 ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-12-01 15:46 UTC (permalink / raw)
  To: iommu, kvm, virtualization, virtio-dev
  Cc: Lorenzo Pieralisi, ashok.raj, mst, Marc Zyngier, Will Deacon,
	Jayachandran.Nair, eric.auger, Robin Murphy, eric.auger.pro

Hello,

Since we still have time to introduce disruptive changes, I'm
considering changing the UNMAP parameters slightly. Behavior stays the
same, but instead of passing virt_addr and size, we pass virt_start and
virt_end:

struct virtio_iommu_req_unmap {
        le32    domain;
        le64    virt_start;
        le64    virt_end;
        le32    reserved;
};

And for symmetry, also change MAP:

struct virtio_iommu_req_map {
        le32    domain;
        le64    phys_start;
        le64    virt_start;
        le64    virt_end;
        le32    flags;
};

This would allow to express the full 64-bit range in MAP and UNMAP
requests. Currently the MAP description dismisses this case with "just
use VIRTIO_IOMMU_F_BYPASS if you need an identity mapping", and I think
it still holds. But for UNMAP it can be useful to send a single request
covering the full address space instead of lots of individual requests.

I had this problem when implementing ATTACH/DETACH in kvmtool, because
VFIO doesn't have an explicit unmap-all command (and kvmtool doesn't
keep track of the mappings). When changing a device's domain, I had to
send two unmap commands, one for each half of the address space. It's
not a huge problem, just a bit inconvenient.

I don't see how unmap-all would be useful for virtio-iommu at the moment
(since detaching the domain unmaps all) but just like it turns out to be
desirable in VFIO, I'm sure someone will need it in virtio-iommu one
day.

Alternatively we could change \field{reserved} of the unmap request into
\field{flags} and add an UNMAP_ALL flag. This is backward-compatible and
less invasive. Introducing a new UNMAP_ALL request would probably be
cleaner though. In any case, I personally prefer start and end
parameter, it looks nicer. The changeset below looks scary, but it's
mostly reformatting.

Please let me know if you have any objection, or other comment for 0.5.

Thanks,
Jean

--- 8< ---
Subject: [PATCH] Change MAP and UNMAP parameters

Passing start/end instead of start/size to MAP and UNMAP offers more
flexibility. UNMAP can now be used to unmap the whole address space in
one go. UNMAP(domain, 0, ~0ULL) should now remove all mappings.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 device-operations.tex | 124 +++++++++++++++++++++++++-------------------------
 1 file changed, 61 insertions(+), 63 deletions(-)

diff --git a/device-operations.tex b/device-operations.tex
index 9b35283..3ecafd3 100644
--- a/device-operations.tex
+++ b/device-operations.tex
@@ -21,15 +21,15 @@ types:
   \texttt{attach(device = 0x104, domain = 1)}
 \item Create a mapping between a range of guest-virtual and guest-physical
   address. \\
-  \texttt{map(domain = 1, virt = 0x1000, phys = 0xa000,
-          size = 0x1000, flags = READ)}
+  \texttt{map(domain = 1, virt_start = 0x1000, virt_end = 0x1fff,
+          phys = 0xa000, flags = READ)}

   Endpoint 0x104, for example a hardware PCI endpoint, can now read at
   addresses 0x1000-0x1fff. These accesses are translated into
   system-physical addresses by the IOMMU.

 \item Remove the mapping.\\
-  \texttt{unmap(domain = 1, virt = 0x1000, size = 0x1000)}
+  \texttt{unmap(domain = 1, virt_start= 0x1000, virt_end = 0x1fff)}

   Any access to addresses 0x1000-0x1fff by endpoint 0x104 would now be
   rejected.
@@ -286,8 +286,9 @@ written to it, the driver SHOULD interpret it as a failure from the device
 to parse the request.

 If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, the driver SHOULD
-NOT send requests with \field{virt_addr} less than
-\field{input_range.start} or greater than \field{input_range.end}.
+NOT send requests with \field{virt_start} less than
+\field{input_range.start} or \field{virt_end} greater than
+\field{input_range.end}.

 If the VIRTIO_IOMMU_F_DOMAIN_BITS feature is offered, the driver SHOULD
 NOT send requests with \field{domain} greater than the size described by
@@ -321,7 +322,7 @@ The device MUST ignore reserved fields of the head and the tail of a
 request.

 If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, the device MUST
-truncate the range described by \field{virt_addr} and \field{size} in
+truncate the range described by \field{virt_start} and \field{virt_end} in
 requests to fit in the range described by \field{input_range}.

 If the VIRTIO_IOMMU_F_DOMAIN_BITS is offered, the device MUST ignore bits
@@ -446,9 +447,9 @@ endpoint cannot access any mapping from that domain.
 \begin{lstlisting}
 struct virtio_iommu_req_map {
 	le32	domain;
-	le64	phys_addr;
-	le64	virt_addr;
-	le64	size;
+	le64	phys_start;
+	le64	virt_start;
+	le64	virt_end;
 	le32	flags;
 };

@@ -461,30 +462,21 @@ struct virtio_iommu_req_map {
 Map a range of virtually-contiguous addresses to a range of
 physically-contiguous addresses of the same size. After the request
 succeeds, all endpoints attached to this domain can access memory in the
-range $[phys\_addr; phys\_addr + size[$. For example, if an endpoint
-accesses address $VA \in [virt\_addr; virt\_addr + size[$, the device (or
-the physical IOMMU) translates the address: $PA = VA - virt\_addr +
-phys\_addr$. If the access parameters are compatible with \field{flags}
-(for instance, the access is write and \field{flags} are
-VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows
-the access to reach $PA$.
-
-The range defined by (\field{virt_addr}, \field{size}) must be within the
-limits specified by \field{input_range}. The range defined by
-(\field{phys_addr}, \field{size}) must be within the guest-physical
-address space. This includes upper and lower limits, as well as any
-carving of guest-physical addresses for use by the host (for instance MSI
-doorbells). Guest physical boundaries are set by the host using a firmware
-mechanism outside the scope of this specification.
-
-\begin{note}
-This format prevents from creating the identity mapping in a single
-request \texttt{[0x0; 0xfff....fff] $\rightarrow$ [0x0; 0xfff...fff]},
-since it would result in a size of zero. Hopefully allowing
-VIRTIO_IOMMU_F_BYPASS eliminates the need for issuing such request. It
-would also be unlikely to conform to the physical range restrictions
-from the previous paragraph.
-\end{note}
+range $[virt\_start; virt\_end]$. For example, if an endpoint accesses
+address $VA \in [virt\_start; virt\_end]$, the device (or the physical
+IOMMU) translates the address: $PA = VA - virt\_start + phys\_start$. If
+the access parameters are compatible with \field{flags} (for instance, the
+access is write and \field{flags} are VIRTIO_IOMMU_MAP_F_READ |
+VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows the access to reach $PA$.
+
+The range defined by \field{virt_start} and \field{virt_end} should be
+within the limits specified by \field{input_range}. Given $phys\_end =
+phys\_start + virt\_end - virt\_start$, the range defined by
+\field{phys_start} and phys_end should be within the guest-physical address
+space. This includes upper and lower limits, as well as any carving of
+guest-physical addresses for use by the host (for instance MSI doorbells).
+Guest physical boundaries are set by the host using a firmware mechanism
+outside the scope of this specification.

 \begin{note}
 On flags: it is unlikely that all possible combinations of flags will be
@@ -503,11 +495,13 @@ negotiated.

 The driver SHOULD set undefined \field{flags} bits to zero.

+\field{virt_end} MUST be strictly greater than \field{virt_start}.
+
 \devicenormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request}

-If \field{virt_addr}, \field{phys_addr} or \field{size} is not aligned on
-the page granularity, the device SHOULD set the request \field{status} to
-VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping.
+If \field{virt_start}, \field{phys_start} or (\field{virt_end} + 1) is
+not aligned on the page granularity, the device SHOULD set the request
+\field{status} to VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping.

 If the device doesn't recognize a \field{flags} bit, it SHOULD set the
 request \field{status} to VIRTIO_IOMMU_S_INVAL. In this case the device
@@ -524,45 +518,49 @@ If \field{domain} does not exist, the device SHOULD set the request
 \begin{lstlisting}
 struct virtio_iommu_req_unmap {
 	le32	domain;
-	le64	virt_addr;
-	le64	size;
+	le64	virt_start;
+	le64	virt_end;
 	le32	reserved;
 };
 \end{lstlisting}

 Unmap a range of addresses mapped with VIRTIO_IOMMU_T_MAP. We define here
 a mapping as a virtual region created with a single MAP request. All
-mappings covered by the range $[virt\_addr; virt\_addr + size [$ are
-removed.
+mappings covered by the range $[virt\_start; virt\_end]$ are removed.

-The semantics of unmapping are specified below, and illustrated with the
-following requests, assuming each example sequence starts with a blank
-address space. We define two pseudocode functions \texttt{map(virt\_addr,
-size) -> mapping} and \texttt{unmap(virt\_addr, size)}.
+The semantics of unmapping are specified in \ref{drivernormative:Device
+Types / IOMMU Device / Device operations / UNMAP request} and
+\ref{devicenormative:Device Types / IOMMU Device / Device operations /
+UNMAP request}, and illustrated with the following requests, assuming each
+example sequence starts with a blank address space. We define two
+pseudocode functions \texttt{map(virt_start, virt_end) -> mapping} and
+\texttt{unmap(virt_start, virt_end)}.

 \begin{lstlisting}
-(1) unmap(addr=0, size=5)        -> succeeds, doesn't unmap anything
+(1) unmap(virt_start=0,
+          virt_end=4)            -> succeeds, doesn't unmap anything

-(2) a = map(addr=0, size=10);
-    unmap(0, 10)                 -> succeeds, unmaps a
+(2) a = map(virt_start=0,
+            virt_end=9);
+    unmap(0, 9)                  -> succeeds, unmaps a

-(3) a = map(0, 5);
-    b = map(5, 5);
-    unmap(0, 10)                 -> succeeds, unmaps a and b
+(3) a = map(0, 4);
+    b = map(5, 9);
+    unmap(0, 9)                  -> succeeds, unmaps a and b

-(4) a = map(0, 10);
-    unmap(0, 5)                  -> faults, doesn't unmap anything
+(4) a = map(0, 9);
+    unmap(0, 4)                  -> faults, doesn't unmap anything

-(5) a = map(0, 5);
-    b = map(5, 5);
-    unmap(0, 5)                  -> succeeds, unmaps a
+(5) a = map(0, 4);
+    b = map(5, 9);
+    unmap(0, 4)                  -> succeeds, unmaps a

-(6) a = map(0, 5);
-    unmap(0, 10)                 -> succeeds, unmaps a
+(6) a = map(0, 4);
+    unmap(0, 9)                  -> succeeds, unmaps a

-(7) a = map(0, 5);
-    b = map(10, 5);
-    unmap(0, 15)                 -> succeeds, unmaps a and b
+(7) a = map(0, 4);
+    b = map(10, 14);
+    unmap(0, 14)                 -> succeeds, unmaps a and b
 \end{lstlisting}

 This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been
@@ -572,9 +570,9 @@ negotiated.

 The driver SHOULD set the \field{reserved} field to zero.

-The range, defined by \field{virt_addr} and \field{size}, SHOULD cover one
-or more contiguous mappings created with MAP requests. The range MAY spill
-over unmapped virtual addresses.
+The range, defined by \field{virt_start} and \field{virt_end}, SHOULD
+cover one or more contiguous mappings created with MAP requests. The range
+MAY spill over unmapped virtual addresses.

 The first address of a range SHOULD either be the first address of a
 mapping or be outside any mapping. The last address of a range SHOULD
--
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [virtio-dev] Re: [RFC] virtio-iommu version 0.5
@ 2017-12-01 15:46   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-12-01 15:46 UTC (permalink / raw)
  To: iommu, kvm, virtualization, virtio-dev
  Cc: Will Deacon, Robin Murphy, Lorenzo Pieralisi, mst, jasowang,
	Marc Zyngier, eric.auger, eric.auger.pro, bharat.bhushan, peterx,
	kevin.tian, Jayachandran.Nair, ashok.raj, Alex Williamson

Hello,

Since we still have time to introduce disruptive changes, I'm
considering changing the UNMAP parameters slightly. Behavior stays the
same, but instead of passing virt_addr and size, we pass virt_start and
virt_end:

struct virtio_iommu_req_unmap {
        le32    domain;
        le64    virt_start;
        le64    virt_end;
        le32    reserved;
};

And for symmetry, also change MAP:

struct virtio_iommu_req_map {
        le32    domain;
        le64    phys_start;
        le64    virt_start;
        le64    virt_end;
        le32    flags;
};

This would allow to express the full 64-bit range in MAP and UNMAP
requests. Currently the MAP description dismisses this case with "just
use VIRTIO_IOMMU_F_BYPASS if you need an identity mapping", and I think
it still holds. But for UNMAP it can be useful to send a single request
covering the full address space instead of lots of individual requests.

I had this problem when implementing ATTACH/DETACH in kvmtool, because
VFIO doesn't have an explicit unmap-all command (and kvmtool doesn't
keep track of the mappings). When changing a device's domain, I had to
send two unmap commands, one for each half of the address space. It's
not a huge problem, just a bit inconvenient.

I don't see how unmap-all would be useful for virtio-iommu at the moment
(since detaching the domain unmaps all) but just like it turns out to be
desirable in VFIO, I'm sure someone will need it in virtio-iommu one
day.

Alternatively we could change \field{reserved} of the unmap request into
\field{flags} and add an UNMAP_ALL flag. This is backward-compatible and
less invasive. Introducing a new UNMAP_ALL request would probably be
cleaner though. In any case, I personally prefer start and end
parameter, it looks nicer. The changeset below looks scary, but it's
mostly reformatting.

Please let me know if you have any objection, or other comment for 0.5.

Thanks,
Jean

--- 8< ---
Subject: [PATCH] Change MAP and UNMAP parameters

Passing start/end instead of start/size to MAP and UNMAP offers more
flexibility. UNMAP can now be used to unmap the whole address space in
one go. UNMAP(domain, 0, ~0ULL) should now remove all mappings.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 device-operations.tex | 124 +++++++++++++++++++++++++-------------------------
 1 file changed, 61 insertions(+), 63 deletions(-)

diff --git a/device-operations.tex b/device-operations.tex
index 9b35283..3ecafd3 100644
--- a/device-operations.tex
+++ b/device-operations.tex
@@ -21,15 +21,15 @@ types:
   \texttt{attach(device = 0x104, domain = 1)}
 \item Create a mapping between a range of guest-virtual and guest-physical
   address. \\
-  \texttt{map(domain = 1, virt = 0x1000, phys = 0xa000,
-          size = 0x1000, flags = READ)}
+  \texttt{map(domain = 1, virt_start = 0x1000, virt_end = 0x1fff,
+          phys = 0xa000, flags = READ)}

   Endpoint 0x104, for example a hardware PCI endpoint, can now read at
   addresses 0x1000-0x1fff. These accesses are translated into
   system-physical addresses by the IOMMU.

 \item Remove the mapping.\\
-  \texttt{unmap(domain = 1, virt = 0x1000, size = 0x1000)}
+  \texttt{unmap(domain = 1, virt_start= 0x1000, virt_end = 0x1fff)}

   Any access to addresses 0x1000-0x1fff by endpoint 0x104 would now be
   rejected.
@@ -286,8 +286,9 @@ written to it, the driver SHOULD interpret it as a failure from the device
 to parse the request.

 If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, the driver SHOULD
-NOT send requests with \field{virt_addr} less than
-\field{input_range.start} or greater than \field{input_range.end}.
+NOT send requests with \field{virt_start} less than
+\field{input_range.start} or \field{virt_end} greater than
+\field{input_range.end}.

 If the VIRTIO_IOMMU_F_DOMAIN_BITS feature is offered, the driver SHOULD
 NOT send requests with \field{domain} greater than the size described by
@@ -321,7 +322,7 @@ The device MUST ignore reserved fields of the head and the tail of a
 request.

 If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, the device MUST
-truncate the range described by \field{virt_addr} and \field{size} in
+truncate the range described by \field{virt_start} and \field{virt_end} in
 requests to fit in the range described by \field{input_range}.

 If the VIRTIO_IOMMU_F_DOMAIN_BITS is offered, the device MUST ignore bits
@@ -446,9 +447,9 @@ endpoint cannot access any mapping from that domain.
 \begin{lstlisting}
 struct virtio_iommu_req_map {
 	le32	domain;
-	le64	phys_addr;
-	le64	virt_addr;
-	le64	size;
+	le64	phys_start;
+	le64	virt_start;
+	le64	virt_end;
 	le32	flags;
 };

@@ -461,30 +462,21 @@ struct virtio_iommu_req_map {
 Map a range of virtually-contiguous addresses to a range of
 physically-contiguous addresses of the same size. After the request
 succeeds, all endpoints attached to this domain can access memory in the
-range $[phys\_addr; phys\_addr + size[$. For example, if an endpoint
-accesses address $VA \in [virt\_addr; virt\_addr + size[$, the device (or
-the physical IOMMU) translates the address: $PA = VA - virt\_addr +
-phys\_addr$. If the access parameters are compatible with \field{flags}
-(for instance, the access is write and \field{flags} are
-VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows
-the access to reach $PA$.
-
-The range defined by (\field{virt_addr}, \field{size}) must be within the
-limits specified by \field{input_range}. The range defined by
-(\field{phys_addr}, \field{size}) must be within the guest-physical
-address space. This includes upper and lower limits, as well as any
-carving of guest-physical addresses for use by the host (for instance MSI
-doorbells). Guest physical boundaries are set by the host using a firmware
-mechanism outside the scope of this specification.
-
-\begin{note}
-This format prevents from creating the identity mapping in a single
-request \texttt{[0x0; 0xfff....fff] $\rightarrow$ [0x0; 0xfff...fff]},
-since it would result in a size of zero. Hopefully allowing
-VIRTIO_IOMMU_F_BYPASS eliminates the need for issuing such request. It
-would also be unlikely to conform to the physical range restrictions
-from the previous paragraph.
-\end{note}
+range $[virt\_start; virt\_end]$. For example, if an endpoint accesses
+address $VA \in [virt\_start; virt\_end]$, the device (or the physical
+IOMMU) translates the address: $PA = VA - virt\_start + phys\_start$. If
+the access parameters are compatible with \field{flags} (for instance, the
+access is write and \field{flags} are VIRTIO_IOMMU_MAP_F_READ |
+VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows the access to reach $PA$.
+
+The range defined by \field{virt_start} and \field{virt_end} should be
+within the limits specified by \field{input_range}. Given $phys\_end =
+phys\_start + virt\_end - virt\_start$, the range defined by
+\field{phys_start} and phys_end should be within the guest-physical address
+space. This includes upper and lower limits, as well as any carving of
+guest-physical addresses for use by the host (for instance MSI doorbells).
+Guest physical boundaries are set by the host using a firmware mechanism
+outside the scope of this specification.

 \begin{note}
 On flags: it is unlikely that all possible combinations of flags will be
@@ -503,11 +495,13 @@ negotiated.

 The driver SHOULD set undefined \field{flags} bits to zero.

+\field{virt_end} MUST be strictly greater than \field{virt_start}.
+
 \devicenormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request}

-If \field{virt_addr}, \field{phys_addr} or \field{size} is not aligned on
-the page granularity, the device SHOULD set the request \field{status} to
-VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping.
+If \field{virt_start}, \field{phys_start} or (\field{virt_end} + 1) is
+not aligned on the page granularity, the device SHOULD set the request
+\field{status} to VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping.

 If the device doesn't recognize a \field{flags} bit, it SHOULD set the
 request \field{status} to VIRTIO_IOMMU_S_INVAL. In this case the device
@@ -524,45 +518,49 @@ If \field{domain} does not exist, the device SHOULD set the request
 \begin{lstlisting}
 struct virtio_iommu_req_unmap {
 	le32	domain;
-	le64	virt_addr;
-	le64	size;
+	le64	virt_start;
+	le64	virt_end;
 	le32	reserved;
 };
 \end{lstlisting}

 Unmap a range of addresses mapped with VIRTIO_IOMMU_T_MAP. We define here
 a mapping as a virtual region created with a single MAP request. All
-mappings covered by the range $[virt\_addr; virt\_addr + size [$ are
-removed.
+mappings covered by the range $[virt\_start; virt\_end]$ are removed.

-The semantics of unmapping are specified below, and illustrated with the
-following requests, assuming each example sequence starts with a blank
-address space. We define two pseudocode functions \texttt{map(virt\_addr,
-size) -> mapping} and \texttt{unmap(virt\_addr, size)}.
+The semantics of unmapping are specified in \ref{drivernormative:Device
+Types / IOMMU Device / Device operations / UNMAP request} and
+\ref{devicenormative:Device Types / IOMMU Device / Device operations /
+UNMAP request}, and illustrated with the following requests, assuming each
+example sequence starts with a blank address space. We define two
+pseudocode functions \texttt{map(virt_start, virt_end) -> mapping} and
+\texttt{unmap(virt_start, virt_end)}.

 \begin{lstlisting}
-(1) unmap(addr=0, size=5)        -> succeeds, doesn't unmap anything
+(1) unmap(virt_start=0,
+          virt_end=4)            -> succeeds, doesn't unmap anything

-(2) a = map(addr=0, size=10);
-    unmap(0, 10)                 -> succeeds, unmaps a
+(2) a = map(virt_start=0,
+            virt_end=9);
+    unmap(0, 9)                  -> succeeds, unmaps a

-(3) a = map(0, 5);
-    b = map(5, 5);
-    unmap(0, 10)                 -> succeeds, unmaps a and b
+(3) a = map(0, 4);
+    b = map(5, 9);
+    unmap(0, 9)                  -> succeeds, unmaps a and b

-(4) a = map(0, 10);
-    unmap(0, 5)                  -> faults, doesn't unmap anything
+(4) a = map(0, 9);
+    unmap(0, 4)                  -> faults, doesn't unmap anything

-(5) a = map(0, 5);
-    b = map(5, 5);
-    unmap(0, 5)                  -> succeeds, unmaps a
+(5) a = map(0, 4);
+    b = map(5, 9);
+    unmap(0, 4)                  -> succeeds, unmaps a

-(6) a = map(0, 5);
-    unmap(0, 10)                 -> succeeds, unmaps a
+(6) a = map(0, 4);
+    unmap(0, 9)                  -> succeeds, unmaps a

-(7) a = map(0, 5);
-    b = map(10, 5);
-    unmap(0, 15)                 -> succeeds, unmaps a and b
+(7) a = map(0, 4);
+    b = map(10, 14);
+    unmap(0, 14)                 -> succeeds, unmaps a and b
 \end{lstlisting}

 This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been
@@ -572,9 +570,9 @@ negotiated.

 The driver SHOULD set the \field{reserved} field to zero.

-The range, defined by \field{virt_addr} and \field{size}, SHOULD cover one
-or more contiguous mappings created with MAP requests. The range MAY spill
-over unmapped virtual addresses.
+The range, defined by \field{virt_start} and \field{virt_end}, SHOULD
+cover one or more contiguous mappings created with MAP requests. The range
+MAY spill over unmapped virtual addresses.

 The first address of a range SHOULD either be the first address of a
 mapping or be outside any mapping. The last address of a range SHOULD
--
2.14.2


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC] virtio-iommu version 0.5
@ 2017-10-23  9:32 Jean-Philippe Brucker
  0 siblings, 0 replies; 25+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23  9:32 UTC (permalink / raw)
  To: iommu, kvm, virtualization, virtio-dev
  Cc: lorenzo.pieralisi, ashok.raj, mst, marc.zyngier, will.deacon,
	Jayachandran.Nair, eric.auger, robin.murphy, eric.auger.pro

This is version 0.5 of the virtio-iommu specification, the paravirtualized
IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
Please find the specification, LaTeX sources and pdf, at:
git://linux-arm.org/virtio-iommu.git viommu/v0.5
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf

A detailed changelog since v0.4 follows. You can find the pdf diff at:
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf

* Add an event virtqueue for the device to report translation faults to
  the driver. For the moment only unrecoverable faults are available but
  future versions will extend it.
* Simplify PROBE request by removing the ack part, and flattening RESV
  properties.
* Rename "address space" to "domain". The change might seem futile but
  allows to introduce PASIDs and other features cleanly in the next
  versions. In the same vein, the few remaining "device" occurrences were
  replaced by "endpoint", to avoid any confusion with "the device"
  referring to the virtio device across the document.
* Add implementation notes for RESV_MEM properties.
* Update ACPI table definition.
* Fix typos and clarify a few things.

I will publish the Linux driver for v0.5 shortly. Then for next versions
I'll focus on optimizations and adding support for hardware acceleration.

Existing implementations are simple and can certainly be optimized, even
without architectural changes. But the architecture itself can also be
improved in a number of ways. Currently it is designed to work well with
VFIO. However, having explicit MAP requests is less efficient* than page
tables for emulated and PV endpoints, and the current architecture doesn't
address this. Binding page tables is an obvious way to improve throughput
in that case, but we can explore cleverer (and possibly simpler) ways to
do it.

So first we'll work on getting the base device and driver merged, then
we'll analyze and compare several ideas for improving performance.

Thanks,
Jean

* I have yet to study this behaviour, and would be interested in any
prior art on the subject of analyzing devices DMA patterns (virtio and
others)

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-12-01 15:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23  9:32 [RFC] virtio-iommu version 0.5 Jean-Philippe Brucker
2017-10-23  9:32 ` [virtio-dev] " Jean-Philippe Brucker
2017-10-24  6:27 ` Linu Cherian
     [not found] ` <20171023093241.20113-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-10-24  6:27   ` Linu Cherian
2017-10-24  8:37     ` Jean-Philippe Brucker
2017-10-24  8:37       ` [virtio-dev] " Jean-Philippe Brucker
2017-10-24 16:58       ` Linu Cherian
2017-10-25  7:07         ` Linu Cherian
2017-10-25  9:07           ` Jean-Philippe Brucker
2017-10-25  9:07           ` Jean-Philippe Brucker
2017-10-25  9:07             ` [virtio-dev] " Jean-Philippe Brucker
2017-10-25  9:26             ` Linu Cherian
2017-10-25 11:05             ` Linu Cherian
     [not found]             ` <6e5c3a23-9e00-1936-f80c-085faf42c420-5wv7dgnIgG8@public.gmane.org>
2017-10-25  9:26               ` Linu Cherian
2017-10-25 11:05               ` Linu Cherian
2017-10-25 12:05                 ` Jean-Philippe Brucker
2017-10-25 12:05                 ` Jean-Philippe Brucker
2017-10-25 12:05                   ` [virtio-dev] " Jean-Philippe Brucker
2017-10-25 12:05                   ` Jean-Philippe Brucker
2017-10-25  7:07         ` Linu Cherian
2017-10-24 16:58       ` Linu Cherian
2017-12-01 15:46 ` Jean-Philippe Brucker
2017-12-01 15:46   ` [virtio-dev] " Jean-Philippe Brucker
2017-12-01 15:46 ` Jean-Philippe Brucker
2017-10-23  9:32 Jean-Philippe Brucker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.