All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Guest IOMMU and Cisco usnic
@ 2014-02-12 18:10 Benoît Canet
  2014-02-12 19:34 ` Alex Williamson
  0 siblings, 1 reply; 5+ messages in thread
From: Benoît Canet @ 2014-02-12 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson


Hi Alex,

After the IRC conversation we had a few days ago I understood that guest IOMMU
was not implemented.

I have a real use case for it:

Cisco usnic allow to write MPI applications while driving the network card in
userspace in order to optimize the latency. It's made for compute clusters.

The typical cloud provider don't provide bare metal access but only vms on top
of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to the
guest and no IOMMU is present in the guest.

questions: Would writing a performing guest IOMMU implementation be possible ?
           How complex this project looks for someone knowing IOMMUs issues ?

The ideal implementation would forward the IOMMU work to the host hardware for
speed.

I can devote time writing the feature if it's doable.

Best regards

Benoît

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Guest IOMMU and Cisco usnic
  2014-02-12 18:10 [Qemu-devel] Guest IOMMU and Cisco usnic Benoît Canet
@ 2014-02-12 19:34 ` Alex Williamson
  2014-02-12 22:38   ` Benoît Canet
  2014-02-12 22:51   ` Benoît Canet
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Williamson @ 2014-02-12 19:34 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel

On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote:
> Hi Alex,
> 
> After the IRC conversation we had a few days ago I understood that guest IOMMU
> was not implemented.
> 
> I have a real use case for it:
> 
> Cisco usnic allow to write MPI applications while driving the network card in
> userspace in order to optimize the latency. It's made for compute clusters.
> 
> The typical cloud provider don't provide bare metal access but only vms on top
> of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to the
> guest and no IOMMU is present in the guest.
> 
> questions: Would writing a performing guest IOMMU implementation be possible ?
>            How complex this project looks for someone knowing IOMMUs issues ?
> 
> The ideal implementation would forward the IOMMU work to the host hardware for
> speed.
> 
> I can devote time writing the feature if it's doable.

Hi Benoît,

I imagine it's doable, but it's certainly not trivial, beyond that I
haven't put much thought into it.

VFIO running in a guest would need an IOMMU that implements both the
IOMMU API and IOMMU groups.  Whether that comes from an emulated
physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you
to decide.  VT-d would imply using a PCIe chipset like Q35 and trying to
bandage on VT-d or updating Q35 to something that natively supports
VT-d.  Getting a sufficiently similar PCIe hierarchy between host an
guest would also be required.

The current model of putting all guest devices in a single IOMMU domain
on the host is likely not what you would want and might imply a new VFIO
IOMMU backend that is better tuned for separate domains, sparse
mappings, and low-latency.  VFIO has a modular IOMMU design, so this
isn't architecturally a problem.  The VFIO user (QEMU) is able to select
which backend to use and the code is written with supporting multiple
backends in mind.

A complication you'll have is that the granularity of IOMMU operations
through VFIO is at the IOMMU group level, so the guest would not be able
to easily split devices grouped together on the host between separate
users in the guest.  That could be modeled as a conventional PCI bridge
masking the requester ID of devices in the guest such that host groups
are mirrored as guest groups.

There might also be more simple "punch-through" ways to do it, for
instance what if instead of trying to make it work like it does on the
host we invented a paravirt VFIO interface and the vfio-pv driver in the
guest populated /dev/vfio as slightly modified passthroughs to the host
fds.  The guest OS may not even really need to be aware of the device.

It's an interesting project and certainly a valid use case.  I'd also
like to see things like Intel's DPDK move to using VFIO, but the current
UIO DPDK is often used in guests.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Guest IOMMU and Cisco usnic
  2014-02-12 19:34 ` Alex Williamson
@ 2014-02-12 22:38   ` Benoît Canet
  2014-02-12 22:51   ` Benoît Canet
  1 sibling, 0 replies; 5+ messages in thread
From: Benoît Canet @ 2014-02-12 22:38 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Benoît Canet, qemu-devel

The Wednesday 12 Feb 2014 à 12:34:25 (-0700), Alex Williamson wrote :
> On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote:
> > Hi Alex,
> > 
> > After the IRC conversation we had a few days ago I understood that guest IOMMU
> > was not implemented.
> > 
> > I have a real use case for it:
> > 
> > Cisco usnic allow to write MPI applications while driving the network card in
> > userspace in order to optimize the latency. It's made for compute clusters.
> > 
> > The typical cloud provider don't provide bare metal access but only vms on top
> > of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to the
> > guest and no IOMMU is present in the guest.
> > 
> > questions: Would writing a performing guest IOMMU implementation be possible ?
> >            How complex this project looks for someone knowing IOMMUs issues ?
> > 
> > The ideal implementation would forward the IOMMU work to the host hardware for
> > speed.
> > 
> > I can devote time writing the feature if it's doable.
> 
> Hi Benoît,
> 
> I imagine it's doable, but it's certainly not trivial, beyond that I
> haven't put much thought into it.

Thanks for the anwser.
I am afraid when an expert of the field says "not trivial" :)

Best regards

Benoît

> 
> VFIO running in a guest would need an IOMMU that implements both the
> IOMMU API and IOMMU groups.  Whether that comes from an emulated
> physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you
> to decide.  VT-d would imply using a PCIe chipset like Q35 and trying to
> bandage on VT-d or updating Q35 to something that natively supports
> VT-d.  Getting a sufficiently similar PCIe hierarchy between host an
> guest would also be required.
> 
> The current model of putting all guest devices in a single IOMMU domain
> on the host is likely not what you would want and might imply a new VFIO
> IOMMU backend that is better tuned for separate domains, sparse
> mappings, and low-latency.  VFIO has a modular IOMMU design, so this
> isn't architecturally a problem.  The VFIO user (QEMU) is able to select
> which backend to use and the code is written with supporting multiple
> backends in mind.
> 
> A complication you'll have is that the granularity of IOMMU operations
> through VFIO is at the IOMMU group level, so the guest would not be able
> to easily split devices grouped together on the host between separate
> users in the guest.  That could be modeled as a conventional PCI bridge
> masking the requester ID of devices in the guest such that host groups
> are mirrored as guest groups.
> 
> There might also be more simple "punch-through" ways to do it, for
> instance what if instead of trying to make it work like it does on the
> host we invented a paravirt VFIO interface and the vfio-pv driver in the
> guest populated /dev/vfio as slightly modified passthroughs to the host
> fds.  The guest OS may not even really need to be aware of the device.
> 
> It's an interesting project and certainly a valid use case.  I'd also
> like to see things like Intel's DPDK move to using VFIO, but the current
> UIO DPDK is often used in guests.  Thanks,
> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Guest IOMMU and Cisco usnic
  2014-02-12 19:34 ` Alex Williamson
  2014-02-12 22:38   ` Benoît Canet
@ 2014-02-12 22:51   ` Benoît Canet
  2014-02-13  0:03     ` Alex Williamson
  1 sibling, 1 reply; 5+ messages in thread
From: Benoît Canet @ 2014-02-12 22:51 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Benoît Canet, qemu-devel

The Wednesday 12 Feb 2014 à 12:34:25 (-0700), Alex Williamson wrote :
> On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote:
> > Hi Alex,
> > 
> > After the IRC conversation we had a few days ago I understood that guest IOMMU
> > was not implemented.
> > 
> > I have a real use case for it:
> > 
> > Cisco usnic allow to write MPI applications while driving the network card in
> > userspace in order to optimize the latency. It's made for compute clusters.
> > 
> > The typical cloud provider don't provide bare metal access but only vms on top
> > of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to the
> > guest and no IOMMU is present in the guest.
> > 
> > questions: Would writing a performing guest IOMMU implementation be possible ?
> >            How complex this project looks for someone knowing IOMMUs issues ?
> > 
> > The ideal implementation would forward the IOMMU work to the host hardware for
> > speed.
> > 
> > I can devote time writing the feature if it's doable.
> 
> Hi Benoît,
> 
> I imagine it's doable, but it's certainly not trivial, beyond that I
> haven't put much thought into it.
> 
> VFIO running in a guest would need an IOMMU that implements both the
> IOMMU API and IOMMU groups.  Whether that comes from an emulated
> physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you
> to decide.  VT-d would imply using a PCIe chipset like Q35 and trying to
> bandage on VT-d or updating Q35 to something that natively supports
> VT-d.  Getting a sufficiently similar PCIe hierarchy between host an
> guest would also be required.

This Cisco thing usnic (driver/infiniband/hw/usnic) does not seems to use VFIO
at all and seems to be hardcoded to make use of an intel IOMMU.

I don't know if it's a good thing or not.

> 
> The current model of putting all guest devices in a single IOMMU domain
> on the host is likely not what you would want and might imply a new VFIO
> IOMMU backend that is better tuned for separate domains, sparse
> mappings, and low-latency.  VFIO has a modular IOMMU design, so this
> isn't architecturally a problem.  The VFIO user (QEMU) is able to select
> which backend to use and the code is written with supporting multiple
> backends in mind.
> 
> A complication you'll have is that the granularity of IOMMU operations
> through VFIO is at the IOMMU group level, so the guest would not be able
> to easily split devices grouped together on the host between separate
> users in the guest.  That could be modeled as a conventional PCI bridge
> masking the requester ID of devices in the guest such that host groups
> are mirrored as guest groups.

I think that users would be happy with only one palo ucs VF wrapped by usnic
in the guest. I definitively need to check this point.

> 
> There might also be more simple "punch-through" ways to do it, for
> instance what if instead of trying to make it work like it does on the
> host we invented a paravirt VFIO interface and the vfio-pv driver in the
> guest populated /dev/vfio as slightly modified passthroughs to the host
> fds.  The guest OS may not even really need to be aware of the device.
> 

As I am not really interested in nesting VFIO but using the intel IOMMU directly
in the guest a "punch-through" method would be fine.

> It's an interesting project and certainly a valid use case.  I'd also
> like to see things like Intel's DPDK move to using VFIO, but the current
> UIO DPDK is often used in guests.  Thanks,

I ask this to Thomas Monjalon the DPDK maintainer.

Thanks,

Best regards

Benoît

> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Guest IOMMU and Cisco usnic
  2014-02-12 22:51   ` Benoît Canet
@ 2014-02-13  0:03     ` Alex Williamson
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Williamson @ 2014-02-13  0:03 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel

On Wed, 2014-02-12 at 23:51 +0100, Benoît Canet wrote:
> The Wednesday 12 Feb 2014 à 12:34:25 (-0700), Alex Williamson wrote :
> > On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote:
> > > Hi Alex,
> > > 
> > > After the IRC conversation we had a few days ago I understood that guest IOMMU
> > > was not implemented.
> > > 
> > > I have a real use case for it:
> > > 
> > > Cisco usnic allow to write MPI applications while driving the network card in
> > > userspace in order to optimize the latency. It's made for compute clusters.
> > > 
> > > The typical cloud provider don't provide bare metal access but only vms on top
> > > of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to the
> > > guest and no IOMMU is present in the guest.
> > > 
> > > questions: Would writing a performing guest IOMMU implementation be possible ?
> > >            How complex this project looks for someone knowing IOMMUs issues ?
> > > 
> > > The ideal implementation would forward the IOMMU work to the host hardware for
> > > speed.
> > > 
> > > I can devote time writing the feature if it's doable.
> > 
> > Hi Benoît,
> > 
> > I imagine it's doable, but it's certainly not trivial, beyond that I
> > haven't put much thought into it.
> > 
> > VFIO running in a guest would need an IOMMU that implements both the
> > IOMMU API and IOMMU groups.  Whether that comes from an emulated
> > physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you
> > to decide.  VT-d would imply using a PCIe chipset like Q35 and trying to
> > bandage on VT-d or updating Q35 to something that natively supports
> > VT-d.  Getting a sufficiently similar PCIe hierarchy between host an
> > guest would also be required.
> 
> This Cisco thing usnic (driver/infiniband/hw/usnic) does not seems to use VFIO
> at all and seems to be hardcoded to make use of an intel IOMMU.
> 
> I don't know if it's a good thing or not.

Sorry, I got a little off track assuming usnic was a VFIO userspace
driver.  Peeking quickly at it, it looks like it also uses the IOMMU
API, so unless I missed the VT-d specific parts, a pv IOMMU in the guest
might allow some simplification if you don't care about non-Linux
support.
 
> > The current model of putting all guest devices in a single IOMMU domain
> > on the host is likely not what you would want and might imply a new VFIO
> > IOMMU backend that is better tuned for separate domains, sparse
> > mappings, and low-latency.  VFIO has a modular IOMMU design, so this
> > isn't architecturally a problem.  The VFIO user (QEMU) is able to select
> > which backend to use and the code is written with supporting multiple
> > backends in mind.
> > 
> > A complication you'll have is that the granularity of IOMMU operations
> > through VFIO is at the IOMMU group level, so the guest would not be able
> > to easily split devices grouped together on the host between separate
> > users in the guest.  That could be modeled as a conventional PCI bridge
> > masking the requester ID of devices in the guest such that host groups
> > are mirrored as guest groups.
> 
> I think that users would be happy with only one palo ucs VF wrapped by usnic
> in the guest. I definitively need to check this point.

The solution should support multiple devices though, it may just require
multiple guest IOMMUs and fairly strict configuration constraints.

> > There might also be more simple "punch-through" ways to do it, for
> > instance what if instead of trying to make it work like it does on the
> > host we invented a paravirt VFIO interface and the vfio-pv driver in the
> > guest populated /dev/vfio as slightly modified passthroughs to the host
> > fds.  The guest OS may not even really need to be aware of the device.
> > 
> 
> As I am not really interested in nesting VFIO but using the intel IOMMU directly
> in the guest a "punch-through" method would be fine.

I was doing a lot of hand-waving for a vfio-pv punch-though, but I don't
even have a vague idea of what an IOMMU API punch-through would look
like.  Seems like you need to evaluate if the pain of emulating VT-d is
greater than the pain of creating a new pv IOMMU and which is likely to
perform better.  Thanks,

Alex

> > It's an interesting project and certainly a valid use case.  I'd also
> > like to see things like Intel's DPDK move to using VFIO, but the current
> > UIO DPDK is often used in guests.  Thanks,
> 
> I ask this to Thomas Monjalon the DPDK maintainer.
> 
> Thanks,
> 
> Best regards
> 
> Benoît
> 
> > 
> > Alex
> > 
> > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-13  0:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-12 18:10 [Qemu-devel] Guest IOMMU and Cisco usnic Benoît Canet
2014-02-12 19:34 ` Alex Williamson
2014-02-12 22:38   ` Benoît Canet
2014-02-12 22:51   ` Benoît Canet
2014-02-13  0:03     ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.