Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found] <CALCETrU2o27udqjvREDFyoNtKtaKvuHTyYRaK+Qt_1Q3tPWYDQ@mail.gmail.com>
@ 2015-07-28  7:05 ` Christian Borntraeger
  2015-07-28  8:16 ` Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Christian Borntraeger @ 2015-07-28  7:05 UTC (permalink / raw)
  To: Andy Lutomirski, Rusty Russell, Michael S. Tsirkin
  Cc: linux-s390, Benjamin Herrenschmidt, Linux Virtualization,
	Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck, linux390,
	xen-devel

Am 28.07.2015 um 03:08 schrieb Andy Lutomirski:
> On Mon, Sep 1, 2014 at 10:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> This fixes virtio on Xen guests as well as on any other platform
>> that uses virtio_pci on which physical addresses don't match bus
>> addresses.
>>
>> This can be tested with:
>>
>>     virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
>>
>> using virtme from here:
>>
>>     https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
>>
>> Without these patches, the guest hangs forever.  With these patches,
>> everything works.
>>
> 
> Dusting off an ancient thread.
> 
> Now that the dust has accumulated^Wsettled, is it worth pursuing this?
>  I think the situation is considerably worse than it was when I
> originally wrote these patches: I think that QEMU now supports a nasty
> mode in which the guest's PCI bus appears to be behind an IOMMU but
> the virtio devices on that bus punch straight through that IOMMU.
> 
> I have a half-hearted port to modern kernels here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=virtio_ring_xen
> 
> I didn't implement DMA API access for virtio_pci_modern, and I have no
> idea what to do about detecting whether a given virtio device honors
> its IOMMU or not.

I think its really tricky.

Looking at where virtio came from, the virtio ring was always native access without
IOMMU. This was true for the early lguest things and then the early s390 transport,
(which is quite close to the lguest interface). virtio-pci used the same scheme
 - ignoring all iommu considerations.

I understand that for PCI we  actually might want to follow iommu restrictions from
a correctness and security point of view and from the ccw point of view we do not.
No idea about virtio-mmio.

I think the proper way of handling this is to take this into the TC for virtio - dont
know what would be the right thing to do. A feature bit, always iommu for pci, something
else?

Michael, Conny, 

do you agree?

Christian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found] <CALCETrU2o27udqjvREDFyoNtKtaKvuHTyYRaK+Qt_1Q3tPWYDQ@mail.gmail.com>
  2015-07-28  7:05 ` [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API Christian Borntraeger
@ 2015-07-28  8:16 ` Paolo Bonzini
       [not found] ` <55B73A49.9050206@redhat.com>
  2015-07-28 13:08 ` Michael S. Tsirkin
  3 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-28  8:16 UTC (permalink / raw)
  To: Andy Lutomirski, Rusty Russell, Michael S. Tsirkin
  Cc: linux-s390, Benjamin Herrenschmidt, Linux Virtualization,
	Christian Borntraeger, Stefan Hajnoczi, Cornelia Huck, linux390,
	xen-devel



On 28/07/2015 03:08, Andy Lutomirski wrote:
> On Mon, Sep 1, 2014 at 10:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> This fixes virtio on Xen guests as well as on any other platform
>> that uses virtio_pci on which physical addresses don't match bus
>> addresses.
>>
>> This can be tested with:
>>
>>     virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
>>
>> using virtme from here:
>>
>>     https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
>>
>> Without these patches, the guest hangs forever.  With these patches,
>> everything works.
>>
> 
> Dusting off an ancient thread.
> 
> Now that the dust has accumulated^Wsettled, is it worth pursuing this?
>  I think the situation is considerably worse than it was when I
> originally wrote these patches: I think that QEMU now supports a nasty
> mode in which the guest's PCI bus appears to be behind an IOMMU but
> the virtio devices on that bus punch straight through that IOMMU.

That is an experimental feature (it's x-iommu), so it can change.

The plan was:

- for PPC, virtio never honors IOMMU

- for non-PPC, either have virtio always honor IOMMU, or enforce that
virtio is not under IOMMU.

Paolo

> I have a half-hearted port to modern kernels here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=virtio_ring_xen
> 
> I didn't implement DMA API access for virtio_pci_modern, and I have no
> idea what to do about detecting whether a given virtio device honors
> its IOMMU or not.
> 
> --Andy
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found] ` <55B73A49.9050206@redhat.com>
@ 2015-07-28 10:12   ` Benjamin Herrenschmidt
       [not found]   ` <1438078345.7562.133.camel@kernel.crashing.org>
  1 sibling, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2015-07-28 10:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-s390, xen-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Rusty Russell, Andy Lutomirski, Christian Borntraeger,
	Cornelia Huck, linux390, Linux Virtualization

On Tue, 2015-07-28 at 10:16 +0200, Paolo Bonzini wrote:
> 
> On 28/07/2015 03:08, Andy Lutomirski wrote:
> > On Mon, Sep 1, 2014 at 10:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> This fixes virtio on Xen guests as well as on any other platform
> >> that uses virtio_pci on which physical addresses don't match bus
> >> addresses.
> >>
> >> This can be tested with:
> >>
> >>     virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
> >>
> >> using virtme from here:
> >>
> >>     https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
> >>
> >> Without these patches, the guest hangs forever.  With these patches,
> >> everything works.
> >>
> > 
> > Dusting off an ancient thread.
> > 
> > Now that the dust has accumulated^Wsettled, is it worth pursuing this?
> >  I think the situation is considerably worse than it was when I
> > originally wrote these patches: I think that QEMU now supports a nasty
> > mode in which the guest's PCI bus appears to be behind an IOMMU but
> > the virtio devices on that bus punch straight through that IOMMU.
> 
> That is an experimental feature (it's x-iommu), so it can change.
> 
> The plan was:
> 
> - for PPC, virtio never honors IOMMU
> 
> - for non-PPC, either have virtio always honor IOMMU, or enforce that
> virtio is not under IOMMU.
> 

I dislike having PPC special cased.

In fact, today x86 guests also assume that virtio bypasses IOMMU I
believe. In fact *all* guests do.

I would much prefer if the information as to whether it honors or not
gets passed to the guest somewhat. My preference goes for passing it via
the virtio config space but there were objections that it should be a
bus property (which is tricky to do with PCI and doesn't properly
reflect the fact that in qemu you can mix & match IOMMU-honoring devices
and bypassing-virtio on the same bus). 

Ben.

> Paolo
> 
> > I have a half-hearted port to modern kernels here:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=virtio_ring_xen
> > 
> > I didn't implement DMA API access for virtio_pci_modern, and I have no
> > idea what to do about detecting whether a given virtio device honors
> > its IOMMU or not.
> > 
> > --Andy
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]   ` <1438078345.7562.133.camel@kernel.crashing.org>
@ 2015-07-28 12:46     ` Paolo Bonzini
       [not found]     ` <55B7799C.3060908@redhat.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-28 12:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-s390, xen-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Rusty Russell, Andy Lutomirski, Christian Borntraeger,
	Cornelia Huck, linux390, Linux Virtualization



On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
>> > That is an experimental feature (it's x-iommu), so it can change.
>> > 
>> > The plan was:
>> > 
>> > - for PPC, virtio never honors IOMMU
>> > 
>> > - for non-PPC, either have virtio always honor IOMMU, or enforce that
>> > virtio is not under IOMMU.
>> > 
> I dislike having PPC special cased.
> 
> In fact, today x86 guests also assume that virtio bypasses IOMMU I
> believe. In fact *all* guests do.

This doesn't matter much, since the only guests that implement an IOMMU
in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
of stability.

> I would much prefer if the information as to whether it honors or not
> gets passed to the guest somewhat. My preference goes for passing it via
> the virtio config space but there were objections that it should be a
> bus property (which is tricky to do with PCI and doesn't properly
> reflect the fact that in qemu you can mix & match IOMMU-honoring devices
> and bypassing-virtio on the same bus). 

Yes, for example on x86 it must be passed through the DMAR table.
virtio-pci device must have a separate DRHD for them.  In QEMU, you
could add an "under-iommu" property to PCI bridges, and walk the
hierarchy of bridges to build the DRHDs.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]     ` <55B7799C.3060908@redhat.com>
@ 2015-07-28 13:06       ` Michael S. Tsirkin
       [not found]       ` <20150728160358-mutt-send-email-mst@redhat.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2015-07-28 13:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, jan.kiszka,
	Stefan Hajnoczi, Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote:
> 
> 
> On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
> >> > That is an experimental feature (it's x-iommu), so it can change.
> >> > 
> >> > The plan was:
> >> > 
> >> > - for PPC, virtio never honors IOMMU
> >> > 
> >> > - for non-PPC, either have virtio always honor IOMMU, or enforce that
> >> > virtio is not under IOMMU.
> >> > 
> > I dislike having PPC special cased.
> > 
> > In fact, today x86 guests also assume that virtio bypasses IOMMU I
> > believe. In fact *all* guests do.
> 
> This doesn't matter much, since the only guests that implement an IOMMU
> in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
> of stability.

Hmm I think Jan (cc) said it was already used out there.


> > I would much prefer if the information as to whether it honors or not
> > gets passed to the guest somewhat. My preference goes for passing it via
> > the virtio config space but there were objections that it should be a
> > bus property (which is tricky to do with PCI and doesn't properly
> > reflect the fact that in qemu you can mix & match IOMMU-honoring devices
> > and bypassing-virtio on the same bus). 
> 
> Yes, for example on x86 it must be passed through the DMAR table.
> virtio-pci device must have a separate DRHD for them.  In QEMU, you
> could add an "under-iommu" property to PCI bridges, and walk the
> hierarchy of bridges to build the DRHDs.
> 
> Paolo

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found] <CALCETrU2o27udqjvREDFyoNtKtaKvuHTyYRaK+Qt_1Q3tPWYDQ@mail.gmail.com>
                   ` (2 preceding siblings ...)
       [not found] ` <55B73A49.9050206@redhat.com>
@ 2015-07-28 13:08 ` Michael S. Tsirkin
  3 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2015-07-28 13:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Benjamin Herrenschmidt, Rusty Russell,
	Linux Virtualization, Christian Borntraeger, Paolo Bonzini,
	Stefan Hajnoczi, Cornelia Huck, linux390, xen-devel

On Mon, Jul 27, 2015 at 06:08:59PM -0700, Andy Lutomirski wrote:
> On Mon, Sep 1, 2014 at 10:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > This fixes virtio on Xen guests as well as on any other platform
> > that uses virtio_pci on which physical addresses don't match bus
> > addresses.
> >
> > This can be tested with:
> >
> >     virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
> >
> > using virtme from here:
> >
> >     https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
> >
> > Without these patches, the guest hangs forever.  With these patches,
> > everything works.
> >
> 
> Dusting off an ancient thread.
> 
> Now that the dust has accumulated^Wsettled, is it worth pursuing this?
>  I think the situation is considerably worse than it was when I
> originally wrote these patches: I think that QEMU now supports a nasty
> mode in which the guest's PCI bus appears to be behind an IOMMU but
> the virtio devices on that bus punch straight through that IOMMU.
> 
> I have a half-hearted port to modern kernels here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=virtio_ring_xen
> 
> I didn't implement DMA API access for virtio_pci_modern, and I have no
> idea what to do about detecting whether a given virtio device honors
> its IOMMU or not.
> 
> --Andy

It's worth thinking about. I'll need to measure what's the overhead of
supporting both modes - probably after I'm back from the KVM forum.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]       ` <20150728160358-mutt-send-email-mst@redhat.com>
@ 2015-07-28 13:11         ` Jan Kiszka
       [not found]         ` <55B77F8C.7010804@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 13:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Paolo Bonzini
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 15:06, Michael S. Tsirkin wrote:
> On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
>>>>> That is an experimental feature (it's x-iommu), so it can change.
>>>>>
>>>>> The plan was:
>>>>>
>>>>> - for PPC, virtio never honors IOMMU
>>>>>
>>>>> - for non-PPC, either have virtio always honor IOMMU, or enforce that
>>>>> virtio is not under IOMMU.
>>>>>
>>> I dislike having PPC special cased.
>>>
>>> In fact, today x86 guests also assume that virtio bypasses IOMMU I
>>> believe. In fact *all* guests do.
>>
>> This doesn't matter much, since the only guests that implement an IOMMU
>> in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
>> of stability.
> 
> Hmm I think Jan (cc) said it was already used out there.

Yes, no known issues with vt-d emulation for almost a year now. Error
reporting could be improved, and interrupt remapping is still missing,
but those are minor issues in this context.

In my testing setups, I also have virtio devices in use, passed through
to an L2 guest, but only in 1:1 mapping so that their broken IOMMU
support causes no practical problems.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]         ` <55B77F8C.7010804@siemens.com>
@ 2015-07-28 16:11           ` Andy Lutomirski
  2015-07-28 16:36           ` Paolo Bonzini
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 16:11 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On Jul 28, 2015 6:11 AM, "Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>
> On 2015-07-28 15:06, Michael S. Tsirkin wrote:
> > On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
> >>>>> That is an experimental feature (it's x-iommu), so it can change.
> >>>>>
> >>>>> The plan was:
> >>>>>
> >>>>> - for PPC, virtio never honors IOMMU
> >>>>>
> >>>>> - for non-PPC, either have virtio always honor IOMMU, or enforce that
> >>>>> virtio is not under IOMMU.
> >>>>>
> >>> I dislike having PPC special cased.
> >>>
> >>> In fact, today x86 guests also assume that virtio bypasses IOMMU I
> >>> believe. In fact *all* guests do.
> >>
> >> This doesn't matter much, since the only guests that implement an IOMMU
> >> in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
> >> of stability.
> >
> > Hmm I think Jan (cc) said it was already used out there.
>
> Yes, no known issues with vt-d emulation for almost a year now. Error
> reporting could be improved, and interrupt remapping is still missing,
> but those are minor issues in this context.
>
> In my testing setups, I also have virtio devices in use, passed through
> to an L2 guest, but only in 1:1 mapping so that their broken IOMMU
> support causes no practical problems.
>

How are you getting 1:1 to work?  Is it something that L0 QEMU can
advertise to L1?  If so, can we just do that unconditionally, which
would make my patch work?

I have no objection to 1:1 devices in general.  It's only devices that
the PCI code on the guest identifies as not 1:1 but that are
nonetheless 1:1 that cause problems.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]         ` <55B77F8C.7010804@siemens.com>
  2015-07-28 16:11           ` Andy Lutomirski
@ 2015-07-28 16:36           ` Paolo Bonzini
       [not found]           ` <55B7AF99.2080209@redhat.com>
       [not found]           ` <CALCETrVF1-=z60bCtsMBctbd14zjJgpUPNNo+EcJj_fXhww1og@mail.gmail.com>
  3 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-28 16:36 UTC (permalink / raw)
  To: Jan Kiszka, Michael S. Tsirkin
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, linux390, Linux Virtualization



On 28/07/2015 15:11, Jan Kiszka wrote:
>>> >>
>>> >> This doesn't matter much, since the only guests that implement an IOMMU
>>> >> in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
>>> >> of stability.
>> > 
>> > Hmm I think Jan (cc) said it was already used out there.
> Yes, no known issues with vt-d emulation for almost a year now. Error
> reporting could be improved, and interrupt remapping is still missing,
> but those are minor issues in this context.

On the other hand interrupt remapping is absolutely necessary for
production use, hence my point that x86 does not promise API stability.

("Any kind of stability" actually didn't include crashes; those are not
expected :))

The Google patches for userspace PIC and IOAPIC are proceeding well, so
hopefully we can have interrupt remapping soon.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]           ` <55B7AF99.2080209@redhat.com>
@ 2015-07-28 16:42             ` Jan Kiszka
       [not found]             ` <55B7B105.50200@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 16:42 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 18:36, Paolo Bonzini wrote:
> On 28/07/2015 15:11, Jan Kiszka wrote:
>>>>>>
>>>>>> This doesn't matter much, since the only guests that implement an IOMMU
>>>>>> in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
>>>>>> of stability.
>>>>
>>>> Hmm I think Jan (cc) said it was already used out there.
>> Yes, no known issues with vt-d emulation for almost a year now. Error
>> reporting could be improved, and interrupt remapping is still missing,
>> but those are minor issues in this context.
> 
> On the other hand interrupt remapping is absolutely necessary for
> production use, hence my point that x86 does not promise API stability.

Well, we currently implement the features that the Q35 used to expose.
Adding interrupt remapping will require a new chipset and/or a hack
switch to ignore compatibility.

> 
> ("Any kind of stability" actually didn't include crashes; those are not
> expected :))
> 
> The Google patches for userspace PIC and IOAPIC are proceeding well, so
> hopefully we can have interrupt remapping soon.

If the day had 48 hours... I'd love to look into this, first adding QEMU
support for the new irqchip architecture.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]           ` <CALCETrVF1-=z60bCtsMBctbd14zjJgpUPNNo+EcJj_fXhww1og@mail.gmail.com>
@ 2015-07-28 16:44             ` Jan Kiszka
       [not found]             ` <55B7B15C.4010101@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 18:11, Andy Lutomirski wrote:
> On Jul 28, 2015 6:11 AM, "Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>>
>> On 2015-07-28 15:06, Michael S. Tsirkin wrote:
>>> On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
>>>>>>> That is an experimental feature (it's x-iommu), so it can change.
>>>>>>>
>>>>>>> The plan was:
>>>>>>>
>>>>>>> - for PPC, virtio never honors IOMMU
>>>>>>>
>>>>>>> - for non-PPC, either have virtio always honor IOMMU, or enforce that
>>>>>>> virtio is not under IOMMU.
>>>>>>>
>>>>> I dislike having PPC special cased.
>>>>>
>>>>> In fact, today x86 guests also assume that virtio bypasses IOMMU I
>>>>> believe. In fact *all* guests do.
>>>>
>>>> This doesn't matter much, since the only guests that implement an IOMMU
>>>> in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
>>>> of stability.
>>>
>>> Hmm I think Jan (cc) said it was already used out there.
>>
>> Yes, no known issues with vt-d emulation for almost a year now. Error
>> reporting could be improved, and interrupt remapping is still missing,
>> but those are minor issues in this context.
>>
>> In my testing setups, I also have virtio devices in use, passed through
>> to an L2 guest, but only in 1:1 mapping so that their broken IOMMU
>> support causes no practical problems.
>>
> 
> How are you getting 1:1 to work?  Is it something that L0 QEMU can
> advertise to L1?  If so, can we just do that unconditionally, which
> would make my patch work?

The guest hypervisor is Jailhouse and the guest is the root cell that
loaded the hypervisor, thus continues with identity mappings. You
usually don't have 1:1 mapping with other setups - maybe with some Xen
configuration? Dunno.

> 
> I have no objection to 1:1 devices in general.  It's only devices that
> the PCI code on the guest identifies as not 1:1 but that are
> nonetheless 1:1 that cause problems.

The ability to have virtio on systems with IOMMU in place makes testing
much more efficient for us. Ideally, we would have it in non-identity
mapping scenarios as well, e.g. to start secondary Linux instances in
the test VMs, giving them their own virtio devices. And we will
eventually have this need on ARM as well.

Virtio needs to be backward compatible, so the change to put these
devices under IOMMU control could be advertised during feature
negotiations and controlled on QEMU side via a device property. Newer
guest drivers would have to acknowledge that they support virtio via
IOMMUs. Older ones would refuse to work, and the admin could instead
spawn VMs with this feature disabled.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]             ` <55B7B15C.4010101@siemens.com>
@ 2015-07-28 17:10               ` Andy Lutomirski
       [not found]               ` <CALCETrW_7tmUg1zf3m1xYeSruW8njNYP9N4s8tBKJcTGJf_0QA@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 17:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 9:44 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> The ability to have virtio on systems with IOMMU in place makes testing
> much more efficient for us. Ideally, we would have it in non-identity
> mapping scenarios as well, e.g. to start secondary Linux instances in
> the test VMs, giving them their own virtio devices. And we will
> eventually have this need on ARM as well.
>
> Virtio needs to be backward compatible, so the change to put these
> devices under IOMMU control could be advertised during feature
> negotiations and controlled on QEMU side via a device property. Newer
> guest drivers would have to acknowledge that they support virtio via
> IOMMUs. Older ones would refuse to work, and the admin could instead
> spawn VMs with this feature disabled.
>

The trouble is that this is really a property of the bus and not of
the device.  If you build a virtio device that physically plugs into a
PCIe slot, the device has no concept of an IOMMU in the first place.
Similarly, if you take an L0-provided IOMMU-supporting device and pass
it through to L2 using current QEMU on L1 (with Q35 emulation and
iommu enabled), then, from L2's perspective, the device is 1:1 no
matter what the device thinks.

IOW, I think the original design was wrong and now we have to deal
with it.  I think the best solution would be to teach QEMU to fix its
ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]             ` <55B7B105.50200@siemens.com>
@ 2015-07-28 17:15               ` Paolo Bonzini
       [not found]               ` <55B7B8AF.5060603@redhat.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-28 17:15 UTC (permalink / raw)
  To: Jan Kiszka, Michael S. Tsirkin
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, linux390, Linux Virtualization



On 28/07/2015 18:42, Jan Kiszka wrote:
> > On the other hand interrupt remapping is absolutely necessary for
> > production use, hence my point that x86 does not promise API stability.
> 
> Well, we currently implement the features that the Q35 used to expose.
> Adding interrupt remapping will require a new chipset and/or a hack
> switch to ignore compatibility.

Isn't the VT-d register space separate from other Q35 features and
backwards-compatible?  You could even add it to PIIX in theory just by
adding a DMAR.

It's not like for example SMRAM, where the registers are in the
northbridge configuration space and move around in every chipset generation.

> > ("Any kind of stability" actually didn't include crashes; those are not
> > expected :))
> > 
> > The Google patches for userspace PIC and IOAPIC are proceeding well, so
> > hopefully we can have interrupt remapping soon.
> 
> If the day had 48 hours... I'd love to look into this, first adding QEMU
> support for the new irqchip architecture.

I hope I can squeeze in some time for that...  Google also had an intern
that was looking at it.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]               ` <CALCETrW_7tmUg1zf3m1xYeSruW8njNYP9N4s8tBKJcTGJf_0QA@mail.gmail.com>
@ 2015-07-28 17:17                 ` Jan Kiszka
       [not found]                 ` <55B7B91E.40200@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 17:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 19:10, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 9:44 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> The ability to have virtio on systems with IOMMU in place makes testing
>> much more efficient for us. Ideally, we would have it in non-identity
>> mapping scenarios as well, e.g. to start secondary Linux instances in
>> the test VMs, giving them their own virtio devices. And we will
>> eventually have this need on ARM as well.
>>
>> Virtio needs to be backward compatible, so the change to put these
>> devices under IOMMU control could be advertised during feature
>> negotiations and controlled on QEMU side via a device property. Newer
>> guest drivers would have to acknowledge that they support virtio via
>> IOMMUs. Older ones would refuse to work, and the admin could instead
>> spawn VMs with this feature disabled.
>>
> 
> The trouble is that this is really a property of the bus and not of
> the device.  If you build a virtio device that physically plugs into a
> PCIe slot, the device has no concept of an IOMMU in the first place.

If one would build a real virtio device today, it would be broken
because every IOMMU would start to translate its requests. Already from
that POV, we really need to introduce a feature flag "I will be
IOMMU-translated" so that a potential physical implementation can carry
it unconditionally.

> Similarly, if you take an L0-provided IOMMU-supporting device and pass
> it through to L2 using current QEMU on L1 (with Q35 emulation and
> iommu enabled), then, from L2's perspective, the device is 1:1 no
> matter what the device thinks.
> 
> IOW, I think the original design was wrong and now we have to deal
> with it.  I think the best solution would be to teach QEMU to fix its
> ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.

Only the current drivers are broken. And we can easily tell them apart
from newer ones via feature flags. Sorry, don't get the problem.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]               ` <55B7B8AF.5060603@redhat.com>
@ 2015-07-28 17:19                 ` Jan Kiszka
       [not found]                 ` <55B7B992.8070509@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 17:19 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 19:15, Paolo Bonzini wrote:
> 
> 
> On 28/07/2015 18:42, Jan Kiszka wrote:
>>> On the other hand interrupt remapping is absolutely necessary for
>>> production use, hence my point that x86 does not promise API stability.
>>
>> Well, we currently implement the features that the Q35 used to expose.
>> Adding interrupt remapping will require a new chipset and/or a hack
>> switch to ignore compatibility.
> 
> Isn't the VT-d register space separate from other Q35 features and
> backwards-compatible?  You could even add it to PIIX in theory just by
> adding a DMAR.

Yes, it's practically working, but it's not accurate /wrt how that
hardware looked like in reality.

> 
> It's not like for example SMRAM, where the registers are in the
> northbridge configuration space and move around in every chipset generation.
> 
>>> ("Any kind of stability" actually didn't include crashes; those are not
>>> expected :))
>>>
>>> The Google patches for userspace PIC and IOAPIC are proceeding well, so
>>> hopefully we can have interrupt remapping soon.
>>
>> If the day had 48 hours... I'd love to look into this, first adding QEMU
>> support for the new irqchip architecture.
> 
> I hope I can squeeze in some time for that...  Google also had an intern
> that was looking at it.

Great!

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                 ` <55B7B992.8070509@siemens.com>
@ 2015-07-28 17:31                   ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-28 17:31 UTC (permalink / raw)
  To: Jan Kiszka, Michael S. Tsirkin
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt, Rusty Russell,
	Andy Lutomirski, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, linux390, Linux Virtualization



On 28/07/2015 19:19, Jan Kiszka wrote:
> On 2015-07-28 19:15, Paolo Bonzini wrote:
>>
>>
>> On 28/07/2015 18:42, Jan Kiszka wrote:
>>>> On the other hand interrupt remapping is absolutely necessary for
>>>> production use, hence my point that x86 does not promise API stability.
>>>
>>> Well, we currently implement the features that the Q35 used to expose.
>>> Adding interrupt remapping will require a new chipset and/or a hack
>>> switch to ignore compatibility.
>>
>> Isn't the VT-d register space separate from other Q35 features and
>> backwards-compatible?  You could even add it to PIIX in theory just by
>> adding a DMAR.
> 
> Yes, it's practically working, but it's not accurate /wrt how that
> hardware looked like in reality.

We've done that for a long time.  Real PIIX3 didn't have ACPI too, for
example (and it had a USB UHCI that is optional in QEMU).

Of course I'm not advocating adding the IOMMU to PIIX (assuming that
would work even just practically)... but I don't think adding interrupt
remapping to Q35 is a big deal.  It would be optional, just in case you
want to debug something without interrupt remapping, but it can be added.

>>>> The Google patches for userspace PIC and IOAPIC are proceeding well, so
>>>> hopefully we can have interrupt remapping soon.
>>>
>>> If the day had 48 hours... I'd love to look into this, first adding QEMU
>>> support for the new irqchip architecture.
>>
>> I hope I can squeeze in some time for that...  Google also had an intern
>> that was looking at it.
> 
> Great!

In theory it's easy with the latest series.  All you need is support for
converting IOAPIC routes to KVM routes (and of course the glue code to
enable the capability and create the userspace devices); everything else
should work just by reusing the -machine kernel_irqchip=on code.  In
theory...

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                 ` <55B7B91E.40200@siemens.com>
@ 2015-07-28 18:22                   ` Andy Lutomirski
       [not found]                   ` <CALCETrWUVQWQ=mydbdiSJnA0uLSFarVYCnWWKpcgYD0A8hpfXQ@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 18:22 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2015-07-28 19:10, Andy Lutomirski wrote:
>> The trouble is that this is really a property of the bus and not of
>> the device.  If you build a virtio device that physically plugs into a
>> PCIe slot, the device has no concept of an IOMMU in the first place.
>
> If one would build a real virtio device today, it would be broken
> because every IOMMU would start to translate its requests. Already from
> that POV, we really need to introduce a feature flag "I will be
> IOMMU-translated" so that a potential physical implementation can carry
> it unconditionally.
>

Except that, with my patches, it would work correctly.  ISTM the thing
that's broken right now is QEMU and the virtio_pci driver.  My patches
fix the driver.  Last year that would have been the end of the story
except for PPC.  Now we have to deal with QEMU.

>> Similarly, if you take an L0-provided IOMMU-supporting device and pass
>> it through to L2 using current QEMU on L1 (with Q35 emulation and
>> iommu enabled), then, from L2's perspective, the device is 1:1 no
>> matter what the device thinks.
>>
>> IOW, I think the original design was wrong and now we have to deal
>> with it.  I think the best solution would be to teach QEMU to fix its
>> ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.
>
> Only the current drivers are broken. And we can easily tell them apart
> from newer ones via feature flags. Sorry, don't get the problem.

I still don't see how feature flags solve the problem.  Suppose we
added a feature flag meaning "respects IOMMU".

Bad case 1:  Build a malicious device that advertises
non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
leaking physical addresses to the device (and the device doesn't work,
of course).  Maybe that's only barely a security problem, but still...

Bad case 2:  Use current QEMU w/ IOMMU enabled.  Assign a virtio
device provided by L0 QEMU to L2.  L1 crashes.  I consider *that* to
be a security problem, although in practice no one will configure
their system that way because it has zero chance of actually working.
Nonetheless, the device does work if L1 accesses it directly?  The
issue is vfio doesn't notice that the device doesn't respect the IOMMU
because "respects-IOMMU" is a property of the PCI bus and the platform
IOMMU, and vfio assumes it works correctly.

Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
device that *does* respect the IOMMU and sets the feature flag.  They
emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
the guest.

We could make the rule that *all* virtio-pci devices (except on PPC)
respect the bus rules.  We'd have to fix QEMU so that virtio devices
on Q35 iommu=on systems set up a PCI topology where the devices
*aren't* behind the IOMMU or are protected by RMRRs or whatever.  Then
old kernels would work correctly on new hosts, new kernels would work
correctly except on old iommu-providing hosts, and Xen would work.

In fact, on Xen, it's impossible without colossal hacks to support
non-IOMMU-respecting virtio devices because Xen acts as an
intermediate IOMMU between the Linux dom0 guest and the actual host.
The QEMU host doesn't even know that Xen is involved.  This is why Xen
and virtio don't currently work together (without my patches): the
device thinks it doesn't respect the IOMMU, the driver thinks the
device doesn't respect the IOMMU, and they're both wrong.

TL;DR: I think there are only two cases.  Either a device respects the
IOMMU or a device doesn't know whether it respects the IOMMU.  The
latter case is problematic.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                   ` <CALCETrWUVQWQ=mydbdiSJnA0uLSFarVYCnWWKpcgYD0A8hpfXQ@mail.gmail.com>
@ 2015-07-28 19:06                     ` Jan Kiszka
       [not found]                     ` <55B7D2A9.6040700@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 19:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 20:22, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2015-07-28 19:10, Andy Lutomirski wrote:
>>> The trouble is that this is really a property of the bus and not of
>>> the device.  If you build a virtio device that physically plugs into a
>>> PCIe slot, the device has no concept of an IOMMU in the first place.
>>
>> If one would build a real virtio device today, it would be broken
>> because every IOMMU would start to translate its requests. Already from
>> that POV, we really need to introduce a feature flag "I will be
>> IOMMU-translated" so that a potential physical implementation can carry
>> it unconditionally.
>>
> 
> Except that, with my patches, it would work correctly.  ISTM the thing

I haven't looked at your patches yet - they make the virtio PCI driver
in Linux IOMMU-compatible? Perfect - except for a compatibility check,
right?

> that's broken right now is QEMU and the virtio_pci driver.  My patches
> fix the driver.  Last year that would have been the end of the story
> except for PPC.  Now we have to deal with QEMU.
> 
>>> Similarly, if you take an L0-provided IOMMU-supporting device and pass
>>> it through to L2 using current QEMU on L1 (with Q35 emulation and
>>> iommu enabled), then, from L2's perspective, the device is 1:1 no
>>> matter what the device thinks.
>>>
>>> IOW, I think the original design was wrong and now we have to deal
>>> with it.  I think the best solution would be to teach QEMU to fix its
>>> ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.
>>
>> Only the current drivers are broken. And we can easily tell them apart
>> from newer ones via feature flags. Sorry, don't get the problem.
> 
> I still don't see how feature flags solve the problem.  Suppose we
> added a feature flag meaning "respects IOMMU".
> 
> Bad case 1:  Build a malicious device that advertises
> non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
> leaking physical addresses to the device (and the device doesn't work,
> of course).  Maybe that's only barely a security problem, but still...

I don't see right now how critical such a hypothetical case could be.
But the OS / its drivers could still decide to refuse talking to such a
device.

> 
> Bad case 2:  Use current QEMU w/ IOMMU enabled.  Assign a virtio
> device provided by L0 QEMU to L2.  L1 crashes.  I consider *that* to
> be a security problem, although in practice no one will configure
> their system that way because it has zero chance of actually working.
> Nonetheless, the device does work if L1 accesses it directly?  The
> issue is vfio doesn't notice that the device doesn't respect the IOMMU
> because "respects-IOMMU" is a property of the PCI bus and the platform
> IOMMU, and vfio assumes it works correctly.

I would have no problem with rejecting configurations in future QEMU
that try to expose unconfined virtio devices in the presence of IOMMU
emulation. Once we can do better, it's just about letting the guest know
about the difference.

The current situation is indeed just broken, we don't need to discuss
this as we can't change history to prevent this.

> 
> Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
> device that *does* respect the IOMMU and sets the feature flag.  They
> emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
> the guest.

No. In that case, the feature negotiation of "virtio-with-iommu-support"
would have failed for older drivers, and the device would have never
been used by the guest.

> 
> We could make the rule that *all* virtio-pci devices (except on PPC)
> respect the bus rules.  We'd have to fix QEMU so that virtio devices
> on Q35 iommu=on systems set up a PCI topology where the devices
> *aren't* behind the IOMMU or are protected by RMRRs or whatever.  Then
> old kernels would work correctly on new hosts, new kernels would work
> correctly except on old iommu-providing hosts, and Xen would work.

I don't see a point in doing anything about old QEMU with IOMMU enabled
and virtio devices plugged except declaring such setups broken. No one
should have configured this for production purposes, only for test
setups (like we, with the knowledge about the limitations).

> 
> In fact, on Xen, it's impossible without colossal hacks to support
> non-IOMMU-respecting virtio devices because Xen acts as an
> intermediate IOMMU between the Linux dom0 guest and the actual host.
> The QEMU host doesn't even know that Xen is involved.  This is why Xen
> and virtio don't currently work together (without my patches): the
> device thinks it doesn't respect the IOMMU, the driver thinks the
> device doesn't respect the IOMMU, and they're both wrong.
> 
> TL;DR: I think there are only two cases.  Either a device respects the
> IOMMU or a device doesn't know whether it respects the IOMMU.  The
> latter case is problematic.

See above, the latter is only problematic on setups that actually use an
IOMMU. If that includes Xen, then no one should use it until virtio can
declare itself IOMMU compatible, and drivers exist that process this.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                     ` <55B7D2A9.6040700@siemens.com>
@ 2015-07-28 19:24                       ` Andy Lutomirski
       [not found]                       ` <CALCETrVeT-M+qMxXwDUB50xnFW1B6yu+ofiX-ydvtjaGgRLUNA@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 19:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 12:06 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2015-07-28 20:22, Andy Lutomirski wrote:
>> On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2015-07-28 19:10, Andy Lutomirski wrote:
>>>> The trouble is that this is really a property of the bus and not of
>>>> the device.  If you build a virtio device that physically plugs into a
>>>> PCIe slot, the device has no concept of an IOMMU in the first place.
>>>
>>> If one would build a real virtio device today, it would be broken
>>> because every IOMMU would start to translate its requests. Already from
>>> that POV, we really need to introduce a feature flag "I will be
>>> IOMMU-translated" so that a potential physical implementation can carry
>>> it unconditionally.
>>>
>>
>> Except that, with my patches, it would work correctly.  ISTM the thing
>
> I haven't looked at your patches yet - they make the virtio PCI driver
> in Linux IOMMU-compatible? Perfect - except for a compatibility check,
> right?

Yes.  (virtio_pci_legacy, anyway.  Presumably virtio_pci_modern is
easy to adapt, too.)

>
>> that's broken right now is QEMU and the virtio_pci driver.  My patches
>> fix the driver.  Last year that would have been the end of the story
>> except for PPC.  Now we have to deal with QEMU.
>>
>>>> Similarly, if you take an L0-provided IOMMU-supporting device and pass
>>>> it through to L2 using current QEMU on L1 (with Q35 emulation and
>>>> iommu enabled), then, from L2's perspective, the device is 1:1 no
>>>> matter what the device thinks.
>>>>
>>>> IOW, I think the original design was wrong and now we have to deal
>>>> with it.  I think the best solution would be to teach QEMU to fix its
>>>> ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.
>>>
>>> Only the current drivers are broken. And we can easily tell them apart
>>> from newer ones via feature flags. Sorry, don't get the problem.
>>
>> I still don't see how feature flags solve the problem.  Suppose we
>> added a feature flag meaning "respects IOMMU".
>>
>> Bad case 1:  Build a malicious device that advertises
>> non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
>> leaking physical addresses to the device (and the device doesn't work,
>> of course).  Maybe that's only barely a security problem, but still...
>
> I don't see right now how critical such a hypothetical case could be.
> But the OS / its drivers could still decide to refuse talking to such a
> device.
>

How does OS know it's such a device as opposed to a QEMU-supplied thing?

>>
>> Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
>> device that *does* respect the IOMMU and sets the feature flag.  They
>> emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
>> the guest.
>
> No. In that case, the feature negotiation of "virtio-with-iommu-support"
> would have failed for older drivers, and the device would have never
> been used by the guest.

So are you suggesting that newer virtio devices always provide this
feature flag and, if supplied by QEMU with iommu=on, simply refuse to
operate of the driver doesn't support that flag?

That could work as long as QEMU with the current (broken?) iommu=on
never exposes such a device.

>
>>
>> We could make the rule that *all* virtio-pci devices (except on PPC)
>> respect the bus rules.  We'd have to fix QEMU so that virtio devices
>> on Q35 iommu=on systems set up a PCI topology where the devices
>> *aren't* behind the IOMMU or are protected by RMRRs or whatever.  Then
>> old kernels would work correctly on new hosts, new kernels would work
>> correctly except on old iommu-providing hosts, and Xen would work.
>
> I don't see a point in doing anything about old QEMU with IOMMU enabled
> and virtio devices plugged except declaring such setups broken. No one
> should have configured this for production purposes, only for test
> setups (like we, with the knowledge about the limitations).
>

I'm fine with that.  In fact, I proposed these patches before QEMU had
this feature in the first place.

>>
>> In fact, on Xen, it's impossible without colossal hacks to support
>> non-IOMMU-respecting virtio devices because Xen acts as an
>> intermediate IOMMU between the Linux dom0 guest and the actual host.
>> The QEMU host doesn't even know that Xen is involved.  This is why Xen
>> and virtio don't currently work together (without my patches): the
>> device thinks it doesn't respect the IOMMU, the driver thinks the
>> device doesn't respect the IOMMU, and they're both wrong.
>>
>> TL;DR: I think there are only two cases.  Either a device respects the
>> IOMMU or a device doesn't know whether it respects the IOMMU.  The
>> latter case is problematic.
>
> See above, the latter is only problematic on setups that actually use an
> IOMMU. If that includes Xen, then no one should use it until virtio can
> declare itself IOMMU compatible, and drivers exist that process this.

Xen works right now with my patches on standard QEMU (as long as
iommu=off).  Certainly no one except me uses it now with virtio
because it doesn't work with mainline kernels.

If we apply something similar enough to my patches, then even old
hypervisors (e.g. Amazon's hardware virt systems) will support Xen
with virtio devices passed in just fine.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                       ` <CALCETrVeT-M+qMxXwDUB50xnFW1B6yu+ofiX-ydvtjaGgRLUNA@mail.gmail.com>
@ 2015-07-28 19:33                         ` Jan Kiszka
       [not found]                         ` <55B7D8F5.1000902@siemens.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-28 19:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On 2015-07-28 21:24, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 12:06 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2015-07-28 20:22, Andy Lutomirski wrote:
>>> On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2015-07-28 19:10, Andy Lutomirski wrote:
>>>>> The trouble is that this is really a property of the bus and not of
>>>>> the device.  If you build a virtio device that physically plugs into a
>>>>> PCIe slot, the device has no concept of an IOMMU in the first place.
>>>>
>>>> If one would build a real virtio device today, it would be broken
>>>> because every IOMMU would start to translate its requests. Already from
>>>> that POV, we really need to introduce a feature flag "I will be
>>>> IOMMU-translated" so that a potential physical implementation can carry
>>>> it unconditionally.
>>>>
>>>
>>> Except that, with my patches, it would work correctly.  ISTM the thing
>>
>> I haven't looked at your patches yet - they make the virtio PCI driver
>> in Linux IOMMU-compatible? Perfect - except for a compatibility check,
>> right?
> 
> Yes.  (virtio_pci_legacy, anyway.  Presumably virtio_pci_modern is
> easy to adapt, too.)
> 
>>
>>> that's broken right now is QEMU and the virtio_pci driver.  My patches
>>> fix the driver.  Last year that would have been the end of the story
>>> except for PPC.  Now we have to deal with QEMU.
>>>
>>>>> Similarly, if you take an L0-provided IOMMU-supporting device and pass
>>>>> it through to L2 using current QEMU on L1 (with Q35 emulation and
>>>>> iommu enabled), then, from L2's perspective, the device is 1:1 no
>>>>> matter what the device thinks.
>>>>>
>>>>> IOW, I think the original design was wrong and now we have to deal
>>>>> with it.  I think the best solution would be to teach QEMU to fix its
>>>>> ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.
>>>>
>>>> Only the current drivers are broken. And we can easily tell them apart
>>>> from newer ones via feature flags. Sorry, don't get the problem.
>>>
>>> I still don't see how feature flags solve the problem.  Suppose we
>>> added a feature flag meaning "respects IOMMU".
>>>
>>> Bad case 1:  Build a malicious device that advertises
>>> non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
>>> leaking physical addresses to the device (and the device doesn't work,
>>> of course).  Maybe that's only barely a security problem, but still...
>>
>> I don't see right now how critical such a hypothetical case could be.
>> But the OS / its drivers could still decide to refuse talking to such a
>> device.
>>
> 
> How does OS know it's such a device as opposed to a QEMU-supplied thing?

It can restrict itself to virtio devices exposing the feature if it
feels uncomfortable that it might be talking to some evil piece of
silicon (instead of the hypervisor, which has to be trusted anyway).

> 
>>>
>>> Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
>>> device that *does* respect the IOMMU and sets the feature flag.  They
>>> emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
>>> the guest.
>>
>> No. In that case, the feature negotiation of "virtio-with-iommu-support"
>> would have failed for older drivers, and the device would have never
>> been used by the guest.
> 
> So are you suggesting that newer virtio devices always provide this
> feature flag and, if supplied by QEMU with iommu=on, simply refuse to
> operate of the driver doesn't support that flag?

Exactly.

> 
> That could work as long as QEMU with the current (broken?) iommu=on
> never exposes such a device.

QEMU would have to be adjusted first so that all its virtio-pci device
models take IOMMUs into account - if they exist or not. Only then it
could expose the feature and expect the guest to acknowledge it.

For compat reasons, QEMU should still be able to expose virtio devices
without the flag set - but then without any IOMMU emulation enabled as
well. That would prevent the current setup we are using today, but it's
trivial to update the guest kernel to a newer virtio driver which would
restore our scenario again.

> 
>>
>>>
>>> We could make the rule that *all* virtio-pci devices (except on PPC)
>>> respect the bus rules.  We'd have to fix QEMU so that virtio devices
>>> on Q35 iommu=on systems set up a PCI topology where the devices
>>> *aren't* behind the IOMMU or are protected by RMRRs or whatever.  Then
>>> old kernels would work correctly on new hosts, new kernels would work
>>> correctly except on old iommu-providing hosts, and Xen would work.
>>
>> I don't see a point in doing anything about old QEMU with IOMMU enabled
>> and virtio devices plugged except declaring such setups broken. No one
>> should have configured this for production purposes, only for test
>> setups (like we, with the knowledge about the limitations).
>>
> 
> I'm fine with that.  In fact, I proposed these patches before QEMU had
> this feature in the first place.
> 
>>>
>>> In fact, on Xen, it's impossible without colossal hacks to support
>>> non-IOMMU-respecting virtio devices because Xen acts as an
>>> intermediate IOMMU between the Linux dom0 guest and the actual host.
>>> The QEMU host doesn't even know that Xen is involved.  This is why Xen
>>> and virtio don't currently work together (without my patches): the
>>> device thinks it doesn't respect the IOMMU, the driver thinks the
>>> device doesn't respect the IOMMU, and they're both wrong.
>>>
>>> TL;DR: I think there are only two cases.  Either a device respects the
>>> IOMMU or a device doesn't know whether it respects the IOMMU.  The
>>> latter case is problematic.
>>
>> See above, the latter is only problematic on setups that actually use an
>> IOMMU. If that includes Xen, then no one should use it until virtio can
>> declare itself IOMMU compatible, and drivers exist that process this.
> 
> Xen works right now with my patches on standard QEMU (as long as
> iommu=off).  Certainly no one except me uses it now with virtio
> because it doesn't work with mainline kernels.
> 
> If we apply something similar enough to my patches, then even old
> hypervisors (e.g. Amazon's hardware virt systems) will support Xen
> with virtio devices passed in just fine.

Then it seems we can make everyone happy - perfect. :)

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                         ` <55B7D8F5.1000902@siemens.com>
@ 2015-07-28 21:16                           ` Andy Lutomirski
       [not found]                           ` <CALCETrWZM=1y_nwBW2RcMNqKcPPGa0uqqGS_fKijfJzYJL4L-Q@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 21:16 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 12:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2015-07-28 21:24, Andy Lutomirski wrote:
>> On Tue, Jul 28, 2015 at 12:06 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2015-07-28 20:22, Andy Lutomirski wrote:
>>>> On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> On 2015-07-28 19:10, Andy Lutomirski wrote:
>>>>>> The trouble is that this is really a property of the bus and not of
>>>>>> the device.  If you build a virtio device that physically plugs into a
>>>>>> PCIe slot, the device has no concept of an IOMMU in the first place.
>>>>>
>>>>> If one would build a real virtio device today, it would be broken
>>>>> because every IOMMU would start to translate its requests. Already from
>>>>> that POV, we really need to introduce a feature flag "I will be
>>>>> IOMMU-translated" so that a potential physical implementation can carry
>>>>> it unconditionally.
>>>>>
>>>>
>>>> Except that, with my patches, it would work correctly.  ISTM the thing
>>>
>>> I haven't looked at your patches yet - they make the virtio PCI driver
>>> in Linux IOMMU-compatible? Perfect - except for a compatibility check,
>>> right?
>>
>> Yes.  (virtio_pci_legacy, anyway.  Presumably virtio_pci_modern is
>> easy to adapt, too.)
>>
>>>
>>>> that's broken right now is QEMU and the virtio_pci driver.  My patches
>>>> fix the driver.  Last year that would have been the end of the story
>>>> except for PPC.  Now we have to deal with QEMU.
>>>>
>>>>>> Similarly, if you take an L0-provided IOMMU-supporting device and pass
>>>>>> it through to L2 using current QEMU on L1 (with Q35 emulation and
>>>>>> iommu enabled), then, from L2's perspective, the device is 1:1 no
>>>>>> matter what the device thinks.
>>>>>>
>>>>>> IOW, I think the original design was wrong and now we have to deal
>>>>>> with it.  I think the best solution would be to teach QEMU to fix its
>>>>>> ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.
>>>>>
>>>>> Only the current drivers are broken. And we can easily tell them apart
>>>>> from newer ones via feature flags. Sorry, don't get the problem.
>>>>
>>>> I still don't see how feature flags solve the problem.  Suppose we
>>>> added a feature flag meaning "respects IOMMU".
>>>>
>>>> Bad case 1:  Build a malicious device that advertises
>>>> non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
>>>> leaking physical addresses to the device (and the device doesn't work,
>>>> of course).  Maybe that's only barely a security problem, but still...
>>>
>>> I don't see right now how critical such a hypothetical case could be.
>>> But the OS / its drivers could still decide to refuse talking to such a
>>> device.
>>>
>>
>> How does OS know it's such a device as opposed to a QEMU-supplied thing?
>
> It can restrict itself to virtio devices exposing the feature if it
> feels uncomfortable that it might be talking to some evil piece of
> silicon (instead of the hypervisor, which has to be trusted anyway).
>
>>
>>>>
>>>> Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
>>>> device that *does* respect the IOMMU and sets the feature flag.  They
>>>> emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
>>>> the guest.
>>>
>>> No. In that case, the feature negotiation of "virtio-with-iommu-support"
>>> would have failed for older drivers, and the device would have never
>>> been used by the guest.
>>
>> So are you suggesting that newer virtio devices always provide this
>> feature flag and, if supplied by QEMU with iommu=on, simply refuse to
>> operate of the driver doesn't support that flag?
>
> Exactly.
>
>>
>> That could work as long as QEMU with the current (broken?) iommu=on
>> never exposes such a device.
>
> QEMU would have to be adjusted first so that all its virtio-pci device
> models take IOMMUs into account - if they exist or not. Only then it
> could expose the feature and expect the guest to acknowledge it.
>
> For compat reasons, QEMU should still be able to expose virtio devices
> without the flag set - but then without any IOMMU emulation enabled as
> well. That would prevent the current setup we are using today, but it's
> trivial to update the guest kernel to a newer virtio driver which would
> restore our scenario again.

Seems reasonable.

>>
>> If we apply something similar enough to my patches, then even old
>> hypervisors (e.g. Amazon's hardware virt systems) will support Xen
>> with virtio devices passed in just fine.
>
> Then it seems we can make everyone happy - perfect. :)

Yay.

FWIW, I have no intention to touch the QEMU code for this.  I'm
willing to do the vring bit and the virtio-pci bit as long as it's
well specified.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                           ` <CALCETrWZM=1y_nwBW2RcMNqKcPPGa0uqqGS_fKijfJzYJL4L-Q@mail.gmail.com>
@ 2015-07-28 22:43                             ` Andy Lutomirski
       [not found]                             ` <CALCETrW+YynS13qgXnUR-Vw9uyve9Js3RQb2RV9NBHxwyft5=w@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 22:43 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-s390, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Stefan Hajnoczi, Rusty Russell, xen-devel, Christian Borntraeger,
	Paolo Bonzini, Cornelia Huck, linux390, Linux Virtualization

Let me try to summarize a proposal:

Add a feature flag that indicates IOMMU support.

New kernels acknowledge that flag on any device that advertises it.

New kernels always respect the IOMMU (except on PowerPC).  New kernels
optionally refuse to talk to devices that don't have that feature flag
if the device appears to be behind an IOMMU.  (This presumably
includes any device whatsoever on an x86 platform with an IOMMU,
including Xen's fake IOMMU.)

New QEMU always respects the IOMMU, if any, except on PPC.  New QEMU
always advertises this feature flag.  If iommu=on, QEMU's virtio
devices refuse to work unless the driver acknowledges the flag.

On PPC, new QEMU will not respect the IOMMU and will not set the flag.
New kernels will not talk to devices that set the flag.  If someone
wants to fix that, then they get to figure out how.

This results in:

New kernels work fine with old QEMU unless iommu=on.

New kernels work with new devices (QEMU and physical devices that set
the flag) under all circumstances, except on PPC where physical
devices are and remain broken.

Xen works work new QEMU and cleanly refuses to interoperate with old
QEMU.  (This is worse than with just my patches, but it's better than
the status quo in which the Xen guest corrupts itself and possibly
corrupts the Xen hypervisor.)

New kernels with old QEMU with iommu=on optionally refuses to interoperate.

Old kernels are oblivious.  They work exactly the same as they do
today except that they fail cleanly with new QEMU with iommu=on.  Old
kernels continue to fail with physical virtio devices if they're
behind an iommu.

Old physical virtio devices that don't advertise the flag fail cleanly
if the host uses an iommu.  The driver could optionally whitelist such
devices.

PPC works as well as it currently does.

I'm unsure about the arm64 situation.

Did I get this right?

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                             ` <CALCETrW+YynS13qgXnUR-Vw9uyve9Js3RQb2RV9NBHxwyft5=w@mail.gmail.com>
@ 2015-07-28 23:21                               ` Benjamin Herrenschmidt
       [not found]                               ` <1438125694.7562.177.camel@kernel.crashing.org>
  1 sibling, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2015-07-28 23:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Jan Kiszka,
	Rusty Russell, xen-devel, Christian Borntraeger, Paolo Bonzini,
	Cornelia Huck, linux390, Linux Virtualization

On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote:
> Let me try to summarize a proposal:
> 
> Add a feature flag that indicates IOMMU support.
> 
> New kernels acknowledge that flag on any device that advertises it.
> 
> New kernels always respect the IOMMU (except on PowerPC).

Why ? I disagree, the flag should be honored when set in any
architecture. PowerPC is no different than any other platform in that
regard.

>   New kernels
> optionally refuse to talk to devices that don't have that feature flag
> if the device appears to be behind an IOMMU.  (This presumably
> includes any device whatsoever on an x86 platform with an IOMMU,
> including Xen's fake IOMMU.)
> 
> New QEMU always respects the IOMMU, if any, except on PPC.

This is just a matter of what is the default of the flag, ie we
should have a machine flag that indicates what the default is for
new virtio devices, otherwise, it should be specified per device
as an attribute of the device instance.

I would argue that we should default to "bypass IOMMU" on *all*
architecture due to the performance impact, and to essentially
default to the same behaviour as today. With things like DDW even
powerpc might be able to mostly alleviate the performance impact
so we might to change in the long term, but I tend to prefer
more incremental approaches.

>   New QEMU
> always advertises this feature flag.  If iommu=on, QEMU's virtio
> devices refuse to work unless the driver acknowledges the flag.

This should be configurable.

> On PPC, new QEMU will not respect the IOMMU and will not set the flag.
> New kernels will not talk to devices that set the flag.  If someone
> wants to fix that, then they get to figure out how.

I disagree with the kernel bit and I disagree with special casing PPC in
any shape or form in the code. The only difference should be a default
value for the iommu mode of virtio in qemu set per machine.

You can then feel free to change that default (in a separate patch for
bisectability) on x86 for the sake of Xen.

Ben.

> This results in:
> 
> New kernels work fine with old QEMU unless iommu=on.
> 
> New kernels work with new devices (QEMU and physical devices that set
> the flag) under all circumstances, except on PPC where physical
> devices are and remain broken.
> 
> Xen works work new QEMU and cleanly refuses to interoperate with old
> QEMU.  (This is worse than with just my patches, but it's better than
> the status quo in which the Xen guest corrupts itself and possibly
> corrupts the Xen hypervisor.)
> 
> New kernels with old QEMU with iommu=on optionally refuses to interoperate.
> 
> Old kernels are oblivious.  They work exactly the same as they do
> today except that they fail cleanly with new QEMU with iommu=on.  Old
> kernels continue to fail with physical virtio devices if they're
> behind an iommu.
> 
> Old physical virtio devices that don't advertise the flag fail cleanly
> if the host uses an iommu.  The driver could optionally whitelist such
> devices.
> 
> PPC works as well as it currently does.
> 
> I'm unsure about the arm64 situation.
> 
> 
> Did I get this right?
> 
> --Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                               ` <1438125694.7562.177.camel@kernel.crashing.org>
@ 2015-07-28 23:33                                 ` Andy Lutomirski
       [not found]                                 ` <CALCETrXFMXCfCWLTdE2nff=gibsu2_2XEwwOxC6T3vQZphGeBg@mail.gmail.com>
  2015-07-29  8:07                                 ` Jan Kiszka
  2 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28 23:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Jan Kiszka,
	Rusty Russell, xen-devel, Christian Borntraeger, Paolo Bonzini,
	Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 4:21 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote:
>> Let me try to summarize a proposal:
>>
>> Add a feature flag that indicates IOMMU support.
>>
>> New kernels acknowledge that flag on any device that advertises it.
>>
>> New kernels always respect the IOMMU (except on PowerPC).
>
> Why ? I disagree, the flag should be honored when set in any
> architecture. PowerPC is no different than any other platform in that
> regard.

Perhaps I should have said instead "someone more familiar with PPC
than I am should figure out what PPC should do".  For the non-PPC
case, there is only one instance that I know of in which ignoring the
IOMMU is beneficial, and that case is the experimental Q35 thing.

If new kernels ignore the IOMMU for devices that don't set the flag
and there are physical devices that already exist and don't set the
flag, then those devices won't work reliably on most modern
non-virtual platforms, PPC included.

>
>>   New kernels
>> optionally refuse to talk to devices that don't have that feature flag
>> if the device appears to be behind an IOMMU.  (This presumably
>> includes any device whatsoever on an x86 platform with an IOMMU,
>> including Xen's fake IOMMU.)
>>
>> New QEMU always respects the IOMMU, if any, except on PPC.
>
> This is just a matter of what is the default of the flag, ie we
> should have a machine flag that indicates what the default is for
> new virtio devices, otherwise, it should be specified per device
> as an attribute of the device instance.

On x86, I think that even super-peformance-critical virtio devices
should always honor the iommu, but that the iommu in question should
be a 1:1 iommu.  I *think* that x86 supports that.  IOW x86 would
always set the feature flag.

>
> I would argue that we should default to "bypass IOMMU" on *all*
> architecture due to the performance impact, and to essentially
> default to the same behaviour as today. With things like DDW even
> powerpc might be able to mostly alleviate the performance impact
> so we might to change in the long term, but I tend to prefer
> more incremental approaches.

As above, there's a difference between "bypass IOMMU" and "there is no
IOMMU".  x86 and, I think, most other platforms are capable of the
latter.  I'm not sure PPC is.

I think that, in an ideal world, there would be no feature flag and
all virtio devices would always respect the IOMMU.  Unfortunately we
have existing practice in the form of PPC and Q35 iommu=on that
conflict with that.

>
>>   New QEMU
>> always advertises this feature flag.  If iommu=on, QEMU's virtio
>> devices refuse to work unless the driver acknowledges the flag.
>
> This should be configurable.

Would any non-PPC user ever configure it differently?  I suppose if
you want to support old kernels on new QEMU, you'd flip the switch.

>
>> On PPC, new QEMU will not respect the IOMMU and will not set the flag.
>> New kernels will not talk to devices that set the flag.  If someone
>> wants to fix that, then they get to figure out how.
>
> I disagree with the kernel bit and I disagree with special casing PPC in
> any shape or form in the code. The only difference should be a default
> value for the iommu mode of virtio in qemu set per machine.
>
> You can then feel free to change that default (in a separate patch for
> bisectability) on x86 for the sake of Xen.

I think we should flip the default everywhere to "respects IOMMU".
That's the setting that will work in all cases on new guest + new
host, and it's the setting that's safest.  vfio will probably always
malfunction if given a device that looks like it's behind an IOMMU but
doesn't respect it.  For people who need the last bit of performance,
they should use bus-level controls where available (they should be
available everywhere except PPC and maybe arm64) and, ideally, someone
would teach PPC how to exclude devices from the IOMMU cleanly if
possible.  If that can't be done, then there can be an option to
bypass the IOMMU the way it's currently done and no one except PPC
would do it.

PPC really is different from everything except x86 Q35 iommu=on, and
the latter is experimental.  AFAIK in all other cases, the IOMMU is
respected by virtio, but there is no non-1:1 IOMMU.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                                 ` <CALCETrXFMXCfCWLTdE2nff=gibsu2_2XEwwOxC6T3vQZphGeBg@mail.gmail.com>
@ 2015-07-29  0:36                                   ` Benjamin Herrenschmidt
       [not found]                                   ` <1438130185.7562.186.camel@kernel.crashing.org>
  1 sibling, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2015-07-29  0:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Jan Kiszka,
	Rusty Russell, xen-devel, Christian Borntraeger, Paolo Bonzini,
	Cornelia Huck, linux390, Linux Virtualization

On Tue, 2015-07-28 at 16:33 -0700, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 4:21 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote:
> >> Let me try to summarize a proposal:
> >>
> >> Add a feature flag that indicates IOMMU support.
> >>
> >> New kernels acknowledge that flag on any device that advertises it.
> >>
> >> New kernels always respect the IOMMU (except on PowerPC).
> >
> > Why ? I disagree, the flag should be honored when set in any
> > architecture. PowerPC is no different than any other platform in that
> > regard.
> 
> Perhaps I should have said instead "someone more familiar with PPC
> than I am should figure out what PPC should do".  For the non-PPC
> case, there is only one instance that I know of in which ignoring the
> IOMMU is beneficial, and that case is the experimental Q35 thing.

"ppc" is many fairly different platforms, some with iommu, some without,
some benefiting from bypass, some less etc... I think ARM will soon be
in a similar basket.

> If new kernels ignore the IOMMU for devices that don't set the flag
> and there are physical devices that already exist and don't set the
> flag, then those devices won't work reliably on most modern
> non-virtual platforms, PPC included.

Are there many virtio physical devices out there ? We are talking about
a virtio flag right ? Or have you been considering something else ?

> >>   New kernels
> >> optionally refuse to talk to devices that don't have that feature flag
> >> if the device appears to be behind an IOMMU.  (This presumably
> >> includes any device whatsoever on an x86 platform with an IOMMU,
> >> including Xen's fake IOMMU.)
> >>
> >> New QEMU always respects the IOMMU, if any, except on PPC.
> >
> > This is just a matter of what is the default of the flag, ie we
> > should have a machine flag that indicates what the default is for
> > new virtio devices, otherwise, it should be specified per device
> > as an attribute of the device instance.
> 
> On x86, I think that even super-peformance-critical virtio devices
> should always honor the iommu, but that the iommu in question should
> be a 1:1 iommu.  I *think* that x86 supports that.  IOW x86 would
> always set the feature flag.

Ok.

> > I would argue that we should default to "bypass IOMMU" on *all*
> > architecture due to the performance impact, and to essentially
> > default to the same behaviour as today. With things like DDW even
> > powerpc might be able to mostly alleviate the performance impact
> > so we might to change in the long term, but I tend to prefer
> > more incremental approaches.
> 
> As above, there's a difference between "bypass IOMMU" and "there is no
> IOMMU".  x86 and, I think, most other platforms are capable of the
> latter.  I'm not sure PPC is.

Depends on the platform. "pseries" isn't since it's already a
paravirtualized plaform, but there are other ppc platforms out there
which behave differently. That's why I think:

 - The kernel should just honor what qemu says, ie, whether the qemu
device honors or bypasses the iommu.

 - Qemu default behaviour should be set via a machine attribute which
can be overriden both globally (the machine one) or per-device. 

> I think that, in an ideal world, there would be no feature flag and
> all virtio devices would always respect the IOMMU.  Unfortunately we
> have existing practice in the form of PPC and Q35 iommu=on that
> conflict with that.

And possibly more as in this is how the qemu virtio devices are written
today, they do not use the proper DMA accessors, they always bypass,
whatever the platform is (so sparc would be in the same boat for
example).

> >>   New QEMU
> >> always advertises this feature flag.  If iommu=on, QEMU's virtio
> >> devices refuse to work unless the driver acknowledges the flag.
> >
> > This should be configurable.
> 
> Would any non-PPC user ever configure it differently?  I suppose if
> you want to support old kernels on new QEMU, you'd flip the switch.

Possibly, have we looked at what ia64, sparc, arm, ... do ? At least
sparc has iommus as well.

Let's try to not make it an architecture issue. As I said above, we have
a kernel that just reacts appropriately based on what qemu says it's
doing, and what qemu does is a per-machine flag to set the default.

> >> On PPC, new QEMU will not respect the IOMMU and will not set the flag.
> >> New kernels will not talk to devices that set the flag.  If someone
> >> wants to fix that, then they get to figure out how.
> >
> > I disagree with the kernel bit and I disagree with special casing PPC in
> > any shape or form in the code. The only difference should be a default
> > value for the iommu mode of virtio in qemu set per machine.
> >
> > You can then feel free to change that default (in a separate patch for
> > bisectability) on x86 for the sake of Xen.
> 
> I think we should flip the default everywhere to "respects IOMMU".

On new machine types, we shouldn't change the behaviour of an existing
machine type, and we should keep the default to 0 on ppc/pseries because
of backward compatibility issue. But that should be the only place that
is "ppc specific", ie, a default value in a machine def structure.

> That's the setting that will work in all cases on new guest + new
> host, and it's the setting that's safest.  vfio will probably always
> malfunction if given a device that looks like it's behind an IOMMU but
> doesn't respect it.  For people who need the last bit of performance,
> they should use bus-level controls where available (they should be
> available everywhere except PPC and maybe arm64) and, ideally, someone
> would teach PPC how to exclude devices from the IOMMU cleanly if
> possible.  If that can't be done, then there can be an option to
> bypass the IOMMU the way it's currently done and no one except PPC
> would do it.
> 
> PPC really is different from everything except x86 Q35 iommu=on, and
> the latter is experimental.  AFAIK in all other cases, the IOMMU is
> respected by virtio, but there is no non-1:1 IOMMU.

What about sparc ? I though it was pretty similar to PPC in that
regard...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                                   ` <1438130185.7562.186.camel@kernel.crashing.org>
@ 2015-07-29  0:47                                     ` Andy Lutomirski
       [not found]                                     ` <CALCETrWrLypYdF3Yhzj=ku9i=EmBJm4Ux5ouJ33MuXf7tEgJ0w@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-29  0:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Jan Kiszka,
	Rusty Russell, xen-devel, Christian Borntraeger, Paolo Bonzini,
	Cornelia Huck, linux390, Linux Virtualization

On Tue, Jul 28, 2015 at 5:36 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2015-07-28 at 16:33 -0700, Andy Lutomirski wrote:
>> On Tue, Jul 28, 2015 at 4:21 PM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>> > On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote:
>> >> Let me try to summarize a proposal:
>> >>
>> >> Add a feature flag that indicates IOMMU support.
>> >>
>> >> New kernels acknowledge that flag on any device that advertises it.
>> >>
>> >> New kernels always respect the IOMMU (except on PowerPC).
>> >
>> > Why ? I disagree, the flag should be honored when set in any
>> > architecture. PowerPC is no different than any other platform in that
>> > regard.
>>
>> Perhaps I should have said instead "someone more familiar with PPC
>> than I am should figure out what PPC should do".  For the non-PPC
>> case, there is only one instance that I know of in which ignoring the
>> IOMMU is beneficial, and that case is the experimental Q35 thing.
>
> "ppc" is many fairly different platforms, some with iommu, some without,
> some benefiting from bypass, some less etc... I think ARM will soon be
> in a similar basket.
>
>> If new kernels ignore the IOMMU for devices that don't set the flag
>> and there are physical devices that already exist and don't set the
>> flag, then those devices won't work reliably on most modern
>> non-virtual platforms, PPC included.
>
> Are there many virtio physical devices out there ? We are talking about
> a virtio flag right ? Or have you been considering something else ?

Yes, virtio flag.  I dislike having a virtio flag at all, but so far
no one has come up with any better ideas.  If there was a reliable,
cross-platform mechanism for per-device PCI bus properties, I'd be all
for using that instead.

>
>> >>   New kernels
>> >> optionally refuse to talk to devices that don't have that feature flag
>> >> if the device appears to be behind an IOMMU.  (This presumably
>> >> includes any device whatsoever on an x86 platform with an IOMMU,
>> >> including Xen's fake IOMMU.)
>> >>
>> >> New QEMU always respects the IOMMU, if any, except on PPC.
>> >
>> > This is just a matter of what is the default of the flag, ie we
>> > should have a machine flag that indicates what the default is for
>> > new virtio devices, otherwise, it should be specified per device
>> > as an attribute of the device instance.
>>
>> On x86, I think that even super-peformance-critical virtio devices
>> should always honor the iommu, but that the iommu in question should
>> be a 1:1 iommu.  I *think* that x86 supports that.  IOW x86 would
>> always set the feature flag.
>
> Ok.
>
>> > I would argue that we should default to "bypass IOMMU" on *all*
>> > architecture due to the performance impact, and to essentially
>> > default to the same behaviour as today. With things like DDW even
>> > powerpc might be able to mostly alleviate the performance impact
>> > so we might to change in the long term, but I tend to prefer
>> > more incremental approaches.
>>
>> As above, there's a difference between "bypass IOMMU" and "there is no
>> IOMMU".  x86 and, I think, most other platforms are capable of the
>> latter.  I'm not sure PPC is.
>
> Depends on the platform. "pseries" isn't since it's already a
> paravirtualized plaform, but there are other ppc platforms out there
> which behave differently. That's why I think:
>
>  - The kernel should just honor what qemu says, ie, whether the qemu
> device honors or bypasses the iommu.

Except for vfio, which maybe just needs a special case: vfio checks if
the device claims to be virtio and doesn't set the flag, in which case
vfio just refuses to bind the device.

>
>  - Qemu default behaviour should be set via a machine attribute which
> can be overriden both globally (the machine one) or per-device.
>
>> I think that, in an ideal world, there would be no feature flag and
>> all virtio devices would always respect the IOMMU.  Unfortunately we
>> have existing practice in the form of PPC and Q35 iommu=on that
>> conflict with that.
>
> And possibly more as in this is how the qemu virtio devices are written
> today, they do not use the proper DMA accessors, they always bypass,
> whatever the platform is (so sparc would be in the same boat for
> example).

Except that AFAIK Q35 is the only QEMU platform that supports a
nontrivial IOMMU in the first place.  Are there pseries hosts that
have a working IOMMU?  Maybe I've just misunderstood.

>
>> >>   New QEMU
>> >> always advertises this feature flag.  If iommu=on, QEMU's virtio
>> >> devices refuse to work unless the driver acknowledges the flag.
>> >
>> > This should be configurable.
>>
>> Would any non-PPC user ever configure it differently?  I suppose if
>> you want to support old kernels on new QEMU, you'd flip the switch.
>
> Possibly, have we looked at what ia64, sparc, arm, ... do ? At least
> sparc has iommus as well.

I think (I hope!) that ia64 is irrelevant, and last I checked ARM
didn't have a QEMU-emulated IOMMU.  Maybe things have changed.

>
> Let's try to not make it an architecture issue. As I said above, we have
> a kernel that just reacts appropriately based on what qemu says it's
> doing, and what qemu does is a per-machine flag to set the default.
>
>> >> On PPC, new QEMU will not respect the IOMMU and will not set the flag.
>> >> New kernels will not talk to devices that set the flag.  If someone
>> >> wants to fix that, then they get to figure out how.
>> >
>> > I disagree with the kernel bit and I disagree with special casing PPC in
>> > any shape or form in the code. The only difference should be a default
>> > value for the iommu mode of virtio in qemu set per machine.
>> >
>> > You can then feel free to change that default (in a separate patch for
>> > bisectability) on x86 for the sake of Xen.
>>
>> I think we should flip the default everywhere to "respects IOMMU".
>
> On new machine types, we shouldn't change the behaviour of an existing
> machine type, and we should keep the default to 0 on ppc/pseries because
> of backward compatibility issue. But that should be the only place that
> is "ppc specific", ie, a default value in a machine def structure.

Fair enough, except I still think we should change the default to be
"respect IOMMU" on machine types that don't have an IOMMU in the first
place.  That way Xen works with old machine types, and I don't think
we lose anything.

>
>> That's the setting that will work in all cases on new guest + new
>> host, and it's the setting that's safest.  vfio will probably always
>> malfunction if given a device that looks like it's behind an IOMMU but
>> doesn't respect it.  For people who need the last bit of performance,
>> they should use bus-level controls where available (they should be
>> available everywhere except PPC and maybe arm64) and, ideally, someone
>> would teach PPC how to exclude devices from the IOMMU cleanly if
>> possible.  If that can't be done, then there can be an option to
>> bypass the IOMMU the way it's currently done and no one except PPC
>> would do it.
>>
>> PPC really is different from everything except x86 Q35 iommu=on, and
>> the latter is experimental.  AFAIK in all other cases, the IOMMU is
>> respected by virtio, but there is no non-1:1 IOMMU.
>
> What about sparc ? I though it was pretty similar to PPC in that
> regard...

No clue, honestly.  I could be wrong about the set of existing QEMU
machine types.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                                     ` <CALCETrWrLypYdF3Yhzj=ku9i=EmBJm4Ux5ouJ33MuXf7tEgJ0w@mail.gmail.com>
@ 2015-07-29  0:54                                       ` Benjamin Herrenschmidt
  2015-07-29  8:17                                       ` Paolo Bonzini
       [not found]                                       ` <55B88C0E.9050104@redhat.com>
  2 siblings, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2015-07-29  0:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Jan Kiszka,
	Rusty Russell, xen-devel, Christian Borntraeger, Paolo Bonzini,
	Cornelia Huck, linux390, Linux Virtualization

On Tue, 2015-07-28 at 17:47 -0700, Andy Lutomirski wrote:

> Yes, virtio flag.  I dislike having a virtio flag at all, but so far
> no one has come up with any better ideas.  If there was a reliable,
> cross-platform mechanism for per-device PCI bus properties, I'd be all
> for using that instead.

There isn't that I know of, so I think it's the best approach we have.

 .../...

> >  - The kernel should just honor what qemu says, ie, whether the qemu
> > device honors or bypasses the iommu.
> 
> Except for vfio, which maybe just needs a special case: vfio checks if
> the device claims to be virtio and doesn't set the flag, in which case
> vfio just refuses to bind the device.

Right but passing virtio through isn't the highest priority on the
radar, but yes, indeed, it should identify them and reject them.

> >  - Qemu default behaviour should be set via a machine attribute which
> > can be overriden both globally (the machine one) or per-device.
> >
> >> I think that, in an ideal world, there would be no feature flag and
> >> all virtio devices would always respect the IOMMU.  Unfortunately we
> >> have existing practice in the form of PPC and Q35 iommu=on that
> >> conflict with that.
> >
> > And possibly more as in this is how the qemu virtio devices are written
> > today, they do not use the proper DMA accessors, they always bypass,
> > whatever the platform is (so sparc would be in the same boat for
> > example).
> 
> Except that AFAIK Q35 is the only QEMU platform that supports a
> nontrivial IOMMU in the first place.  Are there pseries hosts that
> have a working IOMMU?  Maybe I've just misunderstood.

You may well be correct, I remember that we actually created the iommu
infrastructure to a large extent in qemu for ppc/pseries, then it got
extended when q35 came in.

> >> >>   New QEMU
> >> >> always advertises this feature flag.  If iommu=on, QEMU's virtio
> >> >> devices refuse to work unless the driver acknowledges the flag.
> >> >
> >> > This should be configurable.
> >>
> >> Would any non-PPC user ever configure it differently?  I suppose if
> >> you want to support old kernels on new QEMU, you'd flip the switch.
> >
> > Possibly, have we looked at what ia64, sparc, arm, ... do ? At least
> > sparc has iommus as well.
> 
> I think (I hope!) that ia64 is irrelevant, and last I checked ARM
> didn't have a QEMU-emulated IOMMU.  Maybe things have changed.

Not yet...

 .../...
> >
> > On new machine types, we shouldn't change the behaviour of an existing
> > machine type, and we should keep the default to 0 on ppc/pseries because
> > of backward compatibility issue. But that should be the only place that
> > is "ppc specific", ie, a default value in a machine def structure.
> 
> Fair enough, except I still think we should change the default to be
> "respect IOMMU" on machine types that don't have an IOMMU in the first
> place. 

Ok, but do it in a separate patch because it *is* a behaviour change to
some extent.

>  That way Xen works with old machine types, and I don't think
> we lose anything.
> 
> >
> >> That's the setting that will work in all cases on new guest + new
> >> host, and it's the setting that's safest.  vfio will probably always
> >> malfunction if given a device that looks like it's behind an IOMMU but
> >> doesn't respect it.  For people who need the last bit of performance,
> >> they should use bus-level controls where available (they should be
> >> available everywhere except PPC and maybe arm64) and, ideally, someone
> >> would teach PPC how to exclude devices from the IOMMU cleanly if
> >> possible.  If that can't be done, then there can be an option to
> >> bypass the IOMMU the way it's currently done and no one except PPC
> >> would do it.
> >>
> >> PPC really is different from everything except x86 Q35 iommu=on, and
> >> the latter is experimental.  AFAIK in all other cases, the IOMMU is
> >> respected by virtio, but there is no non-1:1 IOMMU.
> >
> > What about sparc ? I though it was pretty similar to PPC in that
> > regard...
> 
> No clue, honestly.  I could be wrong about the set of existing QEMU
> machine types.

Ok.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                               ` <1438125694.7562.177.camel@kernel.crashing.org>
  2015-07-28 23:33                                 ` Andy Lutomirski
       [not found]                                 ` <CALCETrXFMXCfCWLTdE2nff=gibsu2_2XEwwOxC6T3vQZphGeBg@mail.gmail.com>
@ 2015-07-29  8:07                                 ` Jan Kiszka
  2 siblings, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-29  8:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Andy Lutomirski
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Rusty Russell,
	xen-devel, Christian Borntraeger, Paolo Bonzini, Cornelia Huck,
	linux390, Linux Virtualization

On 2015-07-29 01:21, Benjamin Herrenschmidt wrote:
> On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote:
>>   New QEMU
>> always advertises this feature flag.  If iommu=on, QEMU's virtio
>> devices refuse to work unless the driver acknowledges the flag.
> 
> This should be configurable.

Advertisement of that flag must be configurable, or we won't be able to
run older guests anymore which don't know it, thus will reject it. The
only precondition: there must be no IOMMU if we turn it off.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                                     ` <CALCETrWrLypYdF3Yhzj=ku9i=EmBJm4Ux5ouJ33MuXf7tEgJ0w@mail.gmail.com>
  2015-07-29  0:54                                       ` Benjamin Herrenschmidt
@ 2015-07-29  8:17                                       ` Paolo Bonzini
       [not found]                                       ` <55B88C0E.9050104@redhat.com>
  2 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-29  8:17 UTC (permalink / raw)
  To: Andy Lutomirski, Benjamin Herrenschmidt
  Cc: linux-s390, Michael S. Tsirkin, Jan Kiszka, Rusty Russell,
	xen-devel, Christian Borntraeger, Stefan Hajnoczi, Cornelia Huck,
	linux390, Linux Virtualization



On 29/07/2015 02:47, Andy Lutomirski wrote:
> > > If new kernels ignore the IOMMU for devices that don't set the flag
> > > and there are physical devices that already exist and don't set the
> > > flag, then those devices won't work reliably on most modern
> > > non-virtual platforms, PPC included.
> >
> > Are there many virtio physical devices out there ? We are talking about
> > a virtio flag right ? Or have you been considering something else ?
>
> Yes, virtio flag.  I dislike having a virtio flag at all, but so far
> no one has come up with any better ideas.  If there was a reliable,
> cross-platform mechanism for per-device PCI bus properties, I'd be all
> for using that instead.

No, a virtio flag doesn't make sense.

Blindly using system memory is a bug in QEMU; it has to be fixed to use
the right address space, and then whatever the system provides to
describe "the right address space" can be used (like the DMAR table on x86).

On PPC I suppose you could use the host bridge's device tree?  If you
need a hook, you can add a

	bool virtio_should_bypass_iommu(void)
	{
		/* lookup something in the device tree?!? */
	}
	EXPORT_SYMBOL_GPL(virtio_should_bypass_iommu);

in some pseries.c file, and in the driver:

	static bool virtio_bypass_iommu(void)
	{
		bool (*fn)(void);
	
		fn = symbol_get(virtio_should_bypass_iommu);
		return fn && fn();
	}

Awful, but that's what this thing is.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                                       ` <55B88C0E.9050104@redhat.com>
@ 2015-07-29  8:20                                         ` Jan Kiszka
  2015-07-29  9:21                                         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2015-07-29  8:20 UTC (permalink / raw)
  To: Paolo Bonzini, Andy Lutomirski, Benjamin Herrenschmidt
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Rusty Russell,
	xen-devel, Christian Borntraeger, Cornelia Huck, linux390,
	Linux Virtualization

On 2015-07-29 10:17, Paolo Bonzini wrote:
> 
> 
> On 29/07/2015 02:47, Andy Lutomirski wrote:
>>>> If new kernels ignore the IOMMU for devices that don't set the flag
>>>> and there are physical devices that already exist and don't set the
>>>> flag, then those devices won't work reliably on most modern
>>>> non-virtual platforms, PPC included.
>>>
>>> Are there many virtio physical devices out there ? We are talking about
>>> a virtio flag right ? Or have you been considering something else ?
>>
>> Yes, virtio flag.  I dislike having a virtio flag at all, but so far
>> no one has come up with any better ideas.  If there was a reliable,
>> cross-platform mechanism for per-device PCI bus properties, I'd be all
>> for using that instead.
> 
> No, a virtio flag doesn't make sense.

That will create the risk of subtly breaking old guests over new setups.
I wouldn't suggest this.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
       [not found]                                       ` <55B88C0E.9050104@redhat.com>
  2015-07-29  8:20                                         ` Jan Kiszka
@ 2015-07-29  9:21                                         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2015-07-29  9:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-s390, Michael S. Tsirkin, Stefan Hajnoczi, Rusty Russell,
	xen-devel, Christian Borntraeger, Jan Kiszka, Cornelia Huck,
	linux390, Andy Lutomirski, Linux Virtualization

On Wed, 2015-07-29 at 10:17 +0200, Paolo Bonzini wrote:
> 
> On 29/07/2015 02:47, Andy Lutomirski wrote:
> > > > If new kernels ignore the IOMMU for devices that don't set the flag
> > > > and there are physical devices that already exist and don't set the
> > > > flag, then those devices won't work reliably on most modern
> > > > non-virtual platforms, PPC included.
> > >
> > > Are there many virtio physical devices out there ? We are talking about
> > > a virtio flag right ? Or have you been considering something else ?
> >
> > Yes, virtio flag.  I dislike having a virtio flag at all, but so far
> > no one has come up with any better ideas.  If there was a reliable,
> > cross-platform mechanism for per-device PCI bus properties, I'd be all
> > for using that instead.
> 
> No, a virtio flag doesn't make sense.

It wouldn't if we were creating virtio from scratch.

However we have to be realistic here, we are contending with existing
practices and implementation. The fact is qemu *does* bypass any iommu
and has been doing so for a long time, *and* the guest drivers are
written today *also* bypassing all DMA mapping mechanisms and just
passing everything accross.

So if it's a bug, it's a bug on both sides of the fence. We are no
longer in "bug fixing" territory here, it's a fundamental change of ABI.
The ABI might not be what was intended (but that's arguable, see below),
but it is that way.

Arguably it was even known and considered a *feature* by some (including
myself) at the time. It somewhat improved performances on archs where
otherwise every page would have to be mapped/unmapped in guest iommu. In
fact, it also makes vhost a lot easier.

So I disagree, it's de-facto a feature (even if unintended) of the
existing virtio implementations and changing that would be a major
interface change, and thus should be exposed as such.

> Blindly using system memory is a bug in QEMU; it has to be fixed to use
> the right address space, and then whatever the system provides to
> describe "the right address space" can be used (like the DMAR table on x86).

Except that it's not so easy.

For example, on PPC PAPR guests, there is no such thing as a "no IOMMU"
space, the concept doesn't exist. So we have at least three things to
deal with:

 - Existing guests, so we must preserve the existing behaviour for
backward compatibility.

 - vhost is made more complex because it now needs to be informed of the
guest iommu updates

 - New guests have the "new driver" that knows how to map and unmap
would have a performance loss unless some mechanism to create a "no
iommu" space exists which for us would need to be added. Either that or
we rely on DDW which is a way for a guest to create a permanent mapping
of its entire address space in an IOMMU but that incur a significant
waste of host kernel memory.

> On PPC I suppose you could use the host bridge's device tree?  If you
> need a hook, you can add a

No because we can mix and match virtio and other devices on the same
host bridge. Unless we put a property that only applies to virtio
children of the host bridge.

> 	bool virtio_should_bypass_iommu(void)
> 	{
> 		/* lookup something in the device tree?!? */
> 	}
> 	EXPORT_SYMBOL_GPL(virtio_should_bypass_iommu);
> 
> in some pseries.c file, and in the driver:
> 
> 	static bool virtio_bypass_iommu(void)
> 	{
> 		bool (*fn)(void);
> 	
> 		fn = symbol_get(virtio_should_bypass_iommu);
> 		return fn && fn();
> 	}
> 
> Awful, but that's what this thing is.

Ben.

> Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
@ 2015-07-28  1:08 Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2015-07-28  1:08 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin
  Cc: linux-s390, xen-devel, Benjamin Herrenschmidt,
	Linux Virtualization, Christian Borntraeger, Paolo Bonzini,
	Stefan Hajnoczi, Cornelia Huck, linux390, Andy Lutomirski

On Mon, Sep 1, 2014 at 10:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> This fixes virtio on Xen guests as well as on any other platform
> that uses virtio_pci on which physical addresses don't match bus
> addresses.
>
> This can be tested with:
>
>     virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
>
> using virtme from here:
>
>     https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
>
> Without these patches, the guest hangs forever.  With these patches,
> everything works.
>

Dusting off an ancient thread.

Now that the dust has accumulated^Wsettled, is it worth pursuing this?
 I think the situation is considerably worse than it was when I
originally wrote these patches: I think that QEMU now supports a nasty
mode in which the guest's PCI bus appears to be behind an IOMMU but
the virtio devices on that bus punch straight through that IOMMU.

I have a half-hearted port to modern kernels here:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=virtio_ring_xen

I didn't implement DMA API access for virtio_pci_modern, and I have no
idea what to do about detecting whether a given virtio device honors
its IOMMU or not.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-07-29  9:21 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALCETrU2o27udqjvREDFyoNtKtaKvuHTyYRaK+Qt_1Q3tPWYDQ@mail.gmail.com>
2015-07-28  7:05 ` [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API Christian Borntraeger
2015-07-28  8:16 ` Paolo Bonzini
     [not found] ` <55B73A49.9050206@redhat.com>
2015-07-28 10:12   ` Benjamin Herrenschmidt
     [not found]   ` <1438078345.7562.133.camel@kernel.crashing.org>
2015-07-28 12:46     ` Paolo Bonzini
     [not found]     ` <55B7799C.3060908@redhat.com>
2015-07-28 13:06       ` Michael S. Tsirkin
     [not found]       ` <20150728160358-mutt-send-email-mst@redhat.com>
2015-07-28 13:11         ` Jan Kiszka
     [not found]         ` <55B77F8C.7010804@siemens.com>
2015-07-28 16:11           ` Andy Lutomirski
2015-07-28 16:36           ` Paolo Bonzini
     [not found]           ` <55B7AF99.2080209@redhat.com>
2015-07-28 16:42             ` Jan Kiszka
     [not found]             ` <55B7B105.50200@siemens.com>
2015-07-28 17:15               ` Paolo Bonzini
     [not found]               ` <55B7B8AF.5060603@redhat.com>
2015-07-28 17:19                 ` Jan Kiszka
     [not found]                 ` <55B7B992.8070509@siemens.com>
2015-07-28 17:31                   ` Paolo Bonzini
     [not found]           ` <CALCETrVF1-=z60bCtsMBctbd14zjJgpUPNNo+EcJj_fXhww1og@mail.gmail.com>
2015-07-28 16:44             ` Jan Kiszka
     [not found]             ` <55B7B15C.4010101@siemens.com>
2015-07-28 17:10               ` Andy Lutomirski
     [not found]               ` <CALCETrW_7tmUg1zf3m1xYeSruW8njNYP9N4s8tBKJcTGJf_0QA@mail.gmail.com>
2015-07-28 17:17                 ` Jan Kiszka
     [not found]                 ` <55B7B91E.40200@siemens.com>
2015-07-28 18:22                   ` Andy Lutomirski
     [not found]                   ` <CALCETrWUVQWQ=mydbdiSJnA0uLSFarVYCnWWKpcgYD0A8hpfXQ@mail.gmail.com>
2015-07-28 19:06                     ` Jan Kiszka
     [not found]                     ` <55B7D2A9.6040700@siemens.com>
2015-07-28 19:24                       ` Andy Lutomirski
     [not found]                       ` <CALCETrVeT-M+qMxXwDUB50xnFW1B6yu+ofiX-ydvtjaGgRLUNA@mail.gmail.com>
2015-07-28 19:33                         ` Jan Kiszka
     [not found]                         ` <55B7D8F5.1000902@siemens.com>
2015-07-28 21:16                           ` Andy Lutomirski
     [not found]                           ` <CALCETrWZM=1y_nwBW2RcMNqKcPPGa0uqqGS_fKijfJzYJL4L-Q@mail.gmail.com>
2015-07-28 22:43                             ` Andy Lutomirski
     [not found]                             ` <CALCETrW+YynS13qgXnUR-Vw9uyve9Js3RQb2RV9NBHxwyft5=w@mail.gmail.com>
2015-07-28 23:21                               ` Benjamin Herrenschmidt
     [not found]                               ` <1438125694.7562.177.camel@kernel.crashing.org>
2015-07-28 23:33                                 ` Andy Lutomirski
     [not found]                                 ` <CALCETrXFMXCfCWLTdE2nff=gibsu2_2XEwwOxC6T3vQZphGeBg@mail.gmail.com>
2015-07-29  0:36                                   ` Benjamin Herrenschmidt
     [not found]                                   ` <1438130185.7562.186.camel@kernel.crashing.org>
2015-07-29  0:47                                     ` Andy Lutomirski
     [not found]                                     ` <CALCETrWrLypYdF3Yhzj=ku9i=EmBJm4Ux5ouJ33MuXf7tEgJ0w@mail.gmail.com>
2015-07-29  0:54                                       ` Benjamin Herrenschmidt
2015-07-29  8:17                                       ` Paolo Bonzini
     [not found]                                       ` <55B88C0E.9050104@redhat.com>
2015-07-29  8:20                                         ` Jan Kiszka
2015-07-29  9:21                                         ` Benjamin Herrenschmidt
2015-07-29  8:07                                 ` Jan Kiszka
2015-07-28 13:08 ` Michael S. Tsirkin
2015-07-28  1:08 Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).