On Fri, 26 Jun 2020 14:49:37 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 6/26/20 12:58 PM, Daniel P. Berrangé wrote:
> > On Fri, Jun 26, 2020 at 11:29:03AM +0100, Dr. David Alan Gilbert wrote:
> >> * Janosch Frank (frankja@linux.ibm.com) wrote:
> >>> On 6/26/20 11:32 AM, Daniel P. BerrangÃ© wrote:
> >>>> On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
> >>>>> On 6/26/20 8:53 AM, David Hildenbrand wrote:
> >>>>>>>>>> Does this have any implications when probing with the 'none' machine?
> >>>>>>>>>
> >>>>>>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
> >>>>>>>>> as before, so it would tell you base feature availability, but not
> >>>>>>>>> whether you can use the new configuration option.
> >>>>>>>>>
> >>>>>>>>> Since the HTL option is generic, you could still set it on the "none"
> >>>>>>>>> machine, though it wouldn't really have any effect.  That is, if you
> >>>>>>>>> could create a suitable object to point it at, which would depend on
> >>>>>>>>> ... details.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> The important point is that we never want the (expanded) host cpu model
> >>>>>>>> look different when either specifying or not specifying the HTL
> >>>>>>>> property.
> >>>>>>>
> >>>>>>> Ah, yes, I see your point.  So my current suggestion will satisfy
> >>>>>>> that, basically it is:
> >>>>>>>
> >>>>>>> cpu has unpack (inc. by default) && htl specified
> >>>>>>> 	=> works (allowing secure), as expected
> >>>>>>
> >>>>>> ack
> >>>>>>
> >>>>>>>
> >>>>>>> !cpu has unpack && htl specified
> >>>>>>> 	=> bails out with an error
> >>>>>>
> >>>>>> ack
> >>>>>>
> >>>>>>>
> >>>>>>> !cpu has unpack && !htl specified
> >>>>>>> 	=> works for a non-secure guest, as expected
> >>>>>>> 	=> guest will fail if it attempts to go secure
> >>>>>>
> >>>>>> ack, behavior just like running on older hw without unpack
> >>>>>>
> >>>>>>>
> >>>>>>> cpu has unpack && !htl specified
> >>>>>>> 	=> works as expected for a non-secure guest (unpack feature is
> >>>>>>> 	   present, but unused)
> >>>>>>> 	=> secure guest may work "by accident", but only if all virtio
> >>>>>>> 	   properties have the right values, which is the user's
> >>>>>>> 	   problem
> >>>>>>>
> >>>>>>> That last case is kinda ugly, but I think it's tolerable.
> >>>>>>
> >>>>>> Right, we must not affect non-secure guests, and existing secure setups
> >>>>>> (e.g., older qemu machines). Will have to think about this some more,
> >>>>>> but does not sound too crazy.
> >>>>>
> >>>>> I severely dislike having to specify things to make PV work.
> >>>>> The IOMMU is already a thorn in our side and we're working on making the
> >>>>> whole ordeal completely transparent so the only requirement to make this
> >>>>> work is the right machine, kernel, qemu and kernel cmd line option
> >>>>> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
> >>>>>
> >>>>> I.e. the goal is that if customers convert compatible guests into
> >>>>> protected ones and start them up on a z15 on a distro with PV support
> >>>>> they can just use the guest without having to change XML or command line
> >>>>> parameters.
> >>>>
> >>>> If you're exposing new features to the guest machine, then it is usually
> >>>> to be expected that XML and QEMU command line will change. Some simple
> >>>> things might be hidable behind a new QEMU machine type or CPU model, but
> >>>> there's a limit to how much should be hidden that way while staying sane.
> >>>>
> >>>> I'd really expect the configuration to change when switching a guest to
> >>>> a new hardware platform and wanting major new functionality to be enabled.
> >>>> The XML / QEMU config is a low level instantiation of a particular feature
> >>>> set, optimized for a specific machine, rather than a high level description
> >>>> of ideal "best" config independent of host machine.
> >>>
> >>> You still have to set the host command line and make sure that unpack is
> >>> available. Currently you also have to specify the IOMMU which we like to
> >>> drop as a requirement. Everything else is dependent on runtime
> >>> information which tells us if we need to take a PV or non-PV branch.
> >>> Having the unpack facility should be enough to use the unpack facility.
> >>>
> >>> Keep in mind that we have no real concept of a special protected VM to
> >>> begin with. If the VM never boots into a protected kernel it will never
> >>> be protected. On a reboot it drops from protected into unprotected mode
> >>> to execute the bios and boot loader and then may or may not move back
> >>> into a protected state.
> >>
> >> My worry isn't actually how painful adding all the iommu glue is, but
> >> what happens when users forget; especially if they forget for one
> >> device.
> >>
> >> I could appreciate having a machine option to cause iommu to then get
> >> turned on with all other devices; but I think also we could do with
> >> something that failed with a nice error if an iommu flag was missing.
> >> For SEV this could be done pretty early, but for power/s390 I guess
> >> you'd have to do this when someone tried to enable secure mode, but
> >> I'm not sure you can tell.
> > 
> > What is the cost / downside of turning on the iommu option for virtio
> > devices ? Is it something that is reasonable for a mgmt app todo
> > unconditionally, regardless of whether memory encryption is in use,
> > or will that have a negative impact on things ?
> 
> speed, memory usage and compatibility problems.
> There might also be a problem with s390 having to use <=2GB iommu areas
> in the guest, I need to check with Halil if this is still true.

It is partially true. The coherent_dma_mask is 31 bit and the dma_mask
is 64. That means if iommu=on but !PV the coherent stuff will use <= 2GB
(that stuff allocated by virtio core, like virtqueues, CCWs, etc.) but
there will be no bounce buffering. We don't even initialize swiotlb if
!PV.

I agree with Janosch, we want iommu='on' only when really needed. I've
tried to make that point several times.

Regards,
Halil

> 
> Also, if the default or specified IOMMU buffer size isn't big enough for
> your IO workload the guest is gonna have a very bad time. I.e. if
> somebody has an alternative implementation of bounce buffers we'd be
> happy to take it :)
> 
> > 
> > Regards,
> > Daniel
> > 
> 
>