Re: x86/vMSI-X emulation issue

From: Paul Durrant <Paul.Durrant@citrix.com>
To: Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Subject: Re: x86/vMSI-X emulation issue
Date: Thu, 24 Mar 2016 09:09:34 +0000	[thread overview]
Message-ID: <06625a3cd114425c833bf1a845308b0a@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <56F3AA9302000078000DFE82@prv-mh.provo.novell.com>

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Jan
> Beulich
> Sent: 24 March 2016 07:52
> To: xen-devel
> Cc: Andrew Cooper
> Subject: Re: [Xen-devel] x86/vMSI-X emulation issue
> 
> >>> On 23.03.16 at 18:05, <JBeulich@suse.com> wrote:
> > All,
> >
> > so I've just learned that Windows (at least some versions and some
> > of their code paths) use REP MOVSD to read/write the MSI-X table.
> > The way at least msixtbl_write() works is not compatible with this
> > (msixtbl_read() also seems affected, albeit to a lesser degree), and
> > apparently it just worked by accident until the XSA-120 and 128-131
> > and follow-up changes - most notably commit ad28e42bd1 ("x86/MSI:
> > track host and guest masking separately"), as without the call to
> > guest_mask_msi_irq() interrupts won't ever get unmasked.
> >
> > The problem with emulating REP MOVSD is that msixtbl_write()
> > intentionally returns X86EMUL_UNHANDLEABLE on all writes to
> > words 0, 1, and 2. When in the process of emulating multiple
> > writes, we therefore hand the entire batch of 3 or 4 writes to qemu,
> > and the hypervisor doesn't get to see any other than the initial
> > iteration.
> >
> > Now I see a couple of possible solutions, but none of them look
> > really neat, hence I'm seeking a second opinion (including, of
> > course, further alternative ideas):
> >
> > 1) Introduce another X86EMUL_* like status that's not really to be
> >     used by the emulator itself, but only by the two vMSI-X functions
> >     to indicate to their caller that prior to forwarding the request it
> >     should be chopped to a single repetition.
> >
> > 2) Do aforementioned chopping automatically on seeing
> >     X86EMUL_UNHANDLEABLE, on the basis that the .check
> >     handler had indicated that the full range was acceptable. That
> >     would at once cover other similarly undesirable cases like the
> >     vLAPIC code returning this error. However, any stdvga like
> >     emulated device would clearly not want such to happen, and
> >     would instead prefer the entire batch to get forwarded in one
> >     go (stdvga itself sits on a different path). Otoh, with the
> >     devices we have currently, this would seem to be the least
> >     intrusive solution.
> 
> Having thought about it more over night, I think this indeed is
> the most reasonable route, not just because it's least intrusive:
> For non-buffered internally handled I/O requests, no good can
> come from forwarding full batches to qemu, when the respective
> range checking function has indicated that this is an acceptable
> request. And in fact neither vHPET not vIO-APIC code generate
> X86EMUL_UNHANDLEABLE. And vLAPIC code doing so is also
> just apparently so - I'll submit a patch to make this obvious once
> tested.
> 
> Otoh stdvga_intercept_pio() uses X86EMUL_UNHANDLEABLE in
> a manner similar to the vMSI-X code - for internal caching and
> then forwarding to qemu. Clearly that is also broken for
> REP OUTS, and hence a similar rep count reduction is going to
> be needed for the port I/O case.
> 

It suggests that such cache-and/or-forward models should probably sit somewhere else in the flow, possibly being invoked from hvm_send_ioreq() since there should indeed be a selected ioreq server for these cases.

  Paul

> vRTC code would misbehave too, albeit there it is quite hard to
> see what use REP INS or REP OUTS could be. Yet we can't
> exclude a guest using such, so we should make it behave
> correctly.
> 
> For handle_pmt_io(), otoh, forwarding the full batch would be
> okay, but since there shouldn't be any writes breaking up such
> batches wouldn't be a problem. Then again forwarding such
> invalid requests to qemu is kind of pointless - we could as well
> terminate them right in Xen, just like we terminate requests
> of other than 4 byte width -  again I'll submit a patch to make
> this obvious once tested.
> 
> Jan
> 
> > 3) Have emulation backends provide some kind of (static) flag
> >     indicating which forwarding behavior they would like.
> >
> > 4) Expose the full ioreq to the emulation backends, so they can
> >     fiddle with the request to their liking.
> >
> > Thanks, Jan
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel