xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Paul Durrant <Paul.Durrant@citrix.com>
Cc: Anthony Perard <anthony.perard@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	Stefano Stabellini <Stefano.Stabellini@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors
Date: Fri, 01 Apr 2016 07:42:51 -0600	[thread overview]
Message-ID: <56FE96FB02000078000E2138@prv-mh.provo.novell.com> (raw)
In-Reply-To: <88fb0367a4ab40f8b36cb1996a0e2f59@AMSPEX02CL03.citrite.net>

>>> On 01.04.16 at 15:01, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 01 April 2016 12:21
>> >>> On 01.04.16 at 12:56, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: 01 April 2016 10:59
>> >> >>> On 01.04.16 at 11:15, <JBeulich@suse.com> wrote:
>> >> > Recent changes to Linux result in there just being a single unmask
>> >> > operation prior to expecting the first interrupts to arrive. However,
>> >> > we've had a chicken-and-egg problem here: Qemu invokes
>> >> > xc_domain_update_msi_irq(), ultimately leading to
>> >> > msixtbl_pt_register(), upon seeing that first unmask operation. Yet
>> >> > for msixtbl_range() to return true (in order to msixtbl_write() to get
>> >> > invoked at all) msixtbl_pt_register() must have completed.
>> >> >
>> >> > Deal with this by snooping suitable writes in msixtbl_range() and
>> >> > triggering the invocation of msix_write_completion() from
>> >> > msixtbl_pt_register() when that happens in the context of a still in
>> >> > progress vector control field write.
>> >> >
>> >> > Note that the seemingly unrelated deletion of the redundant
>> >> > irq_desc->msi_desc checks in msixtbl_pt_register() is to make clear to
>> >> > any compiler version used that the "msi_desc" local variable isn't
>> >> > being used uninitialized. (Doing the same in msixtbl_pt_unregister() is
>> >> > just for consistency reasons.)
>> >> >
>> >> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> >> > ---
>> >> > TODO: How to deal with REP MOVS to the MSI-X table (in
>> msixtbl_range())?
>> >>
>> >> Some more detail on the thoughts I so far had for this aspect:
>> >> It has always been puzzling me that the hypervisor doesn't
>> >> see _all_ the MSI-X table accesses (which is a result of the
>> >> addresses only getting registered via XEN_DOMCTL_bind_pt_irq);
>> >> it's quite natural that this is an at least latent issue possibly
>> >> causing guest misbehavior. I cannot, however, currently see any
>> >> way to address this without altering both Xen and qemu, since for
>> >> Xen to see all accesses it would need to become aware of the
>> >> GPA of the MSI-X table much earlier (read: before the domain
>> >> actually start, or at the latest when the domain first enables
>> >> memory decoding on the device).
>> >>
>> >> The mapping of the MMIO BARs of the device into guest memory,
>> >> however, intentionally excludes the page(s) covering the MSI-X
>> >> table, so the hypervisor can't become aware of them by just
>> >> looking at data it gets presented today. Hence either we need to
>> >> add some new hypercall for qemu to invoke, or we need to make
>> >> qemu map the full BAR ranges, filtering out MSI-X table pages
>> >> in the hypervisor (using those mapping requests just to learn the
>> >> GPA of the MSI-X table, without entering them into P2M).
>> >>
>> >> Unless someone can think of a way which doesn't require altering
>> >> both qemu and Xen (creating the well known compatibility issues
>> >> between unmatched pairs), I think the patch as presented should
>> >> be okay without handling this case, i.e. best possible effort, and a
>> >> subsequent change then ought to be to deal with this by changing
>> >> both components. In which case I'd suggest that the change here
>> >> go into 4.7, but the full fix would then likely need deferring until
>> >> 4.8.
>> >
>> > I guess it could be handled entirely in Xen if we are willing to snoop on
>> > PCI configuration. It would not be too hard to snoop guest writes to the
>> BARs
>> > in config space so that Xen can keep track of where they are. Snooping on
>> the
>> > MSI-X capability could then tell Xen when to start interposing on the table,
>> > and allow it to discover the GPA at that point (via the BIR and offset
>> > values).
>> 
>> Well, that's a possibility, but won't - afaict - work without qemu's
>> help at another point: So far we don't know the guest's PCI bus
>> topology, hence we can't correlate vBAR writes we might snoop
>> with the physical devices they correspond to.
> 
> Does Xen need to know that correspondence just to track state? I thought the 
> problem here was that Xen does not see every guest access to an MSI-X table. 
> If Xen always interposes on MSI-X tables then it can at least track the state 
> of the emulated table, even if we end up just forwarding the access for QEMU 
> to handle at first. When the mapping is created to the actual h/w table then, 
> presumably, Xen's idea of state should correspond to QEMU's.

But Xen doesn't see the guest view of config space, and the
whereabouts of the MSI-X table are in read-only config space fields
(i.e. snooping writes to those doesn't make sense, as the guest may
not ever write them or write anything, which would just get dropped
on the floor).

And additionally msixtbl_addr_to_desc() needs to know the physical
device.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-04-01 13:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-01  9:15 [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors Jan Beulich
2016-04-01  9:58 ` Jan Beulich
2016-04-01 10:56   ` Paul Durrant
2016-04-01 11:21     ` Jan Beulich
2016-04-01 13:01       ` Paul Durrant
2016-04-01 13:42         ` Jan Beulich [this message]
2016-04-01 13:54           ` Paul Durrant
2016-04-01 14:17             ` Jan Beulich
2016-04-01 14:19               ` Paul Durrant
2016-04-01 14:28                 ` Jan Beulich
2016-04-01 14:27               ` Paul Durrant
2016-04-01 14:40                 ` Jan Beulich
2016-04-01 14:58                   ` Paul Durrant
2016-04-01 15:28                     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FE96FB02000078000E2138@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=Stefano.Stabellini@citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).