From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Roger_Pau_Monn=E9?= Subject: Re: [PATCH v3 10/10] x86/MSI-X: provide hypercall interface for mask-all control Date: Fri, 19 Jun 2015 16:07:43 +0200 Message-ID: <5584222F.90707@citrix.com> References: <55719F9D0200007800081425@mail.emea.novell.com> <5571A3F202000078000814CA@mail.emea.novell.com> <557964870200007800083706@mail.emea.novell.com> <55795A3D.7060304@citrix.com> <55842E9702000078000870C9@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Z5wy3-00060L-AZ for xen-devel@lists.xenproject.org; Fri, 19 Jun 2015 14:08:27 +0000 In-Reply-To: <55842E9702000078000870C9@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Andrew Cooper Cc: Wei Liu , Stefano Stabellini , Ian Jackson , Ian Campbell , xen-devel , dgdegra@tycho.nsa.gov, Keir Fraser List-Id: xen-devel@lists.xenproject.org El 19/06/15 a les 15.00, Jan Beulich ha escrit: >>>> On 11.06.15 at 11:51, wrote: >> On 11/06/15 09:35, Jan Beulich wrote: >>> While I continue to be of the opinion that all direct writes to >>> interrupt masking bits (MSI-X mask-all, MSI-X per-entry mask, >>> MSI per entry mask) outside of the hypervisor are wrong and >>> should be eliminated, the scope of the problem now clearly >>> going beyond qemu made me reconsider whether we shouldn't, >>> as advocated by Stefano, follow the trap-and-emulate route >>> instead. This would not only mean adding code to x86's existing >>> port CF8/CFC intercepts, but also write-protecting the MMCFG >>> pages for all PCI devices being MSI or MSI-X capable, emulating >>> writes with inspection / modification of writes to any of the mask >>> bits located in PCI config space. (A subsequent optimization to >>> this may then be a hypercall to do config space writes, >>> eliminating the emulation overhead, accompanied by a bitmap >>> indicating which devices' CFG space can be written directly.) >>> >>> For a (from now on) timely resolution of the original problem I'd >>> really appreciate opinions (or alternative suggestions). >> >> A very definite +1 from me. I have previously suggested as much. > > And now that I started looking into what it takes to make this > work, I'm having a deja vu: In order for us to reliably intercept > all CFG accesses, we need to whitelist the MMCFG pages of > devices we know we don't care about being written. I.e. we > need to start out with all of them being read-only. And the > affected MFNs have to be known before Dom0 maps these > pages (or else we would have to hunt down all the mappings in > the page tables, which is nothing I consider even remotely > reasonable). Yet, and here comes the deja vu, upstream Linux > _still_ doesn't make use of PHYSDEVOP_pci_mmcfg_reserved. > No idea whether FreeBSD or whatever else can be used as Dom0 > do. So no matter how we turn it, we have a dependency on the > Dom0 kernel also being adjusted. In which case we might as well > go the original route of requiring hypercalls to be used for certain > operations to deal with the problem here. FreeBSD doesn't implement PHYSDEVOP_pci_mmcfg_reserved ATM. I had a patch to implement it, but it's completely useless with the way we map MMIO regions on PVH right now. Every hole in the e820 is basically mapped as a MMIO region _before_ starting Dom0, making the white/black listing done in PHYSDEVOP_pci_mmcfg_reserved completely moot. > Otoh the write interception has the potential of dealing with other > problems (like that of XSAs 120 and 126), but making the security > of Xen (in presence of the fix/workaround to the original problem > here) dependent on a Dom0 side change not even on its way into > the master Linux branch yet makes me really hesitant to try going > that route. (And no, I'm not up to fighting for another pv-ops hook > considering that I've never been really convinced of the pv-ops > model in the first place.) > > But then again the one thing we might consider saving us on the > Linux side is that as of 2.6.25 base config space accesses don't > get done via MMCFG anymore, and we don't have an immediate > need to intercept extended ones (i.e. initially we might even get > away without snooping MMCFG writes at all). Roger - how do > things look like on the FreeBSD side? I don't mind adding a PHYSDEVOP_pci_mmcfg_reserved call to FreeBSD, but for it to have any effect we need to stop unconditionally mapping everything as MMIO regions on PVH Dom0. Then we need to expand XENMEM_add_to_physmap_batch so it can be used to map MMIO regions on demand from Dom0 or modify PHYSDEVOP_pci_mmcfg_reserved so it sets up the right mappings (1:1) for auto-translated guests. Roger.