From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751882AbcEKG3N (ORCPT ); Wed, 11 May 2016 02:29:13 -0400 Received: from mga14.intel.com ([192.55.52.115]:47874 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751136AbcEKG3J convert rfc822-to-8bit (ORCPT ); Wed, 11 May 2016 02:29:09 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,608,1455004800"; d="scan'208";a="101040981" From: "Tian, Kevin" To: Alex Williamson CC: Yongji Xie , David Laight , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linuxppc-dev@lists.ozlabs.org" , "iommu@lists.linux-foundation.org" , "bhelgaas@google.com" , "aik@ozlabs.ru" , "benh@kernel.crashing.org" , "paulus@samba.org" , "mpe@ellerman.id.au" , "joro@8bytes.org" , "warrier@linux.vnet.ibm.com" , "zhong@linux.vnet.ibm.com" , "nikunj@linux.vnet.ibm.com" , "eric.auger@linaro.org" , "will.deacon@arm.com" , "gwshan@linux.vnet.ibm.com" , "alistair@popple.id.au" , "ruscur@russell.cc" Subject: RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported Thread-Topic: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported Thread-Index: AQHRoIMDw+bjSjW8wkusiJ7fWPCsmJ+muJEA//+EsACAAIcFkP//kPaAgAPMIVCAACQUuoAAByVg//+rVIABKrYjoA== Date: Wed, 11 May 2016 06:29:06 +0000 Message-ID: References: <1461761010-5452-1-git-send-email-xyjxie@linux.vnet.ibm.com> <1461761010-5452-6-git-send-email-xyjxie@linux.vnet.ibm.com> <063D6719AE5E284EB5DD2968C1650D6D5F4B52B5@AcuExch.aculab.com> <4be013bc-e81b-84c5-06d3-e1b3f46b3227@linux.vnet.ibm.com> <20160505090513.56886c12@t450s.home> In-Reply-To: <20160505090513.56886c12@t450s.home> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNWYxMDM4NzMtNzE4ZS00MGE1LWIwZjQtMWJjODlmYWNmYjg3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IlAyc3lLajdoNVwvcEpWR2pWUXEzdEdEM05CaCtobkNIOFlOckI5Y3hnQ3VZPSJ9 x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Thursday, May 05, 2016 11:05 PM > > On Thu, 5 May 2016 12:15:46 +0000 > "Tian, Kevin" wrote: > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > Hi David and Kevin, > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > From: Tian, Kevin > > > >> Sent: 05 May 2016 10:37 > > > > ... > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > >>> can make sure guest kernel would not touch MSI-X table in > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > >>> > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > Or a malicious guest driver for an ethernet card setting up > > > > the receive buffer ring to contain a single word entry that > > > > contains the address associated with an MSI-X interrupt and > > > > then using a loopback mode to cause a specific packet be > > > > received that writes the required word through that address. > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > cycle. > > > > > > > > David > > > > > > > > > > If we have enough permission to load a malicious driver or > > > kernel, we can easily break the guest without exposed > > > MSI-X table. > > > > > > I think it should be safe to expose MSI-X table if we can > > > make sure that malicious guest driver/kernel can't use > > > the MSI-X table to break other guest or host. The > > > capability of IRQ remapping could provide this > > > kind of protection. > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > structure to guest. I know actual IRQ remapping might be platform > > specific, but at least for Intel VT-d specification, MSI-X entry must > > be configured with a remappable format by host kernel which > > contains an index into IRQ remapping table. The index will find a > > IRQ remapping entry which controls interrupt routing for a specific > > device. If you allow a malicious program random index into MSI-X > > entry of assigned device, the hole is obvious... > > > > Above might make sense only for a IRQ remapping implementation > > which doesn't rely on extended MSI-X format (e.g. simply based on > > BDF). If that's the case for PPC, then you should build MSI-X > > passthrough based on this fact instead of general IRQ remapping > > enabled or not. > > I don't think anyone is expecting that we can expose the MSI-X vector > table to the guest and the guest can make direct use of it. The end > goal here is that the guest on a power system is already > paravirtualized to not program the device MSI-X by directly writing to > the MSI-X vector table. They have hypercalls for this since they > always run virtualized. Therefore a) they never intend to touch the > MSI-X vector table and b) they have sufficient isolation that a guest > can only hurt itself by doing so. > > On x86 we don't have a), our method of programming the MSI-X vector > table is to directly write to it. Therefore we will always require QEMU > to place a MemoryRegion over the vector table to intercept those > accesses. However with interrupt remapping, we do have b) on x86, which > means that we don't need to be so strict in disallowing user accesses > to the MSI-X vector table. It's not useful for configuring MSI-X on > the device, but the user should only be able to hurt themselves by > writing it directly. x86 doesn't really get anything out of this > change, but it helps this special case on power pretty significantly > aiui. Thanks, > Allowing guest direct write to MSI-x table has system-wide impact. As I explained earlier, hypervisor needs to control "interrupt_index" programmed in MSI-X entry, which is used to associate a specific IRQ remapping entry. Now if you expose whole MSI-x table to guest, it can program random index into MSI-X entry to associate with any IRQ remapping entry and then there won't be any isolation per se. You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d spec. Thanks Kevin