From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756065AbbLQKhs (ORCPT ); Thu, 17 Dec 2015 05:37:48 -0500 Received: from e28smtp02.in.ibm.com ([125.16.236.2]:49366 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755976AbbLQKhq (ORCPT ); Thu, 17 Dec 2015 05:37:46 -0500 X-IBM-Helo: d28dlp03.in.ibm.com X-IBM-MailFrom: xyjxie@linux.vnet.ibm.com X-IBM-RcptTo: kvm@vger.kernel.org;linux-api@vger.kernel.org;linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported To: Alex Williamson , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linuxppc-dev@lists.ozlabs.org References: <1449823994-3356-1-git-send-email-xyjxie@linux.vnet.ibm.com> <1449823994-3356-4-git-send-email-xyjxie@linux.vnet.ibm.com> <1450296869.2674.62.camel@redhat.com> Cc: aik@ozlabs.ru, benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, warrier@linux.vnet.ibm.com, zhong@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com From: yongji xie Message-ID: <5672906C.5010708@linux.vnet.ibm.com> Date: Thu, 17 Dec 2015 18:37:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <1450296869.2674.62.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15121710-0005-0000-0000-0000095FA29D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/12/17 4:14, Alex Williamson wrote: > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: >> Current vfio-pci implementation disallows to mmap MSI-X table in >> case that user get to touch this directly. >> >> However, EEH mechanism could ensure that a given pci device >> can only shoot the MSIs assigned for its PE and guest kernel also >> would not write to MSI-X table in pci_enable_msix() because >> para-virtualization on PPC64 platform. So MSI-X table is safe to >> access directly from the guest with EEH mechanism enabled. > The MSI-X table is paravirtualized on vfio in general and interrupt > remapping theoretically protects against errant interrupts, so why is > this PPC64 specific? We have the same safeguards on x86 if we want to > decide they're sufficient. Offhand, the only way I can think that a > device can touch the MSI-X table is via backdoors or p2p DMA with > another device. Maybe I didn't make my point clear. The reasons why we can mmap MSI-X table on PPC64 areļ¼š 1. EEH mechanism could ensure that a given pci device can only shoot the MSIs assigned for its PE. So it would not do harm to other memory space when the guest write a garbage MSI-X address/data to the vector table if we passthough MSI-X tables to guest. 2. The guest kernel would not write to MSI-X table on PPC64 platform when device drivers call pci_enable_msix() to initialize MSI-X interrupts. So I think it is safe to mmap/passthrough MSI-X table on PPC64 platform. And I'm not sure whether other architectures can ensure these two points. Thanks. Regards Yongji Xie >> This patch adds support for this case and allow to mmap MSI-X >> table if EEH is supported on PPC64 platform. >> >> And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify >> userspace that it's safe to mmap MSI-X table. >> >> Signed-off-by: Yongji Xie >> --- >> drivers/vfio/pci/vfio_pci.c | 5 ++++- >> drivers/vfio/pci/vfio_pci_private.h | 5 +++++ >> include/uapi/linux/vfio.h | 2 ++ >> 3 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >> index dbcad99..85d9980 100644 >> --- a/drivers/vfio/pci/vfio_pci.c >> +++ b/drivers/vfio/pci/vfio_pci.c >> @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data, >> if (vfio_pci_bar_page_aligned()) >> info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED; >> >> + if (vfio_msix_table_mmap_enabled()) >> + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP; >> + >> info.num_regions = VFIO_PCI_NUM_REGIONS; >> info.num_irqs = VFIO_PCI_NUM_IRQS; >> >> @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) >> if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) >> return -EINVAL; >> >> - if (index == vdev->msix_bar) { >> + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) { >> /* >> * Disallow mmaps overlapping the MSI-X table; users don't >> * get to touch this directly. We could find somewhere >> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h >> index 319352a..835619e 100644 >> --- a/drivers/vfio/pci/vfio_pci_private.h >> +++ b/drivers/vfio/pci/vfio_pci_private.h >> @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void) >> return IS_ENABLED(CONFIG_PPC64); >> } >> >> +static inline bool vfio_msix_table_mmap_enabled(void) >> +{ >> + return IS_ENABLED(CONFIG_EEH); >> +} > I really dislike these. > >> + >> extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); >> extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); >> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >> index 1fc8066..289e662 100644 >> --- a/include/uapi/linux/vfio.h >> +++ b/include/uapi/linux/vfio.h >> @@ -173,6 +173,8 @@ struct vfio_device_info { >> #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ >> /* Platform support all PCI MMIO BARs to be page aligned */ >> #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4) >> +/* Platform support mmapping PCI MSI-X vector table */ >> +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP (1 << 5) > Again, not sure why this is on the device versus the region, but I'd > prefer to investigate whether we can handle this with the sparse mmap > capability (or lack of) in the capability chains I proposed[1]. Thanks, > > Alex > > [1] https://lkml.org/lkml/2015/11/23/748 > Good idea! I wiil investigate it. Thanks. Regards Yongji Xie >> __u32 num_regions; /* Max region index + 1 */ >> __u32 num_irqs; /* Max IRQ index + 1 */ >> };