From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1750963AbbJGGwS (ORCPT <rfc822;w@1wt.eu>);
	Wed, 7 Oct 2015 02:52:18 -0400
Received: from mail-wi0-f172.google.com ([209.85.212.172]:36991 "EHLO
	mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750730AbbJGGwQ (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 7 Oct 2015 02:52:16 -0400
Subject: Re: [PATCH v3 2/3] uio_pci_generic: add MSI/MSI-X support
To: Alex Williamson <alex.williamson@redhat.com>
References: <1443991398-23761-1-git-send-email-vladz@cloudius-systems.com>
 <1443991398-23761-3-git-send-email-vladz@cloudius-systems.com>
 <20151005031159.GB27303@kroah.com> <56123493.9000602@scylladb.com>
 <20151005094932.GA5236@kroah.com> <56124EDB.3070701@scylladb.com>
 <20151006143821.GA11541@redhat.com> <5613DE26.1090202@cloudius-systems.com>
 <20151006174648-mutt-send-email-mst@redhat.com>
 <5613E75E.1040002@scylladb.com> <1444157480.4059.67.camel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
        Vlad Zolotarov <vladz@cloudius-systems.com>,
        Greg KH <gregkh@linuxfoundation.org>, linux-kernel@vger.kernel.org,
        hjk@hansjkoch.de, corbet@lwn.net, bruce.richardson@intel.com,
        avi@cloudius-systems.com, gleb@cloudius-systems.com,
        stephen@networkplumber.org, alexander.duyck@gmail.com
From: Avi Kivity <avi@scylladb.com>
Message-ID: <5614C11B.6090601@scylladb.com>
Date: Wed, 7 Oct 2015 09:52:11 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <1444157480.4059.67.camel@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 10/06/2015 09:51 PM, Alex Williamson wrote:
> On Tue, 2015-10-06 at 18:23 +0300, Avi Kivity wrote:
>> On 10/06/2015 05:56 PM, Michael S. Tsirkin wrote:
>>> On Tue, Oct 06, 2015 at 05:43:50PM +0300, Vlad Zolotarov wrote:
>>>> The only "like VFIO" behavior we implement here is binding the MSI-X
>>>> interrupt notification to eventfd descriptor.
>>> There will be more if you add some basic memory protections.
>>>
>>> Besides, that's not true.
>>> Your patch queries MSI capability, sets # of vectors.
>>> You even hinted you want to add BAR mapping down the road.
>> BAR mapping is already available from sysfs; it is not mandatory.
>>
>>> VFIO does all of that.
>>>
>> Copying vfio maintainer Alex (hi!).
>>
>> vfio's charter is modern iommu-capable configurations. It is designed to
>> be secure enough to be usable by an unprivileged user.
>>
>> For performance and hardware reasons, many dpdk deployments use
>> uio_pci_generic.  They are willing to trade off the security provided by
>> vfio for the performance and deployment flexibility of pci_uio_generic.
>> Forcing these features into vfio will compromise its security and
>> needlessly complicate its code (I guess it can be done with a "null"
>> iommu, but then vfio will have to decide whether it is secure or not).
> It's not just the iommu model vfio uses, it's that vfio is built around
> iommu groups.  For instance to use a device in vfio, the user opens the
> vfio group file and asks for the device within that group.  That's a
> fairly fundamental part of the mechanics to sidestep.
>
> However, is there an opportunity at a lower level?  Systems without an
> iommu typically have dma ops handled via a software iotlb (ie. bounce
> buffers), but I think they simply don't have iommu ops registered.
> Could a no-iommu, iommu subsystem provide enough dummy iommu ops to fake
> out vfio?  It would need to iterate the devices on the bus and come up
> with dummy iommu groups and dummy versions of iommu_map and unmap.  The
> grouping is easy, one device per group, there's no isolation anyway.
> The vfio type1 iommu backend will do pinning, which seems like an
> improvement over the mlock that uio users probably try to do now.

Right now, people use hugetlbfs maps, which both locks the memory and 
provides better performance.

>    I
> guess the no-iommu map would error if the IOVA isn't simply the bus
> address of the page mapped.
>
> Of course this is entirely unsafe and this no-iommu driver should taint
> the kernel, but it at least standardizes on one userspace API and you're
> already doing completely unsafe things with uio.  vfio should be
> enlightened at least to the point that it allows only privileged users
> access to devices under such a (lack of) iommu.

There is an additional complication.  With an iommu, userspace programs 
the device with virtual addresses, but without it, they have to program 
physical addresses.  So vfio would need to communicate this bit of 
information.

We can go further and define a better translation API than the current 
one (reading /proc/pagemap).  But it's going to be a bigger change to 
vfio than I thought at first.