linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liang, Cunming" <cunming.liang@intel.com>
To: "Bie, Tiwei" <tiwei.bie@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"ddutile@redhat.com" <ddutile@redhat.com>,
	"Duyck, Alexander H" <alexander.h.duyck@intel.com>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Daly, Dan" <dan.daly@intel.com>,
	"Wang, Zhihong" <zhihong.wang@intel.com>,
	"Tan, Jianfeng" <jianfeng.tan@intel.com>,
	"Wang, Xiao W" <xiao.w.wang@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>
Subject: RE: [RFC] vhost: introduce mdev based hardware vhost backend
Date: Fri, 20 Apr 2018 03:50:41 +0000	[thread overview]
Message-ID: <D0158A423229094DA7ABF71CF2FA0DA34E9511D5@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <20180420032806.i3jy7xb7emgil6eu@debian>



> -----Original Message-----
> From: Bie, Tiwei
> Sent: Friday, April 20, 2018 11:28 AM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>; alex.williamson@redhat.com;
> ddutile@redhat.com; Duyck, Alexander H <alexander.h.duyck@intel.com>;
> virtio-dev@lists.oasis-open.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> netdev@vger.kernel.org; Daly, Dan <dan.daly@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>;
> Tian, Kevin <kevin.tian@intel.com>
> Subject: Re: [RFC] vhost: introduce mdev based hardware vhost backend
> 
> On Thu, Apr 19, 2018 at 09:40:23PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
> > > > > > One problem is that, different virtio ring compatible devices
> > > > > > may have different device interfaces. That is to say, we will
> > > > > > need different drivers in QEMU. It could be troublesome. And
> > > > > > that's what this patch trying to fix. The idea behind this
> > > > > > patch is very simple: mdev is a standard way to emulate device
> > > > > > in kernel.
> > > > > So you just move the abstraction layer from qemu to kernel, and
> > > > > you still need different drivers in kernel for different device
> > > > > interfaces of accelerators. This looks even more complex than
> > > > > leaving it in qemu. As you said, another idea is to implement
> > > > > userspace vhost backend for accelerators which seems easier and
> > > > > could co-work with other parts of qemu without inventing new type of
> messages.
> > > > I'm not quite sure. Do you think it's acceptable to add various
> > > > vendor specific hardware drivers in QEMU?
> > > >
> > >
> > > I don't object but we need to figure out the advantages of doing it
> > > in qemu too.
> > >
> > > Thanks
> >
> > To be frank kernel is exactly where device drivers belong.  DPDK did
> > move them to userspace but that's merely a requirement for data path.
> > *If* you can have them in kernel that is best:
> > - update kernel and there's no need to rebuild userspace
> > - apps can be written in any language no need to maintain multiple
> >   libraries or add wrappers
> > - security concerns are much smaller (ok people are trying to
> >   raise the bar with IOMMUs and such, but it's already pretty
> >   good even without)
> >
> > The biggest issue is that you let userspace poke at the device which
> > is also allowed by the IOMMU to poke at kernel memory (needed for
> > kernel driver to work).
> 
> I think the device won't and shouldn't be allowed to poke at kernel memory. Its
> kernel driver needs some kernel memory to work. But the device doesn't have
> the access to them. Instead, the device only has the access to:
> 
> (1) the entire memory of the VM (if vIOMMU isn't used) or
> (2) the memory belongs to the guest virtio device (if
>     vIOMMU is being used).
> 
> Below is the reason:
> 
> For the first case, we should program the IOMMU for the hardware device based
> on the info in the memory table which is the entire memory of the VM.
> 
> For the second case, we should program the IOMMU for the hardware device
> based on the info in the shadow page table of the vIOMMU.
> 
> So the memory can be accessed by the device is limited, it should be safe
> especially for the second case.
> 
> My concern is that, in this RFC, we don't program the IOMMU for the mdev
> device in the userspace via the VFIO API directly. Instead, we pass the memory
> table to the kernel driver via the mdev device (BAR0) and ask the driver to do the
> IOMMU programming. Someone may don't like it. The main reason why we don't
> program IOMMU via VFIO API in userspace directly is that, currently IOMMU
> drivers don't support mdev bus.
> 
> >
> > Yes, maybe if device is not buggy it's all fine, but it's better if we
> > do not have to trust the device otherwise the security picture becomes
> > more murky.
> >
> > I suggested attaching a PASID to (some) queues - see my old post
> > "using PASIDs to enable a safe variant of direct ring access".
> 
Ideally we can have a device binding with normal driver in host, meanwhile support to allocate a few queues attaching with PASID on-demand. By vhost mdev transport channel, the data path ability of queues(as a device) can expose to qemu vhost adaptor as a vDPA instance. Then we can avoid VF number limitation, providing vhost data path acceleration in a small granularity.

> It's pretty cool. We also have some similar ideas.
> Cunming will talk more about this.
> 
> Best regards,
> Tiwei Bie
> 
> >
> > Then using IOMMU with VFIO to limit access through queue to corrent
> > ranges of memory.
> >
> >
> > --
> > MST

  parent reply	other threads:[~2018-04-20  3:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-02 15:23 [RFC] vhost: introduce mdev based hardware vhost backend Tiwei Bie
2018-04-10  2:52 ` Jason Wang
2018-04-10  4:57   ` Tiwei Bie
2018-04-10  7:25     ` Jason Wang
2018-04-19 18:40       ` Michael S. Tsirkin
2018-04-20  3:28         ` Tiwei Bie
2018-04-20  3:50           ` Michael S. Tsirkin
2018-04-20  3:50           ` Liang, Cunming [this message]
2018-04-20 13:52             ` Michael S. Tsirkin
2018-04-20  3:52         ` Jason Wang
2018-04-20 14:12           ` Michael S. Tsirkin
2018-04-10  7:51     ` [virtio-dev] " Paolo Bonzini
2018-04-10  9:23       ` Liang, Cunming
2018-04-10 13:36         ` Michael S. Tsirkin
2018-04-10 14:23           ` Liang, Cunming
2018-04-11  1:38             ` Tian, Kevin
2018-04-11  2:18             ` Jason Wang
2018-04-11  2:01         ` Stefan Hajnoczi
2018-04-11  2:08         ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D0158A423229094DA7ABF71CF2FA0DA34E9511D5@SHSMSX104.ccr.corp.intel.com \
    --to=cunming.liang@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=dan.daly@intel.com \
    --cc=ddutile@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jianfeng.tan@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=tiwei.bie@intel.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xiao.w.wang@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).