linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
       [not found] <20190116163253.23780-1-vincent.whitchurch@axis.com>
@ 2019-01-16 17:07 ` Arnd Bergmann
  2019-01-17 10:54   ` Vincent Whitchurch
  2019-01-18 23:49 ` Stephen Warren
  1 sibling, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2019-01-16 17:07 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Vincent Whitchurch, Kishon Vijay Abraham I, Lorenzo Pieralisi,
	linux-pci, linux-ntb, Jon Mason, Dave Jiang, Allen Hubbe

On Wed, Jan 16, 2019 at 5:33 PM Vincent Whitchurch
<vincent.whitchurch@axis.com> wrote:
>
> The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements a
> generic framework to use virtio between two Linux systems, given shared memory
> and a couple of interrupts.  It does not actually require the Intel MIC
> hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> buildable on more systems and adds a loopback driver to test it without special
> hardware.
>
> Note that I don't have access to Intel MIC hardware so some testing of the
> patchset (especially the patch "vop: Use consistent DMA") on that platform
> would be appreciated, to ensure that the series does not break anything there.

Hi Vincent,

First of all, I think it is a very good idea to make virtio over PCIe avaialable
more generally. Your patches also make sense here, they mostly fix
portability bugs, so no objection there.

I think we need to take a step back though and discuss what combinations
we actually do want to support. I have not actually read the whole mic/vop
driver, so I don't know if this would be a good fit as a generic interface --
it may or may not be, and any other input would be helpful.

Aside from that, I should note that we have two related subsystems
in the kernel: the PCIe endpoint subsystem maintained by Kishon and
Lorenzo, and the NTB subsystem maintained by Jon, Dave and Allen.

In order to properly support virtio over PCIe, I would hope we can come
up with a user space interface that looks the same way for configuring
virtio drivers in mic, pcie-endpoint and ntb, if at all possible. Have
you looked at those two subsystems?

        Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-16 17:07 ` [PATCH 0/8] Virtio-over-PCIe on non-MIC Arnd Bergmann
@ 2019-01-17 10:54   ` Vincent Whitchurch
  2019-01-17 12:39     ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Whitchurch @ 2019-01-17 10:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe

On Wed, Jan 16, 2019 at 06:07:53PM +0100, Arnd Bergmann wrote:
> On Wed, Jan 16, 2019 at 5:33 PM Vincent Whitchurch <vincent.whitchurch@axis.com> wrote:
> > The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements a
> > generic framework to use virtio between two Linux systems, given shared memory
> > and a couple of interrupts.  It does not actually require the Intel MIC
> > hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> > buildable on more systems and adds a loopback driver to test it without special
> > hardware.
> >
> > Note that I don't have access to Intel MIC hardware so some testing of the
> > patchset (especially the patch "vop: Use consistent DMA") on that platform
> > would be appreciated, to ensure that the series does not break anything there.
> 
> I think we need to take a step back though and discuss what combinations
> we actually do want to support. I have not actually read the whole mic/vop
> driver, so I don't know if this would be a good fit as a generic interface --
> it may or may not be, and any other input would be helpful.

The MIC driver as a whole is uninteresting as a generic interface since
it is quite tied to the Intel hardware.  The VOP parts though are
logically separated and have no relation to that hardware, even if the
ioctls are called MIC_VIRTIO_*.

The samples/mic/mpssd/mpssd.c code handles both the boot of the MIC
(sysfs) and the VOP parts (ioctls).

> Aside from that, I should note that we have two related subsystems
> in the kernel: the PCIe endpoint subsystem maintained by Kishon and
> Lorenzo, and the NTB subsystem maintained by Jon, Dave and Allen.
> 
> In order to properly support virtio over PCIe, I would hope we can come
> up with a user space interface that looks the same way for configuring
> virtio drivers in mic, pcie-endpoint and ntb, if at all possible. Have
> you looked at those two subsystems?

pcie-endpoint is a generic framework that allows Linux to act as an
endpoint and set up the BARs, etc.  mic appears to have Intel
MIC-specific code for this (pre-dating pcie-endpoint) but this is
separate from the vop code.  pcie-endpoint and vop do not have
overlapping functionality and can be used together.

I'm not familiar with NTB, but from a quick look it seems to be tied to
special hardware, and I don't see any virtio-related code there.  A vop
backend for NTB-backend would presumably work to allow virtio
functionality there.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 10:54   ` Vincent Whitchurch
@ 2019-01-17 12:39     ` Arnd Bergmann
  2019-01-17 15:15       ` Christoph Hellwig
  2019-01-17 15:19       ` Vincent Whitchurch
  0 siblings, 2 replies; 17+ messages in thread
From: Arnd Bergmann @ 2019-01-17 12:39 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe

On Thu, Jan 17, 2019 at 11:54 AM Vincent Whitchurch
<vincent.whitchurch@axis.com> wrote:
>
> On Wed, Jan 16, 2019 at 06:07:53PM +0100, Arnd Bergmann wrote:
> > On Wed, Jan 16, 2019 at 5:33 PM Vincent Whitchurch <vincent.whitchurch@axis.com> wrote:
> > > The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements a
> > > generic framework to use virtio between two Linux systems, given shared memory
> > > and a couple of interrupts.  It does not actually require the Intel MIC
> > > hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> > > buildable on more systems and adds a loopback driver to test it without special
> > > hardware.
> > >
> > > Note that I don't have access to Intel MIC hardware so some testing of the
> > > patchset (especially the patch "vop: Use consistent DMA") on that platform
> > > would be appreciated, to ensure that the series does not break anything there.
> >
> > I think we need to take a step back though and discuss what combinations
> > we actually do want to support. I have not actually read the whole mic/vop
> > driver, so I don't know if this would be a good fit as a generic interface --
> > it may or may not be, and any other input would be helpful.
>
> The MIC driver as a whole is uninteresting as a generic interface since
> it is quite tied to the Intel hardware.  The VOP parts though are
> logically separated and have no relation to that hardware, even if the
> ioctls are called MIC_VIRTIO_*.
>
> The samples/mic/mpssd/mpssd.c code handles both the boot of the MIC
> (sysfs) and the VOP parts (ioctls).

Right, I wasn't talking about the MIC driver here, just the VOP
stuff. Since that comes with an ioctl interface that you want to keep
using on other hardware, this still means we have to review if it is
a good fit as a general-purpose API.

> > Aside from that, I should note that we have two related subsystems
> > in the kernel: the PCIe endpoint subsystem maintained by Kishon and
> > Lorenzo, and the NTB subsystem maintained by Jon, Dave and Allen.
> >
> > In order to properly support virtio over PCIe, I would hope we can come
> > up with a user space interface that looks the same way for configuring
> > virtio drivers in mic, pcie-endpoint and ntb, if at all possible. Have
> > you looked at those two subsystems?
>
> pcie-endpoint is a generic framework that allows Linux to act as an
> endpoint and set up the BARs, etc.  mic appears to have Intel
> MIC-specific code for this (pre-dating pcie-endpoint) but this is
> separate from the vop code.  pcie-endpoint and vop do not have
> overlapping functionality and can be used together.

What we need to find out though is whether the combination of vop
with pcie-endpoint provides a good abstraction for what users
actually need when want to use e.g. a virtio-net connection on top
of PCIe endpoint hardware.

> I'm not familiar with NTB, but from a quick look it seems to be tied to
> special hardware, and I don't see any virtio-related code there.  A vop
> backend for NTB-backend would presumably work to allow virtio
> functionality there.

Correct, and again we have to see if this is a good interface. The NTB
and PCIe-endpoint interfaces have a number of differences and a
number of similarities. In particular they should both be usable with
virtio-style drivers, but the underlying hardware differs mainly in how
it is probed by the system: an NTB is seen as a PCI device attached
to two host bridges, while and endpoint is typically a platform_device
on one side, but a pci_dev on the other side.

Can you describe how you expect a VOP device over NTB or
PCIe-endpoint would get created, configured and used?
Is there always one master side that is responsible for creating
virtio devices on it, with the slave side automatically attaching to
them, or can either side create virtio devices? Is there any limit on
the number of virtio devices or queues within a VOP device?

       Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 12:39     ` Arnd Bergmann
@ 2019-01-17 15:15       ` Christoph Hellwig
  2019-01-17 15:19         ` Christoph Hellwig
  2019-01-17 15:19       ` Vincent Whitchurch
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2019-01-17 15:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Vincent Whitchurch, sudeep.dutt, ashutosh.dixit, gregkh,
	Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> Can you describe how you expect a VOP device over NTB or
> PCIe-endpoint would get created, configured and used?
> Is there always one master side that is responsible for creating
> virtio devices on it, with the slave side automatically attaching to
> them, or can either side create virtio devices? Is there any limit on
> the number of virtio devices or queues within a VOP device?

For VOP device over NTB your configure your device using configfs
on one side, and for the other side it will just show up like any
other PCIe device, because it is.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 12:39     ` Arnd Bergmann
  2019-01-17 15:15       ` Christoph Hellwig
@ 2019-01-17 15:19       ` Vincent Whitchurch
  2019-01-17 15:21         ` Christoph Hellwig
                           ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Vincent Whitchurch @ 2019-01-17 15:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe

On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> Correct, and again we have to see if this is a good interface. The NTB
> and PCIe-endpoint interfaces have a number of differences and a
> number of similarities. In particular they should both be usable with
> virtio-style drivers, but the underlying hardware differs mainly in how
> it is probed by the system: an NTB is seen as a PCI device attached
> to two host bridges, while and endpoint is typically a platform_device
> on one side, but a pci_dev on the other side.
> 
> Can you describe how you expect a VOP device over NTB or
> PCIe-endpoint would get created, configured and used?

Assuming PCIe-endpoint:

On the RC, a vop-host-backend driver (PCI driver) sets up some shared
memory area which the RC and the endpoint can use to communicate the
location of the MIC device descriptors and other information such as the
MSI address.  It implements vop callbacks to allow the vop framework to
obtain the address of the MIC descriptors and send/receive interrupts
to/from the guest.

On the endpoint, the PCIe endpoint driver sets up (hardcoded) BARs and
memory regions as required to allow the endpoint and the root complex to
access each other's memory.

On the endpoint, the vop-guest-backend, via the shared memory set up by
the vop-host-backend, obtains the address of the MIC device page and the
MSI address, and a method to receive vop interrupts from the host.  This
information is used to implement the vop callbacks allowing the vop
framework to access to the MIC device page and send/receive interrupts
from/to the host.

vop (despite its name) doesn't care about PCIe.  The vop-guest-backend
doesn't actually need to talk to the PCIe endpoint driver.  The
vop-guest-backend can be probed via any means, such as via a device tree
on the endpoint.

On the RC, userspace opens the vop device and adds the virtio devices,
which end up in the MIC device page set up by the vop-host-backend.

On the endpoint, when the vop framework (via the vop-guest-backend) sees
these devices, it registers devices on the virtio bus and the virtio
drivers are probed.

On the RC, userspace implements the device end of the virtio
communication in userspace, using the MIC_VIRTIO_COPY_DESC ioctl.  I
also have patches to support vhost.

> Is there always one master side that is responsible for creating
> virtio devices on it, with the slave side automatically attaching to
> them, or can either side create virtio devices?

Only the master can create virtio devices.  The virtio drivers run on
the slave.

> Is there any limit on
> the number of virtio devices or queues within a VOP device?

The virtio device information (mic_device_desc) is put into the MIC
device page whose size is limited by the ABI header in
include/uapi/linux/mic_ioctl.h (MIC_DP_SIZE, 4096 bytes).  So the number
of devices is limited by the limit of the number of device descriptors
that can fit in that size.  There is also a per-device limit on the
number of vrings (MIC_VRING_ENTRIES) and vring entries
(MIC_VRING_ENTRIES) in the ABI header.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:15       ` Christoph Hellwig
@ 2019-01-17 15:19         ` Christoph Hellwig
  2019-01-17 15:31           ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2019-01-17 15:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Vincent Whitchurch, sudeep.dutt, ashutosh.dixit, gregkh,
	Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 07:15:29AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> > Can you describe how you expect a VOP device over NTB or
> > PCIe-endpoint would get created, configured and used?
> > Is there always one master side that is responsible for creating
> > virtio devices on it, with the slave side automatically attaching to
> > them, or can either side create virtio devices? Is there any limit on
> > the number of virtio devices or queues within a VOP device?
> 
> For VOP device over NTB your configure your device using configfs
> on one side, and for the other side it will just show up like any
> other PCIe device, because it is.

Sorry, I mean over the PCI-EP infratructure of course.  NTB actually
is rather hairy and complicated.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:19       ` Vincent Whitchurch
@ 2019-01-17 15:21         ` Christoph Hellwig
  2019-01-17 15:32           ` Vincent Whitchurch
  2019-01-17 15:53         ` Arnd Bergmann
  2019-01-17 22:17         ` Logan Gunthorpe
  2 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2019-01-17 15:21 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: Arnd Bergmann, sudeep.dutt, ashutosh.dixit, gregkh,
	Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 04:19:06PM +0100, Vincent Whitchurch wrote:
> On the RC, a vop-host-backend driver (PCI driver) sets up some shared
> memory area which the RC and the endpoint can use to communicate the
> location of the MIC device descriptors and other information such as the
> MSI address.  It implements vop callbacks to allow the vop framework to
> obtain the address of the MIC descriptors and send/receive interrupts
> to/from the guest.

Why would we require any work on the RC / host side?  A properly
setup software controlled virtio device should just show up as a
normal PCIe device, and the virtio-pci device should bind to it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:19         ` Christoph Hellwig
@ 2019-01-17 15:31           ` Arnd Bergmann
  0 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2019-01-17 15:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vincent Whitchurch, sudeep.dutt, ashutosh.dixit, gregkh,
	Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 4:19 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Jan 17, 2019 at 07:15:29AM -0800, Christoph Hellwig wrote:
> > On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> > > Can you describe how you expect a VOP device over NTB or
> > > PCIe-endpoint would get created, configured and used?
> > > Is there always one master side that is responsible for creating
> > > virtio devices on it, with the slave side automatically attaching to
> > > them, or can either side create virtio devices? Is there any limit on
> > > the number of virtio devices or queues within a VOP device?
> >
> > For VOP device over NTB your configure your device using configfs
> > on one side, and for the other side it will just show up like any
> > other PCIe device, because it is.
>
> Sorry, I mean over the PCI-EP infratructure of course.  NTB actually
> is rather hairy and complicated.

My understanding was that with virtio, we would be able to have multiple
virtio devices on a single PCI-EP port, so you need a multi-step
configuration: You first set up the PCI-EP to instantiate a VOP device,
which is then seen on both ends of the connection. The question
is how to create a particular virtio device instance (or a set of those)
inside of it.

      Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:21         ` Christoph Hellwig
@ 2019-01-17 15:32           ` Vincent Whitchurch
  2019-01-17 15:46             ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Whitchurch @ 2019-01-17 15:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, sudeep.dutt, ashutosh.dixit, gregkh,
	Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 07:21:42AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 17, 2019 at 04:19:06PM +0100, Vincent Whitchurch wrote:
> > On the RC, a vop-host-backend driver (PCI driver) sets up some shared
> > memory area which the RC and the endpoint can use to communicate the
> > location of the MIC device descriptors and other information such as the
> > MSI address.  It implements vop callbacks to allow the vop framework to
> > obtain the address of the MIC descriptors and send/receive interrupts
> > to/from the guest.
> 
> Why would we require any work on the RC / host side?  A properly
> setup software controlled virtio device should just show up as a
> normal PCIe device, and the virtio-pci device should bind to it.

If I understand you correctly, I think you're talking about the RC
running the virtio drivers and the endpoint implementing the virtio
device?  This vop stuff is used for the other way around: the virtio
device is implement on the RC and the endpoint runs the virtio drivers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:32           ` Vincent Whitchurch
@ 2019-01-17 15:46             ` Christoph Hellwig
  2019-01-17 16:18               ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2019-01-17 15:46 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: Christoph Hellwig, Arnd Bergmann, sudeep.dutt, ashutosh.dixit,
	gregkh, Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 04:32:06PM +0100, Vincent Whitchurch wrote:
> If I understand you correctly, I think you're talking about the RC
> running the virtio drivers and the endpoint implementing the virtio
> device?  This vop stuff is used for the other way around: the virtio
> device is implement on the RC and the endpoint runs the virtio drivers.

Oh.  That is really weird and not that way I'd implement it..

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:19       ` Vincent Whitchurch
  2019-01-17 15:21         ` Christoph Hellwig
@ 2019-01-17 15:53         ` Arnd Bergmann
  2019-01-17 16:26           ` Vincent Whitchurch
  2019-01-17 22:17         ` Logan Gunthorpe
  2 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2019-01-17 15:53 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe

On Thu, Jan 17, 2019 at 4:19 PM Vincent Whitchurch
<vincent.whitchurch@axis.com> wrote:
>
> On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> > Correct, and again we have to see if this is a good interface. The NTB
> > and PCIe-endpoint interfaces have a number of differences and a
> > number of similarities. In particular they should both be usable with
> > virtio-style drivers, but the underlying hardware differs mainly in how
> > it is probed by the system: an NTB is seen as a PCI device attached
> > to two host bridges, while and endpoint is typically a platform_device
> > on one side, but a pci_dev on the other side.
> >
> > Can you describe how you expect a VOP device over NTB or
> > PCIe-endpoint would get created, configured and used?
>
> Assuming PCIe-endpoint:
>
> On the RC, a vop-host-backend driver (PCI driver) sets up some shared
> memory area which the RC and the endpoint can use to communicate the
> location of the MIC device descriptors and other information such as the
> MSI address.  It implements vop callbacks to allow the vop framework to
> obtain the address of the MIC descriptors and send/receive interrupts
> to/from the guest.
>
> On the endpoint, the PCIe endpoint driver sets up (hardcoded) BARs and
> memory regions as required to allow the endpoint and the root complex to
> access each other's memory.
>
> On the endpoint, the vop-guest-backend, via the shared memory set up by
> the vop-host-backend, obtains the address of the MIC device page and the
> MSI address, and a method to receive vop interrupts from the host.  This
> information is used to implement the vop callbacks allowing the vop
> framework to access to the MIC device page and send/receive interrupts
> from/to the host.

Ok, this seems fine so far. So the vop-host-backend is a regular PCI
driver that implements the VOP protocol from the host side, and it
can talk to either a MIC, or another guest-backend written for the PCI-EP
framework to implement the same protocol, right?

> vop (despite its name) doesn't care about PCIe.  The vop-guest-backend
> doesn't actually need to talk to the PCIe endpoint driver.  The
> vop-guest-backend can be probed via any means, such as via a device tree
> on the endpoint.
>
> On the RC, userspace opens the vop device and adds the virtio devices,
> which end up in the MIC device page set up by the vop-host-backend.
>
> On the endpoint, when the vop framework (via the vop-guest-backend) sees
> these devices, it registers devices on the virtio bus and the virtio
> drivers are probed.

Ah, so the direction is fixed, and it's the opposite of what Christoph
and I were expecting. This is probably something we need to discuss
a bit. From what I understand, there is no technical requirement why
it has to be this direction, right?

What I mean is that the same vop framework could work with
a PCI-EP driver implementing the vop-host-backend and
a PCI driver implementing the vop-guest-backend? In order
to do this, the PCI-EP configuration would need to pick whether
it wants the EP to be the vop host or guest, but having more
flexibility in it (letting each side add virtio devices) would be
harder to do.

> On the RC, userspace implements the device end of the virtio
> communication in userspace, using the MIC_VIRTIO_COPY_DESC ioctl.  I
> also have patches to support vhost.

This is a part I don't understand yet. Does this mean that the
normal operation is between a user space process on the vop-host
talking to the kernel on the vop-guest?

I'm a bit worried about the ioctl interface here, as this combines the
configuration side with the actual data transfer, and that seems
a bit inflexible.

> > Is there always one master side that is responsible for creating
> > virtio devices on it, with the slave side automatically attaching to
> > them, or can either side create virtio devices?
>
> Only the master can create virtio devices.  The virtio drivers run on
> the slave.

Ok.

> > Is there any limit on
> > the number of virtio devices or queues within a VOP device?
>
> The virtio device information (mic_device_desc) is put into the MIC
> device page whose size is limited by the ABI header in
> include/uapi/linux/mic_ioctl.h (MIC_DP_SIZE, 4096 bytes).  So the number
> of devices is limited by the limit of the number of device descriptors
> that can fit in that size.  There is also a per-device limit on the
> number of vrings (MIC_VRING_ENTRIES) and vring entries
> (MIC_VRING_ENTRIES) in the ABI header.

Ok, so you can have multiple virtio devices (e.g. a virtio-net and
virtio-console) but not an arbitrary number? I suppose we can always
extend it later if that becomes a problem.

       Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:46             ` Christoph Hellwig
@ 2019-01-17 16:18               ` Arnd Bergmann
  0 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2019-01-17 16:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vincent Whitchurch, sudeep.dutt, ashutosh.dixit, gregkh,
	Linux Kernel Mailing List, Kishon Vijay Abraham I,
	Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason, Dave Jiang,
	Allen Hubbe

On Thu, Jan 17, 2019 at 4:46 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Jan 17, 2019 at 04:32:06PM +0100, Vincent Whitchurch wrote:
> > If I understand you correctly, I think you're talking about the RC
> > running the virtio drivers and the endpoint implementing the virtio
> > device?  This vop stuff is used for the other way around: the virtio
> > device is implement on the RC and the endpoint runs the virtio drivers.
>
> Oh.  That is really weird and not that way I'd implement it..

It does make sense to me for the very special requirements of the MIC
device, which has a regular PC-style server that provides the environment
for a special embedded device inside of a PCIe card, so the PCI-EP
stuff is just used as a transport here going one way, and then the
configuration of the devices implemented through it goes the other
way, providing network connectivity and file system to the embedded
machine on the PCI-EP.

This is actually very similar to a setup that I considered implementing
over USB, where one might have an embedded machine (or a bunch
of them on a USB hub) connected to a USB host port, and then
use it in the opposite way of a regular gadget driver, by providing
a virtfs over USB to the gadget with files residing on a disk on the
USB host.

Apparently Vincent has the same use case that both the Intel
MIC folks and I had here, so doing it like this is clearly useful.
On the other hand, I agree that there are lots of other use cases
that need the opposite, so we should try to come up with a
design that can cover both.
An example of this might be a PCIe-endpoint device providing
network connectivity to the host using a vhost-net device, which
ideally just shows up as a device on the host as a virtio-net
without requiring any configuration.

So for configuring this, I think it'd like to see a way to have
either the PCI-EP or the PCI-host side be the one that can
create virtio devices that show up on the other end. This
configuration is currently done using an ioctl interface, which
was probably the easiest to do for the MIC case, but for
consistency with the PCI-EP framework, using configfs
is probably better.

A different matter is the question of what a virtio device
talks to. A lot of virtio devices are fundamentally
asymmetric (9pfs, rng, block, ...), so you'd have to
have the virtio device on one side, and a user space
or vhost driver on the other. The VOP driver seems to assume
that it's always the slave that uses virtio, while the
master side (which could be on the PCI EP or PCI
host for the sake of this argument) implements it in user
space or otherwise. Is this a safe assumption, or can
we imagine cases where this would be reversed as well?

        Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:53         ` Arnd Bergmann
@ 2019-01-17 16:26           ` Vincent Whitchurch
  2019-01-17 16:34             ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Whitchurch @ 2019-01-17 16:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe

On Thu, Jan 17, 2019 at 04:53:25PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 17, 2019 at 4:19 PM Vincent Whitchurch
> <vincent.whitchurch@axis.com> wrote:
> > On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> > > Can you describe how you expect a VOP device over NTB or
> > > PCIe-endpoint would get created, configured and used?
> >
> > Assuming PCIe-endpoint:
> >
> > On the RC, a vop-host-backend driver (PCI driver) sets up some shared
> > memory area which the RC and the endpoint can use to communicate the
> > location of the MIC device descriptors and other information such as the
> > MSI address.  It implements vop callbacks to allow the vop framework to
> > obtain the address of the MIC descriptors and send/receive interrupts
> > to/from the guest.
> >
> > On the endpoint, the PCIe endpoint driver sets up (hardcoded) BARs and
> > memory regions as required to allow the endpoint and the root complex to
> > access each other's memory.
> >
> > On the endpoint, the vop-guest-backend, via the shared memory set up by
> > the vop-host-backend, obtains the address of the MIC device page and the
> > MSI address, and a method to receive vop interrupts from the host.  This
> > information is used to implement the vop callbacks allowing the vop
> > framework to access to the MIC device page and send/receive interrupts
> > from/to the host.
> 
> Ok, this seems fine so far. So the vop-host-backend is a regular PCI
> driver that implements the VOP protocol from the host side, and it
> can talk to either a MIC, or another guest-backend written for the PCI-EP
> framework to implement the same protocol, right?

Yes, but just to clarify:  the placement of the device page and the way
to communicate the location of the device page address and any other
information needed by the guest-backend are hardware-specific so there
is no generic vop-host-backend implementation which can talk to both a
MIC and to something else.

> > vop (despite its name) doesn't care about PCIe.  The vop-guest-backend
> > doesn't actually need to talk to the PCIe endpoint driver.  The
> > vop-guest-backend can be probed via any means, such as via a device tree
> > on the endpoint.
> >
> > On the RC, userspace opens the vop device and adds the virtio devices,
> > which end up in the MIC device page set up by the vop-host-backend.
> >
> > On the endpoint, when the vop framework (via the vop-guest-backend) sees
> > these devices, it registers devices on the virtio bus and the virtio
> > drivers are probed.
> 
> Ah, so the direction is fixed, and it's the opposite of what Christoph
> and I were expecting. This is probably something we need to discuss
> a bit. From what I understand, there is no technical requirement why
> it has to be this direction, right?

I don't think the vop framework itself has any such requirement.

The MIC uses it in this way (see Documentation/mic/mic_overview.txt) and
it also makes sense (to me, at least) if one wants to treat the endpoint
like one would treat a virtualized guest.

> What I mean is that the same vop framework could work with
> a PCI-EP driver implementing the vop-host-backend and
> a PCI driver implementing the vop-guest-backend? In order
> to do this, the PCI-EP configuration would need to pick whether
> it wants the EP to be the vop host or guest, but having more
> flexibility in it (letting each side add virtio devices) would be
> harder to do.

Correct, this is my understanding also.

> > On the RC, userspace implements the device end of the virtio
> > communication in userspace, using the MIC_VIRTIO_COPY_DESC ioctl.  I
> > also have patches to support vhost.
> 
> This is a part I don't understand yet. Does this mean that the
> normal operation is between a user space process on the vop-host
> talking to the kernel on the vop-guest?

Yes.  For example, the guest mounts a 9p filesystem with virtio-9p and
the 9p server is implemented in a userspace process on the host.  This
is again similar to virtualization.

> I'm a bit worried about the ioctl interface here, as this combines the
> configuration side with the actual data transfer, and that seems
> a bit inflexible.
>
> > > Is there always one master side that is responsible for creating
> > > virtio devices on it, with the slave side automatically attaching to
> > > them, or can either side create virtio devices?
> >
> > Only the master can create virtio devices.  The virtio drivers run on
> > the slave.
> 
> Ok.
> 
> > > Is there any limit on
> > > the number of virtio devices or queues within a VOP device?
> >
> > The virtio device information (mic_device_desc) is put into the MIC
> > device page whose size is limited by the ABI header in
> > include/uapi/linux/mic_ioctl.h (MIC_DP_SIZE, 4096 bytes).  So the number
> > of devices is limited by the limit of the number of device descriptors
> > that can fit in that size.  There is also a per-device limit on the
> > number of vrings (MIC_VRING_ENTRIES) and vring entries
> > (MIC_VRING_ENTRIES) in the ABI header.
> 
> Ok, so you can have multiple virtio devices (e.g. a virtio-net and
> virtio-console) but not an arbitrary number? I suppose we can always
> extend it later if that becomes a problem.

Yes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 16:26           ` Vincent Whitchurch
@ 2019-01-17 16:34             ` Arnd Bergmann
  0 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2019-01-17 16:34 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe

On Thu, Jan 17, 2019 at 5:26 PM Vincent Whitchurch
<vincent.whitchurch@axis.com> wrote:
> On Thu, Jan 17, 2019 at 04:53:25PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 17, 2019 at 4:19 PM Vincent Whitchurch

> > Ok, this seems fine so far. So the vop-host-backend is a regular PCI
> > driver that implements the VOP protocol from the host side, and it
> > can talk to either a MIC, or another guest-backend written for the PCI-EP
> > framework to implement the same protocol, right?
>
> Yes, but just to clarify:  the placement of the device page and the way
> to communicate the location of the device page address and any other
> information needed by the guest-backend are hardware-specific so there
> is no generic vop-host-backend implementation which can talk to both a
> MIC and to something else.

I'm not sure I understand what is hardware specific about it. Shouldn't
it be possible to define at least a vop-host-backend that could work with
any guest-backend running on the PCI-EP framework?

This may have to be different from the interface used on MIC, but
generally speaking that is what I expect from a PCI device.

       Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-17 15:19       ` Vincent Whitchurch
  2019-01-17 15:21         ` Christoph Hellwig
  2019-01-17 15:53         ` Arnd Bergmann
@ 2019-01-17 22:17         ` Logan Gunthorpe
  2 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-01-17 22:17 UTC (permalink / raw)
  To: Vincent Whitchurch, Arnd Bergmann
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, Linux Kernel Mailing List,
	Kishon Vijay Abraham I, Lorenzo Pieralisi, linux-pci, linux-ntb,
	Jon Mason, Dave Jiang, Allen Hubbe



On 2019-01-17 8:19 a.m., Vincent Whitchurch wrote:
> On the endpoint, the PCIe endpoint driver sets up (hardcoded) BARs and
> memory regions as required to allow the endpoint and the root complex to
> access each other's memory.

This statement describes NTB hardware pretty well. In essence that's
what an NTB device is: a BAR that maps to a window in other hosts memory.

Right now the entire NTB upstream software stack (ntb_transport and
ntb_netdev) is specific to that ecosystem and only exposes a network
device so the hosts can communicate. This code works but has some issues
and was never able to perform at full PCIe line speeds (which everyone
expects). So it's not clear to me if anyone is doing anything real with
it. The companies that are working on NTB, that I'm aware of, have
mostly done their own out-of-tree stuff.

It would be interesting to unify ntb_transport with the virtio stack
because I suspect they do very similar things right now and there's a
lot more devices above virtio than just a network device. However, the
main problem people working on NTB face (besides performance) is trying
to get multi-host working in a general and sensible way given that the
hardware typically has limited BAR resources (among other limitations).

Logan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
       [not found] <20190116163253.23780-1-vincent.whitchurch@axis.com>
  2019-01-16 17:07 ` [PATCH 0/8] Virtio-over-PCIe on non-MIC Arnd Bergmann
@ 2019-01-18 23:49 ` Stephen Warren
  2019-01-21 16:25   ` Vincent Whitchurch
  1 sibling, 1 reply; 17+ messages in thread
From: Stephen Warren @ 2019-01-18 23:49 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, arnd, linux-kernel,
	Vincent Whitchurch, ABRAHAM, KISHON VIJAY, Lorenzo Pieralisi,
	linux-pci, linux-ntb, Jon Mason, Dave Jiang, Allen Hubbe,
	Christoph Hellwig

On 1/16/19 9:32 AM, Vincent Whitchurch wrote:
> The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements a
> generic framework to use virtio between two Linux systems, given shared memory
> and a couple of interrupts.  It does not actually require the Intel MIC
> hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> buildable on more systems and adds a loopback driver to test it without special
> hardware.
> 
> Note that I don't have access to Intel MIC hardware so some testing of the
> patchset (especially the patch "vop: Use consistent DMA") on that platform
> would be appreciated, to ensure that the series does not break anything there.

So a while ago I took a look at running virtio over PCIe. I found virtio 
basically had two parts:

1) The protocol used to enumerate which virtio devices exist, and 
perhaps configure them.

2) The ring buffer protocol that actually transfers the data.

I recall that data transfer was purely based on simple shared memory and 
interrupts, and hence could run over PCIe (e.g. via the PCIe endpoint 
subsystem in the kernel) without issue.

However, the enumeration/configuration protocol requires the host to be 
able to do all kinds of strange things that can't possibly be emulated 
over PCIe; IIRC the configuration data contains "registers" that when 
written select the data other "registers" access. When the virtio device 
is exposed by a hypervisor, and all the accesses are emulated 
synchronously through a trap, this is easy enough to implement. However, 
if the two ends of this configuration parsing are on different ends of a 
PCIe bus, there's no way this can work.

Are you thinking of doing something different for 
enumeration/configuration, and just using the virtio ring buffer 
protocol over PCIe?

I did post asking about this quite a while back, but IIRC I didn't 
receive much of a response. Yes, here it is:

> https://lists.linuxfoundation.org/pipermail/virtualization/2018-March/037276.html
"virtio over SW-defined/CPU-driven PCIe endpoint"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC
  2019-01-18 23:49 ` Stephen Warren
@ 2019-01-21 16:25   ` Vincent Whitchurch
  0 siblings, 0 replies; 17+ messages in thread
From: Vincent Whitchurch @ 2019-01-21 16:25 UTC (permalink / raw)
  To: Stephen Warren
  Cc: sudeep.dutt, ashutosh.dixit, gregkh, arnd, linux-kernel, ABRAHAM,
	KISHON VIJAY, Lorenzo Pieralisi, linux-pci, linux-ntb, Jon Mason,
	Dave Jiang, Allen Hubbe, Christoph Hellwig, virtualization

On Fri, Jan 18, 2019 at 04:49:16PM -0700, Stephen Warren wrote:
> On 1/16/19 9:32 AM, Vincent Whitchurch wrote:
> > The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements a
> > generic framework to use virtio between two Linux systems, given shared memory
> > and a couple of interrupts.  It does not actually require the Intel MIC
> > hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> > buildable on more systems and adds a loopback driver to test it without special
> > hardware.
> > 
> > Note that I don't have access to Intel MIC hardware so some testing of the
> > patchset (especially the patch "vop: Use consistent DMA") on that platform
> > would be appreciated, to ensure that the series does not break anything there.
> 
> So a while ago I took a look at running virtio over PCIe. I found virtio
> basically had two parts:
> 
> 1) The protocol used to enumerate which virtio devices exist, and perhaps
> configure them.
> 
> 2) The ring buffer protocol that actually transfers the data.
> 
> I recall that data transfer was purely based on simple shared memory and
> interrupts, and hence could run over PCIe (e.g. via the PCIe endpoint
> subsystem in the kernel) without issue.
> 
> However, the enumeration/configuration protocol requires the host to be able
> to do all kinds of strange things that can't possibly be emulated over PCIe;
> IIRC the configuration data contains "registers" that when written select
> the data other "registers" access. When the virtio device is exposed by a
> hypervisor, and all the accesses are emulated synchronously through a trap,
> this is easy enough to implement. However, if the two ends of this
> configuration parsing are on different ends of a PCIe bus, there's no way
> this can work.

Correct, and that's why the MIC "Virtio-over-PCIe framework" does not
try to implement the standard "Virtio Over PCI Bus".  (Yes, it's
confusing.)

> Are you thinking of doing something different for enumeration/configuration,
> and just using the virtio ring buffer protocol over PCIe?

The mic/vop code already does this.  See
Documentation/mic/mic_overview.txt for some information.

> I did post asking about this quite a while back, but IIRC I didn't receive
> much of a response. Yes, here it is:
> 
> > https://lists.linuxfoundation.org/pipermail/virtualization/2018-March/037276.html
> "virtio over SW-defined/CPU-driven PCIe endpoint"

I came to essentialy the same conclusions before I found the MIC code.

(Your "aside" in that email about virtio doing PCIe reads instead of
 writes is not solved by the MIC code, since that is how the standard
 virtio devices/drivers work.)

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-01-21 16:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190116163253.23780-1-vincent.whitchurch@axis.com>
2019-01-16 17:07 ` [PATCH 0/8] Virtio-over-PCIe on non-MIC Arnd Bergmann
2019-01-17 10:54   ` Vincent Whitchurch
2019-01-17 12:39     ` Arnd Bergmann
2019-01-17 15:15       ` Christoph Hellwig
2019-01-17 15:19         ` Christoph Hellwig
2019-01-17 15:31           ` Arnd Bergmann
2019-01-17 15:19       ` Vincent Whitchurch
2019-01-17 15:21         ` Christoph Hellwig
2019-01-17 15:32           ` Vincent Whitchurch
2019-01-17 15:46             ` Christoph Hellwig
2019-01-17 16:18               ` Arnd Bergmann
2019-01-17 15:53         ` Arnd Bergmann
2019-01-17 16:26           ` Vincent Whitchurch
2019-01-17 16:34             ` Arnd Bergmann
2019-01-17 22:17         ` Logan Gunthorpe
2019-01-18 23:49 ` Stephen Warren
2019-01-21 16:25   ` Vincent Whitchurch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).