linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Kishon Vijay Abraham I <kishon@ti.com>,
	Haotian Wang <haotian.wang@sifive.com>,
	mst@redhat.com, lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	Alan Mikhak <alan.mikhak@sifive.com>
Cc: linux-pci@vger.kernel.org, haotian.wang@duke.edu,
	Jon Mason <jdmason@kudzu.us>, KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function
Date: Tue, 26 Nov 2019 17:58:48 +0800	[thread overview]
Message-ID: <d87fbe2f-b3ae-5cb1-448a-41335febc460@redhat.com> (raw)
In-Reply-To: <2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com>


On 2019/11/25 下午8:49, Kishon Vijay Abraham I wrote:
> +Alan, Jon
>
> Hi Jason, Haotian, Alan,
>
> On 05/09/19 8:26 AM, Jason Wang wrote:
>> On 2019/9/5 上午5:58, Haotian Wang wrote:
>>> Hi Jason,
>>>
>>> I have an additional comment regarding using vring.
>>>
>>> On Tue, Sep 3, 2019 at 6:42 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> Kind of, in order to address the above limitation, you probably want to
>>>> implement a vringh based netdevice and driver. It will work like,
>>>> instead of trying to represent a virtio-net device to endpoint,
>>>> represent a new type of network device, it uses two vringh ring instead
>>>> virtio ring. The vringh ring is usually used to implement the
>>>> counterpart of virtio driver. The advantages are obvious:
>>>>
>>>> - no need to deal with two sets of features, config space etc.
>>>> - network specific, from the point of endpoint linux, it's not a virtio
>>>> device, no need to care about transport stuffs or embedding internal
>>>> virtio-net specific data structures
>>>> - reuse the exist codes (vringh) to avoid duplicated bugs, implementing
>>>> a virtqueue is kind of challenge
>>> With vringh.c, there is no easy way to interface with virtio_net.c.
>>>
>>> vringh.c is linked with vhost/net.c nicely
>>
>> Let me clarify, vhost_net doesn't use vringh at all (though there's a
>> plan to switch to use vringh).
>>
>>
>>> but again it's not easy to
>>> interface vhost/net.c with the network stack of endpoint kernel. The
>>> vhost drivers are not designed with the purpose of creating another
>>> suite of virtual devices in the host kernel in the first place. If I try
>>> to manually write code for this interfacing, it seems that I will do
>>> duplicate work that virtio_net.c does.
>>
>> Let me explain:
>>
>> - I'm not suggesting to use vhost_net since it can only deal with
>> userspace virtio rings.
>> - I suggest to introduce netdev that has vringh vring assoticated.
>> Vringh was designed to deal with virtio ring located at different types
>> of memory. It supports userspace vring and kernel vring currently, but
>> it should not be too hard to add support for e.g endpoint device that
>> requires DMA or whatever other method to access the vring. So it was by
>> design to talk directly with e.g kernel virtio device.
>> - In your case, you can read vring address from virtio config space
>> through endpoint framework and then create vringh. It's as simple as:
>> creating a netdev, read vring address, and initialize vringh. Then you
>> can use vringh helper to get iov and build skb etc (similar to caif_virtio).
>  From the discussions above and from looking at Jason's mdev patches [1], I've
> created the block diagram below.
>
> While this patch (from Haotian) deals with RC<->EP connection, I'd also like
> this to be extended for NTB (using multiple EP instances. RC<->EP<->EP<->RC)
> [2][3].
>
> +-----------------------------------+   +-------------------------------------+
> |                                   |   |                                     |
> |  +------------+  +--------------+ |   | +------------+  +--------------+    |
> |  | vringh_net |  | vringh_rpmsg | |   | | virtio_net |  | virtio_rpmsg |    |
> |  +------------+  +--------------+ |   | +------------+  +--------------+    |
> |                                   |   |                                     |
> |          +---------------+        |   |          +---------------+          |
> |          |  vringh_mdev  |        |   |          |  virtio_mdev  |          |
> |          +---------------+        |   |          +---------------+          |
> |                                   |   |                                     |
> |  +------------+   +------------+  |   | +-------------------+ +------------+|
> |  | vringh_epf |   | vringh_ntb |  |   | | virtio_pci_common | | virtio_ntb ||
> |  +------------+   +------------+  |   | +-------------------+ +------------+|
> | (PCI EP Device)   (NTB Secondary  |   |        (PCI RC)       (NTB Primary  |
> |                       Device)     |   |                          Device)    |
> |                                   |   |                                     |
> |                                   |   |                                     |
> |             (A)                   |   |              (B)                    |
> +-----------------------------------+   +-------------------------------------+
>
> GUEST SIDE (B):
> ===============
> In the virtualization terminology, the side labeled (B) will be the guest side.
> Here it will be the place where PCIe host (RC) side SW will execute (Ignore NTB
> for this discussion since PCIe host side SW will execute on both ends of the
> link in the case of NTB. However I've included in the block diagram since the
> design we adopt should be able to be extended for NTB as well).
>
> Most of the pieces in (B) already exists.
> 1) virtio_net and virtio_rpmsg: No modifications needed and can be used as it
>     is.
> 2) virtio_mdev: Jason has sent this [1]. This could be used as it is for EP
>     usecases as well. Jason has created mvnet based on virtio_mdev, but for EP
>     usecases virtio_pci_common and virtio_ntb should use it.


Can we implement NTB as a transport for virtio, then there's no need for 
virtio_mdev?


> 3) virtio_pci_common: This should be used when a PCIe EPF is connected. This
>     should be modified to create virtio_mdev instead of directly creating virtio
>     device.
> 4) virtio_ntb: This is used for NTB where one end of the link should use
>     virtio_ntb. This should create virtio_mdev.
>
> With this virtio_mdev can abstract virtio_pci_common and virtio_ntb and ideally
> any virtio drivers can be used for EP or NTB (In the block diagram above
> virtio_net and virtio_rpmsg can be used).
>
> HOST SIDE (A):
> ===============
> In the virtualization terminology, the side labeled (A) will be the host side.
> Here it will be the place where PCIe device (Endpoint) side SW will execute.
>
> Bits and pieces of (A) should exist but there should be considerable work in this.
> 1) vringh_net: There should be vringh drivers corresponding to
>     the virtio drivers on the guest side (B). vringh_net should register with
>     the net core. The vringh_net device should be created by vringh_mdev. This
>     should be new development.
> 2) vringh_rpmsg: vringh_rpmsg should register with the rpmsg core. The
>     vringh_rpmsg device should be created by vringh_mdev.
> 3) vringh_mdev: This layer should define ops specific to vringh (e.g
>     get_desc_addr() should give vring descriptor address and will depend on
>     either EP device or NTB device). I haven't looked further on what other ops
>     will be needed. IMO this layer should also decide whether _kern() or _user()
>     vringh helpers should be invoked.


Right, but probably not necessary called "mdev", it could just some 
abstraction as a set of callbacks.


> 4) vringh_epf: This will be used for PCIe endpoint. This will implement ops to
>     get the vring descriptor address.
> 5) vringh_ntb: Similar to vringh_epf but will interface with NTB device instead
>     of EPF device.
>
> Jason,
>
> Can you give your comments on the above design? Do you see any flaws/issues
> with the above approach?


Looks good overall, see questions above.

Thanks


>
> Thanks
> Kishon
>
> [1] -> https://lkml.org/lkml/2019/11/18/261
> [2] -> https://lkml.org/lkml/2019/9/26/291
> [3] ->
> https://www.linuxplumbersconf.org/event/4/contributions/395/attachments/284/481/Implementing_NTB_Controller_Using_PCIe_Endpoint_-_final.pdf
>>
>>> There will be two more main disadvantages probably.
>>>
>>> Firstly, there will be two layers of overheads. vhost/net.c uses
>>> vringh.c to channel data buffers into some struct sockets. This is the
>>> first layer of overhead. That the virtual network device will have to
>>> use these sockets somehow adds another layer of overhead.
>>
>> As I said, it doesn't work like vhost and no socket is needed at all.
>>
>>
>>> Secondly, probing, intialization and de-initialization of the virtual
>>> network_device are already non-trivial. I'll likely copy this part
>>> almost verbatim from virtio_net.c in the end. So in the end, there will
>>> be more duplicate code.
>>
>> It will be a new type of network device instead of virtio, you don't
>> need to care any virtio stuffs but vringh in your codes. So it looks to
>> me it would be much simpler and compact.
>>
>> But I'm not saying your method is no way to go, but you should deal with
>> lots of other issues like I've replied in the previous mail. What you
>> want to achieve is
>>
>> 1) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> virtio
>> ring <-> Endpoint (virtio with customized config_ops).
>>
>> But I suggest is
>>
>> 2) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> vringh
>> vring (virtio ring in the Host) <-> network device
>>
>> The differences is.
>> - Complexity: In your proposal, there will be two virtio devices and 4
>> virtqueues. It means you need to prepare two sets of features, config
>> ops etc. And dealing with inconsistent feature will be a pain. It may
>> work for simple case like a virtio-net device with only _F_MAC, but it
>> would be hard to be expanded. If we decide to go for vringh, there will
>> be a single virtio device and 2 virtqueues. In the endpoint part, it
>> will be 2 vringh vring (which is actually point the same virtqueue from
>> Host side) and a normal network device. There's no need for dealing with
>> inconsistency, since vringh basically sever as a a device
>> implementation, the feature negotiation is just between device (network
>> device with vringh) and driver (virtito-pci) from the view of Linux
>> running on the PCI Host.
>> - Maintainability: A third path for dealing virtio ring. We've already
>> had vhost and vringh, a third path will add a lot of overhead when
>> trying to maintaining them. My proposal will try to reuse vringh,
>> there's no need a new path.
>> - Layer violation: We want to hide the transport details from the device
>> and make virito-net device can be used without modification. But your
>> codes try to poke information like virtnet_info. My proposal is to just
>> have a new networking device that won't need to care virtio at all. It's
>> not that hard as you imagine to have a new type of netdev, I suggest to
>> take a look at how caif_virtio is done, it would be helpful.
>>
>> If you still decide to go with two two virtio device model, you need
>> probably:
>> - Proving two sets of config and features, and deal with inconsistency
>> - Try to reuse the vringh codes
>> - Do not refer internal structures from virtio-net.c
>>
>> But I recommend to take a step of trying vringh method which should be
>> much simpler.
>>
>> Thanks
>>
>>
>>> Thank you for your patience!
>>>
>>> Best,
>>> Haotian


  reply	other threads:[~2019-11-26  9:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 21:31 [PATCH] pci: endpoint: functions: Add a virtnet EP function Haotian Wang
2019-08-26 10:51 ` Kishon Vijay Abraham I
2019-08-26 21:59   ` Haotian Wang
2019-08-27  8:12     ` Kishon Vijay Abraham I
2019-08-27 18:01       ` Haotian Wang
2019-08-30  6:11 ` Jason Wang
2019-08-30 23:06   ` Haotian Wang
2019-09-02  3:50     ` Jason Wang
2019-09-02 20:05       ` Haotian Wang
2019-09-03 10:42         ` Jason Wang
2019-09-04  0:55           ` Haotian Wang
2019-09-04 21:58           ` Haotian Wang
2019-09-05  2:56             ` Jason Wang
2019-09-05  3:28               ` Haotian Wang
2019-11-25 12:49               ` Kishon Vijay Abraham I
2019-11-26  9:58                 ` Jason Wang [this message]
2019-11-26 12:35                   ` Kishon Vijay Abraham I
2019-11-26 21:55                     ` Alan Mikhak
2019-11-26 22:01                       ` Alan Mikhak
2019-11-27  3:04                         ` Jason Wang
2019-09-03  6:25 ` Michael S. Tsirkin
2019-09-03 20:39   ` Haotian Wang
2019-09-05  7:07     ` Michael S. Tsirkin
2019-09-05 16:15       ` Haotian Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d87fbe2f-b3ae-5cb1-448a-41335febc460@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alan.mikhak@sifive.com \
    --cc=bhelgaas@google.com \
    --cc=haotian.wang@duke.edu \
    --cc=haotian.wang@sifive.com \
    --cc=jdmason@kudzu.us \
    --cc=kishon@ti.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).