From: Jason Wang <jasowang@redhat.com>
To: Kishon Vijay Abraham I <kishon@ti.com>,
Haotian Wang <haotian.wang@sifive.com>,
mst@redhat.com, lorenzo.pieralisi@arm.com, bhelgaas@google.com,
Alan Mikhak <alan.mikhak@sifive.com>
Cc: linux-pci@vger.kernel.org, haotian.wang@duke.edu,
Jon Mason <jdmason@kudzu.us>, KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function
Date: Tue, 26 Nov 2019 17:58:48 +0800 [thread overview]
Message-ID: <d87fbe2f-b3ae-5cb1-448a-41335febc460@redhat.com> (raw)
In-Reply-To: <2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com>
On 2019/11/25 下午8:49, Kishon Vijay Abraham I wrote:
> +Alan, Jon
>
> Hi Jason, Haotian, Alan,
>
> On 05/09/19 8:26 AM, Jason Wang wrote:
>> On 2019/9/5 上午5:58, Haotian Wang wrote:
>>> Hi Jason,
>>>
>>> I have an additional comment regarding using vring.
>>>
>>> On Tue, Sep 3, 2019 at 6:42 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> Kind of, in order to address the above limitation, you probably want to
>>>> implement a vringh based netdevice and driver. It will work like,
>>>> instead of trying to represent a virtio-net device to endpoint,
>>>> represent a new type of network device, it uses two vringh ring instead
>>>> virtio ring. The vringh ring is usually used to implement the
>>>> counterpart of virtio driver. The advantages are obvious:
>>>>
>>>> - no need to deal with two sets of features, config space etc.
>>>> - network specific, from the point of endpoint linux, it's not a virtio
>>>> device, no need to care about transport stuffs or embedding internal
>>>> virtio-net specific data structures
>>>> - reuse the exist codes (vringh) to avoid duplicated bugs, implementing
>>>> a virtqueue is kind of challenge
>>> With vringh.c, there is no easy way to interface with virtio_net.c.
>>>
>>> vringh.c is linked with vhost/net.c nicely
>>
>> Let me clarify, vhost_net doesn't use vringh at all (though there's a
>> plan to switch to use vringh).
>>
>>
>>> but again it's not easy to
>>> interface vhost/net.c with the network stack of endpoint kernel. The
>>> vhost drivers are not designed with the purpose of creating another
>>> suite of virtual devices in the host kernel in the first place. If I try
>>> to manually write code for this interfacing, it seems that I will do
>>> duplicate work that virtio_net.c does.
>>
>> Let me explain:
>>
>> - I'm not suggesting to use vhost_net since it can only deal with
>> userspace virtio rings.
>> - I suggest to introduce netdev that has vringh vring assoticated.
>> Vringh was designed to deal with virtio ring located at different types
>> of memory. It supports userspace vring and kernel vring currently, but
>> it should not be too hard to add support for e.g endpoint device that
>> requires DMA or whatever other method to access the vring. So it was by
>> design to talk directly with e.g kernel virtio device.
>> - In your case, you can read vring address from virtio config space
>> through endpoint framework and then create vringh. It's as simple as:
>> creating a netdev, read vring address, and initialize vringh. Then you
>> can use vringh helper to get iov and build skb etc (similar to caif_virtio).
> From the discussions above and from looking at Jason's mdev patches [1], I've
> created the block diagram below.
>
> While this patch (from Haotian) deals with RC<->EP connection, I'd also like
> this to be extended for NTB (using multiple EP instances. RC<->EP<->EP<->RC)
> [2][3].
>
> +-----------------------------------+ +-------------------------------------+
> | | | |
> | +------------+ +--------------+ | | +------------+ +--------------+ |
> | | vringh_net | | vringh_rpmsg | | | | virtio_net | | virtio_rpmsg | |
> | +------------+ +--------------+ | | +------------+ +--------------+ |
> | | | |
> | +---------------+ | | +---------------+ |
> | | vringh_mdev | | | | virtio_mdev | |
> | +---------------+ | | +---------------+ |
> | | | |
> | +------------+ +------------+ | | +-------------------+ +------------+|
> | | vringh_epf | | vringh_ntb | | | | virtio_pci_common | | virtio_ntb ||
> | +------------+ +------------+ | | +-------------------+ +------------+|
> | (PCI EP Device) (NTB Secondary | | (PCI RC) (NTB Primary |
> | Device) | | Device) |
> | | | |
> | | | |
> | (A) | | (B) |
> +-----------------------------------+ +-------------------------------------+
>
> GUEST SIDE (B):
> ===============
> In the virtualization terminology, the side labeled (B) will be the guest side.
> Here it will be the place where PCIe host (RC) side SW will execute (Ignore NTB
> for this discussion since PCIe host side SW will execute on both ends of the
> link in the case of NTB. However I've included in the block diagram since the
> design we adopt should be able to be extended for NTB as well).
>
> Most of the pieces in (B) already exists.
> 1) virtio_net and virtio_rpmsg: No modifications needed and can be used as it
> is.
> 2) virtio_mdev: Jason has sent this [1]. This could be used as it is for EP
> usecases as well. Jason has created mvnet based on virtio_mdev, but for EP
> usecases virtio_pci_common and virtio_ntb should use it.
Can we implement NTB as a transport for virtio, then there's no need for
virtio_mdev?
> 3) virtio_pci_common: This should be used when a PCIe EPF is connected. This
> should be modified to create virtio_mdev instead of directly creating virtio
> device.
> 4) virtio_ntb: This is used for NTB where one end of the link should use
> virtio_ntb. This should create virtio_mdev.
>
> With this virtio_mdev can abstract virtio_pci_common and virtio_ntb and ideally
> any virtio drivers can be used for EP or NTB (In the block diagram above
> virtio_net and virtio_rpmsg can be used).
>
> HOST SIDE (A):
> ===============
> In the virtualization terminology, the side labeled (A) will be the host side.
> Here it will be the place where PCIe device (Endpoint) side SW will execute.
>
> Bits and pieces of (A) should exist but there should be considerable work in this.
> 1) vringh_net: There should be vringh drivers corresponding to
> the virtio drivers on the guest side (B). vringh_net should register with
> the net core. The vringh_net device should be created by vringh_mdev. This
> should be new development.
> 2) vringh_rpmsg: vringh_rpmsg should register with the rpmsg core. The
> vringh_rpmsg device should be created by vringh_mdev.
> 3) vringh_mdev: This layer should define ops specific to vringh (e.g
> get_desc_addr() should give vring descriptor address and will depend on
> either EP device or NTB device). I haven't looked further on what other ops
> will be needed. IMO this layer should also decide whether _kern() or _user()
> vringh helpers should be invoked.
Right, but probably not necessary called "mdev", it could just some
abstraction as a set of callbacks.
> 4) vringh_epf: This will be used for PCIe endpoint. This will implement ops to
> get the vring descriptor address.
> 5) vringh_ntb: Similar to vringh_epf but will interface with NTB device instead
> of EPF device.
>
> Jason,
>
> Can you give your comments on the above design? Do you see any flaws/issues
> with the above approach?
Looks good overall, see questions above.
Thanks
>
> Thanks
> Kishon
>
> [1] -> https://lkml.org/lkml/2019/11/18/261
> [2] -> https://lkml.org/lkml/2019/9/26/291
> [3] ->
> https://www.linuxplumbersconf.org/event/4/contributions/395/attachments/284/481/Implementing_NTB_Controller_Using_PCIe_Endpoint_-_final.pdf
>>
>>> There will be two more main disadvantages probably.
>>>
>>> Firstly, there will be two layers of overheads. vhost/net.c uses
>>> vringh.c to channel data buffers into some struct sockets. This is the
>>> first layer of overhead. That the virtual network device will have to
>>> use these sockets somehow adds another layer of overhead.
>>
>> As I said, it doesn't work like vhost and no socket is needed at all.
>>
>>
>>> Secondly, probing, intialization and de-initialization of the virtual
>>> network_device are already non-trivial. I'll likely copy this part
>>> almost verbatim from virtio_net.c in the end. So in the end, there will
>>> be more duplicate code.
>>
>> It will be a new type of network device instead of virtio, you don't
>> need to care any virtio stuffs but vringh in your codes. So it looks to
>> me it would be much simpler and compact.
>>
>> But I'm not saying your method is no way to go, but you should deal with
>> lots of other issues like I've replied in the previous mail. What you
>> want to achieve is
>>
>> 1) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> virtio
>> ring <-> Endpoint (virtio with customized config_ops).
>>
>> But I suggest is
>>
>> 2) Host (virtio-pci) <-> virtio ring <-> virtual eth device <-> vringh
>> vring (virtio ring in the Host) <-> network device
>>
>> The differences is.
>> - Complexity: In your proposal, there will be two virtio devices and 4
>> virtqueues. It means you need to prepare two sets of features, config
>> ops etc. And dealing with inconsistent feature will be a pain. It may
>> work for simple case like a virtio-net device with only _F_MAC, but it
>> would be hard to be expanded. If we decide to go for vringh, there will
>> be a single virtio device and 2 virtqueues. In the endpoint part, it
>> will be 2 vringh vring (which is actually point the same virtqueue from
>> Host side) and a normal network device. There's no need for dealing with
>> inconsistency, since vringh basically sever as a a device
>> implementation, the feature negotiation is just between device (network
>> device with vringh) and driver (virtito-pci) from the view of Linux
>> running on the PCI Host.
>> - Maintainability: A third path for dealing virtio ring. We've already
>> had vhost and vringh, a third path will add a lot of overhead when
>> trying to maintaining them. My proposal will try to reuse vringh,
>> there's no need a new path.
>> - Layer violation: We want to hide the transport details from the device
>> and make virito-net device can be used without modification. But your
>> codes try to poke information like virtnet_info. My proposal is to just
>> have a new networking device that won't need to care virtio at all. It's
>> not that hard as you imagine to have a new type of netdev, I suggest to
>> take a look at how caif_virtio is done, it would be helpful.
>>
>> If you still decide to go with two two virtio device model, you need
>> probably:
>> - Proving two sets of config and features, and deal with inconsistency
>> - Try to reuse the vringh codes
>> - Do not refer internal structures from virtio-net.c
>>
>> But I recommend to take a step of trying vringh method which should be
>> much simpler.
>>
>> Thanks
>>
>>
>>> Thank you for your patience!
>>>
>>> Best,
>>> Haotian
next prev parent reply other threads:[~2019-11-26 9:59 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-23 21:31 [PATCH] pci: endpoint: functions: Add a virtnet EP function Haotian Wang
2019-08-26 10:51 ` Kishon Vijay Abraham I
2019-08-26 21:59 ` Haotian Wang
2019-08-27 8:12 ` Kishon Vijay Abraham I
2019-08-27 18:01 ` Haotian Wang
2019-08-30 6:11 ` Jason Wang
2019-08-30 23:06 ` Haotian Wang
2019-09-02 3:50 ` Jason Wang
2019-09-02 20:05 ` Haotian Wang
2019-09-03 10:42 ` Jason Wang
2019-09-04 0:55 ` Haotian Wang
2019-09-04 21:58 ` Haotian Wang
2019-09-05 2:56 ` Jason Wang
2019-09-05 3:28 ` Haotian Wang
2019-11-25 12:49 ` Kishon Vijay Abraham I
2019-11-26 9:58 ` Jason Wang [this message]
2019-11-26 12:35 ` Kishon Vijay Abraham I
2019-11-26 21:55 ` Alan Mikhak
2019-11-26 22:01 ` Alan Mikhak
2019-11-27 3:04 ` Jason Wang
2019-09-03 6:25 ` Michael S. Tsirkin
2019-09-03 20:39 ` Haotian Wang
2019-09-05 7:07 ` Michael S. Tsirkin
2019-09-05 16:15 ` Haotian Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d87fbe2f-b3ae-5cb1-448a-41335febc460@redhat.com \
--to=jasowang@redhat.com \
--cc=alan.mikhak@sifive.com \
--cc=bhelgaas@google.com \
--cc=haotian.wang@duke.edu \
--cc=haotian.wang@sifive.com \
--cc=jdmason@kudzu.us \
--cc=kishon@ti.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).