netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>, Eli Cohen <elic@nvidia.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	netdev <netdev@vger.kernel.org>
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)
Date: Wed, 15 Dec 2021 18:01:55 -0800	[thread overview]
Message-ID: <71d2a69c-94a7-76b5-2971-570026760bf0@oracle.com> (raw)
In-Reply-To: <20211215162917-mutt-send-email-mst@kernel.org>



On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
>>
>> On 12/14/2021 6:06 PM, Jason Wang wrote:
>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>>>>>>> it ended up with. I have the following questions,
>>>>>>>>
>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>>>>>>> support will be completely dropped from the table, is my understanding
>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>>>>>>> which is backed by the spec at
>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
>>>>>>>> beyond.
>>>>>>> I personally feel it's less work to add in kernel than try to
>>>>>>> work around it in userspace. Jason feels differently.
>>>>>>> Maybe post the patches and this will prove to Jason it's not
>>>>>>> too terrible?
>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
>>>>>> layout level and is limited to x86 only, there should be easy way out.
>>>>> Note a subtle difference: what matters is that guest, not host is x86.
>>>>> Matters for emulators which might reorder memory accesses.
>>>>> I guess this enforcement belongs in QEMU then?
>>>> Right, I mean to get started, the initial guest driver support and the
>>>> corresponding QEMU support for transitional vdpa backend can be limited
>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
>>>> suppose it's not hard to enforce in QEMU.
>>> It's more than just config space, most devices have headers before the buffer.
>> The ordering in datapath (data VQs) would have to rely on vendor's support.
>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
>> can/should well support the case when ORDER_PLATFORM is not acked by the
>> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
>> today), even with v1.0 spec conforming and modern only vDPA device. The
>> control VQ is implemented in software in the kernel, which can be easily
>> accommodated/fixed when needed.
>>
>>>> QEMU can drive GET_LEGACY,
>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
>>>> individual vendor driver. For that, we need another negotiation protocol
>>>> similar to vhost_user's protocol_features between the vdpa kernel and
>>>> QEMU, way before the guest driver is ever probed and its feature
>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
>>>> from the device, but we can assume weak ordering for legacy at this
>>>> point (x86 only)?
>>> I'm lost here, we have get_features() so:
>> I assume here you refer to get_device_features() that Eli just changed the
>> name.
>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
>>> 2) ORDER_PLATFORM means device requires platform ordering
>>>
>>> Any reason for having a new API for this?
>> Are you going to enforce all vDPA hardware vendors to support the
>> transitional model for legacy guest? meaning guest not acknowledging
>> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
>> (regarding ring layout, native endianness, message framing, vq alignment of
>> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
>> all the things) instead? Noted we don't yet have a set_device_features()
>> that allows the vdpa device to tell whether it is operating in transitional
>> or modern-only mode. For software virtio, all support for the legacy part in
>> a transitional model has been built up there already, however, it's not easy
>> for vDPA vendors to implement all the requirements for an all-or-nothing
>> legacy guest support (big endian guest for example). To these vendors, the
>> legacy support within a transitional model is more of feature to them and
>> it's best to leave some flexibility for them to implement partial support
>> for legacy. That in turn calls out the need for a vhost-user protocol
>> feature like negotiation API that can prohibit those unsupported guest
>> setups to as early as backend_init before launching the VM.
> Right. Of note is the fact that it's a spec bug which I
> hope yet to fix, though due to existing guest code the
> fix won't be complete.
I thought at one point you pointed out to me that the spec does allow 
config space read before claiming features_ok, and only config write 
before features_ok is prohibited. I haven't read up the full thread of 
Halil's VERSION_1 for transitional big endian device yet, but what is 
the spec bug you hope to fix?

>
> WRT ioctls, One thing we can do though is abuse set_features
> where it's called by QEMU early on with just the VERSION_1
> bit set, to distinguish between legacy and modern
> interface. This before config space accesses and FEATURES_OK.
>
> Halil has been working on this, pls take a look and maybe help him out.
Interesting thread, am reading now and see how I may leverage or help there.

>>>>>> I
>>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
>>>>>> 0.95 support, it seems all the ingredient had been there already dated back
>>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
>>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
>>>>>> a few DMA setup ops, which is virtio 1.0 only.
>>>>>>
>>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>>>>>>> of vdpa_get_config() to vdpa_set_config().
>>>>>>>>
>>>>>>>>             /*
>>>>>>>>              * Config accesses aren't supposed to trigger before features are
>>>>>>>> set.
>>>>>>>>              * If it does happen we assume a legacy guest.
>>>>>>>>              */
>>>>>>>>             if (!vdev->features_valid)
>>>>>>>>                     vdpa_set_features(vdev, 0);
>>>>>>>>             ops->get_config(vdev, offset, buf, len);
>>>>>>>>
>>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Siwei
>>>>>>> I'm not sure how important it is to change that.
>>>>>>> In any case it only affects transitional devices, right?
>>>>>>> Legacy only should not care ...
>>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
>>>>>> modern one in a transitional device model rather than being legacy only.
>>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
>>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
>>>>>> only at the time of vdpa creation would simplify the implementation a lot?
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>> I don't know for sure. Take a look at the work Halil was doing
>>>>> to try and support transitional devices with BE guests.
>>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
>>>> implementation can be started to support x86 guest/host with little
>>>> endian and weak memory ordering first. The real trick is to detect
>>>> legacy guest - I am not sure if it's feasible to shift all the legacy
>>>> detection work to QEMU, or the kernel has to be part of the detection
>>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
>>>> effort in QEMU) as well. Let me take a further look and get back.
>>> Michael may think differently but I think doing this in Qemu is much easier.
>> I think the key is whether we position emulating legacy interfaces in QEMU
>> doing translation on top of a v1.0 modern-only device in the kernel, or we
>> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
>> transitional model in the kernel that is able to work for both v0.95 and
>> v1.0 drivers, with some slight aid from QEMU for
>> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
>> former we still rely on vendor for a performant data vqs implementation,
>> leaving the question to what it may end up eventually in the kernel is
>> effectively the latter).
>>
>> Thanks,
>> -Siwei
>
> My suggestion is post the kernel patches, and we can evaluate
> how much work they are.
Thanks for the feedback. I will take some read then get back, probably 
after the winter break. Stay tuned.

Thanks,
-Siwei

>
>>> Thanks
>>>
>>>
>>>
>>>> Meanwhile, I'll check internally to see if a legacy only model would
>>>> work. Thanks.
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>
>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>>>>>>> know what the use
>>>>>>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>>>>>>> directly? Is there a
>>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>>>>>>> BTW a good API could be
>>>>>>>>>>>>
>>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>>
>>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>>>>>>> legacy driver
>>>>>>>>>>> for vDPA. Consider:
>>>>>>>>>>>
>>>>>>>>>>> 1) It's definition is no-normative
>>>>>>>>>>> 2) A lot of budren of codes
>>>>>>>>>>>
>>>>>>>>>>> So qemu can still present the legacy device since the config
>>>>>>>>>>> space or other
>>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>>>>>>> accessed by
>>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>>>>>>> in this
>>>>>>>>>>> case?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>>>>>>> seem to work for people. Any changes need to take that into account
>>>>>>>>>> and document compatibility concerns.
>>>>>>>>> Agree, let me check.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>       I note that any hardware
>>>>>>>>>> implementation is already broken for legacy except on platforms with
>>>>>>>>>> strong ordering which might be helpful in reducing the scope.
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>


  reply	other threads:[~2021-12-16  2:02 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-19 11:54 [PATCH] vdpa/mlx5: set_features should allow reset to zero Si-Wei Liu
2021-02-21 14:44 ` Eli Cohen
2021-02-21 21:52   ` Michael S. Tsirkin
2021-02-22  6:05     ` Eli Cohen
2021-02-23  9:26       ` Michael S. Tsirkin
2021-02-23  9:48         ` Jason Wang
2021-02-23  9:55           ` Michael S. Tsirkin
2021-02-22  4:14 ` Jason Wang
2021-02-22  7:34   ` Michael S. Tsirkin
2021-02-23  1:12     ` Si-Wei Liu
2021-02-23  2:03       ` Jason Wang
2021-02-23 13:26         ` Michael S. Tsirkin
2021-02-23 19:35           ` Si-Wei Liu
2021-02-24  3:20             ` Jason Wang
2021-02-24  5:17               ` Michael S. Tsirkin
2021-02-24  6:02                 ` Jason Wang
2021-02-24  6:45                 ` Eli Cohen
2021-02-24  6:47                   ` Michael S. Tsirkin
2021-02-24  6:55                     ` Jason Wang
2021-02-24  7:12                       ` Michael S. Tsirkin
2021-02-24 12:40                         ` Eli Cohen
2021-02-24  7:17                       ` Eli Cohen
2021-02-24  5:04             ` Michael S. Tsirkin
2021-02-24  6:04               ` Jason Wang
2021-02-24  6:46                 ` Michael S. Tsirkin
2021-02-24  6:53                   ` Jason Wang
2021-02-24  7:17                     ` Michael S. Tsirkin
     [not found]                       ` <babc654d-8dcd-d8a2-c3b6-d20cc4fc554c@redhat.com>
2021-02-24  8:43                         ` Michael S. Tsirkin
2021-02-24  9:30                           ` Jason Wang
2021-02-28 21:30                             ` Michael S. Tsirkin
2021-03-01  3:53                               ` Jason Wang
2021-02-24 18:24               ` Si-Wei Liu
2021-02-26  0:56                 ` Si-Wei Liu
2021-02-28 21:27                   ` Michael S. Tsirkin
2021-03-01 18:08                     ` Si-Wei Liu
2021-02-28 21:28                 ` Michael S. Tsirkin
2021-02-28 21:34                 ` Michael S. Tsirkin
2021-03-01  3:56                   ` Jason Wang
2021-03-02  9:47                     ` Michael S. Tsirkin
2021-03-02 10:53                       ` Jason Wang
2021-12-11  1:44                         ` vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero) Si-Wei Liu
2021-12-12  9:26                           ` Michael S. Tsirkin
2021-12-13  3:02                             ` Jason Wang
2021-12-13  8:06                               ` Michael S. Tsirkin
2021-12-13  8:57                                 ` Jason Wang
2021-12-13 10:42                                   ` Michael S. Tsirkin
2021-12-14  1:13                               ` Si-Wei Liu
2021-12-14  1:59                             ` Si-Wei Liu
2021-12-14  3:01                               ` Jason Wang
2021-12-14  5:06                               ` Michael S. Tsirkin
2021-12-15  1:05                                 ` Si-Wei Liu
2021-12-15  2:06                                   ` Jason Wang
2021-12-15 20:52                                     ` Si-Wei Liu
2021-12-15 21:33                                       ` Michael S. Tsirkin
2021-12-16  2:01                                         ` Si-Wei Liu [this message]
2021-12-16  2:53                                           ` Jason Wang
2021-12-16 22:32                                             ` Si-Wei Liu
2021-12-17  1:57                                               ` Jason Wang
2021-12-17  2:00                                                 ` Michael S. Tsirkin
2021-12-17  2:15                                                   ` Jason Wang
2021-12-16  6:35                                           ` Michael S. Tsirkin
2021-12-16  3:43                                       ` Jason Wang
2021-12-17  1:08                                         ` Si-Wei Liu
2021-12-17  2:01                                           ` Jason Wang
2021-02-22 17:09   ` [PATCH] vdpa/mlx5: set_features should allow reset to zero Si-Wei Liu
2021-02-23  2:03     ` Jason Wang
2021-02-23  9:25     ` Michael S. Tsirkin
2021-02-23  9:46       ` Jason Wang
2021-02-23 10:01         ` Michael S. Tsirkin
2021-02-23 10:17           ` Jason Wang
2021-02-24  9:40             ` Jason Wang
2021-02-23 10:04         ` [virtio-dev] " Cornelia Huck
2021-02-23 10:31           ` Jason Wang
2021-02-23 10:58             ` Cornelia Huck
2021-02-24  9:29               ` Jason Wang
2021-02-24 11:12                 ` Cornelia Huck
2021-02-25  4:36                   ` Jason Wang
2021-02-25 13:26                     ` Cornelia Huck
2021-02-25 18:53                     ` Michael S. Tsirkin
2021-02-26  8:19                       ` Jason Wang
2021-02-28 21:25                         ` Michael S. Tsirkin
2021-03-01  3:51                           ` Jason Wang
2021-03-02 12:08                             ` Cornelia Huck
     [not found]                               ` <5f6972fe-7246-b622-958d-9cab8dd98e21@redhat.com>
2021-03-03  8:29                                 ` Cornelia Huck
2021-03-04  8:24                                   ` Jason Wang
2021-03-04 13:50                                     ` Cornelia Huck
2021-03-05  3:01                                       ` Jason Wang
2021-02-23 12:26 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71d2a69c-94a7-76b5-2971-570026760bf0@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=elic@nvidia.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).