All of lore.kernel.org
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Cindy Lu <lulu@redhat.com>, mst <mst@redhat.com>,
	Yongji Xie <xieyongji@bytedance.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Gautam Dawar <gdawar@xilinx.com>,
	virtualization <virtualization@lists.linux-foundation.org>,
	eperezma <eperezma@redhat.com>,
	Wu Zongyong <wuzongyong@linux.alibaba.com>,
	Eli Cohen <elic@nvidia.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>
Subject: Re: [PATCH V2 2/3] vdpa_sim_net: support feature provisioning
Date: Tue, 27 Sep 2022 02:41:07 -0700	[thread overview]
Message-ID: <c5a96de5-699a-8b5e-0e89-bfe1822e1105@oracle.com> (raw)
In-Reply-To: <CACGkMEsWPbTs+D4PBHQL2hUOtGWj_6zo-669cUhYK5zK039QCQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 13692 bytes --]



On 9/26/2022 8:59 PM, Jason Wang wrote:
> On Tue, Sep 27, 2022 at 9:02 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>
>>
>> On 9/26/2022 12:11 AM, Jason Wang wrote:
>>
>> On Sat, Sep 24, 2022 at 4:01 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>
>>
>> On 9/21/2022 7:43 PM, Jason Wang wrote:
>>
>> This patch implements features provisioning for vdpa_sim_net.
>>
>> 1) validating the provisioned features to be a subset of the parent
>>      features.
>> 2) clearing the features that is not wanted by the userspace
>>
>> For example:
>>
>> # vdpa mgmtdev show
>> vdpasim_net:
>>     supported_classes net
>>     max_supported_vqs 3
>>     dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM
>>
>> Sighs, not to blame any one and it's perhaps too late, but this
>> "dev_features" attr in "mgmtdev show" command output should have been
>> called "supported_features" in the first place.
>>
>> Not sure I get this, but I guess this is the negotiated features actually.
>>
>> Actually no, that is why I said the name is a bit confusing and "supported_features" might sound better.
> You're right, it's an mgmtdev show actually.
>
>> This attribute in the parent device (mgmtdev) denotes the real device capability for what virtio features can be supported by the parent device. Any unprivileged user can check into this field to know parent device's capability without having to create a child vDPA device at all. The features that child vDPA device may support should be a subset of, or at most up to what the parent device offers. For e.g. the vdpa device dev1 you created below can expose less or equal device_features bit than 0x308820028 (MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM), but shouldn't be no more than what the parent device can actually support.
> Yes, I didn't see anything wrong with "dev_features",
Yep, it didn't appear to me anything wrong either at first sight, then I 
gave my R-b on the series introduced this attribute. But it's not a 
perfect name, either, on the other hand. Parav later pointed out that 
the corresponding enum definition for this attribute should follow 
pre-existing naming convention that we should perhaps do 
s/VDPA_ATTR_DEV_SUPPORTED_FEATURES/VDPA_ATTR_MGMTDEV_SUPPORTED_FEATURES/ 
to get it renamed, as this is a mgmtdev level attribute, which I agree. 
Now that with the upcoming "device_features" attribute (vdpa dev level) 
from this series, it's subject to another confusions between these two 
similar names, but actually would represent things at different level. 
While all other attributes in "mgmtdev dev show" seem to be aligned with 
the "supported_" prefix, e.g. supported_classes, max_supported_vqs, from 
which I think the stance of device is already implied through "mgmtdev" 
in the command. For the perspective of clarify and easy distinction, 
"supported_features" seems to be a better name than "dev_features".

>   it aligns to the
> virtio spec which means the features could be used to create a vdpa
> device. But if everyone agree on the renaming, I'm fine.
Never mind, if it's late don't have to bother.

>
>>
>> I think Ling Shan is working on reporting both negotiated features
>> with the device features.
>>
>> Does it imply this series is connected to another work in parallel? Is it possible to add a reference in the cover letter?
> I'm not sure, I remember Ling Shan did some work to not block the
> config show in this commit:
>
> commit a34bed37fc9d3da319bb75dfbf02a7d3e95e12de
> Author: Zhu Lingshan<lingshan.zhu@intel.com>
> Date:   Fri Jul 22 19:53:07 2022 +0800
>
>      vDPA: !FEATURES_OK should not block querying device config space
>
> We need some changes in the vdpa tool to show device_features
> unconditionally in the "dev config show" command.
That's true, I think I ever pointed it out to Lingshan before, that it's 
not needed to bother exposing those config space fields in "dev config 
show" output, if the only intent is for live migration of device 
features between nodes. For vDPA live migration, what cares most is 
those configuration parameters specified on vdpa creation, and userspace 
VMM (QEMU) is supposed to take care of saving and restoring live device 
states. I think it's easier to extend "vdpa dev show" output to include 
device_features and other config params as well, rather than count on 
validity of various config space fields.

https://lore.kernel.org/virtualization/454bdf1b-daa1-aa67-2b8c-bc15351c1851@oracle.com/

It's not just insufficient, but sometimes is incorrect to create vDPA 
device using the config space fields.  For instance, MAC address in 
config space can be changed temporarily (until device reset) via ctrl_vq 
VIRTIO_NET_CTRL_MAC_ADDR_SET command. It's incorrect to create vDPA 
using the MAC address shown in the config space.  Another example, if 
the source vDPA device has MAC address table size limit of 100, then in 
the destination we should pick parent device with size limit no smaller 
than that, and create vDPA on remote node matching the exact same size. 
There's nothing config space field can assist here.

One example further, in the future, if we are going to introduce 
mandatory feature (for e.g. VERSION_1, RING_PACKED) that the device is 
unable to support the opposite case, the destination device should be 
created with equally same mandatory device features, which only vDPA 
creation parameters should matter. While I can't think of a case that 
the mgmt software or live migration tool would have to count on config 
space fields only.


>
>>
>> 1) provision vDPA device with all features that are supported by the
>>      net simulator
>>
>> # vdpa dev add name dev1 mgmtdev vdpasim_net
>> # vdpa dev config show
>> dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500
>>     negotiated_features MTU MAC CTRL_VQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM
>>
>> Maybe not in this patch, but for completeness for the whole series,
>> could we also add device_features to the output?
>>
>> Lingshan, could you please share your thoughts or patch on this?
>>
>> Noted here the device_features argument specified during vdpa creation is introduced by this series itself, it somehow slightly changed the original semantics of what device_features used to be.
> I'm not sure I get this, we don't support device_features in the past
> and it is used to provision device features to the vDPA device which
> seems to be fine.
Before this change, only look at the dev_features in "mgmtdev show" and 
remember creation parameters is sufficient to get to all needed info for 
creating vDPA at destination. After this change, dev_features in 
"mgmtdev show" becomes less relevant, as it would need to remember vdpa 
creation parameters plus the device_features attribute. While this 
series allows cross vendor live migration, it would complicate the 
implementation of mgmt software, on the other hand.

>
>>
>> When simply look at the "vdpa dev config show" output, I cannot really
>> tell the actual device_features that was used in vdpa creation. For e.g.
>> there is a missing feature ANY_LAYOUT from negotiated_features compared
>> with supported_features in mgmtdev, but the orchestration software
>> couldn't tell if the vdpa device on destination host should be created
>> with or without the ANY_LAYOUT feature.
>>
>> I think VERSION_1 implies ANY_LAYOUT.
>>
>> Right, ANY_LAYOUT is a bad example. A good example might be that, I knew the parent mgmtdev on migration source node supports CTRL_MAC_ADDR, but I don't find it in negotiated_features.
> I think we should use the features that we got from "mgmtdev show"
> instead of "negotiated features".
That was how it's supposed to work previously, but with this series, I 
think the newly introduced device_features will be needed instead of the 
one in "mgmtdev show".

>
>> On the migration destination node, the parent device does support all features as the source offers, including CTRL_MAC_ADDR. What device features you would expect the mgmt software to create destination vdpa device with, if not otherwise requiring mgmt software to remember all the arguments on device creation?
> So in this example, we need use "dev_features" so we get exact the
> same features after and operation as either src or dst.
If the device_features vDPA created with at the source doesn't include 
CTRL_MAC_ADDR even though parent supports it, then the vDPA to be 
created at the destination shouldn't come with CTRL_MAC_ADDR either, 
regardless of whether or not CTRL_MAC_ADDR is present in destination 
"mgmtdev show".

However, if just taking look at negotiated_features, some mgmt software 
implementations which don't persist the creation parameters can't get 
the device features a certain vDPA device at the source node was created 
with.

>
>> SOURCE# vdpa mgmtdev show
>> vdpasim_net:
>>     supported_classes net
>>     max_supported_vqs 3
>>     dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM
>> SOURCE# vdpa dev config show
>> dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500
>>     negotiated_features MTU MAC CTRL_VQ VERSION_1 ACCESS_PLATFORM
>>
>> DESTINATION# vdpa mgmtdev show
>> vdpasim_net:
>>     supported_classes net
>>     max_supported_vqs 3
>>     dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM
>>
>>   But it should be sufficient to
>> use features_src & feature_dst in this case. Actually, it should work
>> similar as to the cpu flags, the management software should introduce
>> the concept of cluster which means the maximal set of common features
>> is calculated and provisioned during device creation to allow
>> migration among the nodes inside the cluster.
>>
>> Yes, this is one way mgmt software may implement, but I am not sure if it's the only way. For e.g. for cpu flags, mgmt software can infer the guest cpus features in use from all qemu command line arguments and host cpu features/capability, which doesn't need to remember creation arguments and is easy to recover from failure without having to make the VM config persistent in data store. I thought it would be great if vdpa CLI design could offer the same.
> One minor difference is that we have cpu model abstraction, so we can
> have things like:
>
> ./qemu-system-x86_64 -cpu EPYC
>
> Which implies the cpu features/flags where vDPA doesn't have. But
> consider it's just a 64bit (or 128 in the future), it doesn't seems to
> be too complex for the management to know, we probably need to start
> from this and then we can try to introduce some generation/model after
> it is agreed on most of the vendors.
What you refer to is the so-called named model for CPU flags. I think 
it's a good addition to have some generation or named model defined for 
vDPA. But I don't get the point for how it relates to exposing the 
actual value of device features? Are you saying in this case you'd 
rather expose the model name than the actual value of feature bits? 
Well, I think we can expose both in different fields when there's really 
such a need.

BTW with regard to the cpu model in mgmt software implementation, the 
one implemented in libvirt is a mixed "Host model" [1] with taking 
advantage of QEMU named model and exposing additional individual CPU 
features that gets close to what host CPU offers. I think this implies 
that mgmt software should have to understand what the model name really 
means in terms of individual CPU features, so having feature bit value 
exposed will just do more help if vDPA goes the same way.


Regards,
-Siwei

[1] 
https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html#two-ways-to-configure-cpu-models-with-qemu-kvm

>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>
>> Thanks
>>
>> Thanks,
>> -Siwei
>>
>>
>> 2) provision vDPA device with a subset of the features
>>
>> # vdpa dev add name dev1 mgmtdev vdpasim_net device_features 0x300020000
>> # vdpa dev config show
>> dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500
>>     negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM
>>
>> Reviewed-by: Eli Cohen<elic@nvidia.com>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>    drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 11 ++++++++++-
>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
>> index 886449e88502..a9ba02be378b 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
>> @@ -254,6 +254,14 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
>>        dev_attr.work_fn = vdpasim_net_work;
>>        dev_attr.buffer_size = PAGE_SIZE;
>>
>> +     if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
>> +             if (config->device_features &
>> +                 ~dev_attr.supported_features)
>> +                     return -EINVAL;
>> +             dev_attr.supported_features &=
>> +                      config->device_features;
>> +     }
>> +
>>        simdev = vdpasim_create(&dev_attr);
>>        if (IS_ERR(simdev))
>>                return PTR_ERR(simdev);
>> @@ -294,7 +302,8 @@ static struct vdpa_mgmt_dev mgmt_dev = {
>>        .id_table = id_table,
>>        .ops = &vdpasim_net_mgmtdev_ops,
>>        .config_attr_mask = (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR |
>> -                          1 << VDPA_ATTR_DEV_NET_CFG_MTU),
>> +                          1 << VDPA_ATTR_DEV_NET_CFG_MTU |
>> +                          1 << VDPA_ATTR_DEV_FEATURES),
>>        .max_supported_vqs = VDPASIM_NET_VQ_NUM,
>>        .supported_features = VDPASIM_NET_FEATURES,
>>    };
>>
>>

[-- Attachment #1.2: Type: text/html, Size: 17412 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2022-09-27  9:41 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22  2:43 [PATCH V2 0/3] vdpa: device feature provisioning Jason Wang
2022-09-22  2:43 ` Jason Wang
2022-09-22  2:43 ` [PATCH V2 1/3] " Jason Wang
2022-09-22  2:43   ` Jason Wang
2022-09-22  2:43 ` [PATCH V2 2/3] vdpa_sim_net: support " Jason Wang
2022-09-22  2:43   ` Jason Wang
2022-09-22  5:13   ` Eli Cohen
2022-09-22  7:29     ` Michael S. Tsirkin
2022-09-22  7:29       ` Michael S. Tsirkin
2022-09-22  7:47       ` Eli Cohen
2022-09-22  7:53         ` Michael S. Tsirkin
2022-09-22  7:53           ` Michael S. Tsirkin
2022-09-22  8:01           ` Eli Cohen
2022-09-22  9:11             ` Michael S. Tsirkin
2022-09-22  9:11               ` Michael S. Tsirkin
2022-09-23  4:17               ` Jason Wang
2022-09-23  4:17                 ` Jason Wang
2022-09-23  4:20     ` Jason Wang
2022-09-23  4:20       ` Jason Wang
2022-09-22  9:22   ` Stefano Garzarella
2022-09-22  9:22     ` Stefano Garzarella
2022-09-23  3:33     ` Jason Wang
2022-09-23  3:33       ` Jason Wang
2022-09-23 20:01   ` Si-Wei Liu
2022-09-23 20:01     ` Si-Wei Liu
2022-09-26  7:11     ` Jason Wang
2022-09-26  7:11       ` Jason Wang
2022-09-26  7:11       ` Jason Wang
2022-09-26  7:11         ` Jason Wang
2022-09-27  1:01       ` Si-Wei Liu
2022-09-27  3:59         ` Jason Wang
2022-09-27  3:59           ` Jason Wang
2022-09-27  4:07           ` Jason Wang
2022-09-27  4:07             ` Jason Wang
2022-09-27 10:00             ` Si-Wei Liu
2022-09-29  4:10               ` Jason Wang
2022-09-29  4:10                 ` Jason Wang
2022-10-10 17:44                 ` Si-Wei Liu
2022-10-10 17:44                   ` Si-Wei Liu
2022-09-27  9:41           ` Si-Wei Liu [this message]
2022-09-29  4:55             ` Jason Wang
2022-09-29  4:55               ` Jason Wang
2022-10-07  0:35               ` Si-Wei Liu
2022-10-07  0:35                 ` Si-Wei Liu
2022-10-13  7:10                 ` Jason Wang
2022-10-13  7:10                   ` Jason Wang
2022-10-17 18:43                   ` Si-Wei Liu
2022-10-18  7:45                     ` Jason Wang
2022-10-18  7:45                       ` Jason Wang
2022-09-22  2:43 ` [PATCH V2 3/3] vp_vdpa: " Jason Wang
2022-09-22  2:43   ` Jason Wang
2022-09-23 20:11   ` Si-Wei Liu
2022-09-23 20:11     ` Si-Wei Liu
2022-09-26  7:14     ` Jason Wang
2022-09-26  7:14       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5a96de5-699a-8b5e-0e89-bfe1822e1105@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=elic@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=gdawar@xilinx.com \
    --cc=jasowang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lulu@redhat.com \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wuzongyong@linux.alibaba.com \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.