All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Parav Pandit <parav@nvidia.com>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>
Cc: Eli Cohen <elic@nvidia.com>, "mst@redhat.com" <mst@redhat.com>
Subject: Re: [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout
Date: Thu, 24 Jun 2021 15:05:05 +0800	[thread overview]
Message-ID: <4e0708fb-34e3-471d-ca98-44c75f693b32@redhat.com> (raw)
In-Reply-To: <PH0PR12MB54812DD2DADAD0897E24CFA7DC079@PH0PR12MB5481.namprd12.prod.outlook.com>


在 2021/6/24 下午2:29, Parav Pandit 写道:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Thursday, June 24, 2021 11:13 AM
>>
>> 在 2021/6/23 下午12:22, Parav Pandit 写道:
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Wednesday, June 23, 2021 9:39 AM
>>>>
>>>> 在 2021/6/22 下午10:03, Parav Pandit 写道:
>>>>>> Is it better to use a separate enum for net specific attributes?
>>>>>>
>>>>> Yes, because they are only net specific.
>>>>> I guess it is related to your below question.
>>>>>
>>>>>> Another question (sorry if it has been asked before). Can we simply
>>>>>> return the config (binary) to the userspace, then usespace can use
>>>>>> the existing uAPI like virtio_net_config plus the feature to
>>>>>> explain the
>>>> config?
>>>>> We did discuss in v2.
>>>>> Usually returning the whole blob and parsing is not desired via netlink.
>>>>> Returning individual fields give the full flexibility to return only
>>>>> the valid
>>>> fields.
>>>>> Otherwise we need to implement another bitmask too to tell which
>>>>> fields
>>>> from the struct are valid and share with user space.
>>>>> Returning individual fields is the widely used approach.
>>>> The main concerns are:
>>>>
>>>> 1) The blob will be self contained if it was passed with the
>>>> negotiated features, so we don't need bitmask.
>>> Which fields of the struct are valid is told by additional fields.
>>>> 2) Using individual fields means it must duplicate the config fields
>>>> of every virtio devices
>>>>
>>> Mostly no. if there are common config fields across two device types,
>>> they would be named as
>>> VDPA_ATTR_DEV_CFG_*
>>> Net specific will be,
>>> VDPA_ATTR_DEV_NET_CFG_*
>>> Block specific, will be,
>>> VDPA_ATTR_DEV_BLK_CFG_*
>>
>> I meant it looks like VDPA_ATTR_DEV_NET will duplicate all the fields of:
>>
>> struct virtio_net_config;
>>
>> And VDPA_ATTR_DEV_BLOCK will duplicate all the fields of
>>
>> struct virtio_blk_config; which has ~21 fields.
>>
>> And we had a plenty of other types of virtio devices.
>>
>> Consider we had a mature set of virtio specific uAPI for config space.
>> It would be a burden if we need an unnecessary translation layer of netlink in
>> the middle:
>>
>> [vDPA parent (virtio_net_config)] <-> [netlink (VDPA_ATTR_DEV_NET_XX)]
>> <-> [userspace (VDPA_ATTR_DEV_NET_XX)]
>>> <-> [ user (virtio_net_config)]
> This translation is not there. We show relevant net config fields as VDPA_ATTR_DEV_NET individually.
> It is not a binary dump which is harder for users to parse and make any use of it.


The is done implicitly, user needs to understand the semantic of 
virtio_net_config and map the individual fields to the vdpa tool 
sub-command.


>
> It is only one level of translation from virtio_net_config (kernel) -> netlink vdpa fields.
> It is similar to 'struct netdevice' -> rtnl info fields.


I think not, the problem is that the netdevice is not a part of uAPI but 
virtio_net_config is.


>
>> If we make netlink simply a transport, it would be much easier. And we had
>> the chance to unify the logic of build_config() and set_config() in the driver.
> How? We need bit mask to tell that out of 21 fields which fields to update and which not.
> And that is further mixed with offset and length.


So set_config() could be called from userspace, so did build_config(). 
The only difference is:

1) they're using different transport, ioctl vs netlink
2) build_config() is only expected to be called by the management tool

If qemu works well via set_config ioctl, netlink should work as well.

Btw, what happens if management tool tries to modify the mac of vDPA 
when the device is already used by the driver?


>
>>
>>>> And actually, it's not the binary blob since uapi clearly define the
>>>> format (e.g struct virtio_net_config), can we find a way to use that?
>>>> E.g introduce device/net specific command and passing the blob with
>>>> length and negotiated features.
>>> Length may change in future, mostly expand. And parsing based on length
>> is not such a clean way.
>>
>>
>> Length is only for legal checking. The config is self contained with:
>>
> Unlikely. When structure size increases later, the parsing will change based on the length.
> Because older kernel would return shorter length with older iproute2 tool.


This is fine since the older kernel only support less features. The only 
possible issue if the old iproute 2 runs on new kernel. With the current 
proposal, it may cause some config fields can't not be showed.

I think it might be useful to introduce a command to simply dump the 
config space.


> So user space always have to deal and have nasty parsing/typecasting based on the length.


That's how userspace (Qemu) is expected to work now. The userspace 
should determine the semantic of the fields based on the features.

Differentiate config fields doesn't help much, e.g userspace still need 
to differ LINK_UP and ANNOUNCE for the status field.


>
>> 1) device id
>> 2) features
>>
>>
>>> Parsing fields require knowledge of features as well and application needs
>> to make multiple netlink calls to parse the config space.
>>
>>
>> I think we don't care about the performance in this case. It's about three
>> netlink calls:
>>
> Its not about performance. By the time 1st call is made, features got updated and it is out of sync with config.
>
>> 1) get config
>> 2) get device id
>> 3) get features
>>
> This requires using features from 3rd netlink output to decode output of 1st netlink output.
> Which is a bit odd of netlink.
> Other netlink nla_put() probably sending whole structure doesn’t need to do it.


Well, we can pack them all into a single skb isn't it? (probably with a 
config len).


>
>> For build config, it's only one
>>
>> 1) build config
>>
>>
>>> I prefer to follow rest of the kernel style to return self contained
>> invidividual fields.
>>
>>
>> But I saw a lot of kernel codes choose to use e.g nla_put() directly with
>> module specific structure.
>>
> It might be self-contained structure that probably has not found the need to expand.


I think it's just a matter of putting the config length with the config 
data. Note that we've already had .get_config_size() ops for validating 
inputs through VHOST_SET_CONFIG/VHOST_GET_CONFIG.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-06-24  7:05 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 19:11 [PATCH linux-next v3 0/6] vdpa: enable user to set mac, mtu Parav Pandit
2021-06-16 19:11 ` [PATCH linux-next v3 1/6] vdpa: Introduce and use vdpa device get, set config helpers Parav Pandit
2021-06-22  7:08   ` Jason Wang
2021-06-16 19:11 ` [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout Parav Pandit
2021-06-22  7:20   ` Jason Wang
2021-06-22 14:03     ` Parav Pandit
2021-06-23  4:08       ` Jason Wang
2021-06-23  4:22         ` Parav Pandit
2021-06-24  5:43           ` Jason Wang
2021-06-24  6:29             ` Parav Pandit
2021-06-24  7:05               ` Jason Wang [this message]
2021-06-24  7:59                 ` Parav Pandit
2021-06-25  3:28                   ` Jason Wang
2021-06-25  6:45                     ` Parav Pandit
2021-06-28  5:03                       ` Jason Wang
2021-06-28 10:56                         ` Parav Pandit
2021-06-29  3:52                           ` Jason Wang
2021-06-29  9:49                             ` Parav Pandit
2021-06-30  4:31                               ` Jason Wang
2021-06-30  6:03                                 ` Parav Pandit
2021-07-01  3:34                                   ` Jason Wang
2021-07-01  7:00                                     ` Parav Pandit
2021-07-01  7:43                                       ` Jason Wang
2021-07-02  6:04                                         ` Parav Pandit
2021-07-05  4:35                                           ` Jason Wang
2021-07-06 17:07                                             ` Parav Pandit
2021-07-07  4:03                                               ` Jason Wang
2021-06-28 22:39                         ` Michael S. Tsirkin
2021-06-29  3:41                           ` Jason Wang
2021-06-29 20:01                             ` Michael S. Tsirkin
2021-06-30  3:46                               ` Jason Wang
2021-06-16 19:11 ` [PATCH linux-next v3 3/6] vdpa: Enable user to set mac and mtu of vdpa device Parav Pandit
2021-06-22  7:43   ` Jason Wang
2021-06-22 14:09     ` Parav Pandit
2021-06-16 19:11 ` [PATCH linux-next v3 4/6] vdpa_sim_net: Enable user to set mac address and mtu Parav Pandit
2021-06-16 19:11 ` [PATCH linux-next v3 5/6] vdpa/mlx5: Support configuration of MAC Parav Pandit
2021-06-16 19:11 ` [PATCH linux-next v3 6/6] vdpa/mlx5: Forward only packets with allowed MAC address Parav Pandit
2021-08-05  9:57 ` [PATCH linux-next v3 0/6] vdpa: enable user to set mac, mtu Michael S. Tsirkin
2021-08-05 10:13   ` Parav Pandit via Virtualization
2021-08-05 12:05     ` Michael S. Tsirkin
2021-08-06  2:50   ` Jason Wang
2021-08-06  8:42     ` Michael S. Tsirkin
2021-08-06  8:55       ` Parav Pandit via Virtualization
2021-08-09  3:07         ` Jason Wang
2021-08-09  3:13           ` Parav Pandit via Virtualization
2021-08-09  3:29             ` Jason Wang
     [not found]           ` <20210809052121.GA209158@mtl-vdi-166.wap.labs.mlnx>
2021-08-09  5:42             ` Parav Pandit via Virtualization
     [not found]               ` <20210809055748.GA210406@mtl-vdi-166.wap.labs.mlnx>
2021-08-09  6:01                 ` Parav Pandit via Virtualization
     [not found]                   ` <20210809060746.GA210718@mtl-vdi-166.wap.labs.mlnx>
2021-08-09  6:10                     ` Parav Pandit via Virtualization
2021-08-09  7:05                       ` Jason Wang
2021-08-16 20:51                         ` Michael S. Tsirkin
2021-08-09  9:40         ` Michael S. Tsirkin
2021-08-09  9:51           ` Parav Pandit via Virtualization
2021-08-16 20:54             ` Michael S. Tsirkin
2021-08-18  3:14               ` Parav Pandit via Virtualization
2021-08-18  4:31                 ` Jason Wang
2021-08-18  4:36                   ` Parav Pandit via Virtualization
2021-08-19  4:18                     ` Jason Wang
2021-08-18 17:33                   ` Michael S. Tsirkin
2021-08-19  4:22                     ` Jason Wang
2021-08-19  5:23                       ` Parav Pandit via Virtualization
2021-08-19  7:15                         ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e0708fb-34e3-471d-ca98-44c75f693b32@redhat.com \
    --to=jasowang@redhat.com \
    --cc=elic@nvidia.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.