All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Parav Pandit <parav@nvidia.com>,
	"dsahern@gmail.com" <dsahern@gmail.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Jiri Pirko <jiri@nvidia.com>,
	"moyufeng@huawei.com" <moyufeng@huawei.com>,
	"linuxarm@openeuler.org" <linuxarm@openeuler.org>
Subject: Re: [PATCH RESEND iproute2-next] devlink: Add optional controller user input
Date: Tue, 8 Jun 2021 11:27:51 +0800	[thread overview]
Message-ID: <8c3e48ce-f5ed-d35d-4f5e-1b572f251bd1@huawei.com> (raw)
In-Reply-To: <PH0PR12MB5481FB8528A90E34FA3578C1DC389@PH0PR12MB5481.namprd12.prod.outlook.com>

On 2021/6/7 19:12, Parav Pandit wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Sent: Monday, June 7, 2021 4:27 PM
>>

[..]

>>>
>>>> 2. each PF's devlink instance has three types of port, which is
>>>>    FLAVOUR_PHYSICAL, FLAVOUR_PCI_PF and
>> FLAVOUR_PCI_VF(supposing I
>>>> understand
>>>>    port flavour correctly).
>>>>
>>> FLAVOUR_PCI_{PF,VF,SF} belongs to eswitch (representor) side on
>> switchdev device.
>>
>> If devlink instance or eswitch is in DEVLINK_ESWITCH_MODE_LEGACY mode,
>> the FLAVOUR_PCI_{PF,VF,SF} port instance does not need to created?
> No. in eswitch legacy, there are no representor netdevice or devlink ports.

It seems each devlink port instance corresponds to a netdevice.
More specificly, the devlink instance is created in the
struct pci_driver' probe function of a pci function, a devlink
port instance is created and registered to that devlink instance
when a netdev of that pci function is created?

As in diagram [1], the devlink port instance(flavour FLAVOUR_PHYSICAL)
for ctrl-0-pf0 is created when the netdev of ctrl-0-pf0 is created in
the host of smartNIC, the devlink port instance(flavour FLAVOUR_VIRTUAL)
for ctrl-0-pf0vfN is created when the netdev of ctrl-0-pf0vfN is created
in the host of smartNIC, right?

When eswitch mode is set to DEVLINK_ESWITCH_MODE_SWITCHDEV, the representor
netdev for PF/VF in "controller_num=1" is created in the host of smartNIC,
so is the devlink port instance(FLAVOUR_PCI_{PF,VF,SF}) corresponding to that
representor netdev just created in the host of smartNIC? More specificly,
devlink port instance(flavour FLAVOUR_PCI_PF) for ctrl-1-pf0 and devlink port
instance (flavour FLAVOUR_PCI_VF)for ctrl-1-pf0vfN?

When "controller_num=1" is plugged to a server, the server host creates
devlink instance and devlink port instance in the host of server as
similar as the ctrl-0-pf0 and ctrl-0-pf0vfN in the host of smartNIC?

> 
>>
>>>
>>>> If I understand above correctly, all ports in the same devlink
>>>> instance should have the same controller id, right? If yes, why not
>>>> put the controller id in the devlink instance?
>>> Need not be. All PCI_{PF,VF,SF} can have controller id different for
>> different controllers.
>>
>> The point is that two VF from different PF may be in the different host, all VF
>> of a specific PF need to be in the same host, right?
>> otherwise it may break PCI enumeration process?
>>
> Sure. VFs belong to PF, PF belong to controller, controller is plugged into a host root complex.
> 
>> If yes, as PCI_{PF,VF,SF} belongs to eswitch (representor) side on switchdev
>> device(which means PCI_{PF,VF,SF} port instance is in the same host, as the
>> host corresponding to "controller_num=0" in diagram [1]), so it seems all the
>> PCI_{PF,VF,SF} of a specific PF should have the same controller id, 
> Yes.
> 
>> and using
>> a controller id of the devlink instance in "controller_num=0" in diagram [1]
>> seems enough?
> Yes.
> 
>>
>>> Usually each multi-host is a different controller.
>>> Refer to this diagram [1] and detailed description.
>>
>> devlink instance does not exist in the host corresponding to
>> "controller_num=1" in diagram [1]?
> Devlink instance do exist for controller=1 related PCI PF,VF,SF devices when those functions are plugged in the host.

> 
>> Or devlink instance does exist in the host corresponding to
>> "controller_num=1", but the mode of that devlink instance is
>> DEVLINK_ESWITCH_MODE_LEGACY in diagram [1]?
> As you can see that eswitch is located only on controller=0.
> This eswitch is serving PF, VF, SFs of controller=1 + controlloler=0 as well.

How do we decide where eswitch is located? through some fw/hw
configuration?

It seems if the eswitch is enabled on "controller=1", that is
a nested eswitch too, which you mentioned below?

>>
>> Also, eswitch mode can only be set on the devlink instance corresponding to
>> PF, but not for VF/SF(supposing that VF/SF could have it's own devlink
>> instance too), right?
> Yes. Eswitch can be located on the VF too. Mlx5 driver doesn't have it yet on VF.
> This may be some nested eswitch in future. I do not know when.
> 
>> by the network/sysadmin.
>>> While devlink instance of a given PF,VF,SF is managed by the user of such
>> function itself.
>>
>> 'devlink port function' means "struct devlink_port", right?
> 'function' is the object managing the function connected on the otherside of this port.
> This includes its hw_addr, rate, state, operational state.

Does "other side of this port" means the pci function that is most
likely have been passed through to a VM?

"devlink port" without the "function" represents the representor
netdev on the host where eswitch is located?

> 
>> It seems 'devlink port function' in the host is representing a VF when devlink
>> instance of that VF is in the VM?
> Right.
>>
>>> For example when a VF is mapped to a VM, devlink instance of this VF
>> resides in the VM managed by the guest VM.
>>
>> Does the user in VM really care about devlink info or configuration when
>> network/sysadmin has configured the VF through 'devlink port function'
>> in the host?
> Yes. devlink instance offers many knobs in uniform way on PF, VF, SF.
> They are in use in mlx5 for devlink params, reload, net ns.

"net ns" refer to "net namespace", right?
I am not sure how devlink instance related to net namespace yet.
I thought devlink is not limited to networking, it can be used in
other pcie device other than ethernet device?

> 
>> which devlink info or configuration does user need to query or configure in a
>> VM?
> Usually not much.
> Few examples that mlx5 users do with devlink instance of VF in a VM are, devlink params, devlink reload, board info. Health reporters, health recovery to name few.
> 


  reply	other threads:[~2021-06-08  3:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 11:19 [PATCH RESEND iproute2-next] devlink: Add optional controller user input Parav Pandit
2021-06-04  1:34 ` Yunsheng Lin
2021-06-06  7:10   ` Parav Pandit
2021-06-07  3:31     ` Yunsheng Lin
2021-06-07  6:10       ` Parav Pandit
2021-06-07 10:56         ` Yunsheng Lin
2021-06-07 11:12           ` Parav Pandit
2021-06-08  3:27             ` Yunsheng Lin [this message]
2021-06-08  5:26               ` Parav Pandit
2021-06-08  7:35                 ` Yunsheng Lin
2021-06-08  8:47                   ` Parav Pandit
2021-06-08  9:32                     ` Yunsheng Lin
2021-06-09  9:24                       ` Parav Pandit
2021-06-09 11:35                         ` Yunsheng Lin
2021-06-09 11:41                           ` Parav Pandit
2021-06-07  3:00 ` David Ahern
2021-06-07 11:43   ` Parav Pandit
2021-06-07 14:41     ` David Ahern
2021-06-07 15:12       ` Parav Pandit
2021-06-07 15:15         ` David Ahern
2021-06-07 16:14         ` David Ahern
2021-06-07 18:26           ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c3e48ce-f5ed-d35d-4f5e-1b572f251bd1@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=dsahern@gmail.com \
    --cc=jiri@nvidia.com \
    --cc=linuxarm@openeuler.org \
    --cc=moyufeng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.