netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Parav Pandit <parav@nvidia.com>
To: Yunsheng Lin <linyunsheng@huawei.com>,
	"dsahern@gmail.com" <dsahern@gmail.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Jiri Pirko <jiri@nvidia.com>,
	"moyufeng@huawei.com" <moyufeng@huawei.com>,
	"linuxarm@openeuler.org" <linuxarm@openeuler.org>
Subject: RE: Re: [PATCH RESEND iproute2-next] devlink: Add optional controller user input
Date: Wed, 9 Jun 2021 09:24:03 +0000	[thread overview]
Message-ID: <DM8PR12MB5480BE54D27770DEB39EA009DC369@DM8PR12MB5480.namprd12.prod.outlook.com> (raw)
In-Reply-To: <4e696fd6-3c7b-b48c-18da-16aa57da4d54@huawei.com>



> From: Yunsheng Lin <linyunsheng@huawei.com>
> Sent: Tuesday, June 8, 2021 3:02 PM
> 
> On 2021/6/8 16:47, Parav Pandit wrote:
> >> From: Yunsheng Lin <linyunsheng@huawei.com>
> >> Sent: Tuesday, June 8, 2021 1:06 PM
> >>
> >> On 2021/6/8 13:26, Parav Pandit wrote:
> >>>> From: Yunsheng Lin <linyunsheng@huawei.com>
> >>>> Sent: Tuesday, June 8, 2021 8:58 AM
> >>>>
> >>>> On 2021/6/7 19:12, Parav Pandit wrote:
> >>>>>> From: Yunsheng Lin <linyunsheng@huawei.com>
> >>>>>> Sent: Monday, June 7, 2021 4:27 PM
> >>>>>>
> >>>>
> >>>> [..]
> >>>>
> >>>>>>>
> >>>>>>>> 2. each PF's devlink instance has three types of port, which is
> >>>>>>>>    FLAVOUR_PHYSICAL, FLAVOUR_PCI_PF and
> >>>>>> FLAVOUR_PCI_VF(supposing I
> >>>>>>>> understand
> >>>>>>>>    port flavour correctly).
> >>>>>>>>
> >>>>>>> FLAVOUR_PCI_{PF,VF,SF} belongs to eswitch (representor) side on
> >>>>>> switchdev device.
> >>>>>>
> >>>>>> If devlink instance or eswitch is in
> DEVLINK_ESWITCH_MODE_LEGACY
> >>>>>> mode, the FLAVOUR_PCI_{PF,VF,SF} port instance does not need to
> >>>> created?
> >>>>> No. in eswitch legacy, there are no representor netdevice or
> >>>>> devlink
> >> ports.
> >>>>
> >>>> It seems each devlink port instance corresponds to a netdevice.
> >>>> More specificly, the devlink instance is created in the struct
> >>>> pci_driver' probe function of a pci function, a devlink port
> >>>> instance is created and registered to that devlink instance when a
> >>>> netdev of that
> >> pci function is created?
> >>>>
> >>> Yes.
> >>>
> >>>> As in diagram [1], the devlink port instance(flavour
> >>>> FLAVOUR_PHYSICAL) for
> >>>> ctrl-0-pf0 is created when the netdev of ctrl-0-pf0 is created in
> >>>> the host of smartNIC, the devlink port instance(flavour
> >>>> FLAVOUR_VIRTUAL) for ctrl-0- pf0vfN is created when the netdev of
> >>>> ctrl-0-pf0vfN is created in the host of smartNIC, right?
> >>>>
> >>> Ctrl-0-pf0vfN, ctrl-0-pf0 ports are eswitch ports. They are created
> >>> where
> >> there is eswitch.
> >>> Usually in smartnic where eswitch is located.
> >>
> >> Does diagram in [1] corresponds to the multi-host (two) host setup as
> >> memtioned previously?
> >> H1.pf0.phyical_port = p0.
> >> H1.pf1.phyical_port = p1.
> >> H2.pf0.phyical_port = p0.
> >> H2.pf1.phyical_port = p1.
> >>
> > Yes.
> >
> >> Let's say H1 = server and H2 = smartNIC as the pci rc connected to below:
> >>                  ---------------------------------------------------------
> >>                  |                                                       |
> >>                  |           --------- ---------         ------- ------- |
> >>     -----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
> >>     | server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
> >>     | pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
> >>     | connect |  | -------                       -------                 |
> >>     -----------  |     | controller_num=1 (no eswitch)                   |
> >>                  ------|--------------------------------------------------
> >>                  (internal wire)
> >>                        |
> >>                  ---------------------------------------------------------
> >>                  | devlink eswitch ports and reps                        |
> >>                  | ----------------------------------------------------- |
> >>                  | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
> >>                  | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
> >>                  | ----------------------------------------------------- |
> >>                  | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
> >>                  | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
> >>                  | ----------------------------------------------------- |
> >>                  |                                                       |
> >>                  |                                                       |
> >>     -----------  |           --------- ---------         ------- ------- |
> >>     | smartNIC|  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
> >>     | pci rc  |==| -------   ----/---- ---/----- ------- ---/--- ---/--- |
> >>     | connect |  | | pf0 |______/________/       | pf1 |___/_______/     |
> >>     -----------  | -------                       -------                 |
> >>                  |                                                       |
> >>                  |  local controller_num=0 (eswitch)                     |
> >>
> >> ---------------------------------------------------------
> >>
> >> A vanilla kernel can run on the smartNIC host, right?
> > Right.
> >
> >> what the smartNIC host see is two PF corresponding to ctrl-0-pf0 and
> >> ctrl-0-pf1 When the kernel is boot up first and mlx driver is not
> >> loaded yet, right?
> >>
> >> I am not sure it is ok to leave out the VF and SF, but let's leave
> >> them out for simplicity now.
> >> When mlx driver is loaded, two devlink instances are created, which
> >> corresponds to ctrl-0-pf0 and ctrl-0-pf1, and two devlink port
> >> instances (flavour FLAVOUR_PHYSICAL) is created and registered to
> >> corresponding devlink instances just created, right?
> >>
> >> As the eswitch mode is based on devlink instance, Let's only set the
> >> mode of ctrl-0-pf0' devlink instance to
> >> DEVLINK_ESWITCH_MODE_SWITCHDEV, the representor netdev of ctrl-1-
> pf0
> >> is created and devlink port instance of that representor netdev is
> >> created and registered to devlink instances corresponding to ctrl-0-pf0?
> >>
> >> I think I miss something here, the above does not seems right,
> >> because:
> >> 1. For single host case:the PF is not passed through to the VM, devlink
> port
> >>    instance of VF's representor netdev can be registered to the
> >> devlink instance
> >>    corresponding to it's PF, right?
> > Yes, if I understand your question right.
> >
> >> 2. But for two-host case as above, do we need to create a devlink
> instances
> >>    for the PF corresponding to ctrl-1-pf0 in smartNIC host?
> > You can choose not to create a devlink instance in external controller PF. It
> may not be even a Linux OS running there.
> >
> > I read questions few more times, but I find it hard to understand what you
> really want to ask.
> > Not sure I understood you.
> >
> > Trying again,
> >
> > The model is really very straight forward as visible in the diagram.
> >
> > There is one PF that has the eswitch. Eswitch contains representor ports.
> 
> I thought the representor ports of a PF'eswitch is decided by the function
> under a specific PF(For example, the PF itself and the VF under this PF)?

Eswitch is not per PF in context of smartnic/multi-host.
PF _has_ eswitch that contains the representor ports for PF, VF, SF.

> 
> > Each representor port represent either PF, VF or SF.
> > This PF, VF or SF can be of local controller residing on the eswitch device or
> it can be of an external controller(s).
> > Here external controller = 1.
> 
> If I understood above correctly:
> The fw/hw decide which PF has the eswitch, and how many
> devlink/representor port does this eswitch has?
Number of ports are dynamic. When new SFs/VFs are created, ports get added to the switch.

> Suppose PF0 of controller_num=0 in have the eswitch, and the eswitch may
> has devlink/representor port representing other PF, like PF1 in
> controller_num=0, and even PF0/PF1 in controller_num=1?
Yes. Correct.

  reply	other threads:[~2021-06-09  9:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 11:19 [PATCH RESEND iproute2-next] devlink: Add optional controller user input Parav Pandit
2021-06-04  1:34 ` Yunsheng Lin
2021-06-06  7:10   ` Parav Pandit
2021-06-07  3:31     ` Yunsheng Lin
2021-06-07  6:10       ` Parav Pandit
2021-06-07 10:56         ` Yunsheng Lin
2021-06-07 11:12           ` Parav Pandit
2021-06-08  3:27             ` Yunsheng Lin
2021-06-08  5:26               ` Parav Pandit
2021-06-08  7:35                 ` Yunsheng Lin
2021-06-08  8:47                   ` Parav Pandit
2021-06-08  9:32                     ` Yunsheng Lin
2021-06-09  9:24                       ` Parav Pandit [this message]
2021-06-09 11:35                         ` Yunsheng Lin
2021-06-09 11:41                           ` Parav Pandit
2021-06-07  3:00 ` David Ahern
2021-06-07 11:43   ` Parav Pandit
2021-06-07 14:41     ` David Ahern
2021-06-07 15:12       ` Parav Pandit
2021-06-07 15:15         ` David Ahern
2021-06-07 16:14         ` David Ahern
2021-06-07 18:26           ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM8PR12MB5480BE54D27770DEB39EA009DC369@DM8PR12MB5480.namprd12.prod.outlook.com \
    --to=parav@nvidia.com \
    --cc=dsahern@gmail.com \
    --cc=jiri@nvidia.com \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=moyufeng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).