netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Parav Pandit <parav@mellanox.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@resnulli.us>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	Yuval Avnery <yuvalav@mellanox.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"andrew.gospodarek@broadcom.com" <andrew.gospodarek@broadcom.com>,
	"michael.chan@broadcom.com" <michael.chan@broadcom.com>,
	Moshe Shemesh <moshe@mellanox.com>, Aya Levin <ayal@mellanox.com>,
	Eran Ben Elisha <eranbe@mellanox.com>,
	Vlad Buslov <vladbu@mellanox.com>,
	Yevgeny Kliteynik <kliteyn@mellanox.com>,
	"dchickles@marvell.com" <dchickles@marvell.com>,
	"sburla@marvell.com" <sburla@marvell.com>,
	"fmanlunas@marvell.com" <fmanlunas@marvell.com>,
	Tariq Toukan <tariqt@mellanox.com>,
	"oss-drivers@netronome.com" <oss-drivers@netronome.com>,
	"snelson@pensando.io" <snelson@pensando.io>,
	"drivers@pensando.io" <drivers@pensando.io>,
	"aelior@marvell.com" <aelior@marvell.com>,
	"GR-everest-linux-l2@marvell.com"
	<GR-everest-linux-l2@marvell.com>,
	"grygorii.strashko@ti.com" <grygorii.strashko@ti.com>,
	mlxsw <mlxsw@mellanox.com>, Ido Schimmel <idosch@mellanox.com>,
	Mark Zhang <markz@mellanox.com>,
	"jacob.e.keller@intel.com" <jacob.e.keller@intel.com>,
	Alex Vesker <valex@mellanox.com>,
	"linyunsheng@huawei.com" <linyunsheng@huawei.com>,
	"lihong.yang@intel.com" <lihong.yang@intel.com>,
	"vikas.gupta@broadcom.com" <vikas.gupta@broadcom.com>,
	"magnus.karlsson@intel.com" <magnus.karlsson@intel.com>
Subject: Re: [RFC] current devlink extension plan for NICs
Date: Tue, 24 Mar 2020 05:36:08 +0000	[thread overview]
Message-ID: <de01d429-6740-51a9-62e9-10ec54074041@mellanox.com> (raw)
In-Reply-To: <20200323123116.769e50e4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 3/24/2020 1:01 AM, Jakub Kicinski wrote:
> On Sat, 21 Mar 2020 09:07:30 +0000 Parav Pandit wrote:
>>> I see so you want the creation to be controlled by the same entity that
>>> controls the eswitch..
>>>
>>> To me the creation should be on the side that actually needs/will use
>>> the new port. And if it's not eswitch manager then eswitch manager
>>> needs to ack it.
>>>  
>>
>> There are few reasons to create them on eswitch manager system as below.
>>
>> 1. Creation and deletion on one system and synchronizing it with eswitch
>> system requires multiple back-n-forth calls between two systems.
>>
>> 2. When this happens, system where its created, doesn't know when is the
>> right time to provision to a VM or to a application.
>> udev/systemd/Network Manager and others such software might already
>> start initializing it doing DHCP but its switch side is not yet ready.
> 
> Networking software can deal with link down..
> 
Serving a half cooked device to an application is just not going to
work. It is not just link status.
A typical desired flow is:

1. create device
2. configure mac address
3. configure its rate limits
4. setup policy, encap/decap settings via tc offloads etc
5. bring up the link via rep
6. activate the device and attach it to application

Often administrator wants to assign/do (2), even though user is free to
change it later on.

This can only work if user system and eswitch has secure channel
established which often is just not available.

In other use case user system to define networkId is not trusted.
In this case there is some arbitrary/undefined on wait time at user host
to know that 2 to 6 are done.

vs doing step-1 to 6 on eswitch side by trusted entity and attaching it
to system where it is desired to use is elegant, secure.

networkID is read only for the system where this is deployed.

A application/vm/container may have one or more such devices that needs
to be present for the life of it regardless of its link status.

link down => detach device from container/vm
link up => attach device from container/vm is not right.
Hence port link status doesn't drive device status.

>> So it is desired to make sure that once device is fully
>> ready/configured, its activated.
>>
>> 3. Additionally it doesn't follow mirror sequence during deletion when
>> created on host.
> 
> Why so? Surely host needs to request deletion, otherwise container
> given only an SF could be cut off?
> 
creation from user system,
(a) create device
(b) configure device
(c) synchronous kick to create rep on other system (involving sw)

deletion from user system should be,
(d) synchronous kick to delete rep on other system (involving sw)
(e) unconfig the device
(f) delete the device

To achieve this mirror a sw synchronization is needed, not just with device.

Even if this is achieved somehow, it doesn't address the issue of
untrusted user system not having privilege to create the device with
given NetworkID.

>> 4. eswitch administrator simply doesn't have direct access to the system
>> where this device is used. So it just cannot be created there by eswitch
>> administrator.
> 
> Right, that is the point. It's the host admin that wants the new
> entity, so if possible it'd be better if they could just ask for it 
> via devlink rather than some cloud API. Not that I'm completely opposed
> to a cloud API - just seems unnecessary here.
> 

Flow is:
trusted_administator->cloud_api->smartnic->devlink_create,config,deploy->get_device_on_user_system.

untrusted_user_system->query_network_id->attach_to_container/vm/application.

  parent reply	other threads:[~2020-03-24  5:36 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 19:27 [RFC] current devlink extension plan for NICs Jiri Pirko
2020-03-20  3:32 ` Jakub Kicinski
2020-03-20  7:35   ` Jiri Pirko
2020-03-20 21:25     ` Jakub Kicinski
2020-03-21  9:07       ` Parav Pandit
2020-03-23 19:31         ` Jakub Kicinski
2020-03-23 22:50           ` Jason Gunthorpe
2020-03-24  3:41             ` Jakub Kicinski
2020-03-24 13:43               ` Jason Gunthorpe
2020-03-24  5:36           ` Parav Pandit [this message]
2020-03-21  9:35       ` Jiri Pirko
2020-03-23 19:21         ` Jakub Kicinski
2020-03-23 22:06           ` Jason Gunthorpe
2020-03-24  3:56             ` Jakub Kicinski
2020-03-24 13:20               ` Jason Gunthorpe
2020-03-26 14:37           ` Jiri Pirko
2020-03-26 14:43           ` Jiri Pirko
2020-03-26 14:47           ` Jiri Pirko
2020-03-26 14:51             ` Jiri Pirko
2020-03-26 20:30               ` Jakub Kicinski
2020-03-27  7:47                 ` Jiri Pirko
2020-03-27 16:38                   ` Jakub Kicinski
2020-03-27 18:49                     ` Samudrala, Sridhar
2020-03-27 19:10                       ` Jakub Kicinski
2020-03-27 19:45                         ` Saeed Mahameed
2020-03-27 20:42                           ` Jakub Kicinski
2020-03-30  9:07                             ` Parav Pandit
2020-04-08  6:10                               ` Parav Pandit
2020-03-27 20:47                           ` Samudrala, Sridhar
2020-03-27 20:59                             ` Jakub Kicinski
2020-03-30  7:09                           ` Parav Pandit
2020-03-30  7:48                     ` Parav Pandit
2020-03-30 19:36                       ` Jakub Kicinski
2020-03-31  7:45                         ` Parav Pandit
2020-03-31 17:32                           ` Jakub Kicinski
2020-04-01  7:32                             ` Parav Pandit
2020-04-01 20:12                               ` Jakub Kicinski
2020-04-02  6:16                                 ` Jiri Pirko
2020-04-08  5:10                                   ` Parav Pandit
2020-04-08  5:07                                 ` Parav Pandit
2020-04-08 16:59                                   ` Jakub Kicinski
2020-04-08 18:13                                     ` Parav Pandit
2020-04-09  2:07                                       ` Jakub Kicinski
2020-04-09  6:43                                         ` Parav Pandit
2020-03-30  5:30                   ` Parav Pandit
2020-03-26 14:59           ` Jiri Pirko
2020-03-23 23:32         ` Andy Gospodarek
2020-03-24  0:11           ` Jason Gunthorpe
2020-03-24  5:53           ` Parav Pandit
2020-03-23 21:32       ` Andy Gospodarek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de01d429-6740-51a9-62e9-10ec54074041@mellanox.com \
    --to=parav@mellanox.com \
    --cc=GR-everest-linux-l2@marvell.com \
    --cc=aelior@marvell.com \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=ayal@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=dchickles@marvell.com \
    --cc=drivers@pensando.io \
    --cc=eranbe@mellanox.com \
    --cc=fmanlunas@marvell.com \
    --cc=grygorii.strashko@ti.com \
    --cc=idosch@mellanox.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jiri@resnulli.us \
    --cc=kliteyn@mellanox.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=lihong.yang@intel.com \
    --cc=linyunsheng@huawei.com \
    --cc=magnus.karlsson@intel.com \
    --cc=markz@mellanox.com \
    --cc=michael.chan@broadcom.com \
    --cc=mlxsw@mellanox.com \
    --cc=moshe@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=oss-drivers@netronome.com \
    --cc=saeedm@mellanox.com \
    --cc=sburla@marvell.com \
    --cc=snelson@pensando.io \
    --cc=tariqt@mellanox.com \
    --cc=valex@mellanox.com \
    --cc=vikas.gupta@broadcom.com \
    --cc=vladbu@mellanox.com \
    --cc=yuvalav@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).